Optimizing an FSM-Based Stack for Low-Latency Systems

Implementing a Stack with an FSM in Hardware and SoftwareA stack is one of the most fundamental abstract data structures: last-in, first-out (LIFO). Stacks are used in function call management, expression evaluation, backtracking algorithms, undo mechanisms, and many hardware-control tasks. Implementing a stack controlled by a finite state machine (FSM) unifies control flow and data handling, and is a useful design pattern both in software (embedded systems, RTOS components) and in hardware (FPGA/ASIC designs). This article walks through concepts, design choices, example implementations, verification strategies, and optimizations for FSM-based stacks.


Overview: Why combine an FSM and a stack?

An FSM provides deterministic control over a sequence of operations, while a stack provides a simple yet powerful data storage primitive. Combining them is natural when stack operations must obey strict timing, sequencing, or when external handshaking is required. Example scenarios:

  • Hardware: microcontroller interrupt or subroutine return contexts, expression evaluation units, or a push/pop buffer between clock domains.
  • Software/firmware: driver state machines that push context before entering a critical routine, or parsing engines that manage nested constructs under constrained timing.

Key benefits:

  • Clear separation of control and storage: FSM controls when to push, pop, or idle; the stack module handles data storage.
  • Deterministic timing: FSM-based control ensures operations happen in well-defined cycles—critical in hardware.
  • Easier formal verification: both FSMs and stacks have compact state spaces that are amenable to model checking.

Basic stack operations and FSM states

A stack supports at least three operations:

  • push(data): place data on top
  • pop(): remove and return top element
  • peek()/top(): read top element without removing (optional)
  • empty/full status checks

For a simple FSM-based stack, common control states include:

  • IDLE — waiting for an operation request
  • PUSH_PREP — prepare to accept push data (if multi-cycle)
  • PUSH_WRITE — write data into memory and update pointer
  • POP_READ — read top data and update pointer
  • ERROR/UNDERFLOW/FULL — handle exceptional conditions
  • ACK/COMPLETE — signal completion to requester

Even a minimal implementation can be built with as few as 3–4 states (IDLE, PUSH, POP, ERROR).


Hardware implementation (FPGA/ASIC) — Verilog example

Design goals for hardware:

  • Single-cycle push/pop where possible, or well-defined multi-cycle sequences.
  • Proper handling of synchronous memory (BRAM or register array).
  • Flags for empty/full.
  • Handshaking signals when interacting with other modules (valid/ready, req/ack).

Below is a conceptual synchronous Verilog-style implementation (register-transfer level). This example targets a small parameterizable stack using a memory array and an FSM that sequences push and pop operations.

// stack_fsm.v // Parameterizable stack depth and data width module stack_fsm # (     parameter DATA_WIDTH = 8,     parameter DEPTH = 16,     parameter PTR_WIDTH = $clog2(DEPTH) ) (     input  wire                   clk,     input  wire                   rstn,    // active low reset     // control interface     input  wire                   push_req,     input  wire                   pop_req,     input  wire [DATA_WIDTH-1:0]  push_data,     output reg  [DATA_WIDTH-1:0]  pop_data,     output reg                    empty,     output reg                    full,     output reg                    busy,     output reg                    error     // underflow/overflow );     // storage     reg [DATA_WIDTH-1:0] mem [0:DEPTH-1];     reg [PTR_WIDTH:0]    sp; // stack pointer: points to next free slot     // FSM states     typedef enum reg [2:0] {         S_IDLE = 3'd0,         S_PUSH = 3'd1,         S_POP  = 3'd2,         S_ERROR= 3'd3,         S_DONE = 3'd4     } state_t;     state_t state, next_state;     // combinational next-state logic     always @(*) begin         next_state = state;         case (state)             S_IDLE: begin                 if (push_req && !pop_req) begin                     if (full) next_state = S_ERROR;                     else next_state = S_PUSH;                 end else if (pop_req && !push_req) begin                     if (empty) next_state = S_ERROR;                     else next_state = S_POP;                 end else begin                     next_state = S_IDLE;                 end             end             S_PUSH: next_state = S_DONE;             S_POP:  next_state = S_DONE;             S_DONE: next_state = S_IDLE;             S_ERROR: next_state = S_DONE;             default: next_state = S_IDLE;         endcase     end     // sequential state update and datapath actions     always @(posedge clk or negedge rstn) begin         if (!rstn) begin             state <= S_IDLE;             sp <= 0;             empty <= 1;             full <= 0;             busy <= 0;             error <= 0;             pop_data <= 0;         end else begin             state <= next_state;             // default signals             busy <= (next_state != S_IDLE);             error <= 0;             case (next_state)                 S_PUSH: begin                     mem[sp] <= push_data;                     sp <= sp + 1;                 end                 S_POP: begin                     sp <= sp - 1;                     pop_data <= mem[sp - 1];                 end                 S_ERROR: begin                     error <= 1;                 end                 default: begin end             endcase             // update flags             empty <= (sp == 0);             full  <= (sp == DEPTH);         end     end endmodule 

Notes:

  • The above uses a simple synchronous array and a pointer; push writes at mem[sp] then increments sp; pop decrements sp then reads mem[sp-1].
  • For real FPGA BRAM, read/write timing and collision rules can require a one-cycle latency for reads; adjust FSM (introduce intermediate read or write-complete states) accordingly.
  • Add handshake signals (valid/ready) if interacting with streaming interfaces.

Software implementation (embedded C / RTOS)

In software, an FSM-based stack is often used when stack operations must coordinate with I/O, interrupts, or when non-blocking behavior is required. Use an explicit state variable and event-driven loop (polling or event callbacks). Below is a simple event-driven C-style pattern suitable for small embedded systems.

// stack_fsm.c #include <stdint.h> #include <stdbool.h> #include <stddef.h> #define DEPTH 16 typedef uint8_t data_t; static data_t mem[DEPTH]; static size_t sp = 0; typedef enum {     IDLE,     DO_PUSH,     DO_POP,     HANDLE_ERROR,     DONE } state_t; static state_t state = IDLE; static bool push_req = false; static bool pop_req = false; static data_t push_val; static data_t pop_val; static bool error = false; void request_push(data_t v) {     push_val = v;     push_req = true; } void request_pop(void) {     pop_req = true; } void stack_fsm_tick(void) {     switch(state) {         case IDLE:             if (push_req && !pop_req) {                 if (sp >= DEPTH) { error = true; state = HANDLE_ERROR; }                 else state = DO_PUSH;             } else if (pop_req && !push_req) {                 if (sp == 0) { error = true; state = HANDLE_ERROR; }                 else state = DO_POP;             }             break;         case DO_PUSH:             mem[sp++] = push_val;             push_req = false;             state = DONE;             break;         case DO_POP:             pop_val = mem[--sp];             pop_req = false;             state = DONE;             break;         case HANDLE_ERROR:             // handle overflow/underflow (log, set flag)             state = DONE;             break;         case DONE:             state = IDLE;             break;     } } 

Design notes:

  • Call stack_fsm_tick periodically from a main loop or from a scheduler. For RTOS, you can integrate it into a task or run-to-completion handler.
  • For concurrency (ISRs, multiple tasks) protect shared state with mutexes or disable interrupts around requests and state transitions.

Handshaking and backpressure

When the stack interfaces with other modules, especially in hardware pipelines or streaming data paths, use a handshake such as valid/ready to avoid data loss and provide backpressure. Example signals:

  • in_valid / in_ready for push data
  • out_valid / out_ready for pop/peek results

Integrate these into FSM transitions: only move from PUSH_PREP to PUSH_WRITE when in_valid && in_ready; only complete POP_READ when out_ready.


Verification and testing

Hardware:

  • Unit tests in simulation: random sequences of push/pop, including boundary cases.
  • Assertions: check that sp never exceeds DEPTH or goes below 0; check empty/full signals consistent with sp.
  • Formal proofs: use model checking for invariants (e.g., sp ∈ [0, DEPTH]).
  • FPGA verification: hardware-in-the-loop tests, timing constraints for BRAM/regs.

Software:

  • Unit tests with randomized push/pop sequences.
  • Stress tests under interrupts and concurrent access.
  • Code review for race conditions; use static analysis tools for concurrency issues.

Optimizations and variations

  • Dual-ported BRAM / register files allow simultaneous push and pop on separate ports.
  • Circular buffer style stack: uncommon, but useful when implementing a bounded LIFO with overwrite-on-full semantics.
  • Pointer encoding: gray-code or error-detecting encodings for resilience in noisy hardware.
  • Width/depth trade-offs: wider data vs deeper stack affects BRAM utilization.
  • Multi-element push/pop: for throughput, support batching several elements per operation (adds complexity in pointer arithmetic and boundary handling).
  • Shadow pointer / speculative push: useful when pipeline needs to speculatively accept data and rollback on misprediction.

Common pitfalls

  • Off-by-one errors with stack pointer updates: decide whether sp points to next free slot or the current top and stick with it consistently.
  • BRAM read latency: synchronous RAM often returns data one cycle after address, so FSM must accommodate.
  • Concurrent access in software: race conditions if requests and pointer updates are not protected.
  • Full/empty indicator glitches when push and pop happen simultaneously without atomic handling.

Example use cases

  • Return-address stack in small soft-core processors.
  • Expression evaluator in hardware calculators or parsing accelerators.
  • Command undo buffer in embedded UIs.
  • Scratchpad for nested state contexts in a protocol engine.

Summary

Implementing a stack using an FSM yields a robust, deterministic design suitable for both hardware and software. The FSM controls operation sequencing, error handling, and handshaking while a separate storage datapath (registers, BRAM, or an array) holds values. Key considerations include pointer conventions, timing of memory reads/writes, handshake protocols, and concurrency protection. With clear state design, careful verification, and attention to memory timing, an FSM-based stack provides predictable behavior essential for many embedded and hardware systems.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *