Implementing a Stack with an FSM in Hardware and SoftwareA stack is one of the most fundamental abstract data structures: last-in, first-out (LIFO). Stacks are used in function call management, expression evaluation, backtracking algorithms, undo mechanisms, and many hardware-control tasks. Implementing a stack controlled by a finite state machine (FSM) unifies control flow and data handling, and is a useful design pattern both in software (embedded systems, RTOS components) and in hardware (FPGA/ASIC designs). This article walks through concepts, design choices, example implementations, verification strategies, and optimizations for FSM-based stacks.
Overview: Why combine an FSM and a stack?
An FSM provides deterministic control over a sequence of operations, while a stack provides a simple yet powerful data storage primitive. Combining them is natural when stack operations must obey strict timing, sequencing, or when external handshaking is required. Example scenarios:
- Hardware: microcontroller interrupt or subroutine return contexts, expression evaluation units, or a push/pop buffer between clock domains.
- Software/firmware: driver state machines that push context before entering a critical routine, or parsing engines that manage nested constructs under constrained timing.
Key benefits:
- Clear separation of control and storage: FSM controls when to push, pop, or idle; the stack module handles data storage.
- Deterministic timing: FSM-based control ensures operations happen in well-defined cycles—critical in hardware.
- Easier formal verification: both FSMs and stacks have compact state spaces that are amenable to model checking.
Basic stack operations and FSM states
A stack supports at least three operations:
- push(data): place data on top
- pop(): remove and return top element
- peek()/top(): read top element without removing (optional)
- empty/full status checks
For a simple FSM-based stack, common control states include:
- IDLE — waiting for an operation request
- PUSH_PREP — prepare to accept push data (if multi-cycle)
- PUSH_WRITE — write data into memory and update pointer
- POP_READ — read top data and update pointer
- ERROR/UNDERFLOW/FULL — handle exceptional conditions
- ACK/COMPLETE — signal completion to requester
Even a minimal implementation can be built with as few as 3–4 states (IDLE, PUSH, POP, ERROR).
Hardware implementation (FPGA/ASIC) — Verilog example
Design goals for hardware:
- Single-cycle push/pop where possible, or well-defined multi-cycle sequences.
- Proper handling of synchronous memory (BRAM or register array).
- Flags for empty/full.
- Handshaking signals when interacting with other modules (valid/ready, req/ack).
Below is a conceptual synchronous Verilog-style implementation (register-transfer level). This example targets a small parameterizable stack using a memory array and an FSM that sequences push and pop operations.
// stack_fsm.v // Parameterizable stack depth and data width module stack_fsm # ( parameter DATA_WIDTH = 8, parameter DEPTH = 16, parameter PTR_WIDTH = $clog2(DEPTH) ) ( input wire clk, input wire rstn, // active low reset // control interface input wire push_req, input wire pop_req, input wire [DATA_WIDTH-1:0] push_data, output reg [DATA_WIDTH-1:0] pop_data, output reg empty, output reg full, output reg busy, output reg error // underflow/overflow ); // storage reg [DATA_WIDTH-1:0] mem [0:DEPTH-1]; reg [PTR_WIDTH:0] sp; // stack pointer: points to next free slot // FSM states typedef enum reg [2:0] { S_IDLE = 3'd0, S_PUSH = 3'd1, S_POP = 3'd2, S_ERROR= 3'd3, S_DONE = 3'd4 } state_t; state_t state, next_state; // combinational next-state logic always @(*) begin next_state = state; case (state) S_IDLE: begin if (push_req && !pop_req) begin if (full) next_state = S_ERROR; else next_state = S_PUSH; end else if (pop_req && !push_req) begin if (empty) next_state = S_ERROR; else next_state = S_POP; end else begin next_state = S_IDLE; end end S_PUSH: next_state = S_DONE; S_POP: next_state = S_DONE; S_DONE: next_state = S_IDLE; S_ERROR: next_state = S_DONE; default: next_state = S_IDLE; endcase end // sequential state update and datapath actions always @(posedge clk or negedge rstn) begin if (!rstn) begin state <= S_IDLE; sp <= 0; empty <= 1; full <= 0; busy <= 0; error <= 0; pop_data <= 0; end else begin state <= next_state; // default signals busy <= (next_state != S_IDLE); error <= 0; case (next_state) S_PUSH: begin mem[sp] <= push_data; sp <= sp + 1; end S_POP: begin sp <= sp - 1; pop_data <= mem[sp - 1]; end S_ERROR: begin error <= 1; end default: begin end endcase // update flags empty <= (sp == 0); full <= (sp == DEPTH); end end endmodule
Notes:
- The above uses a simple synchronous array and a pointer; push writes at mem[sp] then increments sp; pop decrements sp then reads mem[sp-1].
- For real FPGA BRAM, read/write timing and collision rules can require a one-cycle latency for reads; adjust FSM (introduce intermediate read or write-complete states) accordingly.
- Add handshake signals (valid/ready) if interacting with streaming interfaces.
Software implementation (embedded C / RTOS)
In software, an FSM-based stack is often used when stack operations must coordinate with I/O, interrupts, or when non-blocking behavior is required. Use an explicit state variable and event-driven loop (polling or event callbacks). Below is a simple event-driven C-style pattern suitable for small embedded systems.
// stack_fsm.c #include <stdint.h> #include <stdbool.h> #include <stddef.h> #define DEPTH 16 typedef uint8_t data_t; static data_t mem[DEPTH]; static size_t sp = 0; typedef enum { IDLE, DO_PUSH, DO_POP, HANDLE_ERROR, DONE } state_t; static state_t state = IDLE; static bool push_req = false; static bool pop_req = false; static data_t push_val; static data_t pop_val; static bool error = false; void request_push(data_t v) { push_val = v; push_req = true; } void request_pop(void) { pop_req = true; } void stack_fsm_tick(void) { switch(state) { case IDLE: if (push_req && !pop_req) { if (sp >= DEPTH) { error = true; state = HANDLE_ERROR; } else state = DO_PUSH; } else if (pop_req && !push_req) { if (sp == 0) { error = true; state = HANDLE_ERROR; } else state = DO_POP; } break; case DO_PUSH: mem[sp++] = push_val; push_req = false; state = DONE; break; case DO_POP: pop_val = mem[--sp]; pop_req = false; state = DONE; break; case HANDLE_ERROR: // handle overflow/underflow (log, set flag) state = DONE; break; case DONE: state = IDLE; break; } }
Design notes:
- Call stack_fsm_tick periodically from a main loop or from a scheduler. For RTOS, you can integrate it into a task or run-to-completion handler.
- For concurrency (ISRs, multiple tasks) protect shared state with mutexes or disable interrupts around requests and state transitions.
Handshaking and backpressure
When the stack interfaces with other modules, especially in hardware pipelines or streaming data paths, use a handshake such as valid/ready to avoid data loss and provide backpressure. Example signals:
- in_valid / in_ready for push data
- out_valid / out_ready for pop/peek results
Integrate these into FSM transitions: only move from PUSH_PREP to PUSH_WRITE when in_valid && in_ready; only complete POP_READ when out_ready.
Verification and testing
Hardware:
- Unit tests in simulation: random sequences of push/pop, including boundary cases.
- Assertions: check that sp never exceeds DEPTH or goes below 0; check empty/full signals consistent with sp.
- Formal proofs: use model checking for invariants (e.g., sp ∈ [0, DEPTH]).
- FPGA verification: hardware-in-the-loop tests, timing constraints for BRAM/regs.
Software:
- Unit tests with randomized push/pop sequences.
- Stress tests under interrupts and concurrent access.
- Code review for race conditions; use static analysis tools for concurrency issues.
Optimizations and variations
- Dual-ported BRAM / register files allow simultaneous push and pop on separate ports.
- Circular buffer style stack: uncommon, but useful when implementing a bounded LIFO with overwrite-on-full semantics.
- Pointer encoding: gray-code or error-detecting encodings for resilience in noisy hardware.
- Width/depth trade-offs: wider data vs deeper stack affects BRAM utilization.
- Multi-element push/pop: for throughput, support batching several elements per operation (adds complexity in pointer arithmetic and boundary handling).
- Shadow pointer / speculative push: useful when pipeline needs to speculatively accept data and rollback on misprediction.
Common pitfalls
- Off-by-one errors with stack pointer updates: decide whether sp points to next free slot or the current top and stick with it consistently.
- BRAM read latency: synchronous RAM often returns data one cycle after address, so FSM must accommodate.
- Concurrent access in software: race conditions if requests and pointer updates are not protected.
- Full/empty indicator glitches when push and pop happen simultaneously without atomic handling.
Example use cases
- Return-address stack in small soft-core processors.
- Expression evaluator in hardware calculators or parsing accelerators.
- Command undo buffer in embedded UIs.
- Scratchpad for nested state contexts in a protocol engine.
Summary
Implementing a stack using an FSM yields a robust, deterministic design suitable for both hardware and software. The FSM controls operation sequencing, error handling, and handshaking while a separate storage datapath (registers, BRAM, or an array) holds values. Key considerations include pointer conventions, timing of memory reads/writes, handshake protocols, and concurrency protection. With clear state design, careful verification, and attention to memory timing, an FSM-based stack provides predictable behavior essential for many embedded and hardware systems.
Leave a Reply