How To Use The Typestate Pattern In Rust

When writing programs, data is often changing through multiple different states as it flows through the application.

As you can imagine, keeping track of which state a piece of data is in can become difficult as the amount of data and number of states grow, and as the application complexity increases.

The good news is that there are multiple ways to handle this issue, and in this tutorial we'll be looking at ways to reduce the cognitive burden of state management by utilizing the typestate pattern.

So lets dive in...

Sidenote: If you want to play with the full source code for this post, check out this repo.

What is the typestate pattern?

Like the name implies, the typestate pattern is for managing information that has different states.

It does so by:

Disallowing access to old data
Ensuring state transitions are always correct
Encapsulating functionality for each state

Here are some examples where using the typestate pattern can improve the quality of code:

When closing a network connection, typestate can prevent writing additional data to the closed connection
In a video game, a character can transition from a ground state to a jumping state. Using typestate makes it impossible to introduce a bug where the character can jump again while they are already jumping
Submission of user-generated data may need to go through a validation step. Encoding the validation step using the typestate pattern prevents invalid data from propagating throughout the program

When using the typestate pattern with Rust, the state information gets encoded into the type system. Any misuse of data encoded using typestate will result in a compile-time error, which provides immediate feedback to developers and reduces the number of bugs in your programs.

How the typestate pattern works in Rust

Typestate works by leveraging the Rust compiler and type system, and it follows a simple process:

Each state in a process gets modeled using a struct,
These states (structs) get moved into state transition functions as inputs, then
The transition functions return the next state in the process

The moved into bit is important, as the state transition functions should always take ownership of the state.

Why?

Well, when the state transition function owns the data, it gets dropped at the end of the function, which invalidates the old state. This is essential for making typestate work in Rust.

Borrowing state data

Borrowing data to create the state is fine, but the state structure needs to move into the transition function. If we fail to do this, then it becomes possible to break our transitions by moving backwards to any arbitrary state.

It's possible to encode any transition using the typestate pattern, so if your situation needs state to move backwards then just use more typestate!

Before hopping into examples, let's walk through the process we mentioned earlier to get a feel for how typestate works.

Walkthrough

Imagine we have some process that has 4 stages: A, B, C, D, and each stage occurs one after the other, resulting in three steps.

To make our jobs easier, we should visualize how the process advances through each stage, and wwe can use arrows (->) to represent each step in the process, like this:

A -> B -> C -> D.

Simple so far right? However, the typestate pattern operates on two states at a time: an input state, and an output state.

It seems a little complex, but we can break our visualization down into multiple pieces to fit this "1 in, 1 out" pattern. This will reveal the number of transition functions (steps) needed, and the states (stages) they operate on:

A -> B
B -> C
C -> D

Now that we have the states and transitions, we can start writing some code:

// each state gets a struct
struct A;
struct B;
struct C;
struct D;

// the transition functions
fn one(a: A) -> B { B }
fn two(b: B) -> C { C }
fn three(c: C) -> D { D }

fn main() {
  // Let's go!
  let a = A;        // start
  let b = one(a);   // A -> B
  let c = two(b);   // B -> C
  let d = three(c); // C -> D
}

This is a simplified version of how to implement the typestate pattern in Rust.

In the following examples, we'll:

Explore how to prevent users of your code from making arbitrary states,
Create more ergonomic APIs for using the pattern, and
Manage shared data between states

Let's hop into the code!

Example 1: Text formatting with typestate pattern

Lets start small and work on a program that formats some text using multiple steps.

We'll be using the typestate pattern to encode each step into the type system, and we will force the data to go through each step in a predefined order. This is a common way to use the typestate pattern, and you may already have used this in your code without realizing it!

Here are the steps for our formatting algorithm:

Take some raw text (a short phrase) as input,
Parse the raw text into words, then
Format the words

The transformation is simple: this is some text, will get transformed to this.is.some.text.

The first thing we need to do is create our states, and we should always use a struct to represent a state when implementing the typestate pattern.

(If your situation feels like it needs an enum, then break out each variant into a struct and use it as a state).

Here are the states for the text formatter:

// type alias makes things a little clearer
type Word = &'static str;

// Each state gets a struct
pub struct RawText(&'static str);
pub struct ParsedText(Vec<Word>);
pub struct FormattedText(String);

Since we are using new types, external users of your code will not be able to create these states, even though they are public.

This is good because we want the state transition functions to be responsible for generating the next state.

Next we should make some notes for the state transitions, and we'll refer to this when implementing the transitions:

// Transitions:
//    RawText    -> ParsedText
//    ParsedText -> FormattedText

We know from the above transitions that RawText is the initial state, so we'll create a function to make a new RawText. (All usage must go through this function first, which kicks off the entire process).

We will be writing all our transition functions using impl blocks on our states.

This will give us a nicer API:

impl RawText {
    pub fn new(raw: &'static str) -> Self {
        // The `raw` data is stored as private data, so
        // end-users can't break data within the state.
        Self(raw)
    }
}

Next we can create the transition from RawText to ParsedText.

We'll look at just the function signature first:

impl RawText {
    pub fn parse(self) -> ParsedText {
      /* snip */
    }
}

The parse function takes self as the sole parameter. We are doing this because self (which is a RawText) will get moved into the parse function.

Once moved, RawText will get destroyed at the end of the parse function. Therefore, we must return the next state because RawText will no longer be available once it gets parsed.

Let's fill in the rest of the function by parsing the text into words and wrapping the parsed text into the ParsedText struct:

impl RawText {
    pub fn parse(self) -> ParsedText {
        let parsed = self.0.split(' ').collect();
        ParsedText(parsed)
    }
}

So far so good.

Let's write the next transition from ParsedText to FormattedText:

impl ParsedText {
    // remember: `self` is a `ParsedText`
    pub fn format(self) -> FormattedText {
        FormattedText(self.0.join("."))
    }
}

Just like the parse function, we are consuming self in order to invalidate the ParsedText state. (This function then performs a trivial formatting operation, wraps up the result in a FormattedText state, and then returns it).

To make things more convenient, we will also add a function to the FormattedText state so we can view the text:

impl FormattedText {
    pub fn as_str(&self) -> &str {
        self.0.as_str()
    }
}

And we're done! We created 3 states and 2 transition functions using the typestate pattern.

Let's take a look at how to utilize this example.

API usage example

fn main() {
    // We start off with some raw text
    let raw: RawText = RawText::new("this is some text");

    // We then parse it. The `RawText` is no longer available
    // because we moved it into `parsed`.
    let parsed: ParsedText = raw.parse();

    // Once parsed, we format it. Just like the `RawText`, we
    // cannot access `parsed` because it was moved into
    // `formatted`.
    let formatted: FormattedText = parsed.format();

    assert_eq!(formatted.as_str(), "this.is.some.text");
}

Once the process begins with RawText::new(), movement of the data passes through the states in the order we defined.

This means that we are unable to access data from previous steps, we are not able to transition backwards, and we cannot call any of the functions out of order.

The type system and Rust compiler have guaranteed that this process will move forward using the transitions we defined in the impl blocks for each state struct.

A potential problem...

While much of the time the above solution is sufficient, not all situations are a good fit. For example, it's common to have shared functionality between all of the states, such as the ability to abort at any time.

So how do we ge around this?

Well, we can still create this shared functionality using the above approach, but this would mean duplicating a code block across all impl blocks, which isn't an ideal solution.

Instead we can implement shared functionality across all states by combining the typestate pattern with generics, like so...

Generic typestate

Using generics with the typestate pattern is similar to the previous example. The main difference when using a generic typestate is the need to create an extra struct to contain the state.

This extra struct will be generic and use the states as the generic parameter.

Example 2: Traffic signal control

We will be encoding the rules of a traffic signal using the typestate pattern and generics.

The traffic signal has 4 states:

Red
Yellow (amber)
Green
Fault (flashing red)

The transitions for the traffic signal are:

// Red    -> Green
// Green  -> Yellow
// Yellow -> Red
//
// All    -> Fault
// Fault  -> Red

Since we will be using generics, we should also create a trait to use as a constraint.

This is important because we want all known states to have valid transitions, and we will also get documentation for free. If we do not provide a constraint then everything will still work, but users of the API will be able to create invalid states which do nothing.

trait SignalState {}

You'll notice that the trait doesn't have any function signatures declared, and this is fine, because we are using it as a marker trait for the generic constraint.

Let's create the states:

struct Red;
struct Yellow;
struct Green;
struct Fault;

impl SignalState for Red {}
impl SignalState for Yellow {}
impl SignalState for Green {}
impl SignalState for Fault {}

The impl blocks will satisfy the contraint we are placing on the upcoming generic struct.

Generics

The generic typestate pattern needs to use generics, so we'll create a generic struct which will be generic over SignalState:

struct TrafficSignal<S: SignalState>;

Alright, that's fine...

Uh oh! We set a generic parameter, but it isn't used within the structure.

Rust requires usage of generic parameters, so this is why we get this error.

We can fix this though, by using the S in our struct.

(We don't want to use S because we are using it as a marker. Just the existence of S is enough for our usage).

PhantomData

We can fix this using PhantomData:

use std::marker::PhantomData;
struct TrafficSignal<S: SignalState> {
    _marker: PhantomData<S>,
}

PhantomData is a zero-sized type that doesn't use any memory, but it does allow us to satisfy the type checker.

PhantomData exists as a type-checking construct, and doesn't have any impact on our compiled code.

This is great because we can now "use" the S in our struct.

Shared functionality

Our transitions listed at the start of this example indicate that we need to be able to transition to a Fault state from any other state.

In our example, faults can occur from factors outside of our control: vehicle collision, broken lights, or maintenance are some examples of things that may happen with a traffic signal.

Since we want the ability to transition to this state at any time, we will create the transition on an impl block which is generic over all SignalState:

impl<S: SignalState> TrafficSignal<S> {
  /* snip */
}

We use TrafficSignal<S> to make this impl block apply to all implementations of TrafficSignal, as long as S implements SignalState (which all our states do).

Let's finish up the impl block:

impl<S: SignalState> TrafficSignal<S> {
    // Helper function to perform state transitions.
    fn transition() -> TrafficSignal<S> {
        TrafficSignal {
            _marker: PhantomData,
        }
    }

    // Transition to a fault state.
    pub fn fault(self) -> TrafficSignal<Fault> {
        TrafficSignal::transition()
    }
}

We create two functions: transition and fault.

The transition function is a helper function that we will be using in other states to transition from one state to another.

It returns a new TrafficSignal<S> where the S is the state required by the transition. The S gets set to the appropriate state based on the return type of the calling function.

The fault function transitions any state to the Fault state. Since all functions in this impl block apply to all states, they are able to call fault when required.

Transition implementations

Now we will implement the state transitions:

impl TrafficSignal<Red> {
    // Red -> Green
    pub fn next(self) -> TrafficSignal<Green> {
        TrafficSignal::transition()
    }
}

impl TrafficSignal<Green> {
    // Green -> Yellow
    pub fn next(self) -> TrafficSignal<Yellow> {
        TrafficSignal::transition()
    }
}

impl TrafficSignal<Yellow> {
    // Yellow -> Red
    pub fn next(self) -> TrafficSignal<Red> {
        TrafficSignal::transition()
    }
}

impl TrafficSignal<Fault> {
    pub fn clear_fault(self) -> TrafficSignal<Red> {
        TrafficSignal::transition()
    }

    pub fn initial() -> TrafficSignal<Fault> {
        TrafficSignal::transition()
    }
}

Since typestate uses the type system, we can see which state we are transitioning from and which we are transitioning to by looking at the types indicated for the generic parameters.

Using Red -> Green as an example:

//           from `Red`
impl TrafficSignal<Red> {
    //                             to `Green`
    pub fn next(self) -> TrafficSignal<Green> {
        TrafficSignal::transition()
    }
}

What does this mean?

For all TrafficSignal with the Red state, we will have a distinct function named next available. next returns a new TrafficSignal which has the state of Green.

This satisfies our transition requirement of Red -> Green.

API usage in this example

Now the TrafficSignal is ready to go, so let's try it out:

fn main() {
    // TrafficSignal starts in a fault state
    // and must be manually cleared.
    let signal: TrafficSignal<Fault> = TrafficSignal::initial();
    let signal: TrafficSignal<Red> = signal.clear_fault();

    // Distinct `next` functions transition from one
    // state to the next:
    let signal: TrafficSignal<Green> = signal.next();
    let signal: TrafficSignal<Yellow> = signal.next();
    let signal: TrafficSignal<Red> = signal.next();

    // We don't need to use any type annotations. Rust
    // keeps track of the state for us:
    let signal = signal.next(); // green
    let signal = signal.next(); // yellow
    let signal = signal.next(); // red
}

Since we defined the initial function on the Fault state, the TrafficSignal always begins in a Fault state. Next the fault gets cleared which transitions the signal to Red. We can then call next to drive operation of the traffic signal.

The type annotations are optional since Rust tracks which types are in use at any point in time. From a development perspective, I think omitting the type annotations result in cleaner code, at least in this case.

More data

So far our states have contained either no items (traffic signal) or 1 item (text formatter).

There are situations which require unique information on a per-state basis, along with persistent data available at all times. The generic typestate pattern is able to handle these situations as well, which we will be exploring in the next example.

Example 3: Chain of custody

In this final example, we'll use the typestate pattern to model product fulfillment, and track the steps a product takes from the warehouse through delivery.

Each step has a distinct set of information attached, which gets logged to produce a chain of custody shared across all states.

Planning

We will start at step 1 and transition through the list in order:

Queued: order for a product was placed
Picking: employee is looking for product
Loading: product is being loaded into truck
OutForDelivery: product is on the way to destination
Delivered: product was delivered
Finalized: process finished

Setup

Just like the traffic signal example, we will be using generic typestate with a trait:

trait PackageState {
    /// Information to be displayed when printing the audit.
    fn audit(&self) -> String;
}

This time around we have some functionality that we need to implement to generate an audit.

We'll write a quick macro to take care of that for us using Debug formatting:

macro_rules! impl_package_state {
    ($($state:ident),+) => {
        $(
        impl PackageState for $state {
            fn audit(&self) -> String {
                format!("{self:?}")
            }
        }
        )+
    };
}

Next we need our states:

// type aliases for clarity
type Name = &'static str;
type Address = &'static str;
type Date = &'static str;

#[derive(Debug)]
struct Queued;

#[derive(Debug)]
struct Picking {
    picker: Name,
}

#[derive(Debug)]
struct Loading {
    truck_id: u32,
}

#[derive(Debug)]
struct OutForDelivery {
    driver: Name,
    address: Address,
    truck_id: u32,
}

#[derive(Debug)]
struct Delivered {
    date: Date,
}

#[derive(Debug)]
struct Finalized;

impl_package_state!(
    Queued,
    Picking,
    Loading,
    OutForDelivery,
    Delivered,
    Finalized
);

As mentioned earlier, some of the states have additional information associated with them, and this information gets saved into the chain of custody as we transition between states.

Now we can create a generic Package struct which contains the chain of custody and an item_number applicable to all states:

struct Package<S: PackageState> {
    item_number: u32,
    custody: Vec<Rc<dyn PackageState>>,
    state: Rc<S>,
}

We store the current state of the package in the state field, so there is no need for PhantomData tricks this time around.

The custody is a collection of trait objects containing the current state and previous states that the Package had.

Instead of the usual Box<dyn Trait> we are opting for Rc<dyn Trait>. The reason for this is to allow state to also exist in custody.

So when iterating over custody, the current state is available along with the historical states.

Note: If you need a refresher on Rc, check out this article in the official Rust docs/book.

Transitioning between states

Here comes the fun part: transitioning between states.

In the text formatting example, we passed the state as arguments to a function when transitioning. This wasn't an issue because there was just one piece of information. Also, the traffic light example didn't have any extra information other than the markers, so there was no issues there either.

This time though, we have the item_number and custody fields that need to move between all states, and each state has it's own set of information...

We could use a bunch of function parameters and duplicate the signatures for each state, and then add state-exclusive information on top of that, but that's a pretty bad solution and is a maintenance nightmare.

Instead, we can provide a generic transition function similar to the traffic light example, and handle the movement of data there.

We'll start with just the impl block and function signature, and then fill it in after we understand what we need to do:

impl<S: PackageState> Package<S> {
    fn transition<N: PackageState + 'static>(self, next: N) -> Package<N> {
      /* snip */
    }
}

So let's break this down.

We have a generic implementation block using the S parameter which represents the current state. The implementation is for Package<S>, so everything in this block applies to all Package, regardless of the state it's in.

The transition function introduces another generic parameter N. This is the next state that we will be transitioning to.

In addition to the PackageState constraint, N must also be 'static. This means that N will live at least as long as PackageState. If we don't put the 'static bound, then it may be possible that the N gets dropped before Package, which is a problem because we want to save N into the chain of custody inside Package.

The return type for transition is Package<N> which will create a new Package having the next state (N). So we will need to create a new Package in this function and set the state field of that Package to next.

Let's finish up this function:

impl<S: PackageState> Package<S> {
    fn transition<N: PackageState + 'static>(self, next: N) -> Package<N> {
        let mut custody = self.custody;
        let next = Rc::new(next);
        custody.push(next.clone());

        Package {
            item_number: self.item_number,
            custody,
            state: next,
        }
    }
}

The body of the function isn't too spectacular.

Since transition takes ownership of self, we can make the custody data mutable. We then take the next state and turn it into a trait object behind an Rc and then put a copy of that into custody. This is how the chain of custody gets managed.

The last part of the function makes a new Package and sets the fields with the appropriate data. Whatever type the state field is will also be the type of Package that gets returned.

For example

If we call transition with next being the Loading state, then we will get a return type of Package<Loading> because the state field has the Loading type.

Let's finish up this impl block with a function to visualize the chain of custody. We put it here so we can get a chain of custody at any point in the process:

impl<S: PackageState> Package<S> {
    pub fn chain_of_custody(&self) -> String {
        self.custody
            .iter()
            .enumerate()
            .map(|(i, ev)| {
                // List each state starting from 1
                let i = i + 1;
                format!("{i}: {}", ev.audit())
            })
            .collect::<Vec<_>>()
            .join("\n")
    }
}

Transition functions

Next up are the transition functions.

Starting from the first step in the process, Queued:

impl Package<Queued> {
    pub fn new(item_number: u32) -> Package<Queued> {
        Package {
            item_number,
            custody: vec![Rc::new(Queued)],
            state: Rc::new(Queued),
        }
    }

    pub fn pick(self, picker: Name) -> Package<Picking> {
        self.transition(Picking { picker })
    }
}

The Queued state has both a transition function (pick) and the new function. We include the new function on this implementation so we can use Package::new() to get the process started.

This function sets the state to the first state (Queued) and also puts the first item into the chain of custody field.

The pick function runs the transition function that we wrote and sets the next state to Picking. We also include the name of the picker as additional information required by the Picking state.

The remaining transition functions follow the same pattern as the pick function:

Take self as the first parameter
Maybe have extra parameters needed for transitioning to the next state
Call self.transition() with the next state

Here is the code for the transitions:

impl Package<Picking> {
    pub fn load(self, truck_id: u32) -> Package<Loading> {
        let next_state = Loading { truck_id };
        self.transition(next_state)
    }
}

impl Package<Loading> {
    pub fn deliver_to(self, driver: Name, address: Address) -> Package<OutForDelivery> {
        let truck_id = self.state.truck_id;
        self.transition(OutForDelivery {
            driver,
            address,
            truck_id,
        })
    }
}

impl Package<OutForDelivery> {
    pub fn delivered(self, date: Date) -> Package<Delivered> {
        self.transition(Delivered { date })
    }
}

impl Package<Delivered> {
    pub fn finalize(self) -> Package<Finalized> {
        self.transition(Finalized)
    }
}

In the Loading state, the truck_id comes from the current state's fields.

Remember, when we use generics we are creating a distinct type made up of the generic elements. So when the current state is Loading, we have access to a truck_id field on the state.

You can imagine that the data structure looks something like this:

struct Package__Loading {
    item_number: u32,
    custody: Vec<Rc<dyn PackageState>>,
    state: Rc<Loading>, // contains truck_id field
}

struct Loading {
    truck_id: u32,
}

API usage

We can now take a look at the API:

fn main() {
    // Type annotations included to see transitions:
    let pkg: Package<Queued> = Package::new(123);
    let pkg: Package<Picking> = pkg.pick("Sandra");
    let pkg: Package<Loading> = pkg.load(8);
    let pkg: Package<OutForDelivery> = pkg.deliver_to("Gary", "Alaska");
    let pkg: Package<Delivered> = pkg.delivered("Friday");
    let pkg: Package<Finalized> = pkg.finalize();

    println!("{}", pkg.chain_of_custody());

    // No annotations:
    let pkg = Package::new(123);
    let pkg = pkg.pick("Sandra");
    let pkg = pkg.load(8);
    let pkg = pkg.deliver_to("Gary", "Alaska");
    let pkg = pkg.delivered("Friday");
    let pkg = pkg.finalize();
}

When using the typestate pattern, it's possible to chain state transitions for a more ergonomic API:

fn main() {
    // Bonus - chained state transitions:
    let pkg = Package::new(123)
        .pick("Sandra")
        .load(8)
        .deliver_to("gary", "alaska")
        .delivered("Friday")
        .finalize();
}

These chained transitions are inherent to the typestate pattern, and you can use chaining in the other examples as well, but if you have any borrowed data in your states, then lifetime problems can occur.

If this happens, then using another let binding should clear up any problems.

The output for the program when using pkg.chain_of_custody() on a finalized delivery:

1: Queued
2: Picking { picker: "Sandra" }
3: Loading { truck_id: 8 }
4: OutForDelivery { driver: "Gary", address: "Alaska", truck_id: 8 }
5: Delivered { date: "Friday" }
6: Finalized

Since the chain_of_custody function implementation was on Package<S>, calling it at any time will result in the current chain of custody at that moment in time.

Now its your turn...

As you can see, the typestate pattern offers an ergonomic way to guarantee that data flows through your program in the order you define, without you having to try and keep track of it on your own.

The question now is, how are you going to use this in your own Rust projects? Maybe to help build your own full-stack Twitter clone?

Whatever you decide to build, if you're a member of our private Discord community, be sure to let me know and share your projects with us in the dedicated Rust channel.

And if you're not a member, or find yourself getting stuck with setting any this up, or jut want to get better at developing with Rust, then be sure to check out my Complete Rust Developer course.

You'll learn how to code and build your own real-world applications using Rust so that you can get hired this year. No previous programming or Rust experience needed, and you can check out the first few lessons for free - no card needed.