Lab 4: Shell and Bootloader Phase 2

Feb 11, 2025    m. Feb 12, 2025    #lab  

Getting Started

Fetch the update for lab 4 from our git repository to your development machine.

git fetch skeleton
git merge skeleton/lab4

You may need to resolve conflicts before continuing. For example, if you see a message that looks like:

Auto-merging kern/src/main.rs
CONFLICT (content): Merge conflict in kern/src/main.rs
Automatic merge failed; fix conflicts and then commit the result.

You will need to manually modify the relevent files to resolve the conflict. Ensure that you keep all of your changes from lab 3.
Once all conflicts are resolved, add the resolved files with git add and commit.
For more information on resolving merge conflicts, see this tutorial on githowto.com .

Phase 2: Not a Seashell

Due to Raspberry Pi devices being unavailable at the moment, we have to skip the implementation parts of Subphase B and C.
The template code already has the implementation ready. Please read through them to understand the code though.

In this phase, you will be implementing drivers for the built-in timer, GPIO, and UART devices. You’ll use then these drivers to implement a simple shell. In the next phase, you’ll use the same drivers to implement a bootloader.

What’s a driver?

The term driver, or device driver, describes software that directly interacts with and controls a hardware device. Drivers expose a higher-level interface to the hardware they control. Operating systems may interact with device drivers to expose an even higher-level interface. For instance, the Linux kernel exposes ALSA (Advanced Linux Sound Architecture), an audio API, which interacts with device drivers that in-turn interact directly with sound cards.

Subphase A: Getting Started

Project Structure

Let’s recall the repository structure we saw previously.

.
├── ...
├── boot : bootloader *
├── kern : the main os kernel *
└── lib  : required libraries
├── pi *
├── shim
├── stack-vec *
├── ttywrite *
├── volatile *
└── xmodem *

All the libraries used by boot and kernel are located under the lib directory.

shim library selectively depends on either std or no_std library. With #[cfg(feature = "no_std")] specified, shim makes use of core_io and the custom no_std module which has minimum library we need such as ffi, path and sync. Otherwise, mostly in the test code, shim just uses std library.

pi subdirectory contains all of your driver code. The pi library makes use of the volatile library. It also depends on the shim library.

boot and kernel make use of the pi library to communicate with hardware. They also depend on shim. In addition to that, boot also depends on the xmodem library, and kernel depends on the stack-vec library. The volatile library has no dependencies. The diagram below illustrates these relationships:

Kernel

The kern directory contains the code for the operating system kernel: the core of your operating system. Calling make inside this directory builds the kernel. The build output is stored in the build/ directory. To run the kernel, use make qemu command. Please refer the Tools page to find details about our Makefile.

At present, the kernel does absolutely nothing. By the end of this phase, the kernel will start up a shell which you can communicate with.

As we saw above, the kernel crate depends on the pi library. As a result, you can use all of the types and items from the pi library in the kernel.

Documentation

While writing your device drivers, you’ll want to keep the BCM2837 ARM Peripherals Manual open.

Subphase B: System Timer

Due to Raspberry Pi devices being unavailable at the moment, we have to skip the implementation parts of this Subphase.
The template code already has the implementation ready. Please read through the following parts and understand the code though.

In this subphase, you will write a device driver for the ARM system timer. You will primarily be working in lib/pi/src/timer.rs and kern/src/main.rs. The ARM system timer is documented on page 172 (section 12) of the BCM2837 ARM Peripherals Manual .

Start by looking at the existing code in lib/pi/src/timer.rs. In particular, note the relationship between the following sections:

const TIMER_REG_BASE: usize = IO_BASE + 0x3000;

#[repr(C)]
struct Registers {
    CS: Volatile<u32>,
    CLO: ReadVolatile<u32>,
    CHI: ReadVolatile<u32>,
    COMPARE: [Volatile<u32>; 4]
}

pub struct Timer {
    registers: &'static mut Registers
}

impl Timer {
    pub fn new() -> Timer {
        Timer {
            registers: unsafe { &mut *(TIMER_REG_BASE as *mut Registers) },
        }
    }
}

The one line of unsafe in this program is very important: it casts the TIMER_REG_BASE address to a *mut Registers and then casts that to an &'static mut Registers. We are telling Rust that we have a static reference to a Registers structure at address TIMER_REG_BASE.

What is at the TIMER_REG_BASE address? On page 172 of the BCM2837 ARM Peripherals Manual , you’ll find that 0x3000 is the peripheral offset for the ARM system timer. Thus, TIMER_REG_BASE is the address at which the ARM system timer registers start! After this one line of unsafe, we can use the registers field to access the timer’s registers safely. We can read the CLO register with self.registers.CLO.read() and write the CS register with self.registers.CS.write(), then combine them together to represent the number of elapsed microseconds.

Why can’t you write to CLO or CHI? (restricted-reads)

The BCM2837 documentation states that the CLO and CHI registers are read-only. Our code enforces this property. How? What prevents us from writing to CLO or CHI?

What exactly is unsafe?

In short, unsafe is a marker for the Rust compiler that you’re taking control of memory safety: the compiler won’t protect you from memory issues. As a result, in unsafe sections, Rust lets you do anything you can do in C. In particular, you can cast between types with more freedom, dereference raw pointers, and fabricate lifetimes.

But note that unsafe is very unsafe. You must ensure that everything you do in an unsafe section is, in fact safe. This is more difficult than it sounds, especially when Rust’s idea of safe is much stricter than in other languages. As such, you should try not to use unsafe at all. For operating systems, unfortunately, we must use unsafe so that we can directly speak to hardware, but we’ll typically limit our use to one line per driver.

If you want to learn more about unsafe, read Chapter 1 of the Nomicon .

Implement the Driver

Implement the Timer::read(), current_time(), and spin_sleep() in lib/pi/src/timer.rs. The signatures on these items indicate their expected functionality. You’ll need to read the timer’s documentation in the BCM manual to implement Timer::read(). In particular, you should understand which registers to read to obtain the timer’s current u64 value. You can build the pi library with cargo build. You can also use cargo check to type-check the library without actually compiling it.

You’ll find the core::time::Duration page useful.

Testing Your Driver

Let’s test your driver by ensuring that spin_sleep() is accurate. We’ll write the code to do this in kern/src/main.rs.

Copy your LED blinky code from phase 4 of lab 1 into main.rs. Instead of the for loop based sleep function, use your newly written spin_sleep() function with Duration to pause between blinks. Compile the kernel, load it onto the MicroSD card as kernel8.img, and then run it on the Raspberry Pi. Ensure that the LED blinks at the frequency that you intended it to. Try other pause times and ensure that they all work as expected. Until you write the bootloader in phase 3, you’ll need to keep swapping the MicroSD card between the Pi and your computer to try out different binaries.

If your timer driver is working as expected, proceed to the next subphase.

Subphase C: GPIO

Due to Raspberry Pi devices being unavailable at the moment, we have to skip the implementation parts of this Subphase.
The template code already has the implementation ready. Please read through the following parts and understand the code though.

In this subphase, you will write a generic, pin-independent device driver for GPIO. You will primarily be working in lib/pi/src/gpio.rs and kern/src/main.rs. The GPIO subsystem is documented on page 89 (section 6) of the BCM2837 ARM Peripherals Manual .

State Machines

All hardware devices are state machines : they begin at a predetermined state and transition to different states based on explicit or implicit inputs. The device exposes different functionality depending on which state it is in. In other words, only some transitions are valid in some states. Importantly, this implies that some transitions are invalid when the device is in a given state.

Most programming languages make it impossible to faithfully encode the semantics of a state machine in hardware, but not Rust! Rust lets us perfectly encode state machine semantics, and we’ll take advantage of this to implement a safer-than-safe device driver for the GPIO subsystem. Our driver will ensure that a GPIO pin is never misused, and it will do so at compile-time.

Below is the state diagram for a subset of the GPIO state machine for a single pin:

GPIO State Diagram

Our goal is to encode this state machine in Rust. Let’s start by interpreting the diagram:

Which transitions did you follow in your lab 1 blinky? (blinky-states)

When you implementing the blinky code in phase 4 of lab 1, you implicitly implemented a subset of this state machine. Which transitions did your code implement?

We’ll use Rust’s type system to ensure that a pin can only be SET and CLEARed if it has been transitioned to the OUTPUT state and the LEVEL read if it is in the INPUT state. Take a look at the declaration for the GPIO structure in lib/pi/src/gpio.rs:

pub struct Gpio<State> {
    pin: u8,
    registers: &'static mut Registers,
    _state: PhantomData<State>
}

The structure has one generic argument, State. Except for PhantomData, nothing actually uses this argument. This is what PhantomData is there for: to convince Rust that the structure somehow uses the generic even though it otherwise wouldn’t. We’re going to use the State generic to encode which state the Gpio device is in. Unlike other generics, we must control this parameter and ensure that a client can never fabricate it.

The state! macro generates types that represent the states a Gpio can be in:

states! {
    Uninitialized, Input, Output, Alt
}

// Each parameter expands to an `enum` that looks like:
enum Input { }

This is also weird; why would we create an enum with no variants? enum’s with no variants have a nice property: they can never be instantiated. In this way, these types act purely as markers. No one can ever pass us a value of type Input because such a value can never be constructed. They exist purely at the type-level.

We can then implement methods corresponding to valid transitions given that a Gpio is in a certain state:

impl Gpio<Output> {
    /// Sets (turns on) the pin.
    pub fn set(&mut self) { ... }

    /// Clears (turns off) the pin.
    pub fn clear(&mut self) { ... }
}

impl Gpio<Input> {
    /// Reads the pin's value.
    pub fn level(&mut self) -> bool { ... }
}

This ensures that a Gpio can only be set and cleared when it is a Gpio<Output> and its level read when it is a Gpio<Input>. Perfect! But how do we actually transition between states? Hello, Gpio::transition()!

impl<T> Gpio<T> {
    fn transition<S>(self) -> Gpio<S> {
        Gpio {
            pin: self.pin,
            registers: self.registers,
            _state: PhantomData
        }
    }
}

This method lets us transition a Gpio from any state to any other state. Given a Gpio in state T, this method returns a Gpio in state S. Note that it works for all S and T. We must be very careful when calling this method. When called, we are encoding the specification of a transition in the state diagram. If we get the specification or encoding wrong, our driver is wrong.

To use the transition() method, we need to tell Rust which type we want as an output S in Gpio<S>. We do this by giving Rust enough information so that it can infer the S type. For instance, consider the implementation of the into_output method:

pub fn into_output(self) -> Gpio<Output> {
    self.into_alt(Function::Output).transition()
}

This method requires its return type to be Gpio<Output>. When the Rust type system inspects the call to transition(), it will search for a Gpio::transition() method that returns a Gpio<Output> to satisfy the requirement. Since our transition method returns Gpio<S> for any S, Rust will replace S with Output and use that method. The result is that we’ve transformed our Gpio<Alt> (from the into_alt() call) into a Gpio<Output>.

What would go wrong if a client fabricates states? (fake-states)

Consider what would happen if we let the user choose the initial state for a Gpio structure. What could go wrong?

Why is this only possible with Rust?

Notice that the into_ transition methods take a Gpio by move. This means that once a Gpio is transitioned into a another state, it can never be accessed in the previous state. Rust’s move semantics make this possible. As long as a type doesn’t implement Clone, Copy, or some other means of duplication, there is no coming back from a transition. No other language, not even C++, affords us this guarantee at compile-time.

Implement the Driver

Implement the unimplemented!() methods in lib/pi/src/gpio.rs. The signatures on these items indicate their expected functionality. You’ll need to read the GPIO documentation (page 89, section 6 of the BCM2837 ARM Peripherals Manual ) to implement your driver. Remember that you can use cargo check to type-check the library without actually compiling it.

Testing Your Driver

We’ll again write code in kern/src/main.rs to ensure that our driver works as expected.

Instead of reading/writing to raw memory addresses, use your new GPIO driver to set and clear GPIO pin 16. Your code should get a lot cleaner. Compile the kernel, load it onto the MicroSD card as kernel8.img, run it on the Raspberry Pi, and ensure your LED blinks as before.

Now, connect more LEDs to your Raspberry Pi. Use GPIO pins 5, 6, 13, 19, and 26. Refer to the pin numbering diagram from assignment 0 to determine their physical location. Have your kernel blink all of the LEDs in a pattern of your choice.

Which pattern did you choose? (led-pattern)

What pattern did you have your LEDs blink in? If you haven’t yet decided, one fun idea is to have them imitate a “loading spinner” by arranging the LEDs in a circle and turning them on/off in a sequential, circular pattern.

spinner

Once your GPIO driver is working as expected, proceed to the next subphase.

Subphase D: UART

In this subphase, you will write a device driver for the mini UART device on the Raspberry Pi. You will primarily be working in lib/pi/src/uart.rs and kern/src/main.rs. The mini UART is documented on page 8 and page 10 (sections 2.1 and 2.2) of the BCM2837 ARM Peripherals Manual .

UART: Universal Asynchronous RX/TX

A UART , or universal asynchronous receiver-transmitter, is a device and serial protocol for communicating over two wires. These are the two wires (rx/tx) that you used in phase 1 of lab 0 to connect the UART device on the CP2102 USB module to the UART device on the Pi. You can send any kind of data over UART: text, binaries, images, anything! As an example, in the next subphase, you’ll implement a shell by reading from the UART device on the Pi and writing to the UART device on the CP2102 USB module. In phase 3, you’ll read from the UART on the Pi to download a binary being sent via the UART on the CP2102 USB module.

The UART protocol has several configuration parameters, and both the receiver and transmitter need to be configured identically to communicate. These parameters are:

The mini UART on the Pi does not support parity bits and only supports 1 stop bit. As such, only the baud rate and data frame length need to be configured. To learn more about UART, see the Basics of UART Communication article.

Implement the Driver

At this point, you have all of the tools to write a device driver without additional background information (congratulations!).

Implement the mini UART device driver in lib/pi/src/uart.rs. You’ll need to complete the definition of the Registers structure. Ensure that you use the Volatile type with the minimal set of capabilities for each register: read-only registers should use ReadVolatile, write-only registers should use WriteVolatile, and reserved space should use Reserved. Then, initialize the device in new() by setting the baud rate to 115200 (a divider of 270) and data length to 8 bits. Finally, implement the remaining unimplemented!() methods and the fmt::Write, io::Read and io::Write traits for MiniUart.

You’ll need to write to the LCR, BAUD, and CNTL registers in new.

Use your GPIO driver from the previous subphase.

Testing Your Driver

Test your driver by writing a simple “echo” program in kern/src/main.rs: sit in a hot loop writing out every byte you read in. In pseudocode, this looks like:

loop {
    write_byte(read_byte())
}

If you want to work in a single tty, just use make qemu to get qemu output to stdio directly.

Alternatively, use make qemu_screen to run qemu over pty - a pseudo tty.
Then use screen /dev/<your-path> 115200 in another terminal to communicate over UART.
(Another terminal can be spawned using docker exec -it <container> bash).
screen sends every keypress over the TTY, so if your echo program works correctly, you’ll see every character you type.

Exit qemu using Ctrl-a , x. Similarly, exit screen using Ctrl-a, Ctrl-d.

It might help to send an extra character or two each time you receive a byte to convince yourself things are working as you expect:

loop {
    write_byte(read_byte())
        write_str("<-")
}

Once your driver works as expected, proceed to the next subphase.

Subphase E: The Shell

In this subphase, you’ll use your new UART driver to implement a simple shell that will be the interface to your operating system. You will be working in kern/src/console.rs, kern/src/shell.rs, and kernel/src/main.rs.

The Console

To write our shell, we’ll need some notion of a global default input and output. Unix and friends typically refer to this is as stdin and stdout; we’ll be calling it Console. Console will allow us to implement the kprint! and kprintln! macros, our kernel-space versions of the familiar print! and println!, and give us a default source for reading user input. We’ll use Console and these macros to implement our shell.

Take a peek at kernel/src/console.rs. The file contains an unfinished implementation of the Console struct. Console is a singleton wrapper around a MiniUart: only one instance of Console will ever exist in our kernel. That instance will be globally available, for use anywhere and by anything. This will allow us to read and write to the mini UART without explicitly passing around an instance of MiniUart or Console.

Global Mutability

The notion of a globally mutable structure is a scary thought, especially in the face of Rust. After all, Rust doesn’t allow more than one mutable reference to a value, so how can we possibly convince it to allow as many as we want? The trick, of course, relies on unsafe. The idea is as follows: we’ll tell Rust that we’re only going to read a value by using an immutable reference, but what we actually do is use unsafe to “cast” that immutable reference to a mutable reference. Because we can create as many immutable references as we want, Rust will be none the wiser, and we’ll have all of the mutable references we desire!

Such a function might look like this:

// This function must never exist.
fn make_mut<T>(value: &T) -> &mut T {
    unsafe { /* magic */ }
}

Your alarm bells should be ringing: what we’ve proposed so far is wildly unsafe. Recall that we still need to ensure that everything we do in unsafe upholds Rust’s rules. What we’ve proposed thus far clearly does not. As it stands, we’re violating the “at most one mutable reference at a time” rule. The rule states that at any point in the program, a value should have at most one mutable reference to it.

The key insight to maintaining this rule while meeting our requirements is as follows: instead of the compiler checking the rule for us with its borrow and ownership checker, we will ensure that the rule is upheld dynamically, at run-time. As a result, we’ll be able to share references to a structure as many times as we want (via an & reference) while also being able to safely retrieve a mutable reference when we need it (via our &T -> &mut T dynamic borrow checking function).

There are many concrete implementations of this idea. One such implementation ensures that only one mutable reference is returned at a time using a lock:

fn lock<T>(value: &T) -> Locked<&mut T> {
    unsafe { lock(value); cast value to Locked<&mut T> }
}

impl Drop for Locked<&mut T> {
    fn drop(&mut self) { unlock(self.value) }
}

This is known as Mutex in the standard library. Another way is to abort the program if more than one mutable reference is about to be created:

fn get_mut<T>(value: &T) -> Mut<&mut T> {
    unsafe {
        if ref_count(value) != 0 { panic!() }
        ref_count(value) += 1;
        cast value to Mut<&mut T>
    }
}

impl Drop for Mut<&mut T> {
    fn drop(&mut self) { ref_count(value) -= 1; }
}

This is RefCell::borrow_mut() . And yet another is to only return a mutable reference if it is known to be exclusive:

fn get_mut<T>(value: &T) -> Option<Mut<&mut T>> {
    unsafe {
        if ref_count(value) != 0 { None }
        else {
            ref_count(value) += 1;
            Some(cast value to Mut<&mut T>)
        }
    }
}

impl Drop for Mut<&mut T> {
    fn drop(&mut self) { ref_count(value) -= 1; }
}

This is RefCell::try_borrow_mut() . All of these examples implement some form of “interior mutability”: they allow a value to be mutated through an immutable reference. For our Console, we’ll be using Mutex to accomplish the same goal. Since the std::Mutex implementation requires operating system support, we’ve implemented our own Mutex in kern/src/mutex.rs. Our implementation is correct for now, but we’ll need to fix it when we introduce caching or concurrency to continue to uphold Rust’s rules. You don’t need to understand the Mutex implementation for now, but you should understand how to use one.

The global singleton is declared as CONSOLE in kern/src/console.rs. The global variable is used by the kprint! and kprintln! macros defined below below. Once you’ve implemented Console, you’ll be able to use kprint! and kprintln! to print to the console. You’ll also be able to use CONSOLE to globally access the console.

Rust also requires static globals to be Sync.

In order to store a value of type T in a static global, T must implement Sync. This is because Rust also guarantees data race safety at compile-time. Because global values can be accessed from any thread, Rust must ensure that those accesses are thread-safe. The Send and Sync traits, along with Rust’s ownership system, ensure data race freedom.

Why should we never return an &mut T directly? (drop-container)

You’ll notice that every example we’ve provided wraps the mutable reference in a container and then implements Drop for that container. What would go wrong if we returned an &mut T directly instead?

Where does the write_fmt call go? (write-fmt)

The _print helper function calls write_fmt on an instance of MutexGuard<Console>, the return value from Mutex<Console>::lock(). Which type will have its write_fmt method called, and where does the method implementation come from?

Implement and Test Console

implement all of the unimplemented!() methods in kern/src/console.rs. once you’ve implemented everything, use the kprint! and kprintln! macros in kern/src/main.rs to write to the console when you receive a character. you can use these macros exactly like print! and println!. use screen /dev/<your-path> 115200 to communicate with your pi and ensure that your kernel works as expected.

If this were C…

The fact that we get a println! implementation for free with zero effort is just another advantage to using Rust. If this were C, we’d need to implement printf ourselves. In Rust, the compiler provides a generic, abstracted, and safe OS-independent implementation. Whew!

Your Console implementations should be very short: about a line each.

Implement the Shell

Finished Product

You’re now ready to implement the shell in kern/src/shell.rs. We’ve provided a Command structure for your use. The Command::parse() method provides a simple command-line argument parser, returning a Command struct. The parse method splits the passed in string on spaces and stores all of the non-empty arguments in the args field as a StackVec using the passed in buf as storage. You must implement Command::path() yourself.

Use all of your available libraries (Command, StackVec, Console via CONSOLE, kprint!, kprintln!, and anything else!) to implement a shell in the shell function. Your shell should print the prefix string on each line it waits for input. In the GIF above, for instance, "> " is being used as the prefix. Your shell should then read a line of input from the user, parse the line into a command, and attempt to execute it. It should do this ad-infinitum. Since our operating system is only just beginning, we can’t run any interesting commands just yet. We can, however, build known commands like echo into the shell.

To complete your implementation, your shell should…

Test your implementation by calling your new shell() function in kern/src/main.rs. Minus the “SOS” banner, you should be able to replicate the GIF above. You should also be able to test all of the requirements we’ve set. Once your shell works as expected, revel in your accomplishments. Then, proceed to the next phase.

A byte literal, b'a' is the u8 ASCII value for a character 'a'.

Use \u{b} in a string literal to print any character with ASCII byte value b.

You must print both \r and \n to begin a new line at the line start.

To erase a character, backspace, print a space, then backspace again.

Use StackVec to buffer the user’s input.

You’ll find the core::str::from_utf8() function useful.

How does your shell tie the many pieces together? (shell-lookback)

Your shell makes use of much of the code you’ve written. Briefly explain: which pieces does it makes use of and in what way?