3. Pointer / memory related features

As rust has no garbage collector, programmer needs to allocate memory manually.

Table of contents

3.1 Memory in Computer

In operating system, memory (RAM) is divided into 5 parts (regions, segments): Text (code), bss, data, heap and stack. Those manage memory in runtime.

ref: data segment image from wikipedia
  1. Text (code)
    Text stores executable code, generally fixed size.

  2. Data
    Data stores initialized global variables and static variables.

  3. BSS (Block Started by Symbol)
    BSS generally depends on Programming language, however, like C programming language, manages unintialized static variable.

  4. Stack
    Stack is a LIFO (last-in first-out) data structure, using a mechanism like a piling up pancakes on a plate. Stack manages function calls, local variables, function parameters, and return addresses. Stack pointer is a special register in the CPU keeps track of the top of the stack, pointing the current memory address where new data can be added(malloc), also deleting(free) the topmost data easily.
    • When a function is called, a new stack frame is created on the stack to hold its local variables, parameters and return address.
    • For a memory allocation, compiler analyzes the code to determine the size of each function’s stack frame. Next, compiler generates instructions to adjust the stack pointer, allocating and dellocating space as needed.
    • function calls and returns automatically add (push) and remove (pop) stack frames, ensuring efficient memory usage.
  5. Heap
    Heap is a kind of less organized region of memory, which can be manually directed by user. In the sense of automatic managing, many languages support garbage collection in these days. Rust does not have a traditional garbage collector. Instead, it relies on ownership and borrowing. Rust also supports pointers like in C or C++ for user’s memory manipulation.

3.2. Owenership

Ownership is one of the unique features in rust for handling memory not only within a code block (scope) but also between code blocks.

the book suggests three ownership rules as follows:

  1. Each value in Rust has an owner.
  2. There can only be one owner at a time.
  3. When the owner goes out of scope, the value will be dropped.

Because of axiom #2 and #3, programmers who are familiar with other languages could have in trouble.

For instance, in the case of python, the following would work.

# python
if __name__ == "__main__":
    x = "hello"
    y = x
    print((id(x) == id(y))) # you can see True, they indicates the same memory address
    print(y) # you can get 'hello'
    print(x) # you can get 'hello'

with checking ids of x and y are equal, we can see x and y indicates the same address.

however, in rust programming, the following rust code which is similar to the above code won’t be compiled.

fn main(){
    let x = String::from("hello");
    let y = x;
    println!("{}", y);
    println!("{}", x); // compile error because x has no ownership of value
}

It seems that y and x “shares” the same address, however, it is not. As I mentioned first, in dealing with heap memory, rust basically moves the ownership from x to y. As y has got ownership of the string value “hello”, x cannot access the value until it retrieved ownership back or get new value assigned.

3.3. Pointers

To solve the above ownership problem, we can also use other options : use memory address!

3.3.1. Reference

The foremost option is that borrow x’s value with reference.
Here is an example of borrowing that y borrows the value from x, and return back automatically when y is called.

fn main(){
    let x = String::from("hello");
    let y = &x; // y borrows (refers) the value of x; 
    println!("{}", y);
    println!("{}", x); // no error: Still, x is owner of "hello"
}

This & operator is for referring to x, or borrowing x address. It seems that reference is the same as the pointer in C, however references are always valid and automatically managed by Rust’s borrow checker. They cannot outlive the data they brought and cannot be null.

You can see that those exmaples are based on string type because string in rust works allocating the memory based on move, not copy or clone. If you use the same examples but int allocation, there is no error since the basic trait of allocation in int is value shallow copying.

The following code which seems like simply adding a dereference operator * could make you a little bit crazy.

fn main() {
    let x = String::from("hello"); // we know, x is a string value "hello"; String
    let y = &x; // y borrows (refers) the value of x; &String
    let z = &*x; // (*x) is dereference of x, which is inner string (or string slice) : str 
                 // and then we refer &(*x), which is reference of string slice : &str

    println!("{}", y);
    println!("{}", x);
    println!("{}", z);
}

I added the explanation of x, y and z in the code above with inline comment.

3.3.2. Raw Pointer

Still, C-like pointer is also supported in Rust, named as raw pointer.

Raw pointers in Rust are necessary for certain tasks where you need to interact with low-level code, such as interfacing with C libraries, implementing low-level data structures, or performing certain operations that can’t be expressed safely within Rust’s safe abstraction.

Here are some scenarios where raw pointers are useful:

  1. Interfacing with C Code: Rust often needs to interact with C libraries, which typically use raw pointers extensively. Rust’s FFI (Foreign Function Interface) allows you to call functions from C libraries and vice versa, and raw pointers are often used to pass data between Rust and C code.

  2. Unsafe Operations: Some operations inherently require unsafe behavior, such as dereferencing a pointer to arbitrary memory or performing low-level memory manipulation. While these operations should be avoided whenever possible, there are situations where they’re necessary for performance reasons or to implement certain algorithms.

  3. Unsafe Abstractions: Sometimes, you may need to implement your own safe abstractions that rely on unsafe operations internally. While the interface exposed to the user remains safe, the implementation may use raw pointers or other unsafe constructs to achieve certain behaviors efficiently.

  4. Low-Level Data Structures: Implementing low-level data structures like linked lists, trees, or graphs may require direct manipulation of memory addresses, which is facilitated by raw pointers. While Rust’s standard library provides safe abstractions for common data structures, there are cases where custom implementations are necessary or desirable.

  5. Embedded Systems and Systems Programming: In systems programming and embedded systems development, you often need precise control over memory layout and low-level hardware interactions. Raw pointers allow you to express such operations safely within the context of an unsafe block.

It’s important to note that while raw pointers are a powerful tool, they come with significant responsibility. Rust’s safety guarantees are designed to prevent common programming errors and security vulnerabilities, and bypassing these guarantees with raw pointers can introduce bugs, crashes, or security vulnerabilities if used incorrectly.

fn main() {
    let x = 42;

    // Reference to x
    let reference = &x;
    println!("Reference: {}", reference);

    // Raw pointer to x
    let raw_ptr: *const i32 = &x as *const i32;
    unsafe {
        println!("Raw pointer: {}", *raw_ptr);
    }
}

3.3.3. Smart Pointer

Smart pointers in Rust are data structures that not only hold a value but also contain metadata and provide additional functionality beyond what regular pointers offer. They enforce various safety guarantees at compile time, ensuring memory safety and preventing common programming errors.

One of the most commonly used smart pointers in Rust is Box<T>. It allows you to allocate values on the heap rather than the stack and provides ownership semantics like any other value in Rust. Here’s a brief overview of Box<T>:

  • Box<T>: Box is a smart pointer that owns the data it points to and is stored on the heap (if you’re C++ programmer it resembles unique_ptr but not nullable.). It’s used when you need to have a value with a known size at compile time but don’t know the precise size until runtime, or when you want to transfer ownership of a value across scopes or threads.

Here’s a simple example of Box<T>:

fn main() {
    let x = Box::new(42); // Allocate an integer on the heap
    println!("Value: {}", x); // Print the value stored in the Box
}

In addition to Box<T>, Rust provides other smart pointers like Rc<T> and Arc<T> for shared ownership (shared_ptr in C++), Cell<T> and RefCell<T> for interior mutability, and Mutex<T> and RwLock<T> for synchronization, among others. Each smart pointer type has its own characteristics and use cases, allowing you to choose the appropriate one based on your requirements.

Smart pointers enable you to write safer, more expressive code by encapsulating complex memory management logic and providing clear ownership semantics.

3.4. Lifetimes

The following’s a code from the book.

fn longest(x: &str, y:&str) -> &str {
    if x.len() >= y.len() {
        return x;
    }
    else {
        return y;
    }
}

fn main() {
    let string1 = String::from("abcd");
    let string2 = "xyz";

    let result = longest(string1.as_str(), string2);
    println!("The longest string is {}", result);
}

If you compile the code above, you will get an error

error[E0106]: missing lifetime specifier
 --> src/main.rs:1:32
  |
1 | fn longest(x: &str, y:&str) -> &str {
  |               ----    ----     ^ expected named lifetime parameter
  |
  = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `x` or `y`
help: consider introducing a named lifetime parameter
  |
1 | fn longest<'a>(x: &'a str, y:&'a str) -> &'a str {
  |           ++++     ++         ++          ++

For more information about this error, try `rustc --explain E0106`.

what is a lifetime?

As we mentioned in 3.1.3. reference, rust compiler prevent the reference outlive the data by using borrow checker : so to speak, try to prevent dangling pointer.

With this function of references signatures, rust compiler worry about the rest of reference. If x is returned, what about y? y could become a dangling pointer. vice versa.

Therefore, we need to notify to compiler (actually for ourselves) that the rest reference will be terminated in the same lifetime of the picked (returned) one.

In function longest, returning a reference to x would invalidate y (become a dangling reference) if x was longer. Lifetimes prevent this by making the references’ validity explicit.

The fixed version is as follows:

fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() >= y.len() {
        return x;
    } else {
        return y;
    }
}