Explore the ownership system in Rust
Updated for Rust 1.0.
This guide is for a reader who knows basic syntax and building blocks of Rust but does not quite grasp how the ownership works.
We will start very simple, and then will gradually increase
complexity at a slow pace, exploring and discussing every new bit
of detail. This guide will assume a very
basic familiarity with let
, fn
, struct
, trait
and
impl
constructs.
Our goal is to learn how to write a new Rust program and not hit any walls related to ownership.
- After short Introduction
- we will learn about the Copy Traits, and then
- about the Immutable
- and Mutable ownership rules.
- Then we will look at the Power of Ownership system
- in Memory management,
- Reference counting
- and Concurrency.
Prerequisites - What you already know
Scope/stack based memory management is quite intuitive, because we are very familiar with it.
What happens to i
at the end of the main
function?
It goes out of scope and dies, right?
If we pass this i
to another function foo
, how
many times will it “die”?
Well, it will “die” twice. First, at the end of foo
,
and then at the end of main
. If you modify it in foo
,
it will not affect the value in main
.
The value gets copied at the call of foo(i)
.
In Rust, like in C++ (and some other languages), it is possible to use your own type instead of integer. The value will be allocated on current stack and it will be destroyed (the destructor will be called) when it goes out of scope.
However, the Rust compiler follows different ownership rules, unless
type implements a Copy
trait. Therefore we need to talk about the Copy
trait first, and get it out of the way.
Copy Trait
The Copy
trait makes your type to behave in a very familiar way:
the bits will be copied to another location when assigned, or when
used as a function argument. Exactly like a built-in integer.
For example, this simple struct will be copy-able by default:
Note that we had to tell the compiler that it is Copy
- otherwise
it would always be moved to another
location and would follow the ownership rules.
But we are actually interested in ownership, so from now on we will
concentrate on non-Copy
types!
Ownership
Ownership rules ensure, that at any point, for a single non-copyable value, there is only one owner that can change it.
Therefore, if a function is responsible for deleting this value, it can be sure that there are no other users that will try to access, change or delete it in future.
Let’s see some examples!
Say hello to Bob, our brave new dummy structure
To demonstrate how the data is moving around, we will create
a new struct and call it Bob
.
In Bob constructor new
, we will announce that it was created:
When Bob gets destroyed (sorry, Bob!), we will print its name
by implementing built-in Drop::drop
trait method:
And to make bob value format-able when outputing to console,
we will implement the built-in Debug::fmt
trait method:
Let’s put it to the Test!
When we create Bob in the main
function, we get a predictable
result:
new bob "A"
del bob "A"
OK, it got deleted somehow - but when exactly?
Let’s insert a “print” statement at the end of function:
new bob "A"
del bob "A"
end is near
It was deleted before the end of function. The return
value was not assigned to anything, so the compiler called our
drop
and destroyed the value right there.
What if we bind the returned value to a variable?
new bob "A"
end is near
del bob "A"
With let
, it was deleted at the end of function - at the
end of variable scope. So the compiler simply destroys
bound values at the end of their scope.
Destroyed Unless Moved
There is a catch though - the values can be moved somewhere else - and if they get moved, they won’t get destroyed!
How to move them? Well, simply pass them as values to another function.
Let’s pass our bob value to a function named black_hole
:
new bob "A"
imminent shrinkage bob "A"
del bob "A"
end is near
It got destroyed in the black hole, and not at the end of main
!
But wait… What happens if we try to send Bob
to the black hole
twice?
<anon>:33:16: 33:19 error: use of moved value: `bob`
<anon>:33 black_hole(bob);
^~~
<anon>:32:16: 32:19 note: `bob` moved here because it has type `Bob`, which is non-copyable
<anon>:32 black_hole(bob);
^~~
Simple! Compiler makes sure that we can not use moved values, and explains nicely what happened.
There is no Magic - just some rules
To implement “memory safety without garbage collection”, compiler does not need to go chasing your values around the code. It can decide what is destroyed in a function simply by looking at the function body.
You can easily do that too, if you know the rules. So far, we saw a few of them:
- Unused return values are destroyed.
- All values bound with
let
are destroyed at the end of the scope, unless they are moved.
Here you go, memory safety based on the fact that there can only be a single owner of a value.
However, so far we talked only about immutable let
binding -
the rules get slightly more complicated when the value
can be changed.
Mutable Ownership
All the owned values can be mutated: we just need to put them to
mut slot with let. For example, we can mutate some
part of bob, like a name
:
new bob "A"
del bob "mutant"
We created it with name “A”, but deleted a “mutant”.
If we give this value to another function mutate
, we can also
assign it to mut
slot there:
new bob "A"
del bob "mutant"
So, it is possible to make an owned value mutable at any time.
Useful to know: the function arguments can also be upgraded to mutable,
because they are also bindable slots that work the same way as a let
slot.
So function from previous example can be shortened:
Replacing a value in mutable slot
What happens if we try to overwrite a value in some mut
slot? Let’s see:
new bob "A"
before overwrite
new bob "B"
del bob "A"
after overwrite
before overwrite
new bob "C"
del bob "B"
after overwrite
del bob "C"
The old value gets deleted. The newly assigned value will be deleted at the end of scope - unless it is moved or overwritten again.
Mutable Ownership rules
So, there is one additional rule, for the mutable slots:
- Unused return values are destroyed.
- All values bound with
let
are destroyed at the end of the scope, unless they are moved. - Replaced values are destroyed.
Kind of obvious. The point is, in Rust, we are sure nothing else owns or references them - so it is possible to do that.
The power of Ownership system
These ownership rules might seem a tad limiting at first, but only because we are used to a different set of rules. They do not limit what is actually possible, they simply give us a different foundation for building higher-level constructions.
Some of these constructions are way harder to make safe in other languages. Even if they are made safe, they do not necessarily provide compile-time safety guarantees.
We will now overview some of them, available in the standard library.
Memory Allocation
So far we talked about integer-like values, that live on a stack.
Our test dummy Bob
was such a value. While some popular languages can also
keep values only on a stack (struct
in C#, or
value instantiation without new
in C++), many do not.
Instead, a newly constructed object instance (in many languages - with a new
operator) is created in what is called the heap memory.
The heap memory has some advantages. First, it is not limited by a stack size. Placing a huge structure on the a stack might simply overflow it. Second, its memory location does not change, unlike the location of a stack value. Every time a stack-allocated value is moved or copied, the actual bits need to be copied from one place of the stack to another. While it is very efficient for a small structure (the values are always “nearby”), it can become slower if the structure grows bigger.
Box solves this by moving our created value to the heap, while wrapping a small pointer to the heap location on the stack.
For example, we can create our Bob
in the heap memory like this:
new bob "A"
del bob "A"
The type of value bob
returned from Box::new is Box<Bob>
.
This generic type makes the Bob
lifecycle managed by this Box<Bob>
wrapper and deleted when the Box
is deleted.
Box
is not copyable, and follows the same ownership rules discussed
previously. When it reached the end of life on the stack, its destructor drop
was called, which subsequently called the drop
on the Bob
, as well
as cleaned up the memory on the heap.
The triviality of this implementation is a big deal. If we compare this
to the solutions in other languages, they mostly do one of the two things.
They either leave it up to you to clean up the memory (with some horrible
delete
statement someone will forget or call twice), or rely on
garbage collection to track memory pointers and
clean up memory when those pointers are no longer referenced.
In Rust, ownership tracking has no runtime penalty and is ensured to be
correct at compile-time. This simple memory deallocation over Box
builds directly on ownership tracking, is small, safe and quite often
sufficient.
When it is not sufficient, there are other tools that can help with that.
Reference Counting
Rust has enough low-level tools for reference counting to be implemented as a library. It can be used in rare cases when the value has several owners, therefore its end of life can not be determined statically at compile-time.
Rust has a better name for it: shared ownership.
The std::rc
library provides a way to share ownership of the
same value between different Rc
handles. The value remains alive
as long as there is least one handle for it.
For example, we can make a bob instance managed by Rc
handle this way:
new bob A
Rc(bob A)
del bob A
We can change our black_hole
function to accept Rc<Bob>
and check if it is
destroyed by it. But instead it would be more convenient to make it
accept any type T
that implements Debug
trait (so we can print it).
We are going to make it generic:
Works the same, and we will not need to change it for every new type change.
Now, back to sending Rc<Bob>
to the black hole!
new bob "A"
imminent shrinkage bob "A"
bob "A"
del bob "A"
Plot twist: happy ending! Bob survives the black hole!
Great! How does this work?
Once wrapped by Rc
handle, bob will live as long as there is a live Rc
clone
somewhere. Rc
handle internally uses Box
to place new value in heap memory,
together with reference count (RC).
Every time a new handle clone is created (by calling clone
on Rc
), the RC
is increased, and when it reaches end of life, decreased. When
RC reaches zero, the object itself is dropped and memory is deallocated.
Note, that Rc
above is not mutable. If the contents of Bob
need to be mutated,
it can be additionally wrapped in the RefCell
type which allows a mutable
borrow of a reference to our single bob instance. In the following example
it will be mutated it in the mutate
function.
new bob "A"
RefCell { value: bob "mutant" }
del bob "mutant"
The RefCell
is used to provide what is called the interior mutability.
It is just one of the tools in Rust toolbox to solve a specific problem.
So, the point is: different low-level utilities in Rust can be combined to achieve precisely what is needed with minimal overhead.
For example, Rc
can only be used in the same thread. But there is a
Arc
type for atomic RC usable between threads. A
mutable Rc
might create cycles when multiple objects reference each other.
However, Rc
can be cloned into a Weak
reference which does not participate
in reference-counting. More information can be found in the
official documentation.
Most importantly, more advanced memory management mechanisms can (and will) be implemented later, and they can be done as libraries.
Concurrency
It is interesting to see how Rust changes the way we work with threads. The default mode here is no data races. It is not because there are some special safety walls around threads, no. With Rust you can build your own threading library with similar safety properties, simply because the ownership model is in itself thread-safe.
Consider what happens when we send two values into a new Rust thread, a
Bob
(movable) and an integer (copyable):
new bob "A"
waiting for thread to end
From thread, bob "A" and 12!
del bob "A"
What is happening there? First, we create two values:
bob
and i
. Then we create a new thread with thread::spawn
and pass a closure for it to execute. This closure is going to
capture our variables bob
as i
.
Capturing means different things for Bob
and i
. Because the Bob
is
non-Copy
, it will be moved to the new thread. The i
will be copied
there. When the theead is running, we can modify original copy of i
(if needed). It does not influence the copy that was passed to the thread.
Bob
, however, is now owned by this new thread, and can not be modified unless
the thread returns it back somehow. If we wanted, we could
return it to the main thread over child.join()
(the join
waits for
the thread to finish).
new bob "A"
waiting for thread to end
bob "mutant"
del bob "mutant"
One could say that this does not change much the way we used to work with threads - we know not to share same memory location between threads without some kind of synchronisation. The difference here is that Rust can enforce these rules at compile-time.
Of course, more things are possible in Rust, for example,
the channels
can be used for sending and receiving data
between threads in more efficient ways. More is available in
official threading documentation,
channel documentation, and
the book.
What Else?
We got familiar with ownership system in Rust to the point where we almost seem comfortable to jump in, browse the docs, and create great and safe programs with it.
But the other side was glossed over completely: the borrowing system.
Initially, I was planning to write a two-parter, with second part about the borrowing. But honestly, there are already many resources about it so I no longer feel like continuing. Sorry!