Short intro to C++ for Rust developers: Ownership and Borrowing

Today, there was a reddit post that asked what one needs to know when Going after C++ with Rust basics. I thought this was an interesting question to answer in a blog post and revive my blog.

Since I got C++ job after learning Rust, I thought it would be interesting to write a summary how one would adapt to C++ with some prior Rust experience.

I would assume the reader already knows C++ syntax and features, and would be interested in how one would fit concepts to C++ from Rust world.

In this post, however, I could not fit everything I wanted to write, so I will focus on Ownership, Borrowing and Lifetimes.

Ownership and Moves

The big feature in Rust is Ownership, which means that non-primitive values are moved by default instead of being copied.

As an example, if we create a String in Rust and pass it to another function, it will be moved into that function and destroyed there.

fn foo(val: String) {
    // val destroyed here
}

fn main() {
    let val = String::from("Hello");
    foo(val);
    // accessing val here is compile-time error
}

Let’s look at the same code in C++:

#include <string>

using std::string;

void foo(string val) {
    // val is destroyed here
}

int main() {
    string val("Hello");
    foo(val);
    // accessing val here is fine, because we passed a copy to function
    // original val is destroyed here
}

You may be tempted to reduce copying in C++ too.

The C++ has this notion of lvalues versus rvalues.

In C++, lvalues are copied, while rvalues can be moved, if the type actually implements move operations (and I am glossing over a lot of details here).

There is a function in C++ std library that allows us to transform any lvalue to rvalue, called std::move.

So, we can modify our previous C++ program to behave similarly to Rust program and avoid unnecessary copy by wrapping val with std::move:

#include <string>

using std::string;

void foo(string val) {
    // val is destroyed here
}

int main() {
    string val("Hello");
    foo(std::move(val));
    // warning: accessing val here is NOT fine!
    // original val is also destroyed here, but contains no value so it's fine
}

Note that std::move does not actually move anything, it just changes how the compiler treats the value at this particular place. In this case, move works because std::string implements move operations.

In C++, it is possible to accidentally use moved value. Therefore, the move operations usually set the original container size to zero.

Therefore, a good practice in C++ is to avoid using move in the case like this, even if this means unnecessary deep copy of the value, to avoid the accidental usage of the moved value.

If the copy of value is actually costly and should not be copied, it is worth wrapping it into unique_ptr (like Box) or shared_ptr (like Arc), which will keep a single instance of the value on the heap. Relying on move in such case is very fragile and incurs a maintenance cost to keep the program correct.

Functions and Methods

Const references

In Rust, you can create a function that immutably borrows a value:

fn foo(value: &String) {
    println!("value: {}", value);
}

The Rust compiler will not allow calling methods or operations on String that modify contents of that String. In Rust-talk it would not allow to call methods that mutably borrow a string or need to take ownership of a string.

In C++, you can do the same:

#include <string>
#include <iostream>

using std::string;
using std::cout;
using std::endl;

void foo(const string& value) {
    cout << "value: " << value << endl;
}

The const T& idiom is similar to &T in Rust. C++ compiler will not allow modifying the contents of const T& object. In C++-talk, the C++ would not allow to call methods on the string that are non-const.

Const methods

Let’s say we have structure Person in Rust, and use it as parameter for function print_full_name:

struct Person {
    first_name: String,
    last_name: String,
}

fn print_full_name(person: &Person) {
    println!("{} {}", person.first_name, person.last_name);
}

This function could be made into a method on Person:

struct Person {
    first_name: String,
    last_name: String,
}

impl Person {
    pub fn print_full_name(&self) {
        println!("{} {}", self.first_name, self.last_name);
    }
}

Note that print_full_name can only access &self reference immutably. In C++, this is achieved with const modifier on the method:

#include <string>
#include <iostream>

class Person {
private:
    std::string first_name;
    std::string last_name;
public:
    void print_full_name() const {
        std::cout << first_name << " " << last_name << std::endl;
    }
};

In Rust, we would be able to use print_full_name method in places where Person can be borrowed immutably.

fn foo(person: &Person) {
    person.print_full_name();
}

In C++, we will be able to use print_full_name in places where Person can be const.

void foo(const Person& person) {
    person.print_full_name();
}

Methods that Mutably Borrow in C++

In Rust, methods that modify the reference must use &mut reference. For example, a method implemented on Person:

struct Person {
    first_name: String,
    last_name: String,
}

impl Person {
    pub fn clear_name(&mut self) {
        self.first_name.clear();
        self.last_name.clear();
    }
}

Or a standalone method:

fn foo(person: &mut Person) {
    person.clear_name(); // "clear_name" mutably re-borrows Person
}

In C++, this is simply any method without const qualifier:

#include <string>

class Person {
private:
    std::string first_name;
    std::string last_name;
public:
    void clear_name() {
        first_name.clear();
        last_name.clear();
    }
};

And any method that takes non-const reference:

void foo(Person& person) {
    person.clear_name();
}

Methods that Take Ownership in C++

As discussed previously, it is possible in C++, but is considered a bad practice, and you should leave moves up to the compiler.

However, there is a few cases where manual std::move might be ok. One of them is a setter function.

Consider a Rust method that changes the name:

struct Person {
    name: String,
}

impl Person {
    pub fn set_name(&mut self, name: String) {
        self.name = name;
    }
}

We can call it in some function foo that had the ownership of the name:

fn foo(person: &mut Person, name: String) {
    person.set_name(name); // requires explicit clone
}

In Rust, the set_name will take the ownership of name be default. However, C++ it would copy by default.

Same method in C++:

#include <string>

class Person {
private:
    std::string name;
public:
    void set_name(std::string name) {
        this->name = std::move(name); // we can safely move
    }
};

We can safely move inside the setter, because we have a parameter that is already a copy. However, we did not avoid the copying at the call site:

void foo(Person& person, std::string name) {
    person.set_name(name); // copy
}

We can use std::move here:

void foo(Person& person, std::string name) {
    person.set_name(std::move(name)); // move
}

However, the caller of foo must do the same to ensure the move, and this cycle continues.

One thing to look for when using std::move is mutable references! Let’s say we had a mutable reference in function foo, and moved the value:

void foo(Person& person, std::string& name) {
    person.set_name(std::move(name)); // move clears the original name
}

Now the caller of foo will suddenly find the name gone.

In this particular case, the better practice is to use const T& reference all the way down to the setter. This will create a copy of name inside the setter, with a minimal overhead.

However, if the name was a very big string, i.e. something like file contents, and it would be necessary to ensure no copies for performance reasons, the unique_ptr or shared_ptr would come to the rescue:

#include <string>
#include <memory>

class Person {
private:
    std::shared_ptr<std::string> personal_page;
public:
    void set_personal_page(const std::shared_ptr<std::string>& personal_page) {
        this->personal_page = personal_page; // note that we copy here
    }
};

Note that we leave the copy in, but what we copy now is only a Arc pointer that points to the same memory contents.

Lifetimes

One idiomatic thing in Rust is exposing value’s contents for external mutation. All iterators in Rust are built on this concept, as well as many standard library functions.

For example, we may add a method for Person that allows someone else to change the first and the last names:

#[derive(Debug)]
struct Person {
    first_name: String,
    last_name: String,
}

impl Person {
    pub fn get_first_name_mut(&mut self) -> &mut String {
        &mut self.first_name
    }

    pub fn get_last_name_mut(&mut self) -> &mut String {
        &mut self.last_name
    }
}

Then we can have a function that appends “foo” to a string reference:

fn append_foo(value: &mut String) {
    value.push_str(" foo");
}

Then we can write some code that allows some external function to modify contents of a String inside the Person:

fn main() {
    let mut p = Person {
        first_name: String::from("John"),
        last_name: String::from("Smith"),
    };

    append_foo(p.get_first_name_mut());
    append_foo(p.get_last_name_mut());

    println!("{:?}", p);

    // output:
    // Person { first_name: "John foo", last_name: "Smith foo" }
}

As you may know, the Rust compiler understands lifetime elision. That means you usually do not need to annotate any references with lifetimes, but they are still there.

For example, impl of Person has these lifetime annotations:

impl Person {
    pub fn get_first_name_mut(&'a mut self) -> &'a mut String {
        &mut self.first_name
    }
}

References are basically pointers. The lifetime syntax &'a mut communicates to the compiler that the returned value must point to the same or narrower memory location 'a as the function argument.

If we tried to return a reference to the value which is outside of 'a, the compiler would complain:

impl Person {
    pub fn get_first_name_mut(&'a mut self) -> &'a mut String {
        &mut String::from("Other") // error: borrowed value does not live long enough
        //   ^^^^^^^^^^^^^^^^^^^^^ temporary value created here
    }
}

Therefore, at the call site, the compiler knows that the Person is borrowed for every call to append_foo and would not allow us to do anything funky:

fn main() {
    let mut p = Person {
        first_name: String::from("John"),
        last_name: String::from("Smith"),
    };

    {
        let name: &mut String = p.get_first_name_mut();
        p.first_name = String::from("Crash");
        // error: cannot assign to `p.first_name` because it is borrowed
        append_foo(name);
    }
}

The C++, however, has no machinery to understand where the pointers or references point to, and does not help. However, we can still implement the same in C++.

First, the Person:

class Person {
public:
    std::string first_name;
    std::string last_name;

    Person(std::string first_name, std::string last_name)
    : first_name(std::move(first_name))
    , last_name(std::move(last_name))
    {}

    std::string& get_first_name_mut() {
        return this->first_name;
    }

    std::string& get_last_name_mut() {
        return this->last_name;
    }
};

Similar to setters, we used std::move trick in constructor to avoid copies. This is a usual practice in C++.

Then we create append_foo, which is nothing surprising:

void append_foo(std::string& value) {
    value += " foo";
}

And finally, the main function:

int main() {
    Person p("John", "Smith");

    append_foo(p.get_first_name_mut());
    append_foo(p.get_last_name_mut());

    std::cout << "first name: " << p.first_name << std::endl;
    std::cout << "last name: " << p.last_name << std::endl;

    // output:
    // first name: John foo
    // last name: Smith foo
}

However, the C++ compiler is not able to track lifetimes and ensure memory safety.

This is a problem when you get used to these things being verified by the compiler. The objects we have just written might become more complex, and it would become much harder to track runaway modifications to Person:

int main() {
    Person p("John", "Smith");

    std::string& name = p.get_first_name_mut();
    p = Person("Crash", "Bob");
    append_foo(name);

    // Output:
    // first name: Crash foo
    // last name: Bob
}

It worked, even when we have overwritten the memory location of Person. This actually may continue working. Or it may fail in release build. Or it may fail when other developer wraps Person in shared_ptr:

int main() {
    auto p = std::make_shared<Person>("John", "Smith");

    std::string& name = p->get_first_name_mut();
    p = std::make_shared<Person>("Crash", "Bob");
    append_foo(name);

    std::cout << "first name: " << p->first_name << std::endl;
    std::cout << "last name: " << p->last_name << std::endl;

    // Output:
    // first name: Crash
    // last name: Bob
}

Now, we modified freed memory, which worked, but may not work if something else was written in that previous memory location.

The better practice in C++ is to avoid methods that return mutable references. Instead, we could access the fields directly (but trade away privacy):

int main() {
  Person p("John", "Smith");

  append_foo(p.first_name);
  append_foo(p.last_name);
}

Or create the additional copy, which is not really a big deal:

std::string append_foo(const std::string& value) {
    // set capacity and avoid multiple allocations
    std::string ret;
    ret.reserve(value.size() + 4);
    ret += value;
    ret += " foo";
    return ret;
}

int main() {
    Person p("John", "Smith");

    p.first_name = append_foo(p.first_name);
    p.last_name = append_foo(p.last_name);
}

Conclusion

The big hurdle when moving back to C++ from Rust was the missing move-by-default feature. This required learning other idiomatic patterns in C++ land, and in some cases admitting that not all the code needs to be both efficient and easy to maintain.

In most cases maintainability wins, and avoiding “premature optimization” is very much a necessity in C++.