Intro to C++

What is C++

Definition

C++ is currently one of the world’s most popular programming languages. It was created by Bjarne Stroustrup as an extension of the C language. The language is ideal for building and maintaining large-scale software infrastructures. Its object-oriented nature allows for clear structure, modularity, and code reusability, which helps manage complexity in large projects such as web browsers, operating systems, and database engines. C++ is also a strongly typed language, meaning that variable types must be explicitly defined, which can help catch errors at compile time and improve code reliability (see the section below on C++ types for more information on types).

Why use C++

There are thousands of online resources available for learning and troubleshooting C++, making it accessible for both beginners and experienced developers. C++ supports object-oriented programming, which enables you to design software using classes and objects, promoting encapsulation, inheritance, and polymorphism. Additionally, C++ offers fine-grained control over system resources and memory management, making it a preferred choice for high-performance applications and systems programming.

Writing C++

Basics

Lines end in semicolons
Everything must be declared first, even functions
Indexing starts at 0, where indexing in R starts at 1
Basic math operators (=,+,-,/,*) operate the same as they do in R
Code must first be compiled before it is run

Types

Every variable, function, and expression in C++ must have a type, which determines the amount of memory allocated and the operations that can be performed. This explicit nature is why C++ is defined a strongly typed language. C++ types are categorized into the following three main groups: Primitive (built-in), Derived, and User-Defined.

Primitive types are the basic building blocks in C++, similar to atomic types in R. They represent single values, are are used directly, and are the building blocks for more complex types.

bool: Boolean (true or false), like logical in R but lowercase
char: A single character or small integer
int: Integer
float: Single precision floating-point number, less precise than numeric in R
double: Double precision floating-point number, like numeric in R
void: No return value from a function

Derived types in C++ are built from primitive types.

Arrays: Like R vectors but fixed size and type
Pointers: Variables that store memory addresses, similar to environments or references in R
References: Aliases for existing variables, somewhat like assignment by reference in R

User-defined types in C++ are custom types created by the programmer.

Structs and Classes: These are similar to lists, data frames, or S3/S4 objects in R, allowing you to group different types of data and define custom behaviors and methods.

Hello World example

In almost all tutorials, the hello world program will be the first introduction to writing a C++ program. Here we give two examples, one where main, i.e., the program name, returns an integer and a duplicate example (without comments in the code) that does not return a value. Regardless of what is being returned in either option, they both print “Hello World!” to the terminal.

Example: Hello World

/*
 The include statement allows you to use the iostream library, which includes
 the standard library (std) functions.
*/
#include <iostream>
/*
 `using namespace` allows you to use functions in the library that you specify,
 e.g., std, without specifying their namespace on each instance, e.g., you can
 use `cout` rather than `std::cout`.
*/
using namespace std;

/*
 main is the entry point to the program and the function name the int
 declaration before the function name declares the function will return an
 integer.
*/
int main() {
  /*
   cout allows you to print a message directly to the screen when the function
   is called
  */
  cout << "Hello World!";
  return 0;
}

If a function should not return anything, then the type declaration before the function name should be void. The above Hello World program is re-written below (without comments) to use void instead of int.

Example: Hello World using void

#include <iostream>
using namespace std;

void main() {
  cout << "Hello World!";
}

add example

The following program does a mathematical computation, taking two integers as inputs and returning an integer.

int add(int x, int y) {
  return x + y;
}

The program coded above will fail if the inputs are not integers, which is just one example of why C++ is considered a strongly-typed language. An additional program is needed to handle doubles as inputs. Using this version of add will return a floating-point number. See the section below on templated code for how to avoid having to a function for each type.

double add(double x, double y) {
  return x + y;
}

Vectors

Vectors are a compound type and are available in the standard library, i.e., std::vector. Compound types are built from primitive types like int. A vector will be compound integer if the vector is composed of integers, compound double if the vector is composed of doubles, and so on. The type of all elements in the vector must be the same and must be declared when the vector is created, e.g., std::vector<int>. For example, in FIMS we have a vector of parameter names that we declare as a vector of strings, e.g., std::vector<std::string> parameter_names in information.hpp.

std::vector has a number of member functions (see the rcpp methods section in the Rcpp vignette) that can be used to operate on a vector. Below are a few member functions of std::vector that are commonly used or used within the FIMS code, e.g., we use resize quite often. See the documentation for std::vector for the full list of member functions.

begin: return iterator to beginning
end: return iterator to end
size: returns vector size
resize: changes the vector size
[]: access element
front: access first element
back: access last element
push_back: append element to end
pop_back: delete last element
clear: clear content

The code below illustrates how to create a vector and utilize some of the member functions. Specifically, it initializes a vector, resizes it to have a length of three, adds elements using a for loop, uses the member function push.back() to add one additional element to the end of the vector, and returns a vector of doubles with the following four elements: 1.3 2.6 3.9 8.1.

Example: std::vector

#include <iostream>

std::vector<double> create_vector() {
  std::vector<double> x;
  x.resize(3);
  x[0] = 1.3;
  for(int i=1; i<3; i++){
    x[i] = x[i-1] + 1.3;
  }
  x.push_back(8.1);

  return x;
}

Constants

If you do not want a value to change throughout the program, you can assign it to be a constant. For all constant values, you must first declare it is going to be a constant and then declare the type.

const int MinutesPerHour = 60;
const double pi = 3.14159;

Templates

Because C++ is a strongly typed language (see the section on C++ types), every function must be declared to return a single type, e.g., int, float, double, etc. For example, in the simple addition function that we wrote above, we had to specify two separate functions with the same name, i.e., add, one for adding integers and one for adding doubles.

It can be tedious and repetitive to write out functions for each type, and it can lead to more code to maintain and more potential for bugs. To simplify matters, C++ allows for templated code, which allows functions to take a generic type, e.g., Type, rather than a specific type, e.g., int. This is helpful for instances when the guts of a function are repeated multiple times but the input and output types differ on each instance.

For example, the code below is a single replacement for the two add functions in the simple addition function. In fact, this templated function would also work for additional types beyond integers and doubles not accounted for in the simple addition function, such as booleans because true and false values translate into 1s and 0s.

template <class Type>
Type my_add(Type x, Type y){
  return x + y
}

Syntax in FIMS

In templated code, class is interchangeable with typename, where typename is seen as a more modern way the templated code. Additionally, you do not have to use Type for the name of the typename. You could use a capital T, or many examples use a capital A, e.g., template <typename A> instead of template <class Type>. Within the FIMS codebase, we exclusively use class instead of typename and Type to declare the type parameter name. You can see several examples of templated code in fims_math.hpp.

{Rcpp} does not support C++ templating so you cannot create templated code if you plan on compiling it with Rcpp::cppFunction() (see the vignette on {Rcpp} for where it does work). Instead, you have to compile the C++ code using the terminal.

Compiling directions

The example code for my_add given above only provides the function and does not actually run it. Below, we expand it to include the necessary header file so the output can be printed the screen within an additional main program that runs my_add multiple times. In the example below, main returns the integer zero to signal the end of the program.

To compile and run the code that we give below you must save it to a .cpp file, e.g., my_templated_add.cpp, and navigate your terminal to the directory where the file is saved. Next you will need to compile the program (see the second code chunk), which will create an executable that you can then run to see the results of main. If you are on a Windows computer, g++ will be installed in your rtools directory but it might not be available in your path, which means you will have to specify the full path to the executable rather than just the executable name as we do in the example.

#include <iostream>

template <typename Type>
  Type my_add(Type x, Type y) {
    return x + y;
  }

int main(){
  //works with integers
  std::cout << "Type is int: " << my_add(1, 2) << std::endl;

  //works with floats
  std::cout << "Type is float: " << my_add(1.2, 3.2) << std::endl;

  //works with doubles
  std::cout << "Type is double: " << my_add(1.52757396774, 6.83480375227) << std::endl;

  return 0;

}

Ensure that you navigate your terminal to where you saved my_templated_add.cpp, using cd, i.e., change directory, before you run the following code.

g++ my_templated_add.cpp -o a.exe
a.exe

Running the executable results in the following being printed to the screen.

Type is int: 3
Type in float: 4.4
Type is double: 8.36238

Classes

In C++, a class is a user-defined type that serves as a blueprint for creating objects. For example, Rcpp::NumericVector is a class within Rcpp that we talk about a lot in the Rcpp introduction vignette. Classes allow for the grouping of related data (called member variables or attributes) and functions (called member functions or methods) into a single, cohesive unit. This encapsulation makes it easier to model real-world entities and manage complexity in large programs (see cplusplus.com for more details). By defining classes, you can create multiple objects with the same structure and behavior, enabling code reuse, modularity, and the use of object-oriented programming principles such as inheritance and polymorphism.

Within your code, a class must first be defined before it can be used. Though, memory is only allocated when an instance (object) of the class is created, this is called instantiation. There are several ways to create class instances in C++. You can declare an object directly (e.g., MyClass m; that creates the object m defined by MyClass) or you can use the new function to allocate memory dynamically and create a pointer to the object (e.g., MyClass* m = new MyClass();). In modern C++ and in FIMS, it is common to use shared pointers (std::shared_ptr) to manage memory automatically and safely share ownership of objects across different parts of the program. If you are not yet familiar with pointers or dynamic memory allocation, do not worry—these concepts are explained in detail in the Pointers and References section below.

Once a class instance is instantiated, you can access its members (variables and functions) using either the dot operator (.) if you have a regular object that was created directly or the arrow operator (->) if you are working with a pointer or a shared pointer. This distinction is important for interacting with class members and is a fundamental part of working with objects in C++.

MyClass example

class MyClass{ //start of class called MyClass
  //Access specifier: can be private, public, or protected
  public:
  //Data members: variables to be used
  float a;
  float b;

  MyClass(){} //Constructor - automatically called when new object created

  //Member functions: Methods to access data members
  float my_add(){
    return a + b;
  }

}; //class ends with semicolon

Instantiate and initialize

Example: Use MyClass

// [[Rcpp::export]]
void my_class_add(){ //Use void because the main function has no return

  //Create instance of class using a declaration
  MyClass m1;

  //Members can be accessed and initialized using the . command
  m1.a = 1.2;
  m1.b = 2.4;

  Rcpp::Rcout << "m1.my_add() = " << m1.my_add() << std::endl;

  //Create instance of class using a pointer
  MyClass* m2 = new MyClass();
  //Note that the above is equivalent to the following, where the space is
  //located is irrelevant
  //MyClass *m2 = new MyClass();

  //Members can be access using the -> command
  m2->a = 3.3;
  m2->b = 4.4;

  Rcpp::Rcout << "m2->my_add() = " << m2->my_add() << std::endl;

  //Need to delete memory after use when instantiating with new
  delete m2;

  //Create instance of class with shared pointer
  //Note the type of the shared pointer is MyClass
  std::shared_ptr<MyClass> m3 = std::make_shared<MyClass>();

  //Members can be access using the -> command
  m3->a = 4.2;
  m3->b = 1.7;

  Rcpp::Rcout << "m3->my_add() = " << m3->my_add() << std::endl;
}

The previous two code chunks are saved in cpp/my_class.cpp along with the necessary include statements. You can compile the file in an R session using {Rcpp} and call the program my_class_add(), see the following code:

Rcpp::sourceCpp("cpp/my_class.cpp")
my_class_add()

## m1.my_add() = 3.6
## m2->my_add() = 7.7
## m3->my_add() = 5.9

Inheritance

A new class (child, derived) can inherit members and functions from an existing class (parent, base), and this is termed inheritance. There are three types of inheritance,

- **public**: everything that can access the base class can access its public members and functions;
- **protected**: only the derived classes can access the protected members and functions of a base class; and
- **private**: only the base class can access private members and functions of the base class.

The MyClass example uses public inheritance, which is what is largely used in FIMS. Though, some things in FIMS are declared as private (see SharedInt in rcpp_shared_primitive.hpp).

When setting up inheritance, it’s best to establish a base class with behavior shared with all the derived classes. This means that the base class should contain the data members and functions that are common to all derived classes so code is not duplicated and can be reused easily. By centralizing shared functionality in the base class, you make your code more organized, maintainable, and flexible — any changes to shared behavior only need to be made in one place, and all derived classes will automatically inherit those updates. This approach also makes it easier to extend your code in the future by adding new derived classes that benefit from the existing shared features.

To refer to members (variables or functions) of the current object, this-> is used inside of a class. It is especially useful in derived classes to clarify that you are accessing a member inherited from the base class, or to resolve naming conflicts. For example, if a derived class has a function or variable with the same name as one in the base class, using this-> makes it clear you mean the member of the current object. In most cases, you can access members without this->, but it can help with code clarity and is sometimes required for templates or to avoid ambiguity.

In the code below, we create a base class Shape with two public members, length and height. Then, we add derived classes that can inherit from the Shape base class. Finally, we create a function called calculate_areas() that determines the area of the appropriate child class.

Example: Inheritance

class Shape{

public:
double length;
double height;

Shape(){}

};

//Rectangle is a derived class of Shape
class Rectangle : public Shape{
public:
  Rectangle() : Shape() {}

  public:
  //this->length: points to length within base class, etc.
  double area(){
    return this->length * this->height;
  }
};

//Square is also a derived class of Shape
class Square : public Shape{
public:
  Square() : Shape() {}

  public:
  double area(){
    return this->length * this->length;
  }
};

// [[Rcpp::export]]
double calculate_areas(std::string shape, double length, double width = 0){

  double out = 0;

  if (shape == "rectangle"){
      Rectangle rect;
      rect.length = length;
      rect.width = width;
      out = rect.area();
  } else if (shape == "square") {
      Square sq;
      sq.length = length;
      out = sq.area();
  } else {
      Rcpp::Rcout << "Invalid shape" << std::endl;
  }
  return out;
}

The previous code and the appropriate include statements are combined into cpp/my_inheritance.cpp and can be run from R using Rcpp::sourceCpp(), see below.

Example: Run inheritance example from R

Rcpp::sourceCpp("cpp/my_inheritance.cpp")

calculate_areas("rectangle", 6, 4)

## [1] 24

calculate_areas("square", 2)

## [1] 4

calculate_areas("triangle", 3, 4)

## Invalid shape

## [1] 0

Polymorphism

We saw how derived classes can inherit from base classes. This example resulted in code that required if () statements to calculate a function for the different classes. Polymorphism can be used instead to specify different behaviors for each child class to avoid conditional statements in the code, which are hard to maintain and extend. Polymorphism is used when a function or operator works differently when used in different context and works when classes inherit from each other resulting in a single action or function that produces different results based on the derived class.

To set up polymorphism in your code you will need to use a virtual function that is overridden at run time. Thus, we will set up the base class as before but this time the shared object is a function called area(). Note that the area function is proceeded by virtual and set to equal 0. This is done to specify that area is a function being overwritten by functions from the child classes. Then, we specify virtual functions in each child class with the same name, e.g., area, to overwrite the area function in the base class. Finally, the classes have to be exposed to R using Rcpp::RCPP_MODULE(). See the vignette on Rcpp for a refresher on Rcpp modules.

Example: Polymorphism

class Shape {
  public:
    //constructor
    Shape() {}

    virtual ~Shape() {}

    // Virtual function to calculate area
    virtual double area() = 0;

  };

class Circle : public Shape {
  double radius;

  public:
    Circle(double r) : Shape(){
      radius = r;
    }

  // Override area() for Circle
  virtual double area() {
    return M_PI * radius * radius;
  }
};

class Rectangle : public Shape {
  private:
    double length, height;

  public:
    Rectangle(double l, double h) : Shape(){
      length = l;
      height = h;
    }

  // Override area() for Rectangle
  virtual double area()  {
    return length * height;
  }
};

RCPP_MODULE(shape) {
  Rcpp::class_<Shape>("Shape")
  .method("area", &Shape::area);

  Rcpp::class_<Circle>("Circle")
    .derives<Shape>("Shape")
    .constructor<double>();

  Rcpp::class_<Rectangle>("Rectangle")
    .derives<Shape>("Shape")
    .constructor<double, double>();
}

We can load this code, which is stored in cpp/my_polymorphism.cpp using Rcpp::sourceCpp() and Rcpp::loadModule() where the second argument is TRUE so that all objects are loaded into the R environment. methods::new() is used to create new instances of the desired class.

Example: Load module

Rcpp::sourceCpp("cpp/my_polymorphism.cpp")
# Note that "shape" matches the name passed to RCPP_MODULE
Rcpp::loadModule("shape", TRUE)
c <- methods::new(Circle, 2)
r <- methods::new(Rectangle, 3, 4)

c$area()

## [1] 12.56637

r$area()

## [1] 12

Pointers and References

Pointers

A pointer is a variable that stores the memory address of another variable as its value. Instead of holding a direct value (like an int or double), a pointer “points to” the location in memory where a value is stored. Pointers are declared using the * operator, for example, int* p; creates a pointer named p to an integer. More specifically, to point to where y is stored we can declare int y = 3; int* x = &y;. The * operator can also be used to access the value the pointer points to, e.g., *x. Pointers are powerful for dynamic memory management, working with arrays, and enabling efficient function arguments. In FIMS, we use shared pointer, i.e., std::shared_ptr<int> y, which we will explore later.

Example: Pointers

Rcpp::cppFunction('
#include <Rcpp.h>
int pointer(){
  float y = 3.1459;
  //initiate a variable x that points to the same address as y
  float* x = &y;
  Rcpp::Rcout << "x is equal to the address of y" << std::endl;
  Rcpp::Rcout << "x is: " << x << std::endl;
  Rcpp::Rcout <<  "The address of y is: " << &y << std::endl;
  Rcpp::Rcout << "*x returns the value of y: " << *x <<  std::endl;
  return 0;
}')

pointer()

## x is equal to the address of y
## x is: 0x7ffcc5f440c4
## The address of y is: 0x7ffcc5f440c4
## *x returns the value of y: 3.1459

## [1] 0

References

A reference variable is an alias for an existing variable. Once a reference is initialized to a variable, it acts as another name for that variable — any changes made through the reference affect the original variable. References are declared using the & operator, for example: int y = 3; int &ref = y; creates a reference named ref that refers to the variable y. Unlike pointers, references must be initialized when declared and cannot be changed to refer to another variable later. References are commonly used to pass variables to functions without making copies, enabling efficient and direct access to the original data. The & operator can also be used to return the memory address of a variable, e.g., &y.

Example: References

Rcpp::cppFunction('
  #include <Rcpp.h>
  int reference() {
    int y = 3;
    int &x = y;
    Rcpp::Rcout << "x is: " << x << std::endl;
    Rcpp::Rcout << "y is: " << y << std::endl;
    Rcpp::Rcout << "The memory address of x is: " << &x << std::endl;
    Rcpp::Rcout << "The memory address of y is: " << &y << std::endl;

    return 0;
  }')

reference()

## x is: 3
## y is: 3
## The memory address of x is: 0x7ffcc5f440c4
## The memory address of y is: 0x7ffcc5f440c4

## [1] 0

Modifying Pointers

We can use pointers to update values. In the following example, b is a copy while c is a pointer, which is why b does not get updated when *c is updated. We avoid copies in FIMS because when a variable gets updated in the model, we want it updated everywhere in the model.

Example: Updating pointers

Rcpp::cppFunction('
#include <Rcpp.h>
int update_pointer(){
 //initiate a variable
  float a = 3.1459;
  //initiate a new variable with the same value as a
  float b = a;
  //initiate a variable c that points to the same address as a
  float* c = &a;
  *c = 100;
  Rcpp::Rcout << "a and *c have been updated; b has not"  << std::endl;
  Rcpp::Rcout << "a = " << a << "; *c = " << *c << std::endl;
  Rcpp::Rcout << "b = " << b << std::endl;
  return 0;
}')

update_pointer()

## a and *c have been updated; b has not
## a = 100; *c = 100
## b = 3.1459

## [1] 0

We can reassign pointers using the & operator. What do you expect a and *c to return in the following example?

Example: Reassigning pointers

Rcpp::cppFunction('
#include <Rcpp.h>
int reassign_pointer(){
//initiate a variable
  float a = 3.1459;
  //initiate a new variable with the same value as a
  float b = a;
  //initiate a variable c that points to the same address as a
  float* c = &a;
  *c = 100;
  c = &b;
  b = 10;
  Rcpp::Rcout << "c now equals the address of b" << std::endl;
  Rcpp::Rcout << "&a = " << &a << std::endl;
  Rcpp::Rcout << "&b = " << &b << std::endl;
  Rcpp::Rcout << "c = " << c << std::endl;
  Rcpp::Rcout << "a = " << a << std::endl;
  Rcpp::Rcout << "b = " << b << std::endl;
  Rcpp::Rcout << "*c = " << *c <<std::endl;
  return 0;
}')

reassign_pointer()

## c now equals the address of b
## &a = 0x7ffcc5f440c0
## &b = 0x7ffcc5f440c4
## c = 0x7ffcc5f440c4
## a = 100
## b = 10
## *c = 10

## [1] 0

Memory Management

When pointers are initialized inline, e.g., int x = 3; int* y = x, the compiler automatically creates and manages memory for you. Care must be taken to manage memory properly. The user can also create and manage memory themselves using the new and delete commands. If you create a variable using new and do not clean up after using delete, the program can result in memory leaks where the memory in the program accumulates over its run time, which can slow down or even crash your program. In FIMS, we use clear(), which removes all pointers, references, and objects created during previous model runs, ensuring that no leftover data or memory remains in the C++ backend. This is important for avoiding memory leaks and for making sure that each new model run starts with a clean slate. In practice, you should run clear() before starting a new model or after finishing one, especially when working interactively in R, to prevent old objects from interfering with new analyses.

Example: Manual memory management

Rcpp::cppFunction('
#include <Rcpp.h>
int manage_memory(){
  //initiate an integer type pointer
  int* ptr= new int;
  *ptr = 35;
  Rcpp::Rcout << "*ptr = " << *ptr <<std::endl;

  //delete memory
  delete ptr;
  return 0;
}')

manage_memory()

## *ptr = 35

## [1] 0

Shared pointers

A shared pointers, also called a smart pointer, in C++ is a type of smart pointer provided by the standard library (std::shared_ptr) that manages the lifetime of a dynamically allocated object. Unlike a regular pointer, a shared pointer keeps track of how many shared pointers point to the same object using a reference count. When the last shared pointer referencing an object is destroyed or reset, the object’s memory is automatically deallocated. Shared pointers are especially useful when multiple parts of a program need to share ownership of an object, as they help prevent memory leaks and make memory management safer and easier. In FIMS, shared pointers are used to ensure that important model components are accessible from different modules while avoiding manual memory management.

FIMS uses shared pointers to help with memory management. It is recommended to use a shared pointer when the ownership of an object is shared across the program. For example in FIMS, the recruitment module needs to be accessed by information.hpp and population.hpp. The pointer to the recruitment module, is therefore shared across the program. C++ automatically deallocates memory for a shared pointer when the object goes out of scope, thus preventing memory leaks in the program.

Specifically, each shared pointer has a reference count that tracks the number of instances in which the shared pointer points to the same object. In the example above, information and population both reference the recruitment module using a shared pointer, so the recruitment module pointer would have a reference count of two. When the reference count drops to zero, the memory where the recruitment module is saved would be automatically deleted.

Declare a shared pointer:

std::shared_ptr<Type> ptr;

Initialize using a new pointer:

std::shared_ptr<Type>* ptr;
std::shared_ptr<Type> ptr = std::make_shared<Type>();

Initialize using existing pointer:

double* x = 3.2;
shared_ptr<double> ptr(x);
shared_ptr<double> ptr = make_shared(x);

Example: Shared pointers

Rcpp::cppFunction('
  #include <Rcpp.h>
  //std::shared_ptr is defined in <memory>
  #include <memory>
  int shared_pointer(){
   // Creating shared pointers using std::make_shared
    std::shared_ptr<int> ptr1 = std::make_shared<int>(42);
    std::shared_ptr<int> ptr2 = std::make_shared<int>(24);
    // Accessing the values using the (*) operator
    Rcpp::Rcout << "ptr1: " << *ptr1 << std::endl;
    Rcpp::Rcout << "ptr2: " << *ptr2 << std::endl;
    // Set up a new pointer that shares ownership with the ptr1
    std::shared_ptr<int> ptr3 = ptr1;
    // Checking if shared pointer 1 and shared pointer 3
    Rcpp::Rcout << "ptr1 = " << ptr1 << std::endl;
    Rcpp::Rcout << "ptr2 = " << ptr2 << std::endl;
    Rcpp::Rcout << "ptr3 = " << ptr3 << std::endl;

    return 0;
}')

shared_pointer()

## ptr1: 42
## ptr2: 24
## ptr1 = 0x556ab7a04510
## ptr2 = 0x556ab8a79f50
## ptr3 = 0x556ab7a04510

## [1] 0

Andrea Havron