May 19, 2021
Linux Container Primitives: Memory, CPU, Freezer and Device Control Groups
Part nine of the Linux Container series
After discussing the The Network and Block I/O Controllers , this post considers the Memory, CPU, Freezer and Device controllers. The following list shows the topics of all scheduled blog posts. It will be updated with the corresponding links once new posts are being released.
- An Introduction to Linux Containers
- Linux Capabilities
- An Introduction to Namespaces
- The Mount Namespace and a Description of a Related Information Leak in Docker
- The PID and Network Namespaces
- The User Namespace
- Namespaces Kernel View and Usage in Containerization
- An Introduction to Control Groups
- The Network and Block I/O Controllers
- The Memory, CPU, Freezer and Device Controllers
- Control Groups Kernel View and Usage in Containerization
The Memory Controller (v2)
Processes can be accounted and limited regarding their memory usage. The following types of memory usages are currently being tracked [1]:
- Consumed memory in user-space
- Memory usage in kernel-space, as in kernel data structures
- TCP socket buffers
The memory consumed by a control group can be read using the memory.current
file. One of the most common ways to limit memory usage includes setting the memory.max
value to define an upper memory limit for all processes residing in a control group.
A cgroup is charged for its memory usage when allocating memory. In turn, this accounting gets removed once free()
or similar mechanisms to free previously allocated memory are being used. When moving a process that has allocated memory in the name of a control group to another group, the original group is being charged for the allocated memory since these mappings are not being transferred.
The CPU Controller (v1/v2)
Various controllers exist for version 1 and 2 of the control group implementation to manage CPU utilization. There are three controllers for v1
:
With cpu
a control group is supplied with a guaranteed minimum time of CPU utilization. The cpuset
controller allows specifying a set of processors a process is allowed to be executed on. For the current process this can be examined with cat /proc/$$/status | grep Cpus
. Changes to this setting propagate to all descendants in the control group hierarchy. Accounting is performed with cpuacct
, for example the file cpuacct.usage
gives information on the consumed CPU time of all processes in a control group in nanoseconds.
Control group version 2 allows weight and absolute CPU limiting models with the cpu
subsystem. In contrast to v1
, the newer control group version does not support real-time processes. Therefore, all real-time processes have to be moved to the root control group first before activating the cpu v2
subsystem in the cgroup tree.
The Freezer Controller (v1)
This control group does not limit or account resource usage - it rather allows freezing a process. Freezing a process ultimately stops the execution and suspends it. This allows analyzing the current state of a process with the ability to unfreeze it afterwards and continue the execution without side effects. By creating a checkpoint with the freezer
subsystem it’s also possible to move an entire running process, including its children, to another machine or restart a process from a specific state [2].
For this, the virtual file freezer.state
exists that can receive either FROZEN
or THAWED
as input values to freeze and unfreeze a process. This works by walking down the control group hierarchy and marking all descendants of a process with the desired state. Additionally, all processes managed by the affected groups have to be moved in or out of the freezer
group, depending on the desired freezing state. Freezing itself is done by sending a signal to the affected processes. Also, the freezer
has to follow all child process of the affected processes that may result from calling fork
and freeze these as well to prevent freeze escapes [3].
The Devices Controller (v1)
This controller type allows to implement access controls for devices. One can use whitelist and blacklist approaches to only block or allow very specific accesses by defining exceptions. Child control groups are forced to have the exact same or a subset of the exception list of the parent. This results in faster checks whether a rule can be added to the exception list because only the list of the child has to checked and not the whole group tree. This controller is one of the few that makes use of the hierarchical organization in order to pass configuration information to its child groups.
For the the following example, the devices
controller will be used to restrict a process from accessing /dev/null
.
To limit the usage of devices, their major and minor numbers have to be used. These numbers are the respective identifiers of a device in the filesystem tree. The major number describes the driver that’s required and is used by the kernel in order to access a specific device. The minor number is used by the device driver to distinguish logical and physical devices resulting from the existence of a certain device. In the above example for /dev/null
these numbers can be identified using stat -c "major: %t minor: %T" /dev/null
which yields the values 1
for major and 3
for minor.
First, a device control group has to be created with cgcreate -g devices:nodevnull
with the identifier of the control group being nodevnull
. To add the current shell process to this group, the command cgclassify -g devices:nodevnull $$
will be invoked. The process identifier of the current shell process is $$
. To finally deny accessing the /dev/null
device, this command will be executed: cgset -r devices.deny="c 1:3 rwm" nodevnull
. The format of the parameter for devices.deny
is as follows:
The device type is determined by using the first character of the output of ls -la /dev/null
, which shows that it’s a character device.
Now accessing the specific device is being blocked, even for processes running as root
user:
root@box:~# echo "a" > /dev/zero # Allowed
root@box:~# echo "a" > /dev/null # Denied
bash: /dev/null: Operation not permitted
Next post in series
- The next post in this series ‘
Control
Groups
Kernel
View
and
Usage
in
Containerization
' will be published soon.
Follow us on Twitter , LinkedIn , Xing to stay up-to-date.
Credits
Credits: The elaboration and software project associated to this subject are results of a Master’s thesis created at SCHUTZWERK in collaboration with Aalen University by Philipp Schmied.