This second stack leads to JNI problems on Android, because ART fetches
the address and size of the original stack using pthread functions
(see GetThreadStack in art/runtime/thread.cc), and (presumably) treats
stack addresses outside of the original stack as invalid. (What I don't
understand is why some JNI operations on the CPU thread work fine
despite this but others don't.)
Instead of creating a second stack, let's borrow the approach ART uses:
Use pthread functions to find out the stack's address and size, then
install guard pages at an appropriate location. This lets us get rid
of a workaround we had in the MsgAlert function.
Because we're no longer choosing the stack size ourselves, I've made some
tweaks to where the put the guard pages. Previously we had a stack of
2 MiB and a safe zone of 512 KiB. We now accept stacks as small as 512 KiB
(used on macOS) and use a safe zone of 256 KiB. I feel like this should
be fine, but haven't done much testing beyond "it seems to work".
By the way, on Windows it was already the case that we didn't create
a second stack... But there was a bug in the implementation!
The code for protecting the stack has to run on the CPU thread, since
it's the CPU thread's stack we want to protect, but it was actually
running on EmuThread. This commit fixes that, since now this bug
matters on other operating systems too.
This broke formatting the system memory; see https://bugs.dolphin-emu.org/issues/13176. After calling ticket.DeleteTicket(), ticket.m_bytes was 0-length, but calling ticket.IsV1Ticket() still attempted to read from m_bytes.
This was introduced in 2fd9852ca8, although it didn't actually cause a crash until 929fba08e7.
This fixes a regression from 592ba31. When `a` was a constant 0 and `b`
was a non-constant 0x80000000, the 32-bit negation operation would
overflow, causing an incorrect result. The sign extension needs to happen
before the negation to avoid overflow.
Note that I can't merge the SXTW and NEG into one instruction.
NEG is an alias for SUB with the first operand being set to ZR,
but "SUB (extended register)" treats register 31 as SP instead of ZR.
I've also changed the order for the case where `a` is a constant
0xFFFFFFFF. I don't think the order actually affects correctness here,
but let's use the same order for all the cases since it makes the code
easier to reason about.
The previous code only updated the PLRU on cache misses, which made it so that the least recently inserted cache block was evicted, instead of the least recently used/hit one.
This regressed in 9d39647f9e (part of #11183, but it was fine in e97d380437), although beforehand it was only implemented for the instruction cache, and the instruction cache hit extremely infrequently when the JIT or cached interpreter is in use, which generally keeps it from behaving correctly (the pure interpreter behaves correctly with it).
I'm not aware of any games that are affected by this, though I did not do extensive testing.
Previously we would only backpatch overflowed address calculations
if the overflow was 0x1000 or less. Now we can handle the full 2 GiB
of overflow in both directions.
I'm also making equivalent changes to JitArm64's code. This isn't because
it needs it – JitArm64 address calculations should never overflow – but
because I wanted to get rid of the 0x100001000 inherited from Jit64 that
makes even less sense for JitArm64 than for Jit64.
This fixes a problem I was having where using frame advance with the
debugger open would frequently cause panic alerts about invalid addresses
due to the CPU thread changing MSR.DR while the host thread was trying
to access memory.
To aid in tracking down all the places where we weren't properly locking
the CPU, I've created a new type (in Core.h) that you have to pass as a
reference or pointer to functions that require running as the CPU thread.
This code doesn't need to be portable (since the goal is to have a smaller offset for x64 codegen), so if it's not supported there are other problems. Similar code exists in e.g. DSP.cpp.