Commit Graph

33188 Commits

Author SHA1 Message Date
JosJuice
35ce08fb88 UnitTests: Add PageTableHostMappingTest 2026-02-04 21:35:22 +01:00
JosJuice
94283c9639 Core: Don't call InitMMIO from MemoryManager::Init
In the unit test I'm adding in the next commit, I want to call
MemoryManager::Init, but initializing all the hardware that
MemoryManager::InitMMIO calls into would be cumbersome.

Calling MemoryManager::InitMMIO from MemoryManager::Init was a bit
strange anyway. Because MemoryManager::Init is called about halfway
through HW::Init, some of the hardware that MemoryManager::InitMMIO
calls into isn't initialized yet.
2026-02-04 21:35:22 +01:00
JosJuice
b0e2a28e14 Core: Combine guest pages into host pages larger than 4K
Most systems that Dolphin runs on have a page size of 4 KiB, which
conveniently matches the page size of the GameCube and Wii. But there
are also systems that use larger page sizes, notably Apple CPUs with
16 KiB page sizes. To let us create host mappings on such systems, this
commit implements combining guest mappings into host page sized mappings
wherever possible.

For this to work for a given mapping, not only do four (in the case of
16 KiB) guest mappings have to exist adjacent to each other, but the
corresponding translated addresses also have to be adjacent, and the
lowest bits of the addresses have to match. When I tested a few games,
the following percentages of guest mappings met these criteria:

Spider-Man 2: 0%-12%
Rogue Squadron 2: 39%-42%
Rogue Squadron 3: 28%-41%

So while 16 KiB systems don't get as much of a performance improvement
as 4 KiB systems, they do still get some improvement.
2026-02-04 21:35:22 +01:00
JosJuice
0ce95299f6 Core: Don't create page table mappings before R/C bits are set
This gets rid of the hack of setting the R and C bits pessimistically,
reversing the performance regression in Rogue Squadron 3.
2026-02-04 21:35:22 +01:00
JosJuice
9462e9d890 Core: Update page table mappings incrementally
Removing and readding every page table mapping every time something
changes in the page table is very slow. Instead, let's generate a diff
and ask Memmap to update only the diff.
2026-02-04 21:35:20 +01:00
JosJuice
7b885b857e Core: Postpone page table updates when DR is unset
Page table mappings are only used when DR is set, so if page tables are
updated when DR isn't set, we can wait with updating page table mappings
until DR gets set. This lets us batch page table updates in the Disney
Trio of Destruction, improving performance when the games are loading
data. It doesn't help much for GameCube games, because those run tlbie
with DR set.

The PowerPCState struct has had its members slightly reordered. I had to
put pagetable_update_pending less than 4 KiB from the start so AArch64's
LDRB (immediate) can access it, and I also took the opportunity to move
some other members around to cut down on padding.
2026-02-04 21:34:07 +01:00
JosJuice
083f3a7e0e Core: Create fastmem mappings for page address translation
Previously we've only been setting up fastmem mappings for block address
translation, but now we also do it for page address translation. This
increases performance when games access memory using page tables, but
decreases performance when games set up page tables.

The tlbie instruction is used as an indication that the mappings need to
be updated.

There are some accuracy downsides:

* The TLB is now effectively infinitely large, which matters if games
  don't use tlbie when modifying page tables.
* The R and C bits for page table entries get set pessimistically rather
  than when the page is actually accessed.

No games are known to be broken by these inaccuracies, but unfortunately
the second inaccuracy causes a large performance regression in Rogue
Squadron 3. You still get the old, more accurate behavior if Enable
Write-Back Cache is on.
2026-02-04 21:33:56 +01:00
JosJuice
d3ec630904 Core: Pre-shift pagetable_hashmask left by 6
This will make the upcoming commits just a little bit neater to
implement.
2026-02-01 12:39:33 +01:00
JosJuice
08884746ed Core: Detect SR updates 2026-02-01 12:39:33 +01:00
JosJuice
183e12b055 Common/MemArena: Add function for getting page size 2026-02-01 12:39:32 +01:00
Dentomologist
6711d77b99
Merge pull request #14204 from Geotale/update-comment
Update Comments Based On Hardware Test
2026-01-31 15:02:09 -08:00
Joshua Vandaële
e6bc8fb342
WGL: Correctly load wglDestroyPbufferARB extension 2026-01-31 10:36:55 +01:00
Sam Belliveau
51c8f18b73 feat: Add an option to preserve audio pitch when emulation speed changes, integrating it into core configuration and both Qt and Android UIs. 2026-01-27 18:48:22 -05:00
Dentomologist
8f662f7be3 InputConfig: Remove unused local variables
Remove unused vector `controller_names` from `LoadConfig` and
`SaveConfig`. The vector has names added to it but they're never used.

Prior to d03f9032c1 these vectors were
passed to `DynamicInputTextureManager::GenerateTextures`, but that
commit removed those calls.
2026-01-26 11:36:42 -08:00
OatmealDome
c4c2aa8afd
Merge pull request #14253 from JosJuice/dsp-hle-memory
DSP: Remove HLEMemory functions
2026-01-25 22:07:59 -05:00
OatmealDome
164110d370
Merge pull request #14036 from TellowKrinkle/SkipPostprocess
CMake: Default SKIP_POSTPROCESS_BUNDLE to ON
2026-01-25 22:01:29 -05:00
Dentomologist
1b6a45df69
Merge pull request #14214 from JoshuaVandaele/cmake-nonbreaking-improvements
CMake: Various improvements
2026-01-25 18:33:46 -08:00
OatmealDome
168dbb0ab8
Merge pull request #14302 from oltolm/opengl_assert
DX, OGL: fix assert
2026-01-25 21:06:57 -05:00
Martino Fontana
a14c88ba67 Remove unused imports
Yellow squiggly lines begone!
Done automatically on .cpp files through `run-clang-tidy`, with manual corrections to the mistakes.
If an import is directly used, but is technically unnecessary since it's recursively imported by something else, it is *not* removed.
The tool doesn't touch .h files, so I did some of them by hand while fixing errors due to old recursive imports.
Not everything is removed, but the cleanup should be substantial enough.
Because this done on Linux, code that isn't used on it is mostly untouched.
(Hopefully no open PR is depending on these imports...)
2026-01-25 16:12:15 +01:00
JMC47
533fd18d8a
Merge pull request #14303 from Sintendo/game-ini
Core: Pass game ID as string_view
2026-01-24 15:36:29 -05:00
JosJuice
388b1e861c
Merge pull request #14230 from Sintendo/file-search
Common/FileSearch: Refactor DoFileSearch
2026-01-24 20:42:31 +01:00
oltolm
169f99c14d DX, OGL: fix assert 2026-01-24 20:38:05 +01:00
JMC47
1ef75021b6
Merge pull request #14216 from iwubcode/gameid_fifo_log
Core: add game id to fifo log header
2026-01-24 14:28:23 -05:00
Sintendo
b135537d65 Core/NetPlayServer: Pass game ID as string_view 2026-01-24 18:03:03 +01:00
Sintendo
8e6d95adb1 Core/ConfigManager: Refactor LoadGameIni and friends 2026-01-24 18:03:03 +01:00
Sintendo
bc4b977e9d Core/AchievementManager: Refactor IsApprovedCode and users 2026-01-24 18:02:57 +01:00
Sintendo
c0e75f2821 Core/ConfigLoaders: Refactor GetGameIniFilenames 2026-01-24 17:52:46 +01:00
Sintendo
60ca0626df Remove VectorToJStringArray 2026-01-24 16:50:10 +01:00
Sintendo
972ec95cb3 Clean includes 2026-01-24 16:50:10 +01:00
Sintendo
f2e1c71803 Common/FileSearch: Refactor DoFileSearch 2026-01-24 16:50:10 +01:00
JosJuice
3221e982d3
Merge pull request #13900 from JosJuice/jit-fma-double-rounding
Jit: Implement error-free transformation for single-precision FMA
2026-01-23 21:43:18 +01:00
Dentomologist
009c53ab89
Merge pull request #14146 from jordan-woyak/cached-interp-fix-function-cast-warning
CachedInterpreter: Replace reinterpret_cast with std::bit_cast to resolve -Wcast-function-type-mismatch warnings.
2026-01-21 13:29:44 -08:00
iwubcode
9656332356 Core: add game id to fifo logs, this makes it easier to test graphical enhancements which use the game id to load 2026-01-19 16:03:19 -06:00
Jordan Woyak
2a771937cf
Merge pull request #14294 from JosJuice/textureinfo-getters-header
VideoCommon: Move TextureInfo getters to header
2026-01-19 14:42:20 -06:00
JosJuice
b07c78aabe VideoCommon: Move TextureInfo getters to header
This improves my PC's performance on RS2 Hoth by... 0.1% or so, which I
think is within the margin of error. But this change also cuts down on
boilerplate.
2026-01-19 19:46:21 +01:00
Dentomologist
f4b88af71e JitRegister: Check Open return code
If the call to `Open` a perf map fails don't set `s_is_enabled` (though
it could already be true if you're also using VTUNE) and don't call
`std::setvbuf` with a null stream.

Also fix a typo in a comment (`if` -> `in`)
2026-01-18 17:26:26 -08:00
Dentomologist
7490dea278 JitRegister: Verify IOFile IsGood
Use IOFile's bool operator to check that it's not just open but good.
2026-01-18 17:26:26 -08:00
Dentomologist
935f537a80 JitRegister: Fix IsEnabled when using VTune without perf
Set `s_is_enabled` to `true` in `Init` when `USE_VTUNE` is defined so
that `IsEnabled` returns true even if perf isn't being used.
2026-01-18 17:26:26 -08:00
Admiral H. Curtiss
710905138c
Merge pull request #14290 from Dentomologist/jitregister_remove_redundant_open_file_check
JitRegister: Remove redundant check for open file
2026-01-19 02:06:57 +01:00
JosJuice
3b1a4739bc JitArm64: Special-case fmadds with single-precision inputs
If all inputs to an fmadds instruction (including cousins like fmsubs,
fnmadd...) are single-precision, then the result is identical between a
double-precision calculation with an error-free transform (whether the
calculation is fused or not) and a single-precision FMA instruction
(must be fused). So as a performance optimization in JitArm64, if we
were going to use double precision with EFT but the inputs are singles,
instead we'll use a normal single-precision FMA instruction without
anything extra. This lets us skip both the EFT and double-to-single
conversions.

Also renaming `inaccurate_fma` to `nonfused` because it's confusing that
`inaccurate_fma` and `m_accurate_fmadds` have such similar names
despite controlling separate things.
2026-01-18 20:03:54 +01:00
JosJuice
58487f1633 Jit: Implement error-free transformation for single-precision FMA
This implements the equivalent of 07443e2d41 in Jit64 and JitArm64.
Aims to fix https://bugs.dolphin-emu.org/issues/13865.
2026-01-18 20:02:49 +01:00
JosJuice
6ac7ffcdd7 Jit64: Return FixupBranch from HandleNaNs
This will be used in the next commit to skip running code that's
unnecessary when the result is NaN.
2026-01-18 20:02:49 +01:00
JosJuice
d5067b6276 Jit64: Replace MOVSD with MOVAPD in software FMA
Should be a little faster by avoiding false dependencies. Note that
there is one remaining MOVSD that really needs to be a MOVSD.
2026-01-18 20:02:49 +01:00
JosJuice
caad84c636 JitArm64: Reduce register pressure for inaccurate FMA with accurate NaNs
If result_reg is set to a temporary register instead of VD because of
accurate NaNs, there's no need to allocate a secondary temporary
register because of inaccurate FMA.
2026-01-18 20:02:49 +01:00
JosJuice
84261cfc23 Arm64Emitter: Fix Q bit of vector SHL/URSHR encoding
This doesn't affect any existing callers, because all existing callers
use quad registers.
2026-01-18 20:02:49 +01:00
JMC47
fe668ebc89
Merge pull request #14033 from JosJuice/jitarm64-inaccurate-fma-double
JitArm64: Always use double precision for inaccurate FMA
2026-01-18 13:52:06 -05:00
JMC47
f8b47c031f
Merge pull request #14291 from JosJuice/defer-textureinfo
VideoCommon: Defer creating TextureInfo
2026-01-18 13:33:14 -05:00
JosJuice
fb07406f10 VideoCommon: Defer creating TextureInfo
TextureCacheBase::LoadImpl has a hot path where the passed-in
TextureInfo never gets used. Instead of passing in a TextureInfo, let's
pass in the stage and create the TextureInfo from the stage if needed.

This unlocks somewhere above an additional 4% performance boost in the
Hoth level of Rogue Squadron 2 on my PC. Performance varies, making it
difficult for me to measure, so treat this as a very approximate number.
2026-01-18 13:04:06 +01:00
JosJuice
addededecf JitArm64: Always use double precision for inaccurate FMA
When we're emulating single-precision FMA using an FMA instruction,
there's no precision benefit from using a double-precision instruction,
assuming all inputs are single-precision. But when we're emulating
single-precision FMA using separate multiplication and addition
instructions, there is.

This change increases the precision of inaccurate FMA to the same level
as Jit64, which matters since the only reason we have the inaccurate
FMA mode is for sync compatibility with Jit64.
2026-01-18 10:36:00 +01:00
Jordan Woyak
362d359787 ARDecrypt: Code modernization. 2026-01-18 01:22:10 -06:00