Commit Graph

305 Commits

Author SHA1 Message Date
Ryan Houdek
1bc246735b Add some static_asserts to the Arm32 JIT to make sure ppcState is sane. 2014-09-02 18:38:42 -05:00
Pierre Bourdon
ddb2aefedf Merge pull request #904 from FioraAeterna/dcbz
JIT64: try enabling dcbz again
2014-09-02 15:41:40 +02:00
Lioncash
f8e24de833 Merge pull request #907 from FioraAeterna/rollbacklmw
JIT: revert lmw optimizations
2014-08-31 13:51:24 -04:00
Ryan Houdek
1a6268e6cf Merge pull request #899 from FioraAeterna/checkram
JIT: fix RAM check in load-from-constant-address
2014-08-30 20:49:33 -05:00
Lioncash
beb95b75ca PPCAnalyst: Use std::swap instead of making a temporary variable 2014-08-30 18:32:09 -04:00
Lioncash
eb535be874 Core: Clean up brace placements 2014-08-30 18:06:49 -04:00
Lioncash
1d706b2311 Get rid of C-style empty function parameter indicators 2014-08-30 15:23:48 -04:00
Fiora
1ed6be12b9 JIT: revert lmw optimizations
This seems to break Star Wars Rogue Leader and I have no idea why, so for the
meantime I'm just going to revert it since it's not very important.
2014-08-30 04:17:48 -07:00
Fiora
6f617c4175 JIT64: try enabling dcbz again
This time, check the address carefully beforehand, since apparently some games
do horrible things like running it on non-RAM addresses, or at the very least
virtual addresses.
2014-08-29 12:19:58 -07:00
Fiora
88095a607a JIT: fix RAM check in load-from-constant-address
A bug that seems to have been uncovered by allowing immediate-address loads.
Super Monkey Ball 2 crashes without this change -- it's possible, however, that
the game actually requires the MMU hack, since it crashed due to accessing an
address in the 0x20000000-0x3fffffff range.
2014-08-28 12:54:23 -07:00
Ryan Houdek
ad8fe0fb52 Merge pull request #879 from FioraAeterna/frspx
JIT64: add frspx implementation
2014-08-28 14:12:21 -05:00
Fiora
c359d65dfe JIT64: add frspx implementation 2014-08-28 11:40:31 -07:00
Ryan Houdek
4a78a8a72a Merge pull request #876 from FioraAeterna/floatloadstore
JIT64: clean up and unify float load/store code
2014-08-28 13:37:27 -05:00
Dolphin Bot
359aa664e1 Merge pull request #898 from FioraAeterna/fprffix
JIT: make fprf conditional in fcmp, just like the other instructions
2014-08-28 20:25:26 +02:00
Dolphin Bot
5e514dcfbc Merge pull request #881 from FioraAeterna/mulhwx
JIT64: add mulhwx implementation
2014-08-28 20:25:13 +02:00
Fiora
7929f2f033 JIT: make fprf conditional in fcmp, just like the other instructions
Missed in the FPRF merge (it didn't break anything, but it's probably a bit
slower and not consistent with the others).
2014-08-28 11:19:09 -07:00
Ryan Houdek
0217fb2008 Merge pull request #843 from FioraAeterna/fprf
JIT: Initial FPRF support
2014-08-28 13:15:50 -05:00
Fiora
043256449e Jit64: some load/store optimizations
Avoid extra ops during address calculation in loads; use LEAs or immediates
whenever possible.
2014-08-28 10:12:55 -07:00
Fiora
7e07acbf3f Fix another absent-minded typo in the fmul interpreter patch 2014-08-26 23:00:11 -07:00
Fiora
1a0a33518b Bugfixes for fmul rounding
Fix the places I forgot to add Force25Bit, and fix an incredibly silly typo bug
2014-08-26 21:37:45 -07:00
Fiora
7dbc623dc0 JIT: Initial FPRF support
Doesn't support all the FPSCR flags, just the FPRF ones.
Add PPCAnalyzer support to remove unnecessary FPRF calculations.

POV-ray benchmark with enableFPRF forced on for an extreme comparison:
Before: 1500s
After, fmul/fmadd only: 728s
After, all float: 753s

In real games that use FPRF, like F-Zero GX, FPRF previously cost a few percent
of total runtime.

Since FPRF is so much faster now, if enableFPRF is set, just do it for every
float instruction, not just fmul/fmadd like before. I don't know if this will
fix any games, but there's little good reason not to.
2014-08-26 10:57:03 -07:00
Fiora
288babf414 PPCFP: add comment 2014-08-26 09:08:22 -07:00
Fiora
90324f3809 JIT64: add mulhwx implementation 2014-08-26 01:09:04 -07:00
Fiora
aaca1b01e5 JIT64: clean up and unify float load/store code
While we're at it, support a bunch of float load/store variants that weren't
implemented in the JIT. Might not have a big speed impact on typical games but
they're used at least a bit in povray and luabench.

694 -> 644 seconds on povray.
2014-08-25 19:51:40 -07:00
Lioncash
44ee2f20b9 Merge pull request #874 from FioraAeterna/fixidiocy
JIT: fix incredibly silly mistake in fmul rounding patch
2014-08-25 13:22:33 -04:00
Fiora
f04e362721 JIT: fix incredibly silly mistake in fmul rounding patch 2014-08-25 10:10:28 -07:00
comex
6574682ff5 Remove unused variable m_zero. 2014-08-24 16:22:19 -04:00
comex
d128795594 Merge pull request #862 from comex/registersinuse
Reduce my idiocy in register saving code.
2014-08-24 16:16:32 -04:00
comex
a7752f49be Merge pull request #861 from comex/warnings
Fix warnings for OS X
2014-08-24 16:15:58 -04:00
comex
cf01f47b52 Fix bloody printf specifiers.
In particular, even in code that only runs on x86-64, you can't use
PRIx64 for size_t because, on OS X, one is unsigned long and the other
is unsigned long long and clang whines about the difference.  I guess
you could make a size_t specifier macro, but those are horribly ugly, so
I just used casting.

Anyone want to make a nice (and slow) template-based printf?

Now without bare 'unsigned'.
2014-08-24 15:56:41 -04:00
Pierre Bourdon
ebf1b98106 Merge pull request #834 from FioraAeterna/fixfmulrounding
JIT64: Fix fmul rounding issues
2014-08-24 19:49:56 +02:00
Fiora
4d7b1275c9 Interpreter: apply the same odd rounding to single multiplies as the JIT 2014-08-24 10:28:52 -07:00
Fiora
4f18f6078f JIT64: Fix fmul rounding issues
Thanks to magumagu's softfp experiments, we know a lot more about the Wii's
strange floating point unit than we used to. In particular, when doing a
single-precision floating point multiply (fmulsx), it rounds the right hand
side's mantissa so as to lose the low 28 bits (of the 53-bit mantissa).

Emulating this behavior in Dolphin fixes a bunch of issues with games that
require extremely precise emulation of floating point hardware, especially
game replays. Fortunately, we can do this with rather little CPU cost; just ~5
extra instructions per multiply, instead of the vast load of a pure-software
float implementation.

This doesn't make floating-point behavior at all perfect. I still suspect
fmadd rounding might not be quite right, since the Wii uses fused instructions
and Dolphin doesn't, and NaN/infinity/exception handling is probably off in
various ways... but it's definitely way better than before.

This appears to fix replays in Mario Kart Wii, Mario Kart Double Dash, and
Super Smash Brothers Brawl. I wouldn't be surprised if it fixes a bunch of
other stuff too.

The changes to instructions other than fmulsx may not be strictly necessary,
but I included them for completeness, since it feels wrong to fix some
instructions but not others, since some games we didn't test might rely on
them.
2014-08-24 10:28:52 -07:00
Pierre Bourdon
aaff5a0afb Merge pull request #856 from FioraAeterna/ppcfpopt
JIT: faster PPC_FP code
2014-08-24 19:25:56 +02:00
comex
d19ec35363 Reduce my idiocy in register saving code.
(1) Rename ABI_ALL_CALLEE_SAVED to ABI_ALL_CALLER_SAVED, because that's
what it was actually defined as (and used as).  Derp.

(2) RegistersInUse is always used for the purpose of saving registers
before calling a C++ function in the middle of a JIT block (without
flushing).  There is no need to save callee-saved registers in this
case.  Change the name to CallerSavedRegistersInUse and mask with
ABI_ALL_CALLER_SAVED.

Nothing obvious broke when starting up a Melee game.  (I added a test
for anything actually being masked out; it happens, but in this
particular case seemed to occur at most a few dozen times per second, so
the actual performance benefit is probably negligible.)
2014-08-23 15:46:10 -04:00
Fiora
59c1a46ab1 JIT: faster PPC_FP code
The PPC_FP conversion code can be made a lot simpler with the observation
that the only values that need to be sent through the slow x87 path are
denormals.

A whole bunch faster: 708->678 seconds on POV-RAY.
2014-08-23 07:44:42 -07:00
Ryan Houdek
fc5a73d62e Merge pull request #849 from FioraAeterna/fixppcanalyzer
PPCAnalyzer: move num_instructions initialization to correct place
2014-08-22 00:14:11 -05:00
Lioncash
25e29c323d Merge pull request #842 from lioncash/jit
Coding style clean up for the Jit, JitARM and JitIL
2014-08-21 15:25:43 -04:00
Fiora
5c0145f71b PPCAnalyzer: move num_instructions initialization to correct place
Much of the PPC Analyzer code (e.g. instruction reordering for merging
branches) wasn't actually being run.
2014-08-21 11:19:23 -07:00
Lioncash
f17dcd2019 Merge pull request #764 from magcius/new-nogui-2
Rewrite GLInterface
2014-08-21 14:14:54 -04:00
Lioncash
20f8ec9afa Core: Join a few if statements in IR.cpp 2014-08-20 14:26:16 -04:00
Lioncash
99ae79f7f9 Core: Better assert messages for stx op 2014-08-20 14:16:05 -04:00
Lioncash
d694637938 Core: Clean up brace/body placements for JitIL 2014-08-20 14:04:01 -04:00
Lioncash
b005ba2797 Core: Clean up body/brace placements in JitArm32 2014-08-20 14:03:46 -04:00
Lioncash
6145ced5f7 Core: Change "unsigned result" to "u32 result" in rlwinmx for Jit64 2014-08-20 12:51:30 -04:00
Lioncash
e7f49692e8 Core: Clean up body/brace placements in Jit64 and JitCommon 2014-08-20 12:50:42 -04:00
Shawn Hoffman
043ea31f14 msvc: resolve all warnings in Core.
Note: vs14 will support empty macro parameter as used by MMIO.
2014-08-19 22:33:46 -07:00
Jasper St. Pierre
3bad4bcfdb PPCSymbolDB: Don't show any messages in the status bar
These don't really help anybody. We don't even have a status bar
in MainNoGUI -- status bar text should be controlled by the UI, not the
core code!
2014-08-19 10:09:33 -04:00
Ryan Houdek
355f7b366b Merge pull request #831 from FioraAeterna/cleanupimm
JIT: cleanup unnecessary immedate size-checking logic
2014-08-19 04:15:22 -05:00
Dolphin Bot
2bcc8d414c Merge pull request #807 from FioraAeterna/avoidpcstore
JIT: avoid saving the PC on every store
2014-08-19 11:12:54 +02:00