Commit Graph

645 Commits

Author SHA1 Message Date
Ryan Houdek
bfbbddd76f Rewrites ARMv7 fastmem entirely.
This is a fairly lengthy change that can't be separated out to multiple commits well due to the nature of fastmem being a bit of an intertangled mess.
This makes my life easier for maintaining fastmem on ARMv7 because I now don't have to do any terrible instruction counting and NOP padding. Really
makes my brain stop hurting when working with it.

This enables fastmem for a whole bunch of new instructions, which basically means that all instructions now have fastmem working for them. This also
rewrites the floating point loadstores again because the last implementation was pretty crap when it comes to performance, even if they were the
cleanest implementation from my point of view.

This initially started with me rewriting the fastmem routines to work just like the previous/current implementation of floating loadstores. That was
when I noticed that the performance tanked and decided to rewrite all of it.

This also happens to implement gatherpipe optimizations alongside constant address optimization.

Overall this comment brings a fairly large speedboost when using fastmem.
2014-11-21 05:21:57 -06:00
Lioncash
19dabee326 Merge pull request #1568 from rohit-n/android-warnings
Android: Silence a few warnings.
2014-11-19 12:17:13 -05:00
skidau
2affe25191 Merge pull request #1559 from Sonicadvance1/armv7-minor-optimizations
ARMv7 block profiling + minor optimization
2014-11-18 12:51:03 +11:00
Rohit Nirmal
8ec791c4f3 Android: Silence a few warnings. 2014-11-17 19:21:38 +00:00
Ryan Houdek
f9208dcc13 Fixes ARMv7 FP loadstores using fastmem when not enabled. 2014-11-16 21:12:11 -06:00
comex
aa2fc1f66b Merge pull request #1449 from comex/memtools-merge
Reorganize faulting stuff.  Differentiate between arch- and OS-specific defines.
2014-11-16 13:46:33 -05:00
Ryan Houdek
30e1749d00 Implements block time profiling on ARMv7.
This was interesting implementing.
Our generic QueryPerformanceCounter function on ARMv7 was so slow that profiling a block was impossible.
I waited about five minutes and I couldn't even get a single frame to output.
This instead uses ARMv7's PMU to get cycle counts, which are a relatively minor performance drop in my testing.
One disadvantage of this method is that the kernel can lock us out of using these co-processor registers, but it seems to work on my Jetson board.
Another disadvantage is that we aren't having block times in "real" time but cycles instead, not too big of a deal.

This also removes instruction run counts from profiling because that's just annoying and we don't expose an interface for even getting those results
from our UI.
2014-11-16 09:29:27 +00:00
Ryan Houdek
6683b194ff ARMv7 register cache optimizations.
Enable support for not loading a destination register on FPR cache.
Dump registers if they won't be used later in the block. Stolen from Fiora.
2014-11-16 09:29:22 +00:00
Lioncash
3eab75bc9c Core: Join some variable declarations and assignments 2014-11-15 20:21:35 -05:00
Ryan Houdek
181f16c5f0 Reimplements fastmem for ARMv7 floating point loadstores.
This implements a new system for fastmem backpatching on ARMv7 that is less of a mindfsck to deal with.
This also implements stfs under the default loadstore path as well, not sure why it was by itself in the first place.

I'll be moving the rest of the loadstore methods over to this new way in a few days.
2014-11-15 21:17:50 +00:00
Ryan Houdek
b0becf7af8 Enables the ARMv7 FPR cache.
The problem instructions with the FPR are disabled in the previous commit. This can now be reenabled for fairly large performance gains.
2014-11-14 15:14:10 +00:00
Ryan Houdek
69c3e6516c Disables NEON optimized instructions.
These are causing issues in games. In particular you get pink on the screen in Animal Crossing.
Disable until fully investigated.

This also disables fastmem on floating point loadstore instructions which are horribly broken and won't actually backpatch when an invalid read/write
is encountered.
2014-11-14 15:13:13 +00:00
Stevoisiak
b25e1a2eb4 Various formatting and consistency fixes 2014-11-13 22:42:18 -05:00
skidau
b1f8974db8 Merge pull request #1527 from FioraAeterna/mftbfix
JIT: revert accuracy improvement to mftb
2014-11-13 12:11:13 +11:00
Fiora
4b105ed0e4 JIT: revert accuracy improvement to mftb
Fixes a few games (e.g. Karaoke Revolution Party) for reasons explained in the
comments.
2014-11-10 20:31:07 -08:00
Fiora
6603f98d04 JIT: add 64-bit write support to FIFO functions
Also fix 64-bit values passed to CallAC and otherwise correct immediate
handling in FIFO writes.
Fixes 007 Nightfire.
2014-11-09 21:24:30 -08:00
skidau
c36e7b9c23 Merge pull request #1494 from lioncash/statics
PPCCache: Make PLRU lookup tables static
2014-11-07 12:33:19 +11:00
Lioncash
a5d304eb16 Merge pull request #1495 from lioncash/unused
Interpreter: Remove dead patches() function
2014-11-06 20:31:01 -05:00
Fiora
b8d88a41e0 JIT: remove accidentally left-in debug code 2014-11-05 17:44:13 -08:00
Lioncash
606efbce10 PPCCache: Make PLRU lookup tables static 2014-11-05 19:35:30 -05:00
Rohit Nirmal
1beb047959 PowerPC: Remove unused variable. 2014-11-05 11:47:44 -05:00
Lioncash
a105a9a557 Interpreter: Remove dead patches() function 2014-11-04 20:44:57 -05:00
skidau
0515ab852e Merge pull request #1230 from FioraAeterna/constaddr
JIT: improve handling of stores with a known address
2014-11-05 12:40:38 +11:00
Fiora
b81686b582 JIT: fix register preloading
Partially broken by typoes in the bitset patch.
2014-11-04 04:50:05 -08:00
Fiora
f8880c0284 JIT: fix typo in optimization patch
Whoops... made us flush everything on every branch.
2014-11-04 02:04:30 -08:00
comex
9cba787871 Merge pull request #1408 from randomstuff/perf
Profiling: measure time on non-Windows/POSIX using clock_gettime
2014-11-03 22:36:32 -05:00
Lioncash
30f97723db Core: Fix potentially uninitialized variable warnings 2014-11-03 22:21:10 -05:00
comex
42d41a456e Merge pull request #1489 from FioraAeterna/revertopt
JIT: revert cmpXX optimization
2014-11-03 21:07:11 -05:00
Fiora
768273f59b JIT: revert cmpXX optimization
It seems like this wasn't correct in 100% of cases.
2014-11-03 17:50:20 -08:00
skidau
027791685a Merge pull request #1483 from comex/on-demand-exi-interrupts
Make EXI use CoreTiming events like everything else instead of having its own special check.
2014-11-04 12:31:12 +11:00
Fiora
ce71c3cd4e JIT: fix valid_block marking
This caused invalidations that only affected the last portion of a JIT block
to fail, breaking Wii64's block linking. It might affect a bunch of other
games too; I haven't tested.
2014-11-03 16:23:44 -08:00
Fiora
fc63c7ecae JIT: genericize immediate address handling, support in float stores too 2014-11-03 01:31:39 -08:00
comex
9f683f353b Make EXI use CoreTiming events like everything else instead of having its own slow special check.
Microphone is probably wrong/mistimed because it doesn't take into
account cycles late, but that's not a new issue here.
2014-11-03 00:28:46 -05:00
Fiora
e729fc4a28 JIT: fix dumb mistake in crclr optimization patch 2014-11-02 21:03:11 -08:00
Ryan Houdek
204598a082 Merge pull request #1350 from FioraAeterna/integeropts
Various smallish JIT optimizations
2014-11-02 20:13:20 -06:00
Gabriel Corona
641e820257 Profiling: measure time on POSIX systems using clock_gettime 2014-11-03 00:07:12 +01:00
Ryan Houdek
6e43562496 Merge pull request #1468 from Tilka/cleanup
Small cleanup
2014-11-02 11:02:35 -06:00
Ryan Houdek
0d70880d89 Merge pull request #1466 from Sonicadvance1/ARMv7-and-optimization
Optimizes ARMv7 andi{s,}_rc implementations.
2014-11-02 09:33:37 -06:00
Ryan Houdek
824bad458c Merge pull request #1454 from lioncash/interp
Interpreter: Remove a redundant macro
2014-11-02 09:33:19 -06:00
Tillmann Karras
f4fed0dea0 JitAsm: remove unused code pointers 2014-11-02 02:00:47 +01:00
Ryan Houdek
86ca63658b Optimizes ARMv7 andi{s,}_rc implementations.
Cuts down from a 3 instruction max implementation down to 1 instruction if the immediate can fit in to the instruction encoding.
2014-11-01 13:06:52 +00:00
Fiora
7deaf00c44 JIT: more mftb fixes
A very subtle difference in how I calculated the timebase value seems
to have broken Karaoke Revolution; this seems to fix it. Also be a bit more
paranoid in conditions for mftb merging.
2014-11-01 03:15:25 -07:00
Lioncash
475bb40364 Interpreter: Remove a redundant macro 2014-10-31 10:55:25 -04:00
comex
2ecd849eab Reorganize faulting stuff. Differentiate between arch- and OS-specific defines.
- Get rid of ArmMemTools.cpp and rename x64MemTools.cpp to MemTools.cpp.
  ArmMemTools was almost identical to the POSIX part of x64MemTools, and
  the two differences, (a) lack of sigaltstack, which I added to the
  latter recently, and (b) use of r10 to determine the fault address
  instead of info->si_addr (meaning it only works for specifically
  formatted JIT code), I don't think are necessary.  (Plus Android, see
  below.)

- Rename Core/PowerPC/JitCommon/JitBackpatch.h to Core/MachineContext.h.
  It doesn't contain anything JIT-specific anymore, and e.g. locking
  will want to use faulting support regardless of whether any JIT is in
  use.

- Get rid of different definitions of SContext for different
  architectures under __linux__, since this is POSIX.  The exception is
  of course Android being shitty; I moved the workaround definition from
  ArmMemTools.cpp to here.

- Get rid of #ifdefs around EMM::InstallExceptionHandler and just
  provide an empty implementation for unsupported systems (i.e.
  _M_GENERIC really).  Added const bool g_exception_handlers_supported
  for future use; currently exception handlers are only used by the JIT,
  whose use implies non-M_GENERIC, but locking will change that.

- Remove an unnecessary typedef.
2014-10-31 00:14:06 -04:00
Fiora
fb0960f0ee JIT: flush unused registers during branch merges
Also correct some flags in interpreter tables.
2014-10-29 00:32:59 -07:00
Fiora
1ec1a9c33a JIT: optimize crclr special case of crxor 2014-10-29 00:30:27 -07:00
Fiora
97fba41860 JIT: merge fcmpx and cror
Almost all uses of boolean condition-register ops in real code seem to be
the combination fcmpx + cror (e.g. for <= or >=). This merges the two.
2014-10-29 00:30:27 -07:00
Fiora
a666bb6bf6 JIT: optimize mulhwu 2014-10-29 00:30:26 -07:00
Fiora
5b5e462200 JIT: reorder blr comparisons
This should allow macro-op fusion in blr instructions.
2014-10-29 00:30:26 -07:00
Fiora
7388c62439 JIT: use BLR optimization to avoid anding LR with 0xFFFFFFFC
Should save roughly one instruction per blr.
2014-10-29 00:30:26 -07:00