Commit Graph

18 Commits

Author SHA1 Message Date
Francis Visoiu Mistrih a438432acc [FastISel] Skip creating unnecessary vregs for arguments
This behavior was added in r130928 for both FastISel and SD, and then
disabled in r131156 for FastISel.

This re-enables it for FastISel with the corresponding fix.

This is triggered only when FastISel can't lower the arguments and falls
back to SelectionDAG for it.

FastISel contains a map of "register fixups" where at the end of the
selection phase it replaces all uses of a register with another
register that FastISel sometimes pre-assigned. Code at the end of
SelectionDAGISel::runOnMachineFunction is doing the replacement at the
very end of the function, while other pieces that come in before that
look through the MachineFunction and assume everything is done. In this
case, the real issue is that the code emitting COPY instructions for the
liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg
assigned to the physreg is used, and if it's not, it will skip the COPY.
If a register wasn't replaced with its assigned fixup yet, the copy will
be skipped and we'll end up with uses of undefined registers.

This fix moves the replacement of registers before the emission of
copies for the live-ins.

The initial motivation for this fix is to enable tail calls for
swiftself functions, which were blocked because we couldn't prove that
the swiftself argument (which is callee-save) comes from a function
argument (live-in), because there was an extra copy (vreg to vreg).

A few tests are affected by this:

* llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21
(callee-save) but never reload it because it's attached to the return.
We now don't even spill it anymore.
* llvm/test/CodeGen/*/swiftself.ll: we tail-call now.
* llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this
test was not really testing the right thing, but it worked because the
same registers were re-used.
* llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes
* llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy
* llvm/test/CodeGen/Mips/*: get rid of spills and copies
* llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack
* llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack
* llvm/test/CodeGen/X86/swifterror.ll: same as AArch64
* llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed

Differential Revision: https://reviews.llvm.org/D62361

llvm-svn: 362963
2019-06-10 16:53:37 +00:00
Craig Topper 46e5052b8e [X86FixupLEAs] Turn optIncDec into a generic two address LEA optimizer. Support LEA64_32r properly.
INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags.

This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg.

One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input.

Differential Revision: https://reviews.llvm.org/D61472

llvm-svn: 361691
2019-05-25 06:17:47 +00:00
Matt Arsenault 828b685ebe RegAllocFast: Improve hinting heuristic
Trace through multiple COPYs when looking for a physreg source. Add
hinting for vregs that will be copied into physregs (we only hinted
for vregs getting copied to a physreg previously).  Give hinted a
register a bonus when deciding which value to spill.  This is part of
my rewrite regallocfast series. In fact this one doesn't even have an
effect unless you also flip the allocation to happen from back to
front of a basic block. Nonetheless it helps to split this up to ease
review of D52010

Patch by Matthias Braun

llvm-svn: 360887
2019-05-16 12:50:39 +00:00
Matt Arsenault b6c599afd3 Reapply r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block"
This reverts commit r359912.

This should pass now, since the clang test was made less fragile in
r359918.

llvm-svn: 359919
2019-05-03 19:06:57 +00:00
Nico Weber bb852a9672 Revert r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block"
Makes clang/test/Misc/backend-stack-frame-diagnostics-fallback.cpp fail.

llvm-svn: 359912
2019-05-03 18:08:03 +00:00
Matt Arsenault daf2d653fa RegAllocFast: Add heuristic to detect values not live-out of a block
Add an improved/new heuristic to catch more cases when values are not
live out of a basic block.

Patch by Matthias Braun

llvm-svn: 359906
2019-05-03 17:03:24 +00:00
Matt Arsenault c2e35a6f32 RegAllocFast: Remove early selection loop, the spill calculation will report cost 0 anyway for free regs
The 2nd loop calculates spill costs but reports free registers as cost
0 anyway, so there is little benefit from having a separate early
loop.

Surprisingly this is not NFC, as many register are marked regDisabled
so the first loop often picks up later registers unnecessarily instead
of the first one available in the allocation order...

Patch by Matthias Braun

llvm-svn: 356499
2019-03-19 19:01:34 +00:00
Philip Reames db65a5b776 Allow unordered loads to be considered invariant in CodeGen
The actual code change is fairly straight forward, but exercising it isn't. First, it turned out we weren't adding the appropriate flags in SelectionDAG. Second, it turned out that we've got some optimization gaps, so obvious test cases don't work.

My first attempt (in atomic-unordered.ll) points out a deficiency in our peephole-opt folding logic which I plan to fix separately. Instead, I'm exercising this through MachineLICM.

Differential Revision: https://reviews.llvm.org/D59375

llvm-svn: 356494
2019-03-19 18:27:18 +00:00
Philip Reames 2153c4b828 [AtomicExpand] Fix a crash bug when lowering unordered loads to cmpxchg
Add tests for wider atomic loads and stores.  In the process, fix a crasher where we appearently handled unorder stores, but not loads, when lowering to cmpxchg idioms.

llvm-svn: 356482
2019-03-19 17:20:49 +00:00
Philip Reames 376c87fcd4 [Tests] Update to newer ISA
There are some issues w/missed opts on older platforms, but that's not the purpose of this test.  Using a newer API points out that some TODOs are already handled, and allows addition of tests to exercise other issues (future patch.)

llvm-svn: 356473
2019-03-19 16:46:56 +00:00
Philip Reames af41b282c5 [Tests] Add tests for reordering of unordered atomics on invariant locations
llvm-svn: 356172
2019-03-14 17:36:58 +00:00
Philip Reames 70d156991c Allow code motion (and thus folding) for atomic (but unordered) memory operands
Building on the work done in D57601, now that we can distinguish between atomic and volatile memory accesses, go ahead and allow code motion of unordered atomics. As seen in the diffs, this allows much better folding of memory operations into using instructions. (Mostly done by the PeepholeOpt pass.)

Note: I have not reviewed all callers of hasOrderedMemoryRef since one of them - isSafeToMove - is very widely used. I'm relying on the documented semantics of each method to judge correctness.

Differential Revision: https://reviews.llvm.org/D59345

llvm-svn: 356170
2019-03-14 17:20:59 +00:00
Philip Reames 8dd9b54d9b [Tests] Add negative folding tests w/fences as requested in D59345
llvm-svn: 356165
2019-03-14 17:05:18 +00:00
Philip Reames 9fc51bae73 [Tests] More unordered atomic lowering tests
This time, focused around narrowing and widening transformations.  Also, include a few simple memory optimization tests to highlight missed oppurtunities.  This is part of building up the test base for D57601.

llvm-svn: 353972
2019-02-13 19:49:17 +00:00
Philip Reames 9239b9a0e2 [Tests] RMW folding tests w/unordered atomic operations
We get a suprising number of these today actually, but some are missed. The main point of this is strengthen the test set for D57601.

llvm-svn: 353966
2019-02-13 18:41:54 +00:00
Philip Reames 430d294f0b [Tests] Add a bunch of tests for load folding w/unordered atomics
llvm-svn: 353964
2019-02-13 18:26:01 +00:00
Philip Reames 5dddedee93 [Tests] First batch of cornercase tests for unordered atomics
Mixture of things we legally can't do, and things we're missing.  Once D57601 is in, the later will serve as a punch list.

llvm-svn: 353959
2019-02-13 18:00:58 +00:00
Philip Reames b5bb4a4ec6 [Test] Add codegen tests for unordered and monotonic integer operations
llvm-svn: 353266
2019-02-06 03:19:04 +00:00