Commit Graph

18345 Commits

Author SHA1 Message Date
Paweł Bylica e399c58778
[DAGCombine] Add tests for D57317
Add two tests for D57317: Deduplicate addcarry node using commutativity.
https://reviews.llvm.org/D57317
2022-10-01 16:59:44 +02:00
Matthias Braun 189900eb14 X86: Stop assigning register costs for longer encodings.
This stops reporting CostPerUse 1 for `R8`-`R15` and `XMM8`-`XMM31`.
This was previously done because instruction encoding require a REX
prefix when using them resulting in longer instruction encodings. I
found that this regresses the quality of the register allocation as the
costs impose an ordering on eviction candidates. I also feel that there
is a bit of an impedance mismatch as the actual costs occure when
encoding instructions using those registers, but the order of VReg
assignments is not primarily ordered by number of Defs+Uses.

I did extensive measurements with the llvm-test-suite wiht SPEC2006 +
SPEC2017 included, internal services showed similar patterns. Generally
there are a log of improvements but also a lot of regression. But on
average the allocation quality seems to improve at a small code size
regression.

Results for measuring static and dynamic instruction counts:

Dynamic Counts (scaled by execution frequency) / Optimization Remarks:
    Spills+FoldedSpills   -5.6%
    Reloads+FoldedReloads -4.2%
    Copies                -0.1%

Static / LLVM Statistics:
    regalloc.NumSpills    mean -1.6%, geomean -2.8%
    regalloc.NumReloads   mean -1.7%, geomean -3.1%
    size..text            mean +0.4%, geomean +0.4%

Static / LLVM Statistics:
    mean -2.2%, geomean -3.1%) regalloc.NumSpills
    mean -2.6%, geomean -3.9%) regalloc.NumReloads
    mean +0.6%, geomean +0.6%) size..text

Static / LLVM Statistics:
    regalloc.NumSpills   mean -3.0%
    regalloc.NumReloads  mean -3.3%
    size..text           mean +0.3%, geomean +0.3%

Differential Revision: https://reviews.llvm.org/D133902
2022-09-30 16:01:33 -07:00
Simon Pilgrim ba8e2cb90d [X86] Tweak avx512-gfni-intrinsics.ll tests to avoid xor(select(c,x,0)) 'passthrough' patterns
These can be manipulated by foldSelectWithIdentityConstant and lose the predicate/predicate-zero instruction test coverage - use an insertvalue chain into an aggregate instead to retain all the results.

Noticed while trying to convert foldSelectWithIdentityConstant to use llvm::isNeutralConstant
2022-09-30 15:38:17 +01:00
Serge Pavlov b3913a9cdf [GlobalISel] Do not crash on widening vector result
Function buildCopyToRegs did not handle properly the case when it should
make wider vector result. It happened, for example, in a function that
returns value of type <2 x f32>, which should be widen to <4 x f32> to
fit XMM register. The function eventually calls
MachineIRBuilder.buildUnmerge, which does not expect that only one
destination register is specified.

Now this case is treated specifically in buildCopyToRegs.

Differential Revision: https://reviews.llvm.org/D128546
2022-09-30 21:30:55 +07:00
Amaury Séchet d7600c7ccb [DAG] select Cond, C, -1 --> or (sext (not Cond)), C when C is MVT::i1
In the spirit of D130765 . Get rid of cbranches and/or cmov. Usually shorter, but sometime not, becaus eit's hard to prededict when dependency breaking xor will be introduced.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D134736
2022-09-30 00:36:58 +00:00
Bjorn Pettersson 0513b0305a [X86] Avoid miscompile in combineOr (X86ISelLowering.cpp)
In combineOr (X86ISelLowering.cpp) there is a DAG combine that rewrite
a "(0 - SetCC) | C" pattern into something simpler given that a LEA
can be used. Another requirement is that C has some specific value,
for example 1 or 7. When checking those requirements the code used a
32-bit unsigned variable to store the value of C. So for a 64-bit OR
this could miscompile in case any of the 32 most significant bits in
C were non zero.

This patch adds fixes the bug by using a large enough type for the
C value.

The faulty code seem to have been introduced by commit 9bceb8981d
(D131358).

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D134892
2022-09-29 21:24:31 +02:00
Bjorn Pettersson e4fcbf3950 [X86] Pre-commit test case showing bug in combineOr (X86ISelLowering.cpp)
In combineOr (X86ISelLowering.cpp) there is a DAG combine that rewrite
a "(0 - SetCC) | C" pattern into something simpler given that a LEA
can be used. Another requirement is that C has some specific value,
for example 1 or 7. When doing that check it is using a 32-bit
unsigned variable to store the value of C. So for a 64-bit OR this
could miscompile in case any of the 32 most significant bits in C
are set.

This patch adds a test case to show this miscompile bug.

Differential Revision: https://reviews.llvm.org/D134890
2022-09-29 21:24:31 +02:00
Stefan Gränitz 4a617c426d [WinEH] Prepare test win64-funclet-preisel-intrinsics.ll for extension to nested try-catch case (NFC) 2022-09-29 11:30:27 +02:00
Amaury Séchet c78e947d26 Change constant in cmov-promotion to avoid optimizations 2022-09-27 21:14:13 +00:00
Amaury Séchet 4bab490de2 [X86] Add test case for D134736. NFC 2022-09-27 21:07:10 +00:00
Stefan Gränitz ed8409dfa0 [ObjC][ARC] Fix target register for call expanded from CALL_RVMARKER on Windows
Fix regression https://github.com/llvm/llvm-project/issues/56952 for Clang CodeGen on Windows. In the Windows ABI the instruction sequence that is expanded from CALL_RVMARKER should use RCX as target register and not RDI.

Reviewed By: rnk, fhahn

Differential Revision: https://reviews.llvm.org/D134441
2022-09-27 18:49:40 +02:00
Amaury Séchet d1baed7c9c [DAG] select Cond, -1, C --> or (sext Cond), C if Cond is MVT::i1
This seems to be beneficial overall, except for midpoint-int.ll .

The X86 backend seems to generate zeroing that are not necesary.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D131260
2022-09-27 12:54:52 +00:00
Simon Pilgrim 8427c836f7 [X86] Clean up prefixes for avoid-sfb.ll
Simplifies the diff for D134697
2022-09-27 11:01:17 +01:00
Amaury Séchet 79b69bf8c9 Autogenerate stack-folding-fp X86 tests. NFC 2022-09-26 14:26:08 +00:00
Han Zhu 67a04edd4e [X86] Pre-commit unit test for D134477 2022-09-24 21:51:35 -07:00
Leonard Chan 79565766be Reland "[llvm] Support forward-referenced globals with dso_local_equivalent"
This reverts commit eef5db2c74.

See https://github.com/llvm/llvm-project/issues/57815.

dso_local_equivalent would fail with an assertion on forward-referenced
globals. This is an issue that only comes up in textual IR, which is why
we've never seen this assertion with clang.

Differential Revision: https://reviews.llvm.org/D134234
2022-09-23 18:32:07 +00:00
Josh Stone cb46ffdbf4 [X86] Use BuildStackAdjustment in stack probes
This has the advantage of dealing with live EFLAGS, using LEA instead of
SUB if needed to avoid clobbering. That also respects feature "lea-sp".

We could allow unrolled stack probing from blocks with live-EFLAGS, if
canUseAsEpilogue learns when emitStackProbeInlineGeneric will be used.

Differential Revision: https://reviews.llvm.org/D134495
2022-09-23 09:30:32 -07:00
Josh Stone 26c37b461a [X86] Don't allow prologue stack probing with live EFLAGS
Fixes https://github.com/llvm/llvm-project/issues/49509

Differential Revision: https://reviews.llvm.org/D134494
2022-09-23 09:30:32 -07:00
Leonard Chan eef5db2c74 Revert "[llvm] Support forward-referenced globals with dso_local_equivalent"
This reverts commit 411020ad1c.

One of the tests here fails on some upstream builders:
https://lab.llvm.org/buildbot#builders/16/builds/35314
2022-09-21 20:14:30 +00:00
Leonard Chan 411020ad1c [llvm] Support forward-referenced globals with dso_local_equivalent
See https://github.com/llvm/llvm-project/issues/57815.

dso_local_equivalent would fail with an assertion on forward-referenced
globals. This is an issue that only comes up in textual IR, which is why
we've never seen this assertion with clang.

Differential Revision: https://reviews.llvm.org/D134234
2022-09-21 19:31:35 +00:00
Serge Pavlov 181279ffcd [X86][GlobalISel] Add support for sret demotion
The change add support for the cases when return value is passed in
memory rathen than in registers.

Differential Revision: https://reviews.llvm.org/D134181
2022-09-20 11:47:53 +07:00
Simon Pilgrim 8206044183 [DAG] SimplifyDemandedVectorElts - add MULHS/MULHU handling to existing MUL/AND handling
Allows to determine known zero elements, which particularly helps simplification of DIV/REM by constant patterns
2022-09-19 12:44:43 +01:00
Kazu Hirata cf07277fb4 [X86] Fix the LEA optimization pass
The LEA optimization pass visits each basic block of a given machine
function.  In each basic block, for each pair of LEAs that differ only
in their displacement fields, we replace all uses of the second LEA
with the first LEA while adjusting the displacement.

Now, without this patch, after all the replacements are made, the
following assert triggers:

        assert(MRI->use_empty(LastVReg) &&
               "The LEA's def register must have no uses");

The replacement loop uses:

  for (MachineOperand &MO :
       llvm::make_early_inc_range(MRI->use_operands(LastVReg))) {

which is equivalent to:

  for (auto UI = MRI->use_begin(LastVReg), UE = MRI->use_end();
       UI != UE;) {
    MachineOperand &MO = *UI++;  // <-- Look!

That is, immediately after the post increment, make_early_inc_range
already has the iterator for the next iteration in its mind.

The problem is that in one iteration of the loop, we could replace two
uses in a debug instruction like:

  DBG_VALUE_LIST !"r", !DIExpression(DW_OP_LLVM_arg, 0), %0:gr64, %0:gr64, ...

So, the iterator for the next iteration becomes invalid.  We end up
traversing a garbage use list from that point on.  In turn, we don't
get to visit remaining uses.

The patch fixes the problem by switching to a "draining" while loop:

  while (!MRI->use_empty(LastVReg)) {
    MachineOperand &MO = *MRI->use_begin(LastVReg);
    MachineInstr &MI = *MO.getParent();

The credit goes to Simon Pilgrim for reducing the test case.

Fixes https://github.com/llvm/llvm-project/issues/57673

Differential Revision: https://reviews.llvm.org/D133631
2022-09-18 17:50:17 -07:00
Sotiris Apostolakis b827e7c600 [SelectOpti] Restrict load sinking
This is a follow-up to D133777, which resolved a use-after-free case but
did not cover all possible memory bugs due to misplacement of loads.

In short, the overall problem was that sinked loads could be moved after
state-modifying instructions leading to memory bugs.

The solution is to restrict load sinking unless it is found to be sound.
i) Within a basic block (to-be-sinked load and select-user are in the same BB),
loads can be sinked only if there is no intervening state-modifying instruction.
This is a conservative approach to avoid resorting to alias analysis to detect
potential memory overlap.
ii) Across basic blocks, sinking of loads is avoided. This is because going over
multiple basic blocks looking for memory conflicts could be computationally
expensive and also unlikely to allow loads to sink. Further, experiments showed
that not sinking these loads has a slight positive performance effect.
Maybe for some of these loads, having some separation allows enough time
for the load to be executed in time for its user. This is not the case for
floating point operations that benefit more from sinking.

The solution in D133777 was essentially undone in this patch,
since the latter is a complete solution to the observed problem.

Overall, the performance impact of this patch is minimal.
Tested on two internal Google workloads with instrPGO.
Search application showed <0.05% perf difference,
while the database one showed a slight improvement,
but not statistically significant.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D133999
2022-09-16 20:50:46 +00:00
Liqiang Tao 2e37557fde StackProtector: ensure stack checks are inserted before the tail call
The IR stack protector pass should insert stack checks before the tail
calls not only the musttail calls. So that the attributes `ssqreq` and
`tail call`, which are emited by llvm-opt, could be both enabled by
llvm-llc.

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D133860
2022-09-16 22:24:46 +08:00
Nikita Popov b4309800e9 [CodeGen] Don't zero callee-save registers with zero-call-used-regs (PR57692)
Callee save registers must be preserved, so -fzero-call-used-regs
should not be zeroing them. The previous implementation only did
not zero callee save registers that were saved&restored inside the
function, but we need preserve all of them.

Fixes https://github.com/llvm/llvm-project/issues/57692.

Differential Revision: https://reviews.llvm.org/D133946
2022-09-16 11:52:29 +02:00
Craig Topper ace05124f5 [IntegerDivision][AMDGPU] Use CreateLogicalOr to block poison propagation.
There are two ctlz intrinsics here with the zero_is_poison flag
set. There are also two comparisons that check if either of the
inputs the ctlzs are zero. We need to use a logical or to block
the poison from the ctlz if either of the inputs is zero.

Reviewed By: arsenm, aqjune

Differential Revision: https://reviews.llvm.org/D130680
2022-09-15 09:38:02 -07:00
Sotiris Apostolakis eda61fb656 [SelectOpti] Fix lifetime intrinsic bug
When a select is converted to a branch and load instructions are sinked to the true/false blocks,
lifetime intrinsics (if present) could be made unsound if not moved.

This conservatively moves all lifetime intrinsics in a transformed BB to the end block to ensure
preserved lifetime semantics.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D133777
2022-09-13 19:00:18 +00:00
Simon Pilgrim 8bf04e9f2a [X86] Add GFNI test coverage for bitreverse codegen
We should be able to efficiently use the vector version for scalar bitreverse, like we do for XOP.
2022-09-13 11:23:03 +01:00
Matthias Gehre 6bf1b4e8e0 Move ExpandLargeDivRem to llvm/test/CodeGen/X86 because they need a triple 2022-09-13 08:29:54 +01:00
Matthias Braun d871bce265 Use update_mir_test_checks for some more tests. 2022-09-12 11:35:52 -07:00
Craig Topper 38ffa2bb96 [LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.
For remainder:
If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves
together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer
type, this urem will be expand by DAGCombiner using multiply by magic
constant. We do have to take into account that adding high and low
together can produce a carry, making it a (BitWidth / 2)+1 bit number.
So we need to also add back in the carry from the first addition.

For division:
We can use the above trick to compute the remainder, subtract that
remainder from the dividend, then multiply by the multiplicative
inverse of the Divisor modulo (1 << BitWidth).

This is based on the section "Remainder by Summing Digits" in
Hacker's delight.

The remainder trick is similar to a trick you may have learned for
determining if a decimal number is divisible by 3. You can add all the
digits together and see if the sum is divisible by 3. If you're not sure
if the sum is divisible by 3, you can add its digits together. This
can be repeated until you have a single decimal digit. If that digit
is 3, 6, or 9, then the original number is divisible by 3. This works
because 10 % 3 == 1.

gcc already does this same trick. There are additional tricks gcc
does urem as well as srem, udiv, and sdiv that I plan to add in
future patches.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130862
2022-09-12 10:34:52 -07:00
Craig Topper 545affbf79 [DAGCombiner] Use HandleSDNode to keep node alive across call to getNegatedExpression.
getNegatedExpression can delete nodes. If the first call to
getNegatedExpression produced a node that the second call also
manages to create, it might get deleted. Use a HandleSDNode to
ensure it has a use to prevent it from being deleted.

Fixes PR57658.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D133602
2022-09-09 22:02:41 -07:00
Craig Topper aa83bdd198 [DAGCombiner][X86] Fold (sub (subcarry X, 0, Carry), Y) -> (subcarry X, Y, Carry)
Fixes PR57576.

Differential Revision: https://reviews.llvm.org/D133471
2022-09-08 22:56:46 -07:00
Eric Wang d8a2d3f7d4 [NFC][Regalloc] Introduce the RegAllocPriorityAdvisorAnalysis
This patch introduces the priority analysis and the priority advisor,
the default implementation, and the scaffolding for introducing the
other implementations of the advisor.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D132835
2022-09-08 07:50:03 -07:00
Craig Topper 7c99bf800f [X86] Pre-commit test for PR57576. NFC 2022-09-07 21:04:51 -07:00
Marco Elver 343700358f [AsmPrinter] Emit PCs into requested PCSections
Interpret MD_pcsections in AsmPrinter emitting the requested metadata to
the associated sections. Functions and normal instructions are handled.

Differential Revision: https://reviews.llvm.org/D130879
2022-09-07 11:36:02 +02:00
Xiang1 Zhang c836ddaf72 [X86][NFC] Refine load/store reg to StackSlot for extensibility
Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D133078
2022-09-07 14:35:42 +08:00
Xiang1 Zhang 16743c9534 [CodeGen] Limit building time in CodeGenPrepare for huge function
Details:

Currently CodeGenPrepare is very time consuming in handling big functions.

Old Algorithm :
It iterate each BB in function, and go on handle very instructions in BB.
Due to some instruction optimizations may affect the BBs' dominate tree.
The old logic will re-iterate and try optimize for each BB.

Suppose we have a big function with 20000 BBs, If we handled the last BB
with fine tuning the dominate tree. We need totally re-iterate and try optimize
the 20000 BBs from the beginning.

The Complex is near N!

And we really encounter somes big tests (> 20000 BBs) that cost more than 30
mins in this pass. (Debug version compiler will cost 2 hours here)

What this patch do for huge function ?
It mainly changes the iteration way for optimization.

1 We do optimizeBlock for each BB (that is same with old way).
And, in the meaning time, If BB is changed/updated in the optimization, it will
be put into FreshBBs (try do optimizeBlock again).
The new created BB at previous iteration will also put into FreshBBs.

2 For the BBs which not updated at previous iteration, we directly skip it.
Strictly speaking, here may miss some opportunity, but the probability is very
small.

3 For Instructions in single BB, we do optimizeInst for each instruction.
If optimizeInst change the instruction dominator in this BB, rather than break
and go back to optimize the first BB (the old way), we directly iterate
instructions (to do optimizeInst) in this updated BB again (the new way).

What this patch do for small/normal (not huge) function ?
It is same with the Old Algorithm. (NFC)

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D129352
2022-09-07 10:05:40 +08:00
Markus Böck f049b2c3fc [MC] Emit Stackmaps before debug info
This patch is essentially an alternative to https://reviews.llvm.org/D75836 and was mentioned by @lhames in a comment.

The gist of the issue is that Mach-O has restrictions on which kind of sections are allowed after debug info has been emitted, which is also properly asserted within LLVM. Problem is that stack maps are currently emitted as one of the last sections in each target-specific AsmPrinter so far, which would cause the assertion to trigger. The current approach of special casing for the `__LLVM_STACKMAPS` section is not viable either, as downstream users can overwrite the stackmap format using plugins, which may want to use different sections.

This patch fixes the issue by emitting the stack map earlier, right before debug info is emitted. The way this is implemented is by taking the choice when to emit the StackMap away from the target AsmPrinter and doing so in the base class. The only disadvantage of this approach is that the `StackMaps` member is now part of the base class, even for targets that do not support them. This is functionaly not a problem however, as emitting an empty `StackMaps` is a no-op.

Differential Revision: https://reviews.llvm.org/D132708
2022-09-06 20:20:56 +02:00
Matthias Gehre 2090e85fee [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64
This adds the ExpandLargeDivRem to the default pass pipeline.
The limit at which it expands div/rem instructions is configured
via a new TargetTransformInfo hook (default: no expansion)
X86, Arm and AArch64 backends implement this hook to expand div/rem
instructions with more than 128 bits.

Differential Revision: https://reviews.llvm.org/D130076
2022-09-06 15:32:04 +01:00
Benjamin Kramer c349d7f4ff [SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path
The main difference is that this preserves intermediate rounding steps,
which the other route doesn't. This aligns bfloat16 more with half
floats, which use this path on most targets.

I didn't understand what the difference was between these softening
approaches when I first added bfloat lowerings, would be nice if we only
had one of them.

Based on @pengfei 's D131502

Differential Revision: https://reviews.llvm.org/D133207
2022-09-06 11:54:34 +02:00
Freddy Ye d5fa8b1c2c [X86] Support SAE for VCVTPS2PH from intrinsic.
For now, clang and gcc both failed to generate sae version from _mm512_cvt_roundps_ph:
https://godbolt.org/z/oh7eTGY5z. Intrinsic guide description is also wrong, which will be
update soon.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D132641
2022-09-06 11:28:12 +08:00
Craig Topper 7927c4c5ce [X86] Add test cases for PR57549. NFC 2022-09-05 13:12:18 -07:00
Simon Pilgrim 4e6783f866 [DAG] getFreeze()/getNode() - account for operand depth when calling isGuaranteedNotToBeUndefOrPoison (PR57554)
Similar to #57402 - we were calling isGuaranteedNotToBeUndefOrPoison on the freeze operand (with Depth = 0), but wasn't accounting for the fact that a later isGuaranteedNotToBeUndefOrPoison assertion will call from the new node (with Depth = 0 as well) - which will then recursively call isGuaranteedNotToBeUndefOrPoison for its operands with Depth = 1

Fixes #57554
2022-09-05 11:46:46 +01:00
Craig Topper 0d1d36cfa6 [X86] Pre-commit tests for D130862. NFC 2022-09-04 21:19:01 -07:00
Simon Pilgrim 62cdfdab4d [DAG] canCreateUndefOrPoison - add freeze(insert_subvector(x,y,c)) -> insert_subvector(freeze(x),freeze(y),c) support
We already have plenty of assertions in place to ensure that the insertion index is constant and inrange
2022-09-03 13:41:33 +01:00
Simon Pilgrim 3968844bff [X86] Add test showing failure to fold freeze(insert_subvector(x,y,c)) -> insert_subvector(freeze(x),freeze(y),c)
If at least one of x and y are known never poison.
2022-09-03 13:27:08 +01:00
Nikita Popov 5134bd432f [DwarfEhPrepare] Assign dummy debug location for inserted _Unwind_Resume calls (PR57469)
DwarfEhPrepare inserts calls to _Unwind_Resume into landing pads.
If _Unwind_Resume happens to be defined in the same module and
debug info is used, then this leads to a verifier error:

  inlinable function call in a function with debug info must
    have a !dbg location
  call void @_Unwind_Resume(ptr %exn.obj) #0

Fix this by assigning a dummy location to the call. (As this
happens in the backend, inlining is not actually relevant here.)

Fixes https://github.com/llvm/llvm-project/issues/57469.

Differential Revision: https://reviews.llvm.org/D133095
2022-09-01 16:35:49 +02:00
Nick Desaulniers d7474bef77 [llvm][TailDuplicator] don't taildup isInlineAsmBrIndirectTargets
This fixes a crash observed after
https://reviews.llvm.org/D129997.

Similar to D88823.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D130127
2022-08-31 13:07:10 -07:00