Commit Graph

30439 Commits

Author SHA1 Message Date
Amara Emerson a35c2c7942 [GlobalISel] Implement fewerElements legalization for vector reductions.
This patch adds 3 methods, one for power-of-2 vectors which use tree
reductions using vector ops, before a final reduction op. For non-pow-2
types it generates multiple narrow reductions and combines the values with
scalar ops.

Differential Revision: https://reviews.llvm.org/D97163
2021-03-30 11:19:21 -07:00
Amara Emerson 91887cd4ec [AArch64][GlobalISel] Combine funnel shifts to rotates.
Differential Revision: https://reviews.llvm.org/D99388
2021-03-30 11:00:36 -07:00
Sourabh Singh Tomar f13f050551 [DebugInfo] Support for signed constants inside DIExpression
Negative numbers are represented using DW_OP_consts along with signed representation
of the number as the argument.

Test case IR is generated using Fortran front-end.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D99273
2021-03-30 23:20:38 +05:30
Jessica Paquette 700431128e [GlobalISel][AArch64] Combine G_SEXT_INREG + right shift -> G_SBFX
Basically a port of isBitfieldExtractOpFromSExtInReg in AArch64ISelDAGToDAG.

This is only done post-legalization for now. Once the legalizer knows how to
decompose these back into shifts, this requirement can probably be removed.

Differential Revision: https://reviews.llvm.org/D99230
2021-03-30 10:14:30 -07:00
Amara Emerson f5e9be6fdb [GlobalISel] Implement lowering for G_ROTR and G_ROTL.
This is a straightforward port.

Differential Revision: https://reviews.llvm.org/D99449
2021-03-30 09:44:41 -07:00
Tomas Matheson a9968c0a33 [NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions
Currently needsStackRealignment returns false if canRealignStack returns false.
This means that the behavior of needsStackRealignment does not correspond to
it's name and description; a function might need stack realignment, but if it
is not possible then this function returns false. Furthermore,
needsStackRealignment is not virtual and therefore some backends have made use
of canRealignStack to indicate whether a function needs stack realignment.

This patch attempts to clarify the situation by separating them and introducing
new names:

 - shouldRealignStack - true if there is any reason the stack should be
   realigned

 - canRealignStack - true if we are still able to realign the stack (e.g. we
   can still reserve/have reserved a frame pointer)

 - hasStackRealignment = shouldRealignStack && canRealignStack (not target
   customisable)

Targets can now override shouldRealignStack to indicate that stack realignment
is required.

This change will make it easier in a future change to handle the case where we
need to realign the stack but can't do so (for example when the register
allocator creates an aligned spill after the frame pointer has been
eliminated).

Differential Revision: https://reviews.llvm.org/D98716

Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87
2021-03-30 17:31:39 +01:00
Alok Kumar Sharma 9fb0025f70 [DebugInfo] Upgrade DISubragne::count to accept DIExpression also
This is needed for Fortran assumed shape arrays whose dimensions are
defined as,
  - 'count' is taken from array descriptor passed as parameter by
    caller, access from descriptor is defined by type DIExpression.
  - 'lowerBound' is defined by callee.
The current alternate way represents using upperBound in place of
count, where upperBound is calculated in callee in a temp variable
using lowerBound and count

Representation with count (DIExpression) is not only clearer as
compared to upperBound (DIVariable) but it has another advantage that
variable count is accessed by being parameter has better chance of
survival at higher optimization level than upperBound being local
variable.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D99335
2021-03-30 09:16:55 +05:30
Rahman Lavaee 90c401cab6 [Propeller] Do not generate the BB address map for empty functions.
Empty functions (functions with no real code) are irrelevant for propeller optimizations and their addresses sometimes conflict with other functions which obfuscates the analysis.
This simple change skips the BB address map emission for such functions.

Reviewed By: tmsriram

Differential Revision: https://reviews.llvm.org/D99395
2021-03-29 20:15:01 -07:00
Roger Ferrer Ibanez 489ca73ac4 [PrologEpilogInserter][AMDGPU] Only adjust offset for emergency spill slots if the stack grows down
D89239 adjusts the stack offset of emergency spill slots for overaligned
stacks. However the adjustment is not valid for targets whose stack
grows up (such as AMDGPU).

This change makes the adjustment conditional only to those targets whose
stack grows down.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49686

Differential Revision: https://reviews.llvm.org/D99504
2021-03-29 17:26:58 +00:00
Bradley Smith 9745dce8c3 [SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps
This is currently performed in SelectionDAGLegalize, here we make it also
happen in LegalizeVectorOps, allowing a target to lower the SETCC condition
codes first in LegalizeVectorOps and then lower to a custom node afterwards,
without having to duplicate all of the SETCC condition legalization in the
target specific lowering.

As a result of this, fixed length floating point SETCC nodes can now be
properly lowered for SVE.

Differential Revision: https://reviews.llvm.org/D98939
2021-03-29 15:32:25 +01:00
Florian Hahn eb3d9f2eb6
[SelDag] Add isIntOrFPConstant helper function.
This patch adds a new isIntOrFPConstant  helper function to check if a
SDValue is a integer of FP constant. This pattern is used in various
places.

There also are places that incorrectly just check for integer constants,
e.g. D99384, so hopefully this helper will help people avoid that issue.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D99428
2021-03-28 12:48:58 +01:00
Amara Emerson 55533203d7 [GlobalISel] Add G_ROTR and G_ROTL opcodes for rotates.
Differential Revision: https://reviews.llvm.org/D99383
2021-03-25 17:23:30 -07:00
Jessica Paquette 23f657c165 [AArch64][GlobalISel] Emit bzero on Darwin
Darwin platforms for both AArch64 and X86 can provide optimized `bzero()`
routines. In this case, it may be preferable to use `bzero` in place of a
memset of 0.

This adds a G_BZERO generic opcode, similar to G_MEMSET et al. This opcode can
be generated by platforms which may want to use bzero.

To emit the G_BZERO, this adds a pre-legalize combine for AArch64. The
conditions for this are largely a port of the bzero case in
`AArch64SelectionDAGInfo::EmitTargetCodeForMemset`.

The only difference in comparison to the SelectionDAG code is that, when
compiling for minsize, this will fire for all memsets of 0. The original code
notes that it's not beneficial to do this for small memsets; however, using
bzero here will save a mov from wzr. For minsize, I think that it's preferable
to prioritise omitting the mov.

This also fixes a bug in the libcall legalization code which would delete
instructions which could not be legalized. It also adds a check to make sure
that we actually get a libcall name.

Code size improvements (Darwin):

- CTMark -Os: -0.0% geomean (-0.1% on pairlocalalign)
- CTMark -Oz: -0.2% geomean (-0.5% on bullet)

Differential Revision: https://reviews.llvm.org/D99358
2021-03-25 17:14:25 -07:00
Amara Emerson 0d2c4db637 [GlobalISel] Fix crash in RBS with a non-generic IMPLICIT_DEF.
This may occur when swifterror codegen in the translator generates these,
but we shouldn't try to handle them since they should have regclasses anyway.

rdar://75784009

Differential Revision: https://reviews.llvm.org/D99287
2021-03-24 23:08:51 -07:00
Sander de Smalen 55d18b3cc2 [TTI] Return a TypeSize from getRegisterBitWidth.
This patch changes the interface to take a RegisterKind, to indicate
whether the register bitwidth of a scalar register, fixed-width vector
register, or scalable vector register must be returned.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D98874
2021-03-24 14:45:13 +00:00
Serguei Katkov 311d81ce97 [RegAlloc] Fix "ran out of regs" with uses in statepoint
Statepoint instruction is known to have a variable and big number of operands.
It is possible that Register Allocator will split live intervals in the way that all
physical registers are occupied by "zero-length" live intervals which are marked
as not-spillable.
While intervals are marked as not-spillable in the moment of creation when they are
really zero-length it is possible that in future as part of re-materialization there will
need for physical register between def and use of such tiny interval (the use is not
related to this interval at all).
As all physical registers are assigned to not-spillable intervals there is not avaialbe
registers and RA reports an error.

The idea of the fix is avoid marking tiny live intervals where there is a use in statepoint
instruction in var args section. Such interval may be perfectly spilled and folded to
operand of statepoint.

Reviewers: reames, dantrushin, qcolombet, dsanders, dmgreen
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D98766
2021-03-24 10:25:34 +07:00
serge-sans-paille e19884cd74 Introduce a generic operator to apply complex operations to BitVector
This avoids temporary and memcpy call when computing large expressions.

It's basically some kind of poor man's expression template, but it seems easier
to maintain to have a single generic `apply` call instead of the whole
expression template machinery here.

Differential Revision: https://reviews.llvm.org/D98176
2021-03-23 14:23:26 +01:00
Matt Arsenault b24436ac96 GlobalISel: Lower funnel shifts 2021-03-23 09:11:17 -04:00
David Sherwood 748ae5281d [IR][SVE] Add new llvm.experimental.stepvector intrinsic
This patch adds a new llvm.experimental.stepvector intrinsic,
which takes no arguments and returns a linear integer sequence of
values of the form <0, 1, ...>. It is primarily intended for
scalable vectors, although it will work for fixed width vectors
too. It is intended that later patches will make use of this
new intrinsic when vectorising induction variables, currently only
supported for fixed width. I've added a new CreateStepVector
method to the IRBuilder, which will generate a call to this
intrinsic for scalable vectors and fall back on creating a
ConstantVector for fixed width.

For scalable vectors this intrinsic is lowered to a new ISD node
called STEP_VECTOR, which takes a single constant integer argument
as the step. During lowering this argument is set to a value of 1.
The reason for this additional argument at the codegen level is
because in future patches we will introduce various generic DAG
combines such as

  mul step_vector(1), 2 -> step_vector(2)
  add step_vector(1), step_vector(1) -> step_vector(2)
  shl step_vector(1), 1 -> step_vector(2)
  etc.

that encourage a canonical format for all targets. This hopefully
means all other targets supporting scalable vectors can benefit
from this too.

I've added cost model tests for both fixed width and scalable
vectors:

  llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll
  llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll

as well as codegen lowering tests for fixed width and scalable
vectors:

  llvm/test/CodeGen/AArch64/neon-stepvector.ll
  llvm/test/CodeGen/AArch64/sve-stepvector.ll

See this thread for discussion of the intrinsic:
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html
2021-03-23 10:43:35 +00:00
Pushpinder Singh d0e5422eb8 [GlobalISel][AMDGPU] Lower G_UMULO/G_SMULO
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D93963
2021-03-23 05:45:43 +00:00
Max Kazantsev 105dc0f9de [NFC] Fix typo longre -> longer 2021-03-23 12:13:52 +07:00
Rahman Lavaee 949abf7d6a [llvm-readelf, propeller] Add fallthrough bit to basic block metadata in BB-Address-Map section.
This patch adds a fallthrough bit to basic block metadata, indicating whether the basic block can fallthrough without taking any branches. The bit will help us avoid an intel LBR bug which results in occasional duplicate entries at the beginning of the LBR stack.

This patch uses `MachineBasicBlock::canFallThrough()` to set the bit. This is not a const method because it eventually calls `TargetInstrInfo::analyzeBranch`, but it calls this function with the default `AllowModify=false`. So we can either make the argument to the `getBBAddrMapMetadata` non-const, or we can use `const_cast` when calling `canFallThrough`. I decide to go with the latter since this is purely due to legacy code, and in general we should not allow the BasicBlock to be mutable during `getBBAddrMapMetadata`.

Reviewed By: tmsriram

Differential Revision: https://reviews.llvm.org/D96918
2021-03-22 21:38:05 -07:00
Sanjay Patel 664d0c052c [TargetTransformInfo] move branch probability query from TargetLoweringInfo
This is no-functional-change intended (NFC), but needed to allow
optimizer passes to use the API. See D98898 for a proposed usage
by SimplifyCFG.

I'm simplifying the code by removing the cl::opt. That was added
back with the original commit in D19488, but I don't see any
evidence in regression tests that it was used. Target-specific
overrides can use the usual patterns to adjust as necessary.
We could also restore that cl::opt, but it was not clear to me
exactly how to do it in the convoluted TTI class structure.
2021-03-22 15:55:34 -04:00
Matt Arsenault 9fdfd8dd52 GlobalISel: Add utility function to constant fold FP ops 2021-03-22 14:38:17 -04:00
Matt Arsenault c34819afe3 GlobalISel: Handle G_BUILD_VECTOR in isKnownToBeAPowerOfTwo 2021-03-22 14:20:35 -04:00
Craig Topper 2f13e63f9e [LegalizeDAG] Add asserts to verify the types of custom legalized operation matches the original node.
We've messed this up a few times recently on RISCV. Experiments
with these asserts found a couple issues on other targets as well.
They've all been cleaned up now so we can put in these asserts to
catch future issues

I had to waive Glue because ADDC/ADDE/etc legalization replaces
Glue with i32 on at least AArch64. X86 used to do the same before
we switched to ADDCARRY. So I guess that's just how that works.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D98979
2021-03-22 10:28:51 -07:00
Craig Topper 30080b003e [DAGCombiner] Minor compile time improvement to (sext_in_reg (sign_extend_vector_inreg x)) optimization.
Don't bother calling ComputeNumSignBits if N00Bits < ExtVTBits. No
matter what answer we get back this will be true:
(N00Bits - DAG.ComputeNumSignBits(N00, DemandedSrcElts)) < ExtVTBits)

So we might as well save the computation. This makes the code more
consistent with the similar (sext_in_reg (sext x)) handling above.
2021-03-21 11:16:41 -07:00
Matt Arsenault 20a24af01d MIR: Fix missing serialization for HasTailCall 2021-03-21 13:14:04 -04:00
Matt Arsenault 1098acd46d GlobalISel: Avoid unnecessary truncation to i64
We can just directly pass through the APInt to create a new constant.
2021-03-21 10:07:41 -04:00
Simon Pilgrim 64c2641c89 [DAG] Limit (sext_in_reg (zero_extend_vector_inreg x)) to exact sign extension
As commented by @craig.topper on rG1ba5c550d418, we can't guarantee that we'll be extending zero bits, just sign bit. So, revert to the old code for zero_extend_vector_inreg cases.
2021-03-21 14:01:37 +00:00
Jessica Paquette 4773dd5ba9 [GlobalISel] Add G_SBFX + G_UBFX (bitfield extraction opcodes)
There is a bunch of similar bitfield extraction code throughout *ISelDAGToDAG.

E.g, ARMISelDAGToDAG, AArch64ISelDAGToDAG, and AMDGPUISelDAGToDAG all contain
code that matches a bitfield extract from an and + right shift.

Rather than duplicating code in the same way, this adds two opcodes:

- G_UBFX (unsigned bitfield extract)
- G_SBFX (signed bitfield extract)

They work like this

```
%x = G_UBFX %y, %lsb, %width
```

Where `lsb` and `width` are

- The least-significant bit of the extraction
- The width of the extraction

This will extract `width` bits from `%y`, starting at `lsb`. G_UBFX zero-extends
the result, while G_SBFX sign-extends the result.

This should allow us to use the combiner to match the bitfield extraction
patterns rather than duplicating pattern-matching code in each target.

Differential Revision: https://reviews.llvm.org/D98464
2021-03-19 14:37:19 -07:00
Simon Pilgrim 9d2df96407 [DAG] computeKnownBits - add ISD::MULHS/MULHU/SMUL_LOHI/UMUL_LOHI handling
Reuse the existing KnownBits multiplication code to handle the 'extend + multiply + extract high bits' pattern for multiply-high ops.

Noticed while looking at the codegen for D88785 / D98587 - the patch helps division-by-constant expansion code in particular, which suggests that we might have some further KnownBits div/rem cases we could handle - but this was far easier to implement.

Differential Revision: https://reviews.llvm.org/D98857
2021-03-19 16:02:31 +00:00
Simon Pilgrim ffb2887103 [DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),undef) -> bop(shuffle'(x,y),shuffle'(z,w))
Followup to D96345, handle unary shuffles of binops (as well as binary shuffles) if we can merge the shuffle with inner operand shuffles.

Differential Revision: https://reviews.llvm.org/D98646
2021-03-19 14:14:56 +00:00
Craig Topper 182b831aeb [DAGCombiner][RISCV] Teach visitMGATHER/MSCATTER to remove gather/scatters with all zeros masks that use SPLAT_VECTOR.
Previously only all zeros BUILD_VECTOR was recognized.
2021-03-18 15:34:14 -07:00
Simon Pilgrim 1ba5c550d4 [DAG] Improve folding (sext_in_reg (*_extend_vector_inreg x)) -> (sext_vector_inreg x)
Extend this to support ComputeNumSignBits of the (used) source vector elements so that we can handle more than just the case where we're sext_in_reg from the source element signbit.

Noticed while investigating the poor codegen in D98587.
2021-03-18 15:34:53 +00:00
Matt Arsenault b9a0384983 GlobalISel: Preserve source value information for outgoing byval args
Pass through the original argument IR value in order to preserve the
aliasing information in the memcpy memory operands.
2021-03-18 09:16:54 -04:00
Matt Arsenault 61f834cc09 GlobalISel: Insert memcpy for outgoing byval arguments
byval requires an implicit copy between the caller and callee such
that the callee may write into the stack area without it modifying the
value in the parent. Previously, this was passing through the raw
pointer value which would break if the callee wrote into it.

Most of the time, this copy can be optimized out (however we don't
have the optimization SelectionDAG does yet).

This will trigger more fallbacks for AMDGPU now, since we don't have
legalization for memcpy yet (although we should stop using byval
anyway).
2021-03-18 09:16:54 -04:00
Simon Pilgrim b1afa187c8 [DAG] SelectionDAG::isSplatValue - add ISD::ABS handling
Add ISD::ABS to the existing unary instructions handling for splat detection

This is similar to D83605, but doesn't appear to need to touch any of the wasm refactoring.

Differential Revision: https://reviews.llvm.org/D98778
2021-03-18 10:28:29 +00:00
Amara Emerson 28963d895b [GlobalISel] Don't DCE LIFETIME_START/LIFETIME_END markers.
These are pseudos without any users, so DCE was killing them in the combiner.

Marking them as having side effects doesn't seem quite right since they don't.

Gives a nice 0.3% geomean size win on CTMark -Os.

Differential Revision: https://reviews.llvm.org/D98811
2021-03-17 18:02:08 -07:00
Amara Emerson d7fed7b899 [AArch64][GlobalISel] Fall back if disabling neon/fp in the translator.
The previous technique relied on early-exiting the legalizer predicate
initialization, leaving an empty rule table. That causes a fallback
for most instructions, but some have legacy rules defined like G_ZEXT
which can try continue, but then crash.

We should fall back earlier, in the translator, to avoid this issue.

Differential Revision: https://reviews.llvm.org/D98730
2021-03-17 15:08:08 -07:00
Stephen Tozer 3bfddc2593 Reapply "[DebugInfo] Handle multiple variable location operands in IR"
Fixed section of code that iterated through a SmallDenseMap and added
instructions in each iteration, causing non-deterministic code; replaced
SmallDenseMap with MapVector to prevent non-determinism.

This reverts commit 01ac6d1587.
2021-03-17 16:45:25 +00:00
Hans Wennborg 01ac6d1587 Revert "[DebugInfo] Handle multiple variable location operands in IR"
This caused non-deterministic compiler output; see comment on the
code review.

> This patch updates the various IR passes to correctly handle dbg.values with a
> DIArgList location. This patch does not actually allow DIArgLists to be produced
> by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
> Other than that, it should cover every IR pass.
>
> Most of the changes simply extend code that operated on a single debug value to
> operate on the list of debug values in the style of any_of, all_of, for_each,
> etc. Instances of setOperand(0, ...) have been replaced with with
> replaceVariableLocationOp, which takes the value that is being replaced as an
> additional argument. In places where this value isn't readily available, we have
> to track the old value through to the point where it gets replaced.
>
> Differential Revision: https://reviews.llvm.org/D88232

This reverts commit df69c69427.
2021-03-17 13:36:48 +01:00
Nikita Popov 40bc309911 Revert "[regalloc] Ensure Query::collectInterferringVregs is called before interval iteration"
This reverts commit d40b4911bd.

This causes a large compile-time regression:
https://llvm-compile-time-tracker.com/compare.php?from=0aa637b2037d882ddf7861284169abf63f524677&to=d40b4911bd9aca0573752e065f29ddd9aff280e1&stat=instructions
2021-03-16 20:41:26 +01:00
Mircea Trofin d40b4911bd [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration
The main part of the patch is the change in RegAllocGreedy.cpp: Q.collectInterferringVregs()
needs to be called before iterating the interfering live ranges.

The rest of the patch offers support that is the case: instead of  clearing the query's
InterferingVRegs field, we invalidate it. The clearing happens when the live reg matrix
is invalidated (existing triggering mechanism).

Without the change in RegAllocGreedy.cpp, the compiler ices.

This patch should make it more easily discoverable by developers that
collectInterferringVregs needs to be called before iterating.

I will follow up with a subsequent patch to improve the usability and maintainability of Query.

Differential Revision: https://reviews.llvm.org/D98232
2021-03-16 12:10:10 -07:00
serge-sans-paille 35368bbdbb [NFC] Replace loop by idiomatic llvm::find_if 2021-03-16 12:49:19 +01:00
serge-sans-paille 6e040a19db [NFC] Wisely nest dyn_cast in FunctionLoweringInfo
Take advantage of the inheritance tree to avoid a few comparison.
2021-03-16 10:22:44 +01:00
Fangrui Song 5d44c92bf8 Change void getNoop(MCInst &NopInst) to MCInst getNop()
Prefer (self-documenting) return values to output parameters (which are
liable to be used).
While here, rename Noop to Nop which is more widely used and improves
consistency with hasEmitNops/setEmitNops/emitNop/etc.
2021-03-15 12:05:34 -07:00
Fraser Cormack 0035decae7 [CodeGen] Fix issues with scalable-vector INSERT/EXTRACT_SUBVECTORs
This patch addresses a few issues when dealing with scalable-vector
INSERT_SUBVECTOR and EXTRACT_SUBVECTOR nodes.

When legalizing in DAGTypeLegalizer::SplitVecRes_INSERT_SUBVECTOR, we
store the low and high halves to the stack separately. The offset for
the high half was calculated incorrectly.

Additionally, we can optimize this process when we can detect that the
subvector is contained entirely within the low/high split vector type.
While this optimization is valid on scalable vectors, when performing
the 'high' optimization, the subvector must also be a scalable vector.
Note that the 'low' optimization is still conservative: it may be
possible to insert v2i32 into the low half of a split nxv1i32/nxv1i32,
but we can't guarantee it. It is always possible to insert v2i32 into
nxv2i32 or v2i32 into nxv4i32+2 as we know vscale is at least 1.

Lastly, in SelectionDAG::isSplatValue, we early-exit on the extracted subvector value
type being a scalable vector, forgetting that we can also extract a
fixed-length vector from a scalable one.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98495
2021-03-15 17:04:21 +00:00
Philip Reames 7d38a91a7f Restore fixed version of "[CodeGenPrepare] Fix isIVIncrement (PR49466)"
Change was reverted in commit 8d20f2c2c6 because it was causing an infinite loop.  9228f2f32 fixed the root issue in the code structure, this change just reapplies the original change w/adaptation to the new code structure.
2021-03-13 15:25:02 -08:00
Philip Reames 9228f2f322 [CGP] Consolidate logic for getIVIncrement and isIVIncrement
This fixes the bug demonstrated by the test case in the commit message of 8d20f2c2 (which was a revert of cf82700).  The root issue was that we have two transforms which are inverses of each other.  We use one for simple induction variables (where we can use the post-inc form), and the other for everything else.  The problem was that the two transforms could disagree about whether something was an induction variable.

The reverted commit made a change to one of the matcher routines which was used for one of the two transforms without updating the other matcher.  However, it's worth noting the existing code w/o the reverted change also has cases where the decision could differ between the two paths.

The fix is simply to consolidate the code such that two paths must agree by construction, and to add an assert to catch any potential future re-divergence.

Triggering the infinite loop requires side stepping the SunkAddrs cache.  The SunkAddrs cache has the effect of suppressing the iteration in the common case, but there are codepaths through CGP which restart iteration and clear this cache.

Unfortunately, I have not been able to construct a standalone IR test case for this.  The original test case is a c++ program which when compiled by clang demonstrates the infinite loop, but all of my attempts at extracting an IR test case runnable through opt/llc have failed to reproduce.  (Including capturing the IR at point of the transform itself!)  I have no idea what weird state clang is creating here.

I also tried creating a test case by hand, but gave up after about an hour of trying to find the right combination to dance through multiple transforms to create the end result needed to trip the bug.
2021-03-13 14:55:25 -08:00