Commit Graph

31452 Commits

Author SHA1 Message Date
Chih-Ping Chen 2ed29d87ef [CodeView] Fortran debug info emission in Code View.
Differential Revision: https://reviews.llvm.org/D112826
2021-11-02 15:06:21 -04:00
Arthur Eubanks e2024d72fa Revert "[NFC] Remove LinkAll*.h"
This reverts commit fe364e5dc7.

Causes breakages, e.g. https://lab.llvm.org/buildbot/#/builders/188/builds/5266
2021-11-02 09:08:09 -07:00
Arthur Eubanks fe364e5dc7 [NFC] Remove LinkAll*.h
These were added to prevent functions from being removed by WPO.

But that doesn't make sense, correct WPO will not remove functions we actually use.

I noticed these because compiling cc1_main.cpp was pulling in random LLVM pass headers.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112971
2021-11-02 08:43:17 -07:00
Jay Foad be1a8f8834 [AMDGPU] Really preserve LiveVariables in SILowerControlFlow
https://bugs.llvm.org/show_bug.cgi?id=52204

Differential Revision: https://reviews.llvm.org/D112731
2021-11-02 15:03:37 +00:00
jacquesguan a39eadcf16 [DAGCombiner] Teach combineShiftToMULH to handle constant and const splat vector.
Fold (srl (mul (zext i32:$a to i64), i64:c), 32) -> (mulhu $a, $b),
if c can truncate to i32 without loss.

Reviewed By: frasercrmck, craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D108129
2021-11-02 12:04:23 +00:00
Simon Pilgrim 325031786e [SelectionDAG] Optimize expansion for rotates/funnel shifts
If the type of a funnel shift needs to be expanded, expand it to two funnel shifts instead of regular shifts. For constant shifts, this doesn't make much difference, but for variable shifts it allows a more optimal lowering.

Also use the optimized funnel shift lowering for rotates.

Alive2: https://alive2.llvm.org/ce/z/TvHDB- / https://alive2.llvm.org/ce/z/yzPept

(Branched from D108058 as getting this completed should help unlock some other WIP patches).

Original Patch: @efriedma (Eli Friedman)

Differential Revision: https://reviews.llvm.org/D112443
2021-11-02 11:38:25 +00:00
Simon Pilgrim 37e17f278f [DAG] MatchRotate - remove (redundant) legal type check.
Rely on the hasOperation() instead - as commented on D77804, the mid-term intention is to recognise rotate/funnel-by-constant pre-legalization to help avoid SimplifyDemandedBits regressions.
2021-11-02 11:24:50 +00:00
Kazu Hirata 6bdb61c58a [CodeGen] Use make_early_inc_range (NFC) 2021-11-01 22:38:49 -07:00
Jay Foad b8016b626e [CodeGen] Tweak coding style in LivePhysRegs::stepForward. NFC. 2021-11-01 16:01:24 +00:00
Jun Ma 1f9fa54984 [Taildup] Don't tail-duplicate loop header with multiple successors as its latches
when Taildup hit loop with multiple latches like:
  //    1 -> 2 <-> 3                 |
  //          \  <-> 4               |
  //           \   <-> 5             |
  //            \---> rest           |
it may transform this loop into multiple loops by duplicate loop header.
However, this change may has little benefit while makes cfg much complex.
In some uncommon cases, it causes large compile time regression (offered by
@alexfh in D106056).

This patch disable tail-duplicate of such cases.

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D110613
2021-11-01 15:32:00 +08:00
Craig Topper ada5458521 [RISCV] Expand scalable vector bswap. Fix crash for bitreverse.
Fix LegalizeVectorOps to not try shuffle or unrolling expansions for
scalable vectors.

Differential Revision: https://reviews.llvm.org/D112236
2021-10-31 10:01:27 -07:00
Kazu Hirata 1a605f395f [CodeGen] Use make_early_inc_range (NFC) 2021-10-31 07:57:36 -07:00
Kazu Hirata 72710af233 [CodeGen, Target] Use MachineBasicBlock::terminators (NFC) 2021-10-31 07:57:34 -07:00
Kazu Hirata 4cc7c4724f [MachineCSE] Use make_early_inc_range (NFC) 2021-10-30 19:00:23 -07:00
Roman Lebedev 25043c8276
[NFCI] Introduce `ICmpInst::compare()` and use it where appropriate
As noted in https://reviews.llvm.org/D90924#inline-1076197
apparently this is a pretty common pattern,
let's not repeat it yet again, but have it in a common place.

There may be some more places where it could be used,
but these are the most obvious ones.
2021-10-30 17:50:06 +03:00
Christudasan Devadasan aa2d3b59ce GlobalISel/Utils: Use incoming regbank while constraining the superclasses
Register operands with superclasses can possibly have multiple regBanks
if they have different register types. The regBank ambiguity resolved
during regbankselect should be used to constrain the operand regclass
instead of obtaining one from the MCInstrDesc.

This is a prerequisite patch for D109300 that introduces allocatable AV_*
Superclasses for AMDGPU by combining both VGPRs and AGPRs and we want to
restrain the regclass to either A or V based on the incoming regbank.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112323
2021-10-30 07:20:45 -04:00
Fraser Cormack 8314a04ede [SelectionDAG] Allow FindMemType to fail when widening loads & stores
This patch removes an internal failure found in FindMemType and "bubbles
it up" to the users of that method: GenWidenVectorLoads and
GenWidenVectorStores. FindMemType -- renamed findMemType -- now returns
an optional value, returning None if no such type is found.

Each of the aforementioned users now pre-calculates the list of types it
will use to widen the memory access. If the type breakdown is not
possible they will signal a failure, at which point the compiler will
crash as it does currently.

This patch is preparing the ground for alternative legalization
strategies for vector loads and stores, such as using vector-predication
versions of loads or stores.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112000
2021-10-29 18:27:31 +01:00
Neubauer, Sebastian c78640ee6a [TailDuplicator] Fix merging block with terminator
The TailDuplicator merged two blocks, even if the first one ended with
a terminator, resulting in invalid MIR, where a terminator is in the
middle of a block.

Abort merging if the first block ends with a terminator.

Differential Revision: https://reviews.llvm.org/D112226
2021-10-29 10:52:46 +02:00
Abinav Puthan Purayil db8d7b6e2d [DAGCombine][NFC] s/it's/its in the comment of hasNoInfs(). 2021-10-29 07:36:38 +05:30
Daniel Kiss d8075e8781 Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This is relanding commit da1d1a0869 .
This patch additionally addresses failures found in buildbots & post review comments.

ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-28 21:45:09 +02:00
Guozhi Wei 1e46dcb77b [TwoAddressInstructionPass] Put all new instructions into DistanceMap
In function convertInstTo3Addr, after converting a two address instruction into
three address instruction, only the last new instruction is inserted into
DistanceMap. This is wrong, DistanceMap should track all instructions from the
beginning of current MBB to the working instruction. When a two address
instruction is converted to three address instruction, multiple instructions may
be generated (usually an extra COPY is generated), all of them should be
inserted into DistanceMap.

Similarly when unfolding memory operand in function tryInstructionTransform
DistanceMap is not maintained correctly.

Differential Revision: https://reviews.llvm.org/D111857
2021-10-28 11:11:59 -07:00
Nicolai Hähnle b437aaa672 MachineDominators: Define MachineDomTree type alias
This is a (very) small move towards making the machine dominators more
aligned with the IR dominators:

* DominatorTree / MachineDomTree is the class holding the dominator tree
* DominatorTreeWrapperPass / MachineDominatorTree is the corresponding
  (machine) function pass

This alignment will be used by analyses that are designed as templates
that work with LLVM IR as well as Machine IR.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D112690
2021-10-28 22:30:35 +05:30
Daniel Kiss 66e03db814 Revert "Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume.""
This reverts commit b6420e575f.
2021-10-28 17:24:53 +02:00
Daniel Kiss b6420e575f Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This is relanding commit da1d1a0869 .
This patch additionally addresses failures found in buildbots & post review comments.

ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-28 16:49:19 +02:00
Neubauer, Sebastian 50d8d963e3 [GlobalISel] Simplify RegBankSelect
Save the instruction list of a block before selecting banks.
This allows to cope with moved instructions, even if they are reordered
or splitted into multiple basic blocks.

Differential Revision: https://reviews.llvm.org/D111223
2021-10-28 10:30:55 +02:00
Michael Liao e6a4ba3aa6 [amdgpu] Handle the case where there is no scavenged register.
- When an unconditional branch is expanded into an indirect branch, if
  there is no scavenged register, an SGPR pair needs spilling to enable
  the destination PC calculation. In addition, before jumping into the
  destination, that clobbered SGPR pair need restoring.
- As SGPR cannot be spilled to or restored from memory directly, the
  spilling/restoring of that SGPR pair reuses the regular SGPR spilling
  support but without spilling it into memory. As that spilling and
  restoring points are fully controlled, we only need to spill that SGPR
  into the temporary VGPR, which needs spilling into its emergency slot.
- The target-specific hook is revised to take additional restore block,
  where the restoring code is filled. After that, the relaxation will
  place that restore block directly before the destination block and
  insert an unconditional branch in any fall-through block into the
  destination block.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106449
2021-10-27 18:37:27 -04:00
Kerry McLaughlin f01fafdcd4 [SVE][CodeGen] Fix incorrect legalisation of zero-extended masked loads
PromoteIntRes_MLOAD always sets the extension type to `EXTLOAD`, which
results in a sign-extended load. If the type returned by getExtensionType()
for the load being promoted is something other than `NON_EXTLOAD`, we
should instead pass this to getMaskedLoad() as the extension type.

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D112320
2021-10-27 14:15:41 +01:00
Caroline Concatto 1137b7207d [SelectionDAG] Widening the result of INSERT_SUBVECTOR.
Widens the result and first input vector because they have the same size.
The subvector to be inserted is widened in the operand widen function.

Differential Revision: https://reviews.llvm.org/D112187
2021-10-27 13:52:25 +01:00
Daniel Kiss 894ddba1c9 Revert "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This reverts commit da1d1a0869.
2021-10-27 14:29:35 +02:00
Jay Foad b9e3af124b [LiveInterval] Add RemoveDeadValNo argument to removeSegment(iterator)
Add an optional bool RemoveDeadValNo argument to the
removeSegment(iterator) overload, for consistency with the other
overloads. This gives clients a way to remove dead valnos while also
getting an updated iterator returned (in the manner of vector::erase).

Use this to clean up some inefficient code in
LiveIntervals::repairOldRegInRange. NFC.

Differential Revision: https://reviews.llvm.org/D110560
2021-10-27 09:43:32 +01:00
Daniel Kiss da1d1a0869 [ARM] __cxa_end_cleanup should be called instead of _UnwindResume.
ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-27 10:40:00 +02:00
Kazu Hirata c3e698e2f5 [CodeGen, Hexagon] Use MachineBasicBlock::phis (NFC) 2021-10-26 09:01:29 -07:00
Craig Topper d51e3a2139 [LegalizeTypes][TargetLowering] Merge getShiftAmountTyForConstant into TargetLowering::getShiftAmountTy.
getShiftAmountTyForConstant is a special helper that changes the
shift amount to i32 if the type chosen by
TargetLowering::getShiftAmountTy can't represent all possible values.
This is needed to satisfy an assert in SelectionDAG::getNode.

It requires additional consideration to know when this helper should be used.
I'm not sure that we are always using it when we should.

This patch merges the getShiftAmountTyForConstant handling into
TargetLowering::getShiftAmountTy so we don't need to think about it
anymore.

Technically this may slightly increase compile times since the majority
of callers of getShiftAmountTy won't need this. Hopefully, this isn't
an issue in practice.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112469
2021-10-25 14:06:53 -07:00
Jeremy Morse 4136897bd4 [DebugInfo][InstrRef][NFC] Switch to using DenseMaps and similar
There are a few STL containers hanging around that can become DenseMaps,
SmallVectors and similar. This recovers a modest amount of compile time
performance.

While I'm here, adjust the bit layout of ValueIDNum: this was always
supposed to act like a value type, however it seems that clang doesn't
compile the comparison functions to act that way. Add a uint64_t to a
union that explicitly aliases the bitfields, so that we can compare the
whole value as a single integer.

Differential Revision: https://reviews.llvm.org/D112333
2021-10-25 18:07:17 +01:00
Jeremy Morse 97ddf49e43 [DebugInfo][InstrRef] Recover stack-slot tracking performance
This patch is like D111627 -- instead of calculating IDF for every location
on the stack, only do it for the smallest units of interference, and copy
the PHIs for those units to any aliases.

The test added runs placeMLocPHIs directly, and tests that:
 * A def of the lower 8 bits of a stack slot causes all aliasing regs to
   have PHIs placed,
 * It doesn't cause the equivalent location to x86's $ah, which isn't
   aliased, to have a PHI placed.

Differential Revision: https://reviews.llvm.org/D112324
2021-10-25 17:31:09 +01:00
Danila Malyutin 7b102fcc91 [CodeGen] Fix dependence breaking for tied operands
Differential Revision: https://reviews.llvm.org/D107582
2021-10-25 18:52:27 +03:00
Jeremy Morse ee3eee71e4 [DebugInfo][InstrRef] Track values fused into stack spills
During register allocation, some instructions can have stack spills fused
into them. It means that when vregs are allocated on the stack we can
convert:

    SETCCr %0
    DBG_VALUE %0

to

    SETCCm %stack.0
    DBG_VALUE %stack.0

Unfortunately instruction referencing finds this harder: a store to the
stack doesn't have a specific operand number, therefore we don't substitute
the old operand for a new operand, and the location is dropped. This patch
implements a solution: just recognise the memory operand attached to an
instruction with a Special Number (TM), and record a substitution between
the old value and the new one.

This patch adds substitution code to InlineSpiller to record such fused
spills, and tracking in InstrRefBasedLDV to recognise such values, and
produce the value numbers for them. Everything to do with the movement of
stack-defined values is already handled in InstrRefBasedLDV.

Differential Revision: https://reviews.llvm.org/D111317
2021-10-25 15:14:53 +01:00
Jeremy Morse 2eb96e1711 [DebugInfo][NFC] Avoid a use-after-free
This patch swaps two lines -- the CurSucc reference can be invalidated
by the call to DFS.push_back, therefore that should happen last. The
usual hat-tip to asan for catching this.

This patch also swaps an ealier call to ToAdd.insert and DFS.push_back,
where a stable iterator (from successors()) is being used. This isn't
strictly necessary, but is good for consistency and avoiding readers
asking themselves why the two code portions have a different order.
2021-10-25 14:16:30 +01:00
Sanjay Patel 6e46b66e2a [DAGCombiner] make matching bit-hack form of usubsat more flexible
(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

As suggested in D112085, we can substitute 'xor' with 'add'
in this pattern, and it is logically equivalent:
https://alive2.llvm.org/ce/z/eJtWWC

We canonicalize to 'xor' in IR, but SDAG does not do that
(and it probably should not - https://llvm.org/PR52267 ), so
it is possible to see either pattern in codegen. Note that
'sub' is a another potential pattern, but that is
canonicalized to 'add' in DAGCombiner, so we don't need to
worry about that variation.

Differential Revision: https://reviews.llvm.org/D112377
2021-10-25 09:01:52 -04:00
Tim Northover f9089accba CodeGenPrep: remove all copies of GEP from list if there are duplicates.
Unfortunately ToT has changed enough from the revision where this actually
caused problems that the test no longer triggers an assertion failure.
2021-10-25 14:00:02 +01:00
Kazu Hirata 4bd46501c3 Use llvm::any_of and llvm::none_of (NFC) 2021-10-24 17:35:33 -07:00
Kazu Hirata 1c35973c77 [llvm] Call *(Set|Map)::erase directly (NFC)
We can erase an item in a set or map without checking its membership
first.
2021-10-24 09:32:59 -07:00
Kazu Hirata d8e4170b0a Ensure newlines at the end of files (NFC) 2021-10-23 08:45:29 -07:00
Kazu Hirata d14d7068b6 [llvm] Use StringRef::contains (NFC) 2021-10-23 08:45:27 -07:00
Jay Foad 2915889d74 [ScheduleDAGInstrs] Call adjustSchedDependency in more cases
This removes a condition and the corresponding FIXME comment, because
the Hexagon assertion it refers to has apparently been fixed, probably
by D76134.

NFCI. This just gives targets the opportunity to adjust latencies that
were set to 0 by the generic code because they involve "implicit pseudo"
operands.

Differential Revision: https://reviews.llvm.org/D112306
2021-10-22 20:03:29 +01:00
Jeremy Morse e7084ceab3 [DebugInfo][Instr] Track subregisters across stack spills/restores
Sometimes we generate code that writes to a subregister, then spills /
restores a super-register to the stack, for example:

    $eax = MOV32ri 0
    MOV64mr $rsp, 1, $noreg, 16, $noreg, $rax
    $rcx = MOV64rm $rsp, 1, $noreg, 8, $noreg

This patch takes a different approach: it adds another index to
MLocTracker that identifies a size/offset within a stack slot. A location
on the stack is then a pari of {FrameIndex, SlotNum}. Spilling and
restoring now involves pairing up the src/dest register numbers, and the
dest/src stack position to be transferred to/from. Location coverage
improves as a result, compile-time performance decreases, alas.

One limitation is that if a PHI occurs inside a stack slot:

    DBG_PHI %stack.0, 1

We don't know how large the resulting value is, and so might have
difficulty picking which value to use. DBG_PHI might need to be augmented
in the future with such a size.

Unit tests added ensure that spills and restores correctly transfer to
positions in the Location => Value map, and that different register classes
written to the stack will correctly clobber all other positions in the
stack slot.

Differential Revision: https://reviews.llvm.org/D112133
2021-10-22 19:20:55 +01:00
Craig Topper 93139a3c32 [LegalizeTypes] Only expand CTLZ/CTTZ/CTPOP during type promotion if the new type is legal.
We might be promoting a large non-power of 2 type and the new type
may need to be split. Once we split it we may have a ctlz/cttz/ctpop
instruction for the split type.

I'm also concerned that we may create large shifts with shift amounts
that are too small.
2021-10-22 11:02:35 -07:00
Simon Pilgrim a5f56342b0 [DAG] narrowExtractedVectorLoad - EXTRACT_SUBVECTOR indices are always constant
EXTRACT_SUBVECTOR indices are always constant, we don't need to check for ConstantSDNode, we should just use getConstantOperandVal which will assert for the constant.
2021-10-22 18:32:14 +01:00
Jeremy Morse d9eebe3cd7 [DebugInfo][InstrRef] Add unit tests for transfer-function building
This patch adds some unit tests for the machine-location transfer-function
building parts of InstrRefBasedLDV: i.e., test that if we feed some MIR
into the transfer-function building code, does it create the correct
transfer function.

There are a number of minor defects that get corrected in the process:
 * The unit test was selecting the x86 (i.e. 32 bit) backend rather than
   x86_64's 64 bit backend,
 * COPY instructions weren't actually having their subregister values
   correctly represented in the transfer function. Subregisters were being
   defined by the COPY, rather than taking the value in the source register.
 * SP aliases were at risk of being clobbered, if an SP subregister was
   clobbered.

Differential Revision: https://reviews.llvm.org/D112006
2021-10-22 18:29:03 +01:00
Craig Topper 04c184bba7 [TargetLowering] Simplify the interface of expandABS. NFC
Instead of returning a bool to indicate success and a separate
SDValue, return the SDValue and have the callers check if it is
null.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112331
2021-10-22 10:22:23 -07:00
Craig Topper 0766aef3f3 [LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if they'll be expanded later.
Expanding these requires multiple constants. If we promote during type
legalization when they'll end up getting expanded in LegalizeDAG, we'll
use larger constants. These constants may be harder to materialize.
For example, 64-bit constants on 64-bit RISCV are very expensive.

This is similar to what has already been done to BSWAP and BITREVERSE.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112268
2021-10-22 09:10:01 -07:00
Zarko Todorovski 0bd6a9f2d1 [clang/llvm] Inclusive language: replace segregate with separate 2021-10-22 09:59:35 -04:00
Craig Topper 996123e5e8 [TargetLowering] Simplify the interface for expandCTPOP/expandCTLZ/expandCTTZ.
There is no need to return a bool and have an SDValue output
parameter. Just return the SDValue and let the caller check if it
is null.

I have another patch to add more callers of these so I thought
I'd clean up the interface first.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112267
2021-10-21 15:35:28 -07:00
Craig Topper ff37b1105d [LegalizeVectorOps][X86] Don't defer BITREVERSE expansion to LegalizeDAG.
By expanding early it allows the shifts to be custom lowered in
LegalizeVectorOps. Then a DAG combine is able to run on them before
LegalizeDAG handles the BUILD_VECTORS for the masks used.

v16Xi8 shift lowering on X86 requires a mask to be applied to a v8i16
shift. The BITREVERSE expansion applied an AND mask before SHL ops and
after SRL ops. This was done to share the same mask constant for both shifts.
It looks like this patch allows DAG combine to remove the AND mask added
after v16i8 SHL by X86 lowering. This maintains the mask sharing that
BITREVERSE was trying to achieve. Prior to this patch it looks like
we kept the mask after the SHL instead which required an extra constant
pool or a PANDN to invert it.

This is dependent on D112248 because RISCV will end up scalarizing the BSWAP
portion of the BITREVERSE expansion if we don't disable BSWAP scalarization in
LegalizeVectorOps first.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112254
2021-10-21 15:23:23 -07:00
Craig Topper 458ed5fcc3 [TargetLowering][RISCV] Prevent scalarization of fixed vector bswap.
It's better to do the ands, shifts, ors in the vector domain than
to scalarize it and do those operations on each element.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112248
2021-10-21 14:34:01 -07:00
Yonghong Song f6811cec84 [DebugInfo] Support typedef with btf_decl_tag attributes
Clang patch ([1]) added support for btf_decl_tag attributes with typedef
types. This patch added llvm support including dwarf generation.
For example, for typedef
   typedef unsigned * __u __attribute__((btf_decl_tag("tag1")));
   __u u;
the following shows llvm-dwarfdump result:
   0x00000033:   DW_TAG_typedef
                   DW_AT_type      (0x00000048 "unsigned int *")
                   DW_AT_name      ("__u")
                   DW_AT_decl_file ("/home/yhs/work/tests/llvm/btf_tag/t.c")
                   DW_AT_decl_line (1)

   0x0000003e:     DW_TAG_LLVM_annotation
                     DW_AT_name    ("btf_decl_tag")
                     DW_AT_const_value     ("tag1")

   0x00000047:     NULL

  [1] https://reviews.llvm.org/D110127

Differential Revision: https://reviews.llvm.org/D110129
2021-10-21 08:42:58 -07:00
Sanjay Patel d2198771e9 [DAGCombiner] fold bit-hack form of usubsat
(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

I haven't found a generalization of this identity:
https://alive2.llvm.org/ce/z/_sriEQ

Note: I was actually looking at the first form of the pattern in that link,
but that's part of a long chain of potential missed transforms in codegen
and IR....that I hope ends here!

The predicates for when this is profitable are a bit tricky. This version of
the patch excludes multi-use but includes custom lowering (as opposed to
legal only).

On x86 for example, we have custom lowering for some vector types, and that
uses umax and sub. So to enable that fold, we need add use checks to avoid
regressions. Even with legal-only lowering, we could see code with extra
reg move instructions for extra uses, so that constraint would have to be
eased very carefully to avoid penalties.

Differential Revision: https://reviews.llvm.org/D112085
2021-10-21 09:47:19 -04:00
Kerry McLaughlin 0d153df69e [SVE] Fix selection failure when splitting extended masked loads
When splitting a masked load, `GetDependentSplitDestVTs` is used to get the
MemVTs of the high and low parts. If the masked load is extended, this
may return VTs with different element types which are used to create the
high & low masked load instructions.
This patch changes `GetDependentSplitDestVTs` to ensure we return VTs with
the same element type.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D111996
2021-10-21 13:04:38 +01:00
Arthur Eubanks 6ea7437ca5 [SelectionDAG] Bail out of mergeTruncStores when not optimizing
With unoptimized code, we may see lots of stores and spend too much time in mergeTruncStores.

Fixes PR51827.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D111596
2021-10-20 16:58:22 -07:00
Jon Roelofs b046eb19b8 [AArch64][GlobalISel] combine (and (or x, c1), c2) => (and x, c2) iff c1 & c2 == 0
https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111856
2021-10-20 12:11:52 -07:00
Stanislav Mekhanoshin c80d8a8cea [AMDGPU] MachineLICM cannot hoist VALU
MachineLoop::isLoopInvariant() returns false for all VALU
because of the exec use. Check TII::isIgnorableUse() to
allow hoisting.

That unfortunately results in higher register consumption
since MachineLICM does not adequately estimate pressure.
Therefor I think it shall only be enabled after D107677 even
though it does not depend on it.

Differential Revision: https://reviews.llvm.org/D107859
2021-10-20 11:47:24 -07:00
Itay Bookstein 08ed216000 [IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol
As discussed in:
* https://reviews.llvm.org/D94166
* https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html

The GlobalIndirectSymbol class lost most of its meaning in
https://reviews.llvm.org/D109792, which disambiguated getBaseObject
(now getAliaseeObject) between GlobalIFunc and everything else.
In addition, as long as GlobalIFunc is not a GlobalObject and
getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee
is a GlobalIFunc cannot currently be modeled properly. Creating
aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition,
calling getAliaseeObject on a GlobalIFunc will currently return nullptr,
which is undesirable because it should return the object itself for
non-aliases.

This patch refactors the GlobalIFunc class to inherit directly from
GlobalObject, and removes GlobalIndirectSymbol (while inlining the
relevant parts into GlobalAlias and GlobalIFunc). This allows for
calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc
itself, making getAliaseeObject() more consistent and enabling
alias-to-ifunc to be properly modeled in the IR.

I exercised some judgement in the API clients of GlobalIndirectSymbol:
some were 'monomorphized' for GlobalAlias and GlobalIFunc, and
some remained shared (with the type adapted to become GlobalValue).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108872
2021-10-20 10:29:47 -07:00
Fraser Cormack eabf11f9ea [CodeGenPrepare] Avoid a scalable-vector crash in ctlz/cttz
This patch fixes a crash when despeculating ctlz/cttz intrinsics with
scalable-vector types. It is not safe to speculatively get the size of
the vector type in bits in case the vector type is not a fixed-length type. As
it happens this isn't required as vector types are skipped anyway.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112141
2021-10-20 16:45:55 +01:00
Craig Topper fe1f0de003 [RISCV][WebAssembly][TargetLowering] Allow expandCTLZ/expandCTTZ to rely on CTPOP expansion for vectors.
Our fallback expansion for CTLZ/CTTZ relies on CTPOP. If CTPOP
isn't legal or custom for a vector type we would scalarize the
CTLZ/CTTZ. This is different than CTPOP itself which would use a
vector expansion.

This patch teaches expandCTLZ/CTTZ to rely on the vector CTPOP
expansion instead of scalarizing. To do this I had to add additional
checks to make sure the operations used by CTPOP expansions are all
supported. Some of the operations were already needed for the CTLZ/CTTZ
expansion.

This is a huge improvement to the RISCV which doesn't have a scalar
ctlz or cttz in the base ISA.

For WebAssembly, I've added Custom lowering to keep the scalarizing
behavior. I've also extended the scalarizing to CTPOP.

Differential Revision: https://reviews.llvm.org/D111919
2021-10-20 07:46:41 -07:00
Jeremy Morse 89950ade21 [DebugInfo][InstrRef] Track a single variable at a time
Here's another performance patch for InstrRefBasedLDV: rather than
processing all variable values in a scope at a time, instead, process one
variable at a time. The benefits are twofold:
 * It's easier to reason about one variable at a time in your mind,
 * It improves performance, apparently from increased locality.

The downside is that the value-propagation code gets indented one level
further, plus there's some churn in the unit tests.

Differential Revision: https://reviews.llvm.org/D111799
2021-10-20 15:03:52 +01:00
Sander de Smalen be6c8dc765 [SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors.
When inserting a scalable subvector into a scalable vector through
the stack, the index to store to needs to be scaled by vscale.
Before this patch, that didn't yet happen, so it would generate the
wrong offset, thus storing a subvector to the incorrect address
and overwriting the wrong lanes.

For some insert:
  nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2)

The offset was not scaled by vscale:
  orr     x8, x8, #0x4
  st1h    { z0.h }, p0, [sp]
  st1h    { z1.d }, p1, [x8]
  ld1h    { z0.h }, p0/z, [sp]

And is changed to:
  mov x8, sp
  st1h { z0.h }, p0, [sp]
  st1h { z1.d }, p1, [x8, #1, mul vl]
  ld1h { z0.h }, p0/z, [sp]

Differential Revision: https://reviews.llvm.org/D111633
2021-10-20 13:55:24 +01:00
Simon Pilgrim 71e39e3f18 [ADT] Add APInt::isNegatedPowerOf2() helper
Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos.....

Differential Revision: https://reviews.llvm.org/D111998
2021-10-19 14:38:21 +01:00
Jeremy Morse 849b17949f [DebugInfo][InstrRef] Avoid un-necessary densemap copies and comparisons
This is purely a performance patch: InstrRefBasedLDV used to use three
DenseMaps to store variable values, two for long term storage and one as a
working set. This patch eliminates the working set, and updates the long
term storage in place, thus avoiding two DenseMap comparisons and two
DenseMap assignments, which can be expensive.

Differential Revision: https://reviews.llvm.org/D111716
2021-10-19 11:10:14 +01:00
Jeremy Morse cf033bb2d3 [DebugInfo][NFC] Zero-initialize a class field
This field gets assigned when the relevant object starts being used; but it
remains uninitialized beforehand. This risks introducing hard-to-detect
bugs if something changes, so zero-initialize the field.
2021-10-19 10:24:12 +01:00
Alexandros Lamprineas 04dc68710a [DebugInfo][ARM] Fix incorrect debug information for RWPI accessed globals
When compiling for the RWPI relocation model the debug information is wrong:

* the debug location is described as { DW_OP_addr Var }
  instead of { DW_OP_constNu Var DW_OP_bregX 0 DW_OP_plus }
* the relocation type is R_ARM_ABS32 instead of R_ARM_SBREL32

Differential Revision: https://reviews.llvm.org/D111404
2021-10-18 21:29:46 +01:00
Nikita Popov 54d868991a [ExpandMemCmp] Update CFG before DTU
The applyUpdates() API requires that the CFG is already updated,
so make sure to insert the new terminator first.
2021-10-18 21:49:47 +02:00
Jon Roelofs 1300677f97 [AArch64][GlobalISel] combine and + [la]sr => ubfx
https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111839
2021-10-18 10:33:01 -07:00
Kazu Hirata 8568ca789e Use llvm::erase_if (NFC) 2021-10-18 09:33:42 -07:00
Sanjay Patel 2a3cc4d461 [Analysis] add utility function for unary shuffle mask creation
This is NFC-intended for the callers. Posting in case there are
other potential users that I missed.
I would also use this from VectorCombine in a patch for:
https://llvm.org/PR52178 ( D111901 )

Differential Revision: https://reviews.llvm.org/D111891
2021-10-18 09:00:39 -04:00
Jeremy Morse ea970661dc Fix signed/unsigned comparison after b5426ced71
gcc11 warns that this counter causes a signed/unsigned comaprison when it's
later compared with a SmallVector::difference_type. gcc appears to be
correct, clang does not warn one way or the other.
2021-10-18 10:28:52 +01:00
Jay Foad 012248b0bc Remove the verifyAfter mechanism that was replaced by D111397
Differential Revision: https://reviews.llvm.org/D111872
2021-10-18 10:26:46 +01:00
Jay Foad 36deb9a670 Add new MachineFunction property FailsVerification
TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets
you skip machine verification after a particular pass. Unfortunately
this is used in generic code in TargetPassConfig itself to skip
verification after a generic pass, only because some previous target-
specific pass damaged the MIR on that specific target. This is bad
because problems in one target cause lack of verification for all
targets.

This patch replaces that mechanism with a new MachineFunction property
called "FailsVerification" which can be set by (usually target-specific)
passes that are known to introduce problems. Later passes can reset it
again if they are known to clean up the previous problems.

Differential Revision: https://reviews.llvm.org/D111397
2021-10-18 10:26:46 +01:00
Fraser Cormack 3d850d03ae [SelectionDAG] Fix illegal widening of scalable-vector loads
The process of widening simple vector loads attempts to use a load of a
wider vector type if the original load is sufficiently aligned to avoid
memory faults.

However this optimization is only legal when performed on fixed-length
vector types. For scalable vector types this is invalid (unless vscale
happens to be 1).

This patch does increase the likelihood of compiler crashes (from
`FindMemType` failing to find a suitable type) but this now better
matches how widening non-simple loads, insufficiently-aligned loads, and
scalable-vector stores are handled.

Patches will be introduced later by which loads and stores can be
widened on targets with support for masked or predicated operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111885
2021-10-18 10:00:00 +01:00
Bing1 Yu f383c53311 [MachineSink] Compile time improvement for large testcases which has many kill flags
We did a experiment and observed dramatic decrease on compilation time which spent on clearing kill flags.
Before:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags(ms):1.607509e+05

After:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags:3.987371e+03

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D111688
2021-10-18 15:44:07 +08:00
Mingming Liu cfd155c41b [SelectionDAG] Fix typo in option help
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D111867
2021-10-15 11:27:40 -07:00
Ellis Hoag aa80034ab9 [DebugInfo] retainedTypes should not have subprograms
After D80369, the retainedTypes in CU's should not have any subprograms
so we should not handle that case when emitting debug info.

Differential Revision: https://reviews.llvm.org/D111593
2021-10-15 12:42:25 -04:00
Dávid Bolvanský 6678db00e6 [X86] Enable promotion of i16 popcnt (PR52056)
Solves https://bugs.llvm.org/show_bug.cgi?id=52056

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111507
2021-10-15 15:41:37 +02:00
Kazu Hirata 81e9c90686 [llvm] Use llvm::is_contained (NFC) 2021-10-14 22:44:09 -07:00
Jeremy Morse b5426ced71 [DebugInfo][InstrRef] Place variable-values PHI using LLVM utilities
This patch is very similar to D110173 / a3936a6c19, but for variable
values rather than machine values. This is for the second instr-ref
problem, calculating the correct variable value on entry to each block.
The previous lattice based implementation was broken; we now use LLVMs
existing PHI placement utilities to work out where values need to merge,
then eliminate un-necessary ones through value propagation.

Most of the deletions here happen in vlocJoin: it was trying to pick a
location for PHIs to happen in, badly, leading to an infinite loop in the
MIR test added, where it would repeatedly switch between register
locations. The new approach is simpler: either PHIs can be eliminated, or
they can't, and the location of the value is a different problem.

Various bits and pieces move to the header so that they can be tested in
the unit tests. The DbgValue class grows a "VPHI" kind to represent
variable value PHIS that haven't been eliminated yet.

Differential Revision: https://reviews.llvm.org/D110630
2021-10-14 14:43:43 +01:00
Simon Pilgrim 88487662f7 [Codegen] TargetLowering::getCanonicalIndexType - early out scaled MVT::i8 indices. NFCI.
Avoids unused assignment scan-build warning.
2021-10-14 13:08:40 +01:00
Jeremy Morse e3e1da20d4 Follow up to a3936a6c19, correctly select LiveDebugValues implementation
Some functions get opted out of instruction referencing if they're being
compiled with no optimisations, however the LiveDebugValues pass picks one
implementation and then sticks with it through the rest of compilation.
This leads to a segfault if we encounter a function that doesn't use
instr-ref (because it's optnone, for example), but we've already decided
to use InstrRefBasedLDV which expects to be passed a DomTree.

Solution: keep both implementations around in the pass, and pick whichever
one is appropriate to the current function.
2021-10-14 11:28:53 +01:00
Jeremy Morse fbf269c71e [DebugInfo][InstrRef] Only calculate IDF for reg units
In D110173 we start using the existing LLVM IDF calculator to place PHIs as
we reconstruct an SSA form of machine-code program. Sadly that's slower
than the old (but broken) way, this patch attempts to recover some of that
performance.

The key observation: every time we def a register, we also have to def it's
register units. If we def'd $rax, in the current implementation we
independently calculate PHI locations for {al, ah, ax, eax, hax, rax}, and
they will all have the same PHI positions. Instead of doing that, we can
calculate the PHI positions for {al, ah} and place PHIs for any aliasing
registers in the same positions. Any def of a super-register has to def
the unit, and vice versa, so this is sound. It cuts down the SSA placement
we need to do significantly.

This doesn't work for stack slots, or registers we only ever read, so place
PHIs normally for those. LiveDebugValues choses to ignore writes to SP at
calls, and now have to ignore writes to SP register units too.

Differential Revision: https://reviews.llvm.org/D111627
2021-10-13 16:08:18 +01:00
Jeremy Morse e845ca2ff1 Follow up a3936a6c19 to work around an old compiler bug
Old versions of gcc want template specialisations to happen within the
namespace where the template lives; this is still present in gcc 5.1, which
we officially support, so it has to be worked around.
2021-10-13 13:27:25 +01:00
Jeremy Morse a3936a6c19 [DebugInfo][InstrRef] Use PHI placement utilities for machine locations
InstrRefBasedLDV used to try and determine which values are in which
registers using a lattice approach; however this is hard to understand, and
broken in various ways. This patch replaces that approach with a standard
SSA approach using existing LLVM utilities. PHIs are placed at dominance
frontiers; value propagation then eliminates un-necessary PHIs.

This patch also adds a bunch of unit tests that should cover many of the
weirder forms of control flow.

Differential Revision: https://reviews.llvm.org/D110173
2021-10-13 12:49:04 +01:00
Kerry McLaughlin 1a2e90199f [SVE][CodeGen] Add patterns for ADD/SUB + element count
This patch adds patterns to match the following with INC/DEC:
 - @llvm.aarch64.sve.cnt[b|h|w|d] intrinsics + ADD/SUB
 - vscale + ADD/SUB

For some implementations of SVE, INC/DEC VL is not as cheap as ADD/SUB and
so this behaviour is guarded by the "use-scalar-inc-vl" feature flag, which for SVE
is off by default. There are no known issues with SVE2, so this feature is
enabled by default when targeting SVE2.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D111441
2021-10-13 11:36:15 +01:00
Heejin Ahn 9261ee32dc [WebAssembly] Make EH work with dynamic linking
This makes Wasm EH work with dynamic linking. So far we were only able
to handle destructors, which do not use any tags or LSDA info.

1. This uses `TargetExternalSymbol` for `GCC_except_tableN` symbols,
   which points to the address of per-function LSDA info. It is more
   convenient to use than `MCSymbol` because it can take additional
   target flags.

2. When lowering `wasm_lsda` intrinsic, if PIC is enabled, make the
   symbol relative to `__memory_base` and generate the `add` node. If
   PIC is disabled, continue to use the absolute address.

3. Make tag symbols (`__cpp_exception` and `__c_longjmp`) undefined in
   the backend, because it is hard to make it work with dynamic
   linking's loading order. Instead, we make all tag symbols undefined
   in the LLVM backend and import it from JS.

4. Add support for undefined tags to the linker.

Companion patches:
- https://github.com/WebAssembly/binaryen/pull/4223
- https://github.com/emscripten-core/emscripten/pull/15266

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D111388
2021-10-12 23:28:27 -07:00
Amara Emerson 5abce56edb [GlobalISel] Add support for constant vector folding of binops in CSEMIRBuilder.
Differential Revision: https://reviews.llvm.org/D111524
2021-10-12 11:31:22 -07:00
Hongtao Yu 098a0d8fbc [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3.
This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation.

Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are:
- Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes
- Unblocked CSE by avoiding pseudo probe from clobbering memory SSA
- Unblocked induction variable simpliciation
- Allow empty loop deletion by treating probe intrinsic isDroppable
- Some refactoring.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D110847
2021-10-12 09:44:12 -07:00
Jeremy Morse d9fa186a5c Scatter NDEBUG to fix after 838b4a533e
These "dump" methods call into MachineOperand::dump, which doesn't exist
with NDEBUG, thus we croak. Disable LiveDebugValues dump methods when
NDEBUG is turned on to avoid this.
2021-10-12 17:13:15 +01:00
Jay Foad f7ee21aa32 [TwoAddressInstruction] Remove ad hoc machine verification
With the -early-live-intervals command line flag,
TwoAddressInstructionPass::runOnMachineFunction would call
MachineFunction::verify before returning to check the live intervals.
But there was not much benefit to doing this since -verify-machineinstrs
and LLVM_ENABLE_EXPENSIVE_CHECKS provide a more general way of
scheduling machine verification after every pass.

Also it caused problems on targets like Lanai which are marked as "not
machine verifier clean", since verification would fail for known
target-specific problems which are nothing to do with LiveIntervals.

Differential Revision: https://reviews.llvm.org/D111618
2021-10-12 16:09:18 +01:00
Jeremy Morse 838b4a533e [DebugInfo][NFC] Move LiveDebugValues class to header
This patch shifts the InstrRefBasedLDV class declaration to a header.
Partially because it's already massive, but mostly so that I can start
writing some unit tests for it. This patch also adds the boilerplate for
said unit tests.

Differential Revision: https://reviews.llvm.org/D110165
2021-10-12 16:07:26 +01:00
Yonghong Song 325d000765 [NFC][Attr] rename attribute btf_tag to btf_decl_tag
Per discussion in https://reviews.llvm.org/D111199,
the existing btf_tag attribute will be renamed to
btf_decl_tag. This patch mostly updated the Bitcode and
DebugInfo test cases with new attribute name.

Differential Revision: https://reviews.llvm.org/D111591
2021-10-11 20:57:31 -07:00
Amara Emerson 53ebfa7c5d [AArch64][GlobalISel] Fix combiner assertion in matchConstantOp().
We shouldn't call APInt::getSExtValue() on a >64b value.
2021-10-11 15:55:13 -07:00
Guozhi Wei 6599961c17 [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation
This patch contains following enhancements to SrcRegMap and DstRegMap:

  1 In findOnlyInterestingUse not only check if the Reg is two address usage,
    but also check after commutation can it be two address usage.

  2 If a physical register is clobbered, remove SrcRegMap entries that are
    mapped to it.

  3 In processTiedPairs, when create a new COPY instruction, add a SrcRegMap
    entry only when the COPY instruction is coalescable. (The COPY src is
    killed)

With these enhancements isProfitableToCommute can do better commute decision,
and finally more register copies are removed.

Differential Revision: https://reviews.llvm.org/D108731
2021-10-11 15:28:31 -07:00
Roman Lebedev 684cbae89a
[KnownBits] Introduce `countMaxActiveBits()` and use it in a few places 2021-10-11 23:36:06 +03:00