Commit Graph

32929 Commits

Author SHA1 Message Date
Matt Arsenault 740f920a1f LiveRegUnits: Break register loop when a clobber is encountered 2022-09-13 10:15:08 -04:00
Matt Arsenault b7dae832e6 DeadMachineInstructionElim: Don't repeat per-function init
This was happening for every iteration but only needs to be done once.
2022-09-13 08:19:54 -04:00
Matt Arsenault 243632c63e LiveRegUnits: Cleanup isReg checks
This is the common case and should be checked first. Provides a very
marginal compile time improvement on the example I'm looking at.
2022-09-13 08:19:53 -04:00
Pavel Samolysov 02aaf8e3d6 [NFC][ScheduleDAGInstrs] Use structure bindings and emplace_back
Some uses of std::make_pair and the std::pair's first/second members
in the ScheduleDAGInstrs.[cpp|h] files were replaced with using of the
vector's emplace_back along with structure bindings from C++17.
2022-09-13 12:49:04 +03:00
Sylvestre Ledru cd20a18286 Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView"
Causing:
https://github.com/llvm/llvm-project/issues/57709

This reverts commit ab56719acd.
2022-09-13 10:53:59 +02:00
David Green f124e59b2e [TypePromotionPass] Don't treat phi's as ToPromote
This attempts to stop the type promotion pass transforming where it is
not profitable, by not marking PhiNodes as ToPromote and being more
aggressive about pulling extends out of loops.

Differential Revision: https://reviews.llvm.org/D133203
2022-09-13 08:57:15 +01:00
Matt Arsenault 920b2e65fc DeadMachineInstructionElim: Fix typo 2022-09-12 20:10:33 -04:00
Aiden Grossman eec183c171 [nfc] Refactor SlotIndex::getInstrDistance to better reflect actual functionality
This patch refactors SlotIndex::getInstrDistance to
SlotIndex::getApproxInstrDistance to better describe the actual
functionality of this function. This patch also adds in some additional
comments better documenting the assumptions that this function makes to
increase clarity.

Based on discussion on the LLVM Discourse:
https://discourse.llvm.org/t/odd-behavior-in-slotindex-getinstrdistance/64934/5

Reviewed By: mtrofin, foad

Differential Revision: https://reviews.llvm.org/D133386
2022-09-12 23:33:35 +00:00
Amara Emerson f24f469223 [GlobalISel] Fix crash when lowering G_SELECT of pointer vectors.
The bit masking lowering only works for vectors of scalars, so for pointer
element types we need to add some casting.

Differential Revision: https://reviews.llvm.org/D133672
2022-09-13 00:01:37 +01:00
Matt Arsenault d90f7cb559 LiveRegUnits: Do not use phys_regs_and_masks
Somehow DeadMachineInstructionElim is about 3x slower when using it.
Hopefully this reverses the compile time regression reported for
b5041527c7.
2022-09-12 17:21:24 -04:00
David Majnemer ab56719acd [clang, llvm] Add __declspec(safebuffers), support it in CodeView
__declspec(safebuffers) is equivalent to
__attribute__((no_stack_protector)).  This information is recorded in
CodeView.

While we are here, add support for strict_gs_check.
2022-09-12 21:15:34 +00:00
Kazu Hirata 9606608474 [llvm] Use x.empty() instead of llvm::empty(x) (NFC)
I'm planning to deprecate and eventually remove llvm::empty.

I thought about replacing llvm::empty(x) with std::empty(x), but it
turns out that all uses can be converted to x.empty().  That is, no
use requires the ability of std::empty to accept C arrays and
std::initializer_list.

Differential Revision: https://reviews.llvm.org/D133677
2022-09-12 13:34:35 -07:00
YongKang Zhu 481a32f587 Bug fix on stable hash calculation for machine operands RegisterMask and RegisterLiveOut
MachineOperand::getRegMask() returns a pointer to register mask.  We should hash the raw content of register mask instead of its pointer.

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D133637
2022-09-12 13:25:04 -07:00
Craig Topper 38ffa2bb96 [LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.
For remainder:
If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves
together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer
type, this urem will be expand by DAGCombiner using multiply by magic
constant. We do have to take into account that adding high and low
together can produce a carry, making it a (BitWidth / 2)+1 bit number.
So we need to also add back in the carry from the first addition.

For division:
We can use the above trick to compute the remainder, subtract that
remainder from the dividend, then multiply by the multiplicative
inverse of the Divisor modulo (1 << BitWidth).

This is based on the section "Remainder by Summing Digits" in
Hacker's delight.

The remainder trick is similar to a trick you may have learned for
determining if a decimal number is divisible by 3. You can add all the
digits together and see if the sum is divisible by 3. If you're not sure
if the sum is divisible by 3, you can add its digits together. This
can be repeated until you have a single decimal digit. If that digit
is 3, 6, or 9, then the original number is divisible by 3. This works
because 10 % 3 == 1.

gcc already does this same trick. There are additional tricks gcc
does urem as well as srem, udiv, and sdiv that I plan to add in
future patches.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130862
2022-09-12 10:34:52 -07:00
Matthias Gehre c1502425ba Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth
Also remove new-pass-manager version of ExpandLargeDivRem because there is no way
yet to access TargetLowering in the new pass manager.

Differential Revision: https://reviews.llvm.org/D133691
2022-09-12 17:06:16 +01:00
Jay Foad 210e6a993d [GlobalISel] Simplify extended add/sub to add/sub with carry
Simplify extended add/sub (with carry-in and carry-out) to add/sub with
carry (with carry-out only) if carry-in is known to be zero.

Differential Revision: https://reviews.llvm.org/D133702
2022-09-12 17:05:44 +01:00
Matt Arsenault e30271169f RegAllocGreedy: Try local instruction splitting with subranges
This was only trying this to relax register class constraints, but
this can also help if there are subranges involved.

This solves a compilation failure for AMDGPU when there is high
pressure created by large register tuples. If one virtual register is
using most of the available budget, we need to be able to evict
subranges.

This solves the immediate failure, but this solution leaves a lot to
be desired. In the relevant testcases, we have 32-element tuples but
most of the uses are operations on 1 element subranges of it. What
we're now getting is a spill and restore of the full 1024 bits and an
extract of the used 32-bits. It would be far better if we introduced a
copy to a new virtual register with a smaller register class and used
narrower spills.

Furthermore, we could probably do a better job if the allocator were
to introduce new subranges where none previously existed in the
highest pressure scenarios. The block and region splits should also
try to split specific subranges out.

The mve-vst3.ll test changes looks like noise to me, but instruction
count increased by one. mve-vst4.ll looks like a solid improvement
with several 16-byte spills eliminated. splitkit-copy-live-lanes.mir
also shows a solid reduction in total spill count.

This could use more tests but it's pretty tiring to come up with cases
that fail on this.
2022-09-12 09:03:55 -04:00
Pavel Samolysov f045d0c392 [NFC][ScheduleDAG] Use a reference to iterate over NodeSuccs/ChainSuccs 2022-09-12 15:54:48 +03:00
Pavel Samolysov 05946c144d [NFC][ScheduleDAG] Use structure bindings and emplace_back
Some uses of std::make_pair and the std::pair's first/second members
in the ScheduleDAGRRList.cpp file were replaced with using of the
vector's emplace_back along with structure bindings from C++17.
2022-09-12 15:54:48 +03:00
Matt Arsenault c34679b2e8 DAG: Sink some getter code closer to uses 2022-09-12 08:38:35 -04:00
Matt Arsenault bb70b5d406 CodeGen: Set MODereferenceable from isDereferenceableAndAlignedPointer
Previously this was assuming piontsToConstantMemory implies
dereferenceable.
2022-09-12 08:38:35 -04:00
Pavel Samolysov 354a3d9c02 [NFC][ScheduleDAG] Use Register and MCPhysReg instead of unsigned 2022-09-12 15:18:11 +03:00
Matt Arsenault b5041527c7 DeadMachineInstructionElim: Switch to using LiveRegUnits
Theoretically improves compile time for targets with many overlapping
registers
2022-09-12 07:55:14 -04:00
Matt Arsenault 6c44a7179f RegAlloc: Use SmallSet instead of std::set
There shouldn't be more than a small handful of hints at most.
2022-09-12 07:55:10 -04:00
David Spickett 739b69e655 [LLVM][AArch64] Explain that X19 is used as the frame base pointer register
Fixes #50098

LLVM uses X19 as the frame base pointer, if it needs to. Meaning you
can get warnings if you clobber that with inline asm.

However, it doesn't explain why. The frame base register is not part
of the ABI so it's pretty confusing why you get that warning out of the blue.

This adds a method to explain a reserved register with X19 as the first one.
The logic is the same as getReservedRegs.

I could have added a return parameter to isASMClobberable and friends
but found that there's a lot of things that call isReservedReg in various
ways.

So while one more method on the pile isn't great design, it is simpler
right now to do it this way and only pay the cost if you are actually using
a reserved register.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D133213
2022-09-12 09:18:09 +00:00
Kazu Hirata af91e2b9db [GlobalISel] Use std::initializer_list::size (NFC) 2022-09-11 12:19:37 -07:00
Manuel Brito b51c6130ef Use PoisonValue instead of UndefValue when RAUWing unreachable code [NFC]
Replacing the following instances of UndefValue with PoisonValue, where the UndefValue is used as an arbitrary value:

- llvm/lib/CodeGen/WinEHPrepare.cpp
`demotePHIsOnFunclets`: RAUW arbitrary value for lingering uses of removed PHI nodes

 - llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
`FoldSingleEntryPHINodes`: Removes a self-referential single entry phi node.

 - llvm/lib/Transforms/Utils/CallGraphUpdater.cpp
`finalize`: Remove all references to removed functions.

- llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
`cleanup`: the result is not used then the inserted instructions are removed.

 - llvm/tools/bugpoint/CrashDebugger.cpp
`TestInts`:  the program is cloned and instructions are removed to narrow down source of crash.

Differential Revision: https://reviews.llvm.org/D133640
2022-09-10 14:28:01 +01:00
Craig Topper 545affbf79 [DAGCombiner] Use HandleSDNode to keep node alive across call to getNegatedExpression.
getNegatedExpression can delete nodes. If the first call to
getNegatedExpression produced a node that the second call also
manages to create, it might get deleted. Use a HandleSDNode to
ensure it has a use to prevent it from being deleted.

Fixes PR57658.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D133602
2022-09-09 22:02:41 -07:00
Sebastian Neubauer c7750c522e Add helper func to get first non-alloca position
The LLVM performance tips suggest that allocas should be placed at the
beginning of the entry block. So far, llvm doesn’t provide any helper to
find that position.

Add BasicBlock::getFirstNonPHIOrDbgOrAlloca and IRBuilder::SetInsertPointPastAllocas(Function*)
that get an insert position after the (static) allocas at the start of a
function and use it in ShadowStackGCLowering.

Differential Revision: https://reviews.llvm.org/D132554
2022-09-09 15:39:53 +02:00
Craig Topper aa83bdd198 [DAGCombiner][X86] Fold (sub (subcarry X, 0, Carry), Y) -> (subcarry X, Y, Carry)
Fixes PR57576.

Differential Revision: https://reviews.llvm.org/D133471
2022-09-08 22:56:46 -07:00
Joe Loser 5e96cea1db [llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429
2022-09-08 09:01:53 -06:00
Eric Wang d8a2d3f7d4 [NFC][Regalloc] Introduce the RegAllocPriorityAdvisorAnalysis
This patch introduces the priority analysis and the priority advisor,
the default implementation, and the scaffolding for introducing the
other implementations of the advisor.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D132835
2022-09-08 07:50:03 -07:00
Arthur Eubanks 7e0a52e8e9 [NFC][MachineFunctionPass] Only lookup pass name if we request printing
Should report the small compile time regression reported in D133055.
2022-09-07 21:38:00 -07:00
Marco Elver 343700358f [AsmPrinter] Emit PCs into requested PCSections
Interpret MD_pcsections in AsmPrinter emitting the requested metadata to
the associated sections. Functions and normal instructions are handled.

Differential Revision: https://reviews.llvm.org/D130879
2022-09-07 11:36:02 +02:00
Marco Elver 31a548021b [GlobalISel] Propagate PCSections metadata to MachineInstr
Propagate (most) PC sections metadata to MachineInstr when GlobalISel is
doing instruction selection.

This change results in support for architectures using GlobalISel (such
as -O0 with AArch64). Not all instructions may be supported yet, and
requires further target-specific handling (such as done for AArch64
pseudo-atomics). Expanding supported instructions is planned on a
case-by-case basis and new use cases for PC sections metadata.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130886
2022-09-07 11:36:02 +02:00
Marco Elver f0d6709e4a [AtomicExpandPass] Always copy pcsections Metadata to expanded atomics
When expanding IR atomics to target-specific atomics, copy all
!pcsections Metadata to expanded atomics automatically.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130885
2022-09-07 11:36:01 +02:00
Marco Elver 0ba8886af5 [FastISel] Propagate PCSections metadata to MachineInstr
Propagate PC sections metadata to MachineInstr when FastISel is doing
instruction selection.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130884
2022-09-07 11:36:01 +02:00
Marco Elver 4c58b00801 [SelectionDAG] Propagate PCSections through SDNodes
Add a new entry to SDNodeExtraInfo to propagate PCSections through
SelectionDAG.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130882
2022-09-07 11:22:50 +02:00
Xiang1 Zhang 16743c9534 [CodeGen] Limit building time in CodeGenPrepare for huge function
Details:

Currently CodeGenPrepare is very time consuming in handling big functions.

Old Algorithm :
It iterate each BB in function, and go on handle very instructions in BB.
Due to some instruction optimizations may affect the BBs' dominate tree.
The old logic will re-iterate and try optimize for each BB.

Suppose we have a big function with 20000 BBs, If we handled the last BB
with fine tuning the dominate tree. We need totally re-iterate and try optimize
the 20000 BBs from the beginning.

The Complex is near N!

And we really encounter somes big tests (> 20000 BBs) that cost more than 30
mins in this pass. (Debug version compiler will cost 2 hours here)

What this patch do for huge function ?
It mainly changes the iteration way for optimization.

1 We do optimizeBlock for each BB (that is same with old way).
And, in the meaning time, If BB is changed/updated in the optimization, it will
be put into FreshBBs (try do optimizeBlock again).
The new created BB at previous iteration will also put into FreshBBs.

2 For the BBs which not updated at previous iteration, we directly skip it.
Strictly speaking, here may miss some opportunity, but the probability is very
small.

3 For Instructions in single BB, we do optimizeInst for each instruction.
If optimizeInst change the instruction dominator in this BB, rather than break
and go back to optimize the first BB (the old way), we directly iterate
instructions (to do optimizeInst) in this updated BB again (the new way).

What this patch do for small/normal (not huge) function ?
It is same with the Old Algorithm. (NFC)

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D129352
2022-09-07 10:05:40 +08:00
Markus Böck f049b2c3fc [MC] Emit Stackmaps before debug info
This patch is essentially an alternative to https://reviews.llvm.org/D75836 and was mentioned by @lhames in a comment.

The gist of the issue is that Mach-O has restrictions on which kind of sections are allowed after debug info has been emitted, which is also properly asserted within LLVM. Problem is that stack maps are currently emitted as one of the last sections in each target-specific AsmPrinter so far, which would cause the assertion to trigger. The current approach of special casing for the `__LLVM_STACKMAPS` section is not viable either, as downstream users can overwrite the stackmap format using plugins, which may want to use different sections.

This patch fixes the issue by emitting the stack map earlier, right before debug info is emitted. The way this is implemented is by taking the choice when to emit the StackMap away from the target AsmPrinter and doing so in the base class. The only disadvantage of this approach is that the `StackMaps` member is now part of the base class, even for targets that do not support them. This is functionaly not a problem however, as emitting an empty `StackMaps` is a no-op.

Differential Revision: https://reviews.llvm.org/D132708
2022-09-06 20:20:56 +02:00
Amara Emerson fe7c3b87ce Add parantheses to silence warning. 2022-09-06 15:36:19 +01:00
Marco Elver 7d63983c65 [SelectionDAG] Properly copy ExtraInfo on RAUW
During SelectionDAG legalization SDNodes with associated extra info may
be replaced with a new SDNode. Preserve associated extra info on
ReplaceAllUsesWith and remove entries in DeallocateNode.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130881
2022-09-06 16:32:50 +02:00
Marco Elver cc3faf4226 [SelectionDAG] Rename CallSiteDbgInfo to NodeExtraInfo
For information infrequently attached to SDNodes, it is useful to
provide a way to add this information out-of-line. This is already done
for call-site specific information.

Rename CallSiteDbgInfo to NodeExtraInfo in preparation of adding
additional information not necessarily related to call sites only.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130880
2022-09-06 16:32:50 +02:00
Matthias Gehre 2090e85fee [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64
This adds the ExpandLargeDivRem to the default pass pipeline.
The limit at which it expands div/rem instructions is configured
via a new TargetTransformInfo hook (default: no expansion)
X86, Arm and AArch64 backends implement this hook to expand div/rem
instructions with more than 128 bits.

Differential Revision: https://reviews.llvm.org/D130076
2022-09-06 15:32:04 +01:00
Marco Elver 42836e283f [MachineInstr] Allow setting PCSections in ExtraInfo
Provide MachineInstr::setPCSection(), to propagate relevant metadata
through the backend. Use ExtraInfo to store the metadata.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130876
2022-09-06 15:52:44 +02:00
Amara Emerson 3dd861818a [GlobalISel] Combine G_INSERT/EXTRACT_VECTOR_ELT with out of bounds indices to undef.
Differential Revision: https://reviews.llvm.org/D133309
2022-09-06 13:45:04 +01:00
Benjamin Kramer c349d7f4ff [SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path
The main difference is that this preserves intermediate rounding steps,
which the other route doesn't. This aligns bfloat16 more with half
floats, which use this path on most targets.

I didn't understand what the difference was between these softening
approaches when I first added bfloat lowerings, would be nice if we only
had one of them.

Based on @pengfei 's D131502

Differential Revision: https://reviews.llvm.org/D133207
2022-09-06 11:54:34 +02:00
Daniil Fukalov 51d33afcbe [RegisterCoalescer] Fix crash on early clobbered subreg operands.
The issue was with processing two subregs of the same reg are used in the same
instruction (e.g. inline asm): "def early-clobber" and other just "def".
Register coalescer ran in bad recursion if the early clobbered subreg is second
in the following sequence of COPYs.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D127136
2022-09-06 08:42:37 +03:00
Daniil Fukalov 99d364d1f4 [MachineVerifier] Fix crash on early clobbered subreg operands.
MachineVerifier tried to checkLivenessAtDef() ignoring it is actually a subreg.

The issue was with processing two subregs of the same reg are used in the same
instruction (e.g. inline asm): "def early-clobber" and other just "def".

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D126661
2022-09-05 17:08:21 +03:00
David Sherwood ffa6267300 [CodeGen] Support extracting fixed-length vectors from illegal scalable vectors
For some indices we can simply extract the fixed-length subvector from the
low half of the scalable vector, for example when the index is less than the
minimum number of elements in the low half. For all other cases we can
expand the operation through the stack by storing out the vector and
reloading the fixed-length part we need.

Fixes https://github.com/llvm/llvm-project/issues/55412

Tests added here:

  CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll

Differential Revision: https://reviews.llvm.org/D117499
2022-09-05 15:05:14 +01:00