Similar to #57402 - we were calling isGuaranteedNotToBeUndefOrPoison on the freeze operand (with Depth = 0), but wasn't accounting for the fact that a later isGuaranteedNotToBeUndefOrPoison assertion will call from the new node (with Depth = 0 as well) - which will then recursively call isGuaranteedNotToBeUndefOrPoison for its operands with Depth = 1
Fixes#57554
The way ComputeNumSignBits was being used was only correct if
OuterBitSize is exactly 2x InnerBitSize. Which is always true,
but not obviously so. Comparing ComputeMaxSignificantBits to
InnerBitSize feels more correct.
CGP uses a raw `getInstructionCost(I, TargetTransformInfo::TCK_SizeAndLatency) >= TCC_Expensive` check to see if its better to move an expensive instruction used in a select behind a branch instead.
This is causing issues with upcoming improvements to TCK_SizeAndLatency costs on X86 as we need to use TCK_SizeAndLatency as an uop count (so its compatible with various target-specific buffer sizes - see D132288), but we can have instructions that have a low TCK_SizeAndLatency value but should still be treated as 'expensive' (FDIV for example) - by adding a isExpensiveToSpeculativelyExecute wrapper we can keep the current behaviour but still add an x86 override in a future patch when the cost tables are updated to compensate.
[MachineFunctionPass] Support -filter-passes for -print-changed
-filter-passes specifies a `PassID` (a lower-case dashed-separated pass name,
also used by -print-after, -stop-after, etc) instead of a CamelCasePass.
`-filter-passes=CamelCaseNewPMPass` seems like a workaround for new PM passes before
we can use lower-case dashed-separated pass names (as used by `-passes=`).
Example:
```
# getPassName() is "IRTranslator". PassID is "irtranslator"
llc -mtriple=aarch64 -print-changed -filter-passes=irtranslator < print-changed-machine.ll
```
Close https://github.com/llvm/llvm-project/issues/57453
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D133055
DwarfEhPrepare inserts calls to _Unwind_Resume into landing pads.
If _Unwind_Resume happens to be defined in the same module and
debug info is used, then this leads to a verifier error:
inlinable function call in a function with debug info must
have a !dbg location
call void @_Unwind_Resume(ptr %exn.obj) #0
Fix this by assigning a dummy location to the call. (As this
happens in the backend, inlining is not actually relevant here.)
Fixes https://github.com/llvm/llvm-project/issues/57469.
Differential Revision: https://reviews.llvm.org/D133095
Some targets like RISC-V require operands to be inspected to
determine if an instruction is similar to a move.
Spotted while investigating code differences between using an ADDI
vs an ADDIW. RISC-V has the isAsCheapAsAMove flag for ADDI, but
the TII hook checks the immediate is 0 or the register is X0. ADDIW
is never generated with X0 or with an immediate of 0 so it doesn't
have the isAsCheapAsAMove flag.
I don't know enough about the PRE code to write a test for this yet.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D132981
Warn if `.size` is specified for a function symbol. The size of a
function symbol is determined solely by its content.
I noticed this simplification was possible while debugging #57427, but
this change doesn't fix that specific issue.
Differential Revision: https://reviews.llvm.org/D132929
We feed the result from the first extractShiftForRotate call into the second, and that result might no longer be a shift op (usually due to constant folding).
NOTE: We REALLY need to stop creating nodes on the fly inside extractShiftForRotate!
Fixes Issue #57474
We were calling isGuaranteedNotToBeUndefOrPoison on operands (with Depth = 0), but wasn't accounting for the fact that a later isGuaranteedNotToBeUndefOrPoison assertion will call from the new node (with Depth = 0 as well) - which will then recursively call isGuaranteedNotToBeUndefOrPoison for its operands with Depth = 1
Fixes#57402
IIUC, the conversion part is not part of atomic operations and fences should be put around converted atomic operations.
This also fixes atomic load of floating point values which requires fence on PowerPC.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D127609
The provided testcase would previously fail with an assertion due to later down below trying to allocate registers for `token` return types and arguments. This is especially problematic as the process would then exit instead of falling back to using FastIsel.
This patch fixes that by simply explicitly failing translation if either of these intrinsics are encountered.
Fixes https://github.com/llvm/llvm-project/issues/57349
Differential Revision: https://reviews.llvm.org/D132974
widenScalarDst updates the insert point to after MI, so
widenScalarSrc must be called before widenScalarDst. Otherwise
The updated Src values will appear after MI and break SSA. e.g.:
%14:_(s64), %15:_(s1) = G_UADDE %9:_, %11:_, %13:_
becomes
%14:_(s64), %16:_(s32) = G_UADDE %9:_, %11:_, %17:_
%15:_(s1) = G_TRUNC %16:_(s32)
%17:_(s32) = G_ZEXT %13:_(s1)
Differential Revision: https://reviews.llvm.org/D132547
Change-Id: Ie3458747a6879433f4d5ab9939d2bd102dd0f2db
Mostly just modeled after vp.fneg except there is a
"functional instruction" for fneg while fabs is always an
intrinsic.
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D132793
This patch replaces calls to greatestCommonDivisor with std::gcd where
two arguments are of the same type. This means that
std::common_type_t of the argument type is the same as the argument
type.
We could drop calls to std::abs in some cases, but that's left for
another patch.
This patch replaces calls to greatestCommonDivisor with std::gcd where
both arguments are known to be of unsigned. This means that
std::common_type_t of the two argument types should just be the wider
one of the two.
This patch replaces getLCMSize with std::lcm, a C++17 feature.
Note that all the arguments are of unsigned with no implicit type
conversion as they are passed to getLCMSize.
Adds a pass ExpandLargeDivRem to expand div/rem instructions
with more than 128 bits into a loop computing that value.
As discussed on https://reviews.llvm.org/D120327, this approach has the advantage
that it is independent of the runtime library. This also helps the clang driver,
which otherwise would need to understand enough about the runtime library
to know whether to allow _BitInts with more than 128 bits.
Targets are still free to disable this pass and instead provide a faster
implementation in a runtime library.
Fixes https://github.com/llvm/llvm-project/issues/44994
Differential Revision: https://reviews.llvm.org/D126644
This patch follows the InstCombine approach of stripping poison generating flags (nsw/nuw from add/sub etc.) to allow us to push a freeze() through the op. Unlike InstCombine it doesn't retain any flags, but we have plenty of DAG folds that do the same thing already. We assert that the newly generated op isGuaranteedNotToBeUndefOrPoison.
Similar to the ValueTracking approach, isGuaranteedNotToBeUndefOrPoison has been updated to confirm that if an op can't create undef/poison and its operands are guaranteed not to be undef/poison - then its not undef/poison. This is just for the generic opcodes - target specific opcodes will need to do this manually just in case they have some special cases.
Differential Revision: https://reviews.llvm.org/D132333
While this does not matter for most targets, when building for Arm Morello,
we have to mark the symbol as a function and add size information, so that
LLD can correctly evaluate relocations against the local symbol.
Since Morello is an out-of-tree target, I tried to reproduce this with
in-tree backends and with the previous reviews applied this results in
a noticeable difference when targeting Thumb.
Background: Morello uses a method similar Thumb where the encoding mode is
specified in the LSB of the symbol. If we don't mark the target as a
function, the relocation will not have the LSB set and calls will end up
using the wrong encoding mode (which will almost certainly crash).
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D131429
D132080 introduced a bug leading to `RegisterClassInfo` caches not
getting invalidated when there was exactly one more CSR register added.
Differential Revision: https://reviews.llvm.org/D132606
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.
Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.
KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.
A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:
```
.c:
int f(void);
int (*p)(void) = f;
p();
.s:
.4byte __kcfi_typeid_f
.global f
f:
...
```
Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.
As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.
Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.
Relands 67504c9549 with a fix for
32-bit builds.
Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay
Differential Revision: https://reviews.llvm.org/D119296
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.
Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.
KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.
A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:
```
.c:
int f(void);
int (*p)(void) = f;
p();
.s:
.4byte __kcfi_typeid_f
.global f
f:
...
```
Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.
As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.
Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.
Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay
Differential Revision: https://reviews.llvm.org/D119296
The diff modifies ext-tsp code layout algorithm in the following ways:
(i) fixes merging of cold block chains (this is a port of D129397);
(ii) adjusts the cost model utilized for optimization;
(iii) adjusts some APIs so that the implementation can be used in BOLT; this is
a prerequisite for D129895.
The only non-trivial change is (ii). Here we introduce different weights for
conditional and unconditional branches in the cost model. Based on the new model
it is slightly more important to increase the number of "fall-through
unconditional" jumps, which makes sense, as placing two blocks with an
unconditional jump next to each other reduces the number of jump instructions in
the generated code. Experimentally, this makes a mild impact on the performance;
I've seen up to 0.2%-0.3% perf win on some benchmarks.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D129893
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.
For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.
Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.
Differential Revision: https://reviews.llvm.org/D132520
Based off Issue #57283 - we need to try harder to ensure we're not creating nodes on-the-fly - so make sure we're just using SelectionDAG for analysis where possible
extractShiftForRotate may fail to return canonicalized shifts due to constant folding or other simplification that can occur in getNode()
Fixes Issue #57283
(ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0)
Adjust the legality check to avoid the poor codegen on AArch64.
We probably only want to use popcount on this pattern when it
is a single instruction.
fixes#57225
Differential Revision: https://reviews.llvm.org/D132237
This patch builds on prior support patches to enable support for
variadic debug values in InstrRefLDV, allowing DBG_VALUE_LISTs to
have their ranges extended.
Differential Revision: https://reviews.llvm.org/D128212
musttail should be honored even in the presence of attributes like "disable-tail-calls". SelectionDAG properly handles this.
Update LangRef to explicitly mention that this is the semantics of musttail.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D132193
This patch adds the last of the changes required to enable
DBG_VALUE_LIST handling in InstrRefLDV, handling variadic debug values
during the transfer tracking step. Most of the changes are fairly
straightforward, and based around tracking multiple locations per
variable in TransferTracker::VLocTracker.
Differential Revision: https://reviews.llvm.org/D128211
In preparation for supporting DBG_VALUE_LIST in InstrRefLDV, this patch
adds the logic for emitting DBG_VALUE_LIST instructions from
InstrRefLDV. The logical changes here are fairly simple, with the main
change being that instead of directly prepending offsets to the DIExpr,
we use appendOpsToArg to modify the expression for individual debug
operands in the expression. The function emitLoc is also changed to take
a list of debug ops, with an empty list meaning an undef value.
Differential Revision: https://reviews.llvm.org/D128209
CodeGenPrepare pass can sink pointer comparison across statepoint
to the point of use (see comment in IR/SafepointIRVerifier.cpp)
Due to specifics of statepoints, it is still legal to have tied
def and use rewritten to the same register in TwoAddress pass.
However, properly updating LiveIntervals and LiveVariables becomes
complicated. For simplicity, let's fall back to generic handling of
tied registers when we detect such case.
TODO: This fixes functional (assertion) failure. Ideally we should
try to recompute new live range/liveness in place.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D132255
In preparation for adding support for DBG_VALUE_LIST instructions in
InstrRefLDV, this patch updates the logic for joining variables at block
joins to support joining variables that use multiple debug operands.
This is one of the more meaty "logical" changes, although the line count
isn't too high - this changes pickVPHILoc to find a valid joined
location for every operand, with part of the function being split off
into pickValuePHILoc which finds a location for a single operand.
Differential Revision: https://reviews.llvm.org/D128180
This interface allows a target to reject a proposed
SMS schedule. For Hexagon/PowerPC, all schedules
are accepted, leaving behavior unchanged. For ARM,
schedules which exceed register pressure limits are
rejected.
Also, two RegisterPressureTracker methods now need to be public so
that register pressure can be computed by more callers.
Reapplication of D128941/(reversion:D132037) with small fix.
Differential Revision: https://reviews.llvm.org/D132170
This completes the client side transition to the OperandValueInfo version of this routine. Backend TTI implementations still use the prior versions for now.
This patch fixes:
llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.h:330:5: error:
anonymous types declared in an anonymous union are an extension
[-Werror,-Wnested-anon-types]
In preparation for allowing InstrRefBasedLDV to handle DBG_VALUE_LIST,
this patch updates the internal representation that it uses to represent
debug values to store a list of values. This is one of the more
significant changes in terms of line count, but is fairly simple and
should not affect the output of this pass.
Differential Revision: https://reviews.llvm.org/D128177
`RegisterClassInfo` caches information like allocation orders and reuses
it for multiple machine functions where possible. However the `MCPhysReg
*CalleeSavedRegs` field used to test whether the set of callee saved
registers changed did not work: After D28566
`MachineRegisterInfo::getCalleeSavedRegs()` can return dynamically
computed CSR sets that are only valid while the `MachineRegisterInfo`
object of the current function exists.
This changes the code to make a copy of the CSR list instead of keeping
a possibly invalid pointer around.
Differential Revision: https://reviews.llvm.org/D132080
Currently, InstrRefLDV only handles DBG_VALUE instructions, not
DBG_VALUE_LIST, and as a result of this it handles these instructions
using functions that only work for that type of debug value, i.e. using
getOperand(0) to get the debug operand. This patch changes this to use
the generic debug value functions, such as getDebugOperand and
isDebugOffsetImm, as well as adding an IsVariadic field to the
DbgValueProperties class and a few other minor changes to acknowledge
DBG_VALUE_LISTs. Note that this patch does not add support for
DBG_VALUE_LIST here, but is a precursor to other patches that do add
that support.
Differential Revision: https://reviews.llvm.org/D128174
In the InstrRefBasedImpl for LiveDebugValues, we attempt to propagate
debug values through basic blocks in part by checking to see whether all
a variable's incoming debug values to a BB "agree", i.e. whether their
properties match and they refer to the same underlying value.
Prior to this patch, the check for agreement between incoming values
relied on exact equality, which meant that a VPHI and a Def DbgValue
that referred to the same underlying value would be seen as disagreeing.
This patch changes this behaviour to treat them as referring to the same
value, allowing the shared value to propagate into the BB.
Differential Revision: https://reviews.llvm.org/D125953
Following the comment's thread of D117235, I added checks for the widening + splitting case, which also causes a split with one of the resulting vectors to be empty. Due to the same issues described in that same thread, the `fixed-vectors-strided-store.ll` test is missing the widening + splitting case, while the same case in the `strided-vpload.ll` test requires to manually split the loaded vector.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D121784
Test for a case we observed after the initial implementation of D129997
landed, in which case we observed a crash while building the ppc64le
Linux kernel. In that case, we had one block with two exits, both to the
same successor. Removing one of the exits corrupted the
successor/predecessor lists.
So when we have an INLINEASM_BR, check a few things for each indirect
target:
1. that it exists.
2. that it is listed in our successors.
3. that its predecessor list contains the parent MBB of INLINEASM_BR.
This would have caught the regression discovered after D129997 landed,
after the pass that was problematic (early-tailduplication) rather than
getting a stack trace in a later pass (regalloc) that doesn't understand
the anomaly and crashes.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D130290
Registers used for arguments are listed as "live-ins" into the starting
basic block. This means we don't have to go through a potentially
expensive search through all possible argument registers when we only
care about used argument registers.
Differential Revision: https://reviews.llvm.org/D132181
This patch introduces the priority analysis and the priority advisor,
the default implementation, and the scaffolding for introducing the
other implementations of the advisor.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D131220
Extends findMoreOptimalIndexType to allow ISD::BUILD_VECTOR based
indices to be truncated when such truncation is lossless. This can
enable the use of 32bit gather/scatter indices thus making it less
likely to have to split a gather/scatter in two.
Depends on D125194
Differential Revision: https://reviews.llvm.org/D130533
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.
Original Patch by @samparker (Sam Parker)
Differential Revision: https://reviews.llvm.org/D79483
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.
E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.
Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D117723
This is a potentially better alternative to D131452 that also
should avoid the infinite loop bug from:
issue #56403
This is again a minimal fix to reduce merging pain for the
release. But if this makes sense, then we might want to guard
all of the RTLIB generation (and other libcalls?) with a
similar name check.
Differential Revision: https://reviews.llvm.org/D131521
Improve copy statistics:
- Count copies from or to physical registers: They are used to model function parameters and calling conventions and the register allocator optimizes for them.
- Check physical registers assigned to virtual registers and stop counting "identity" `COPY`s where source and destination is the same physical registers; they will be removed in the `virtregmap` pass anyway.
Differential Revision: https://reviews.llvm.org/D131932
The current machine function splitter is reliant on profile data to do profile summary analysis to split blocks into cold section. This may sometimes limit the usage of machine function splitter especially in cases where we could do some form of static analysis to split out cold blocks if profile data is absent or profile data which may be faulty (Consider Sample PGO).
Of all code that could statically be marked cold Exception handling blocks are one of them (In fact BFI framework also tends to mark them as cold), and the most in size contribution. In my experiments I found out Exception handling pads and all code reachable from there account for up to 6-8% of the .text section on modern production binaries. This patch introduces a flag to split out all Exception handling blocks and blocks only reachable from Exceptional Handling pad to cold section. This flag has shown to give a performance win of up to 0.1% in terms of average cycles and instructions executed on internal facebook search service.
Reviewed By: snehasish
Differential Revision: https://reviews.llvm.org/D131824
This reverts commit 8c4aea438c.
Needed because buildbot failures (warnings) gave a clue that there was
a functional bug in the ARM rejection logic.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D132037
This interface allows a target to reject a proposed
SMS schedule. For Hexagon/PowerPC, all schedules
are accepted, leaving behavior unchanged. For ARM,
schedules which exceed register pressure limits are
rejected.
Also, two RegisterPressureTracker methods now need to be public so
that register pressure can be computed by more callers.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D128941
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code. Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.
Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.
So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.
Discovered while trying to sort out related stuff on D102817.
Differential Revision: https://reviews.llvm.org/D124697
This patch fixes an issue where an instruction reading a whole register would be moved during register allocation into a spot where one of the subregisters was dead.
The code to check whether an instruction can be rematerialized at a given point or not was already checking for subranges to ensure that subregisters are live, but only when the instruction being moved was using a subregister, this patch changes that so the subranges are checked even when the moved instruction uses the full register.
This patch also adds a case to the original test for the subrange checking that trigger the issue described above.
The original subrange checking code was introduced in this revision: https://reviews.llvm.org/D115278
And I've encountered this issue on AMDGPUs while working with DPC++: https://github.com/intel/llvm/issues/6209
Essentially the greedy register allocator attempts to move the following instruction:
```
%3961:vreg_64 = V_LSHLREV_B64_e64 3, %3078:vreg_64, implicit $exec
```
From `@3440` into the body of a loop `@16312`, but `%3078` has the following live ranges:
```
%3078 [2224r,2240r:0)[2240r,3488B:1)[16192B,38336B:1) 0@2224r 1@2240r L0000000000000003 [2224r,3440r:0) 0@2224r L000000000000000C [2240r,3488B:0)[16192B,38336B:0) 0@2240r
```
So `@16312e` `%3078.sub1` is alive but `%3078.sub0` is dead, so this instruction being moved there leads to invalid memory accesses as `3078.sub0` ends up being trashed and the result of this instruction is used as part of an address calculation for a load.
On the original ticket this issue showed up on gfx906 and gfx90a but not on gfx908, this turned out to be because on gfx908 instead of moving the shift instruction into the loop, its value is spilled into an ACC register, gfx906 doesn't have ACC registers and for gfx90a ACC registers are used like regular vector registers and so aren't used for spilling.
With this patch the original application from the DPC++ ticket works properly on gfx906, and the result of the shift instruction is correctly spilled instead of moving the instruction in the loop.
Original Author: npmiller
Reviewed by: rampitec
Submitted by: rampitec
Differential Revision: https://reviews.llvm.org/D131884
Currently we treat initializers with init_seg(compiler/lib) as similar
to any other init_seg, they simply have a global variable in the proper
section (".CRT$XCC" for compiler/".CRT$XCL" for lib) and are added to
llvm.used. However, this doesn't match with how LLVM sees normal (or
init_seg(user)) initializers via llvm.global_ctors. This
causes issues like incorrect init_seg(compiler) vs init_seg(user)
ordering due to GlobalOpt evaluating constructors, and the
ability to remove init_seg(compiler/lib) initializers at all.
Currently we use 'A' for priorities less than 200. Use 200 for
init_seg(compiler) (".CRT$XCC") and 400 for init_seg(lib) (".CRT$XCL"),
which do not append the priority to the section name. Priorities
between 200 and 400 use ".CRT$XCC${Priority}". This allows for
some wiggle room for people/future extensions that want to add
initializers between compiler and lib.
Fixes#56922
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D131910
Static variables declared within a routine or lexical block should
be emitted with a non-qualified name. This allows the variables to
be visible to the Visual Studio watch window.
Differential Revision: https://reviews.llvm.org/D131400
When no landing pads exist for a function, `@LPStart` is undefined and must be omitted.
EH table is generally not emitted for functions without landing pads, except when the personality function is uknown (`!isNoOpWithoutInvoke(classifyEHPersonality(Per))`). In that case, we must omit `@LPStart` even when machine function splitting is enabled.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D131626
Currently there is no way to add in development features to the ML
regalloc evict advisor which is useful to have when working on feature
engineering/improving the current model. This patch adds in the ability
to add in development features to the ML regalloc evict advisor which
are gated by a runtime flag and not added in at all if not compiled in
LLVM development mode. This sets the stage for future work where we are
planning on upstreaming some of the newer features that we are currently
experimenting with.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D131209
This is a followup to D131350, which caused another problem for i64
types being split into i32 on i32 targets. This patch tries to make sure
that either Illegal types are OK, or that the element types of a
buildvector are legal and bigger than or equal to the size of the
original elements.
Differential Revision: https://reviews.llvm.org/D131883
This patch really just extends D39946 towards stores as well as loads.
While the patch is in SelectionDAGBuilder, it only applies to AVR (the
only target that supports unaligned atomic operations).
Differential Revision: https://reviews.llvm.org/D128483
The outer signext_inreg is redundant in the following:
Fold (signext_inreg (extract_subvector (zext|anyext|sext iN_value to _) _) from iN)
-> (extract_subvector (signext iN_value to iM))
Tests are precommitted and clone those by analogy from the AND case in
the same file. Add a negative test to check extension width is handled
correctly.
This patch supersedes D130700.
Differential Revision: https://reviews.llvm.org/D131503
These are guaranteed not to create undef/poison (although they may pass through) - the associated ISD::VALUETYPE node is also guaranteed never to generate poison