Use it AArch64 post-legal combiner. These don't always get folded because when
the instructions are created the constants are obscured by artifacts.
Differential Revision: https://reviews.llvm.org/D106776
Dominator trees were previously used for an optimization related to
`wasm.lsda` but the optimization was removed in D97309. Currently
dominators are not doing anything in this pass. Also removes some
`include` lines without which it compiles.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D106811
This fixes an assert firing when compiling code which involves 128 bit
integrals.
This would trigger runtime checks similar to this:
```
Assertion failed: getMinSignedBits() <= 64 && "Too many bits for int64_t", file llvm/include/llvm/ADT/APInt.h, line 1646
```
To get around this, we just saturate those big values.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D105320
During tail duplication, SSA values may be updated and have their uses
replaced with a virtual register, and any debug instructions that use
that value are deleted. This patch fixes the implementation of the debug
instruction deletion to work correctly for debug instructions that use
the SSA value multiple times, by batching deletions so that we don't
attempt to delete the same instruction twice.
Differential Revision: https://reviews.llvm.org/D106557
Late in SelectionDAG we join up instruction numbers with their defining
instructions, if it couldn't be done during the main part of SelectionDAG.
One exception is function arguments, where we have to point a DBG_PHI
instruction at the incoming live register, as they don't have a defining
instruction. This patch adds another exception, for constant physregs, like
aarch64 has.
It may seem wasteful to use two instructions where we could use a single
DBG_VALUE, however the whole point of instruction referencing is to
decouple the identification of values from the specification of where
variable location ranges start.
(Part of my aarch64 work to ease adoption of instruction referencing, as
in the meta comment on D104520)
Differential Revision: https://reviews.llvm.org/D104520
This patch extends support for (scalable-vector) splats in the
DAGCombiner via the `ISD::matchBinaryPredicate` function, which enable a
variety of simple combines of constants.
Users of this function may now have to distinguish between
`BUILD_VECTOR` and `SPLAT_VECTOR` vector operands. The way of dealing
with this in-tree follows the approach added for
`ISD::matchUnaryPredicate` implemented in D94501.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D106575
to encode the constants for DW_AT_data_member_location.
Summary: In DWARF v3, DW_FORM_data4/8 in
DW_AT_data_member_location are interpreted as location
list pointers. Interpreting constants as pointers is
not expected, so we use DW_FORM_udata to encode the
constants.
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D105687
If value tracking can confirm that the cttz/ctlz source is known non-zero then we don't need to create a branch (which DAG will struggle to recover from).
Differential Revision: https://reviews.llvm.org/D106685
I've setup the basic framework for the isGuaranteedNotToBeUndefOrPoison call and updated DAGCombiner::visitFREEZE to use it, further Opcodes can be handled when we have test coverage.
I'm not aware of any vector test freeze coverage so the DemandedElts (and the Depth) args are not being used yet - but they are in place.
SelectionDAG::isGuaranteedNotToBePoison wrappers have also been added.
Differential Revision: https://reviews.llvm.org/D106668
This adds custom lowering for truncating stores when operating on
fixed length vectors in SVE. It also includes a DAG combine to
fold extends followed by truncating stores into non-truncating
stores in order to prevent this pattern appearing once truncating
stores are supported.
Currently truncating stores are not used in certain cases where
the size of the vector is larger than the target vector width.
Differential Revision: https://reviews.llvm.org/D104471
Reland of 31859f896.
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D104797
This is part of a patch series working towards the ability to make
SourceLocation into a 64-bit type to handle larger translation units.
!srcloc is generated in clang codegen, and pulled back out by llvm
functions like AsmPrinter::emitInlineAsm that need to report errors in
the inline asm. From there it goes to LLVMContext::emitError, is
stored in DiagnosticInfoInlineAsm, and ends up back in clang, at
BackendConsumer::InlineAsmDiagHandler(), which reconstitutes a true
clang::SourceLocation from the integer cookie.
Throughout this code path, it's now 64-bit rather than 32, which means
that if SourceLocation is expanded to a 64-bit type, this error report
won't lose half of the data.
The compiler will tolerate both of i32 and i64 !srcloc metadata in
input IR without faulting. Test added in llvm/MC. (The semantic
accuracy of the metadata is another matter, but I don't know of any
situation where that matters: if you're reading an IR file written by
a previous run of clang, you don't have the SourceManager that can
relate those source locations back to the original source files.)
Original version of the patch by Mikhail Maltsev.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D105491
Prior to this patch, it skipped the instruction defining VNI when checking if the tainted lanes are used.
In the given example, VRGATHER is an illegal instruction because its DstReg overlaps with SrcReg.
Therefore we need to check the defining instruction as well when there is an earlyclobber constraint.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D105684
The coalescer does not check if register uses are available
at the point of rematerialization. If it attempts to rematerialize
an instruction with such uses it can end up with use without a def.
LiveRangeEdit does such check during rematerialization, so just
call LiveRangeEdit::allUsesAvailableAt() to avoid the problem.
Differential Revision: https://reviews.llvm.org/D106396
The existing rule about the operand type is strange. Instead, just say
the operand is a TargetConstant with the right width. (Legalization
ignores TargetConstants, so it doesn't matter if that width is legal.)
Highlights:
1. I had to substantially rewrite the AArch64 isel patterns to expect a
TargetConstant. Nothing too exotic, but maybe a little hairy. Maybe
worth considering a target-specific node with some dagcombines instead
of this complicated nest of isel patterns.
2. Our behavior on RV32 for vectors of i64 has changed slightly. In
particular, we correctly preserve the width of the arithmetic through
legalization. This changes the DAG a bit. Maybe room for
improvement here.
3. I explicitly defined the behavior around overflow. This is necessary
to make the DAGCombine transforms legal, and I don't think it causes any
practical issues.
Differential Revision: https://reviews.llvm.org/D105673
This patch allows iterating typed enum via the ADT/Sequence utility.
It also changes the original design to better separate concerns:
- `StrongInt` only deals with safe `intmax_t` operations,
- `SafeIntIterator` presents the iterator and reverse iterator
interface but only deals with safe `StrongInt` internally.
- `iota_range` only deals with `SafeIntIterator` internally.
This design ensures that operations are always valid. In particular,
"Out of bounds" assertions fire when:
- the `value_type` is not representable as an `intmax_t`
- iterator operations make internal computation underflow/overflow
- the internal representation cannot be converted back to `value_type`
Differential Revision: https://reviews.llvm.org/D106279
We have SelectionDAG patterns for 8 & 16-bit atomic operations, but they
assume the value types will have been legalized to 32-bits. So this adds
the ability to widen them to both AArch64 & generic GISel
infrastructure.
In the textual format, `noduplicates` means no COMDAT/section group
deduplication is performed. Therefore, if both sets of sections are retained, and
they happen to define strong external symbols with the same names,
there will be a duplicate definition linker error.
In PE/COFF, the selection kind lowers to `IMAGE_COMDAT_SELECT_NODUPLICATES`.
The name describes the corollary instead of the immediate semantics. The name
can cause confusion to other binary formats (ELF, wasm) which have implemented/
want to implement the "no deduplication" selection kind. Rename it to be clearer.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D106319
ACC registers are a combination of four consecutive vector registers.
If the vector registers are assigned first this often forces a number
of copies to appear just before the ACC register is created. If the ACC
register is assigned first then fewer copies are generated when the vector
registers are assigned.
This patch tries to force the register allocator to assign the ACC registers first
and then the UACC registers and then the vector pair registers. It does this
by changing the priority of the register classes.
This patch also adds hints to help the register allocator assign UACC registers from
known ACC registers and vector pair registers from known UACC registers.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D105854
This patch fixes a clearly-broken function that I absent-mindedly bodged
many months ago.
Over in D85749 I landed the substituteDebugValuesForInst, that creates
substitution records for all the def operands from one debug-labelled
instruction to the new one. Unfortunately it would crash if the two
instructions had different numbers of operands; I tried to fix this in
537f0fbe82 by adding a "max operand" parameter to the method, but then
didn't actually change the loop bound to take account of this. It passed
all the tests because.... well there wasn't any real test coverage of this
method.
This patch fixes up the loop to be bounded by the MaxOperand bound; and
adds test coverage for the x86-fixup-LEAs calls to this method, so that
it's actually tested.
Differential Revision: https://reviews.llvm.org/D105820
Although this combine checks that there's no load folding barriers between
the loads that it's trying to merge, it was inserting the load at the
MIRBuilder's default insertion point, which is the G_OR use inst.
This was causing a miscompile in the test suite's
SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-bswap-2
Differential Revision: https://reviews.llvm.org/D106251
RISCV would prefer a sign extended constant since that works better
with our constant materialization. We have an existing TLI hook we
use to control sign extension of setcc operands in type legalization.
That hook happens to do the right check we need here, but might be
straying from its original purpose. With only RISCV defining this
hook in tree, I wasn't sure if it was worth adding another hook
with identical behavior.
This is an alternative to D105785 where I tried to handle this in
the RISCV backend by not creating ANY_EXTENDs in some places.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D105918
This reverts commit 2a419a0b99.
The result of a shufflevector must not propagate poison from any element
other than the one noted in the shuffle mask.
The regressions outside of fptoui-may-overflow.ll can probably be
recovered some other way; for example, using isGuaranteedNotToBePoison.
See discussion on https://reviews.llvm.org/D106053 for more background.
Differential Revision: https://reviews.llvm.org/D106222
I'm going to extend the functionality started in D106058 so move the folds into their own method to reduce the amount of code in DAGCombiner::visitSELECT
llvm::KnownBits::byteSwap() and reverse() don't modify in-place, so
we weren't actually computing anything. This was causing a miscompile on an
arm64 stage2 bootstrap clang build.
s56 stores are broken down into s32 + s24 stores. During this step
both of those new stores use an anyextended s64 value, resulting in
truncating stores. With s56, the s24 requires another lower step to
make it legal, and we were crashing because we didn't expect non-pow-2
stores to also be truncating as well.
Differential Revision: https://reviews.llvm.org/D106183
This patch transforms the sequence
lea (reg1, reg2), reg3
sub reg3, reg4
to two sub instructions
sub reg1, reg4
sub reg2, reg4
Similar optimization can also be applied to LEA/ADD sequence.
The modifications to TwoAddressInstructionPass is to ensure the operands of ADD
instruction has expected order (the dest register of LEA should be src register
of ADD).
Differential Revision: https://reviews.llvm.org/D104684
Add an assertion that we've calling MaskedElementsAreZero with a vector op and that the DemandedElts arg is a matching width.
Makes the error a lot easier to grok when something else accidentally gets used.
If you attach __attribute__((optnone)) to a function when using
optimisations, that function will use fast-isel instead of the usual
SelectionDAG method. This is a problem for instruction referencing,
because it means DBG_VALUEs of virtual registers will be created,
triggering some safety assertions in LiveDebugVariables. Those assertions
exist to detect exactly this scenario, where an unexpected piece of code is
generating virtual register references in instruction referencing mode.
Fix this by transforming the DBG_VALUEs created by fast-isel into
half-formed DBG_INSTR_REFs, after which they get patched up in
finalizeDebugInstrRefs. The test modified adds a fast-isel mode to the
instruction referencing isel test.
Differential Revision: https://reviews.llvm.org/D105694
Since we're still building on top of the MVT based infrastructure, we
need to track the pointer type/address space on the side so we can end
up with the correct pointer LLTs when interpreting CCValAssigns.
This adds some level of type safety, allows helper functions to be added for
specific opcodes for free, and also allows us to succinctly check for class
membership with the usual dyn_cast/isa/cast functions.
To start off with, add variants for the different load/store operations with some
places using it.
Differential Revision: https://reviews.llvm.org/D105751
Similar to the folds performed in InstCombinerImpl::foldSelectOpOp, this attempts to push a select further up to help merge a pair of binops.
I'm primarily interested in select(cond,add(x,y),add(x,z)) folds to help expose pointer math (see https://bugs.llvm.org/show_bug.cgi?id=51069 etc.) but I've tried to use the more generic isBinOp().
Differential Revision: https://reviews.llvm.org/D106058
The linker can sometimes drop the do_not_dead_strip if it can't associate the
atom with a symbol (the other place to specify no dead-stripping in MachO
files).
This patch adds the forward scan for finding redundant DBG_VALUEs.
This analysis aims to remove redundant DBG_VALUEs by going forward
in the basic block by considering the first DBG_VALUE as a valid
until its first (location) operand is not clobbered/modified.
For example:
(1) DBG_VALUE $edi, !"var1", ...
(2) <block of code that does affect $edi>
(3) DBG_VALUE $edi, !"var1", ...
...
in this case, we can remove (3).
Differential Revision: https://reviews.llvm.org/D105280
This patch uses AtomicExpandPass to implement quadword lock free atomic operations. It adopts the method introduced in https://reviews.llvm.org/D47882, which expand atomic operations post RA to avoid spilling that might prevent LL/SC progress.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D103614
Any def of EXEC prevents rematerialization of any VOP instruction
because of the physreg use. Create a callback to check if the
physreg use can be ingored to allow rematerialization.
Differential Revision: https://reviews.llvm.org/D105836
This is mostly a minor convenience, but the pattern seems frequent
enough to be worthwhile (and we'll probably add more uses in the
future).
Differential Revision: https://reviews.llvm.org/D105850
This new MIR pass removes redundant DBG_VALUEs.
After the register allocator is done, more precisely, after
the Virtual Register Rewriter, we end up having duplicated
DBG_VALUEs, since some virtual registers are being rewritten
into the same physical register as some of existing DBG_VALUEs.
Each DBG_VALUE should indicate (at least before the LiveDebugValues)
variables assignment, but it is being clobbered for function
parameters during the SelectionDAG since it generates new DBG_VALUEs
after COPY instructions, even though the parameter has no assignment.
For example, if we had a DBG_VALUE $regX as an entry debug value
representing the parameter, and a COPY and after the COPY,
DBG_VALUE $virt_reg, and after the virtregrewrite the $virt_reg gets
rewritten into $regX, we'd end up having redundant DBG_VALUE.
This breaks the definition of the DBG_VALUE since some analysis passes
might be built on top of that premise..., and this patch tries to fix
the MIR with the respect to that.
This first patch performs bacward scan, by trying to detect a sequence of
consecutive DBG_VALUEs, and to remove all DBG_VALUEs describing one
variable but the last one:
For example:
(1) DBG_VALUE $edi, !"var1", ...
(2) DBG_VALUE $esi, !"var2", ...
(3) DBG_VALUE $edi, !"var1", ...
...
in this case, we can remove (1).
By combining the forward scan that will be introduced in the next patch
(from this stack), by inspecting the statistics, the RemoveRedundantDebugValues
removes 15032 instructions by using gdb-7.11 as a testbed.
Differential Revision: https://reviews.llvm.org/D105279
Currently we are resolving lane/subregister conflict by visiting
instructions sequentially in current block to see whether there is any
use of the tainted lanes. To save compile time, we are not doing further
check in successor blocks. This sounds reasonable without subgregister liveness.
But since we have added subregister liveness tracking capability to
register coalescer, we can easily determine whether we have subregister
liveness conflict by checking subranges. This would help coalescing more
COPYs for target that enables subregister liveness tracking.
Reviewed by: arsenm, qcolombet
Differential Revision: https://reviews.llvm.org/D104509
Previously we reliedy on pseudo probe descriptors to look up precomputed GUID during probe emission for inlined probes. Since we are moving to always using unique linkage names, GUID for functions can be computed in place from dwarf names. This eliminates the need of importing pseudo probe descs in thinlto, since those descs should be emitted by the original modules.
This significantly reduces thinlto memory footprint in some extreme case where the number of imported modules for a single module is massive.
Test Plan:
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D105248
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.
Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run. This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.
This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.
In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.
One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.
Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.
This fixes not respecting signext/zeroext in these cases. In the
anyext case, this avoids a larger merge with undef and should be a
better canonical form.
This should also handle this if a merge is needed, but I'm not aware
of a case where that can happen. In a future change this will also
allow AMDGPU to drop some custom code without introducing regressions.
Generalize the existing eq/ne case using `extractParts`. The original code only
handled narrowings for types of width 2n->n. This generalization allows for any
type that can be broken down by `extractParts`.
General overview is:
- Loop over each narrow-sized part and do exactly what the 2-register case did.
- Loop over the leftover-sized parts and do the same thing
- Widen the leftover-sized XOR results to the desired narrow size
- OR that all together and then do the comparison against 0 (just like the old
code)
This shows up a lot when building clang for AArch64 using GlobalISel, so it's
worth fixing. For the sake of simplicity, this doesn't handle the non-eq/ne
case yet.
Also remove the code in this case that notifies the observer; we're just going
to delete MI anyway so talking to the observer shouldn't be necessary.
Differential Revision: https://reviews.llvm.org/D105161
Let other parts of legalization handle the rest of the node, this allows
re-use of existing optimizations elsewhere.
Differential Revision: https://reviews.llvm.org/D105624
This adds custom lowering for truncating stores when operating on
fixed length vectors in SVE. It also includes a DAG combine to
fold extends followed by truncating stores into non-truncating
stores in order to prevent this pattern appearing once truncating
stores are supported.
Currently truncating stores are not used in certain cases where
the size of the vector is larger than the target vector width.
Differential Revision: https://reviews.llvm.org/D104471
The test case here hits machine verifier problems. There are volatile
long loads that the results of do not get used, loading into two dead
registers. IfCvt will predicate them and as it does will add implicit
uses of the predicating registers due to thinking they are live in. As
nothing has used the register, the machine verifier disagrees that they
are really live and we end up with a failure.
The registers come from Pristine regs that LivePhysRegs counts as live.
This patch adds a addLiveInsNoPristines method to be used instead in
IfCvt, so that only really live in regs need to be added as implicit
operands.
Differential Revision: https://reviews.llvm.org/D90965
The original motivation for this was to implement moreElementsVector of shuffles
on AArch64, which resulted in complex sequences of artifacts like unmerge(unmerge(concat...))
which the combiner couldn't handle. It seemed here that the better option,
instead of writing ever-more-complex combines, was to have a way to find
the original "non-artifact" source registers for a given definition, walking
through arbitrary expressions of unmerge/concat/insert. As long as the bits
aren't extended or truncated, this is a pretty simple algorithm that avoids
the need for lots of combines and instead jumps straight to the final result
we want.
I've only used this new technique in 2 places within tryCombineUnmerge, using it
in more general situations resulted in infinite loops in AMDGPU. So for now
it's used when we would otherwise fail to combine and that seems to work.
In order to support looking through G_INSERTs, I also had to add it as an
artifact in isArtifact(), which caused a whole lot of issues in tests. AMDGPU
started infinite looping since full legalization of G_INSERT doensn't seem to
be there. To work around this, I've temporarily added a CLI option to use the
old behaviour so that the MIR tests will still run and terminate.
Other minor changes include no longer making >128b G_MERGE/UNMERGE legal.
We never had isel support for that anyway and it was a remnant of the legacy
legalizer rules. However being legal prevented the combiner from checking if it
was dead and deleting them.
Differential Revision: https://reviews.llvm.org/D104355
`LegalizerHelper::insertParts` uses `extractGCDType` on registers split into
a desired type and a smaller leftover type. This is used to populate a list
of registers. Each register in the list will have the same type as returned by
`extractGCDType`.
If we have
- `ResultTy` = s792
- `PartTy` = s64
- `LeftoverTy` = s24
When we call `extractGCDType`, we'll end up with two different types appended
to the list:
Part: gcd(792, 64, 24) => s8
Leftover: gcd(792, 24, 24) => s24
When this happens, we'll hit an assert while trying to build a G_MERGE_VALUES.
This patch changes the code for the leftover type so that we reuse the GCD from
the desired type.
e.g.
Leftover: gcd(792, 8, 24) => s8
https://llvm.godbolt.org/z/137Kqxj6j
Differential Revision: https://reviews.llvm.org/D105674
This to protect against non-sensical instruction sequences being assembled,
which would either cause asserts/crashes further down, or a Wasm module being output that doesn't validate.
Unlike a validator, this type checker is able to give type-errors as part of the parsing process, which makes the assembler much friendlier to be used by humans writing manual input.
Because the MC system is single pass (instructions aren't even stored in MC format, they are directly output) the type checker has to be single pass as well, which means that from now on .globaltype and .functype decls must come before their use. An extra pass is added to Codegen to collect information for this purpose, since AsmPrinter is normally single pass / streaming as well, and would otherwise generate this information on the fly.
A `-no-type-check` flag was added to llvm-mc (and any other tools that take asm input) that surpresses type errors, as a quick escape hatch for tests that were not intended to be type correct.
This is a first version of the type checker that ignores control flow, i.e. it checks that types are correct along the linear path, but not the branch path. This will still catch most errors. Branch checking could be added in the future.
Differential Revision: https://reviews.llvm.org/D104945
LLVM provides target hooks to recognise stack spill and restore
instructions, such as isLoadFromStackSlot, and it also provides post frame
elimination versions such as isLoadFromStackSlotPostFE. These are supposed
to return the store-source and load-destination registers; unfortunately on
X86, the PostFE recognisers just return "1", apparently to signify "yes
it's a spill/load". This patch alters the hooks to correctly return the
store-source and load-destination registers:
This is really useful for debug-info as we it helps follow variable values
as they move on/off the stack. There should be no codegen changes: the only
other users of these PostFE target hooks are MachineInstr::getRestoreSize
and MachineInstr::getSpillSize, which don't attempt to interpret the
returned register location.
While we're here, delete the (InstrRef) LiveDebugValues heuristic that
tries to find the spill source register by looking for a killed reg -- we
should be able to rely on the target hooks for that. This involves
temporarily turning off a n InstrRef LivedDebugValues test on aarch64
(patch to re-enable it is in D104521).
Differential Revision: https://reviews.llvm.org/D105428
We keep a record of substitutions between debug value numbers post-isel,
however we never actually look them up until the end of compilation. As a
result, there's nothing gained by the collection being a std::map. This
patch downgrades it to being a vector, that's then sorted at the end of
compilation in LiveDebugValues.
Differential Revision: https://reviews.llvm.org/D105029
C++23 will make these conversions ambiguous - so fix them to make the
codebase forward-compatible with C++23 (& a follow-up change I've made
will make this ambiguous/invalid even in <C++23 so we don't regress
this & it generally improves the code anyway)
SelectionDAG's equivalents in ISD::InputArg/OutputArg track the
original argument index. Mips relies on this, and its currently
reinventing its own parallel CallLowering infrastructure which tracks
these indexes on the side. Add this to help move towards deleting the
custom mips handling.
This is a cleanup patch -- we're now able to support all flavours of
variable location in instruction referencing mode. This patch updates
various tests for debug instructions to be broader: numerous code paths
try to ignore debug isntructions, and they now have to ignore the
additional DBG_PHI and DBG_INSTR_REFs that we can generate.
A small amount of rework happens for LiveDebugVariables: as we don't need
to track live intervals through regalloc any more, we can get away with
unlinking debug instructions before regalloc, then re-inserting them after.
Note that this isn't (yet) true of DBG_VALUE_LISTs, they still have to go
through live interval tracking.
In SelectionDAG, add a helper lambda that emits half-formed DBG_INSTR_REFs
for arguments in instr-ref mode, DBG_VALUE otherwise. This is one of the
final locations where DBG_VALUEs are emitted for vreg arguments.
X86InstrInfo now un-sets the debug instr number on SUB instructions that
get mutated into CMP instructions. As the instruction no longer computes a
subtraction, we can't use it for variable locations.
Differential Revision: https://reviews.llvm.org/D88898
This patch prevents GlobalISel from optimizing out redundant branch
instructions when compiling without optimizations.
The motivating example is code like the following common pattern in
Swift, where users expect to be able to set a breakpoint on the early
exit:
public func f(b: Bool) {
guard b else {
return // I would like to set a breakpoint here.
}
...
}
The patch modifies two places in GlobalISEL: The first one is in
IRTranslator.cpp where the removal of redundant branches is made
conditional on the optimization level. The second one is in
AArch64InstructionSelector.cpp where an -O0 *only* optimization is
being removed.
Disabling these optimizations increases code size at -O0 by
~8%. However, doing so improves debuggability, and debug builds are
the primary reason why developers compile without optimizations. We
thus concluded that this is the right trade-off.
rdar://79515454
Differential Revision: https://reviews.llvm.org/D105238
We already have reassociation code for Adds and Ors separately in DAG
combiner, this adds it for the combination of the two where Ors act like
Adds. It reassociates (add (or (x, c), y) -> (add (add (x, y), c)) where
we know that the Ors operands have no common bits set, and the Or has
one use.
Differential Revision: https://reviews.llvm.org/D104765
This patch emits DBG_INSTR_REFs for two remaining flavours of variable
locations that weren't supported: copies, and inter-block VRegs. There are
still some locations that must be represented by DBG_VALUE such as
constants, but they're mostly independent of optimisations.
For variable locations that refer to values defined in different blocks,
vregs are allocated before isel begins, but the defining instruction
might not exist until late in isel. To get around this, emit
DBG_INSTR_REFs in a "half done" state, where the first operand refers to a
VReg. Then at the end of isel, patch these back up to refer to
instructions, using the finalizeDebugInstrRefs method.
Copies are something that I complained about the original RFC, and I
really don't want to have to put instruction numbers on copies. They don't
define a value: they move them. To address this isel, salvageCopySSA
interprets:
* COPYs,
* SUBREG_TO_REG,
* Anything that isCopyInstr thinks is a copy.
And follows chains of copies back to the defining instruction that they
read from. This relies on any physical registers that COPYs read being
defined in the same block, or being entry-block arguments. For the former
we can put an instruction number on the defining instruction; for the
latter we can drop a DBG_PHI that reads the incoming value.
Differential Revision: https://reviews.llvm.org/D88896
This patch fixes an issue which occurred in CodeGenPrepare and
HWAddressSanitizer, which both at some point create a map of Old->New
instructions and update dbg.value uses of these. They did this by
iterating over the dbg.value's location operands, and if an instance of
the old instruction was found, replaceVariableLocationOp would be
called on that dbg.value. This would cause an error if the same operand
appeared multiple times as a location operand, as the first call to
replaceVariableLocationOp would update all uses of the old instruction,
invalidating the old iterator and eventually hitting an assertion.
This has been fixed by no longer iterating over the dbg.value's location
operands directly, but by first collecting them into a set and then
iterating over that, ensuring that we never attempt to replace a
duplicated operand multiple times.
Differential Revision: https://reviews.llvm.org/D105129
This reverts commit 8cd35ad854.
It breaks `TestMembersAndLocalsWithSameName.py` on GreenDragon and
Mikael Holmén points out in D104827 that bitcode files created with the
patch cannot be parsed with binaries built before it.
We're trying to match a few pointer computation patterns here for
re-association opportunities.
1) Isolating a constant operand to be on the RHS, e.g.:
G_PTR_ADD(BASE, G_ADD(X, C)) -> G_PTR_ADD(G_PTR_ADD(BASE, X), C)
2) Folding two constants in each sub-tree as long as such folding
doesn't break a legal addressing mode.
G_PTR_ADD(G_PTR_ADD(BASE, C1), C2) -> G_PTR_ADD(BASE, C1+C2)
AArch64 code size improvements on CTMark with -Os:
Program before after diff
pairlocalalign 251048 251044 -0.0%
consumer-typeset 421820 421812 -0.0%
kc 431348 431320 -0.0%
SPASS 413404 413300 -0.0%
clamscan 384396 384220 -0.0%
tramp3d-v4 370640 370412 -0.1%
lencod 432096 431772 -0.1%
bullet 479400 478796 -0.1%
sqlite3 288504 288072 -0.1%
7zip-benchmark 573796 570768 -0.5%
Geomean difference -0.1%
Differential Revision: https://reviews.llvm.org/D105069
Add a flag so that target can choose to use AsmParser for parsing inline asm.
And set the flag by default for AIX.
-no-intergrated-as will override this default if specified explicitly.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D105314
Fixes bugs [[ https://bugs.llvm.org/show_bug.cgi?id=50580 | 50580 ]] and [[ https://bugs.llvm.org/show_bug.cgi?id=49446 | 49446 ]]
When compiling with -g "DBG_VALUE <reg>" instructions are added in the MIR, if such a instruction is inserted between instructions that use <reg> then MachineCopyPropagation invalidates <reg> , this causes some copies to not be propagated and causes differences in code generation (ex bugs 50580 and 49446 ). DBG_VALUE instructions should be ignored since they don't actually modify the register.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D104394
Reland of 31859f896.
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.
Differential Revision: https://reviews.llvm.org/D104797
Previously we used the vector type, but we're loading/storing
invididual elements so I think only element alignment should matter.
Noticed while looking at the code for something else so I don't
have a test case.
Differential Revision: https://reviews.llvm.org/D105220
In `IRTranslator::translateGetElementPtr`, when we run into a vector gep with
some scalar operands, we try to normalize those operands using
`buildSplatVector`.
This is fine except for when the getelementptr has a <1 x N> type. In that case
it is treated as a scalar. If we run into one of these then every call to
```
// With VectorWidth = 1
LLT::fixed_vector(VectorWidth, PtrTy)
```
will assert.
Here's an example (equivalent to the added testcase):
https://godbolt.org/z/hGsTnMYdW
To get around this, this patch adds a variable, `WantSplatVector`, which
is true when our vector type ought to actually be represented using a vector.
When it's false, we'll translate as a scalar. This checks if `VectorWidth > 1`.
This fixes this bug:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35496
Differential Revision: https://reviews.llvm.org/D105316
Inserting into a smaller-than-legal scalable vector would result in an
internal compiler error. For example, inserting a <vscale x 4 x i8> into
a <vscale x 8 x i8> (both illegal vector types for SVE) would cause a
crash.
This crash was happening because there was no code to promote (legalise)
the result of an INSERT_SUBVECTOR node.
This patch implements PromoteIntRes_INSERT_SUBVECTOR, which legalises
the ISD node. This is currently done by going through memory. This is
necessary because of the requirement that the SubVec parameter of the
INSERT_SUBVECTOR node must be smaller than the Vec parameter, which
means that INSERT_SUBVECTOR cannot always have a legal result/operand
types.
Co-Authored-by: Joe Ellis <joe.ellis@arm.com>
Differential Revision: https://reviews.llvm.org/D102766
Added in 47c3fe2a22, we sometimes need to describe a variable value
substitution with a subregister qualifier, to say that "the value is the
lower 32 bits of this 64 bit register def" for example. That then needs
support during LiveDebugValues to interpret the subregister qualifiers,
which is what this patch adds.
Whenever we encounter a DBG_INSTR_REF and find its value by using a
substitution, collect any subregister qualifiers seen. Then, accumulate the
effects of the qualifiers to work out what offset and what size should be
extracted from the defined register. Finally, for the target ValueIDNum,
extract whatever subregister is in the correct position
Currently, describing a subregister field of a larger value that has been
spilt to the stack, is unimplemented.
Differential Revision: https://reviews.llvm.org/D88894
Since gather lowering can now lower to nodes that may need expansion via
the vector legalizer, do MGATHER lowering via vector legalizer.
Additionally, as part of adding passthru support for fixed typed
gathers, fix passthru support for scalable types.
Depends on D104910
Differential Revision: https://reviews.llvm.org/D104217
Very late in compilation, backends like X86 will perform optimisations like
this:
$cx = MOV16rm $rax, ...
->
$rcx = MOV64rm $rax, ...
Widening the load from 16 bits to 64 bits. SEeing how the lower 16 bits
remain the same, this doesn't affect execution. However, any debug
instruction reference to the defined operand now refers to a 64 bit value,
nto a 16 bit one, which might be unexpected. Elsewhere in codegen, there's
often this pattern:
CALL64pcrel32 @foo, implicit-def $rax
%0:gr64 = COPY $rax
%1:gr32 = COPY %0.sub_32bit
Where we want to refer to the definition of $eax by the call, but don't
want to refer the copies (they don't define values in the way
LiveDebugValues sees it). To solve this, add a subregister field to the
existing "substitutions" facility, so that we can describe a field within
a larger value definition. I would imagine that this would be used most
often when a value is widened, and we need to refer to the original,
narrower definition.
Differential Revision: https://reviews.llvm.org/D88891
This patch changes return type of tryCandidate from void to bool:
1. Methods in some targets already follow this convention.
2. This would help if some target wants to re-use generic code.
3. It looks more intuitive if these try-method returns the same type.
We may need to change return type of them from bool to some enum
further, to make it less confusing.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D103951
This patch adds support to the instruction-referencing LiveDebugValues
implementation for emitting entry values. The instruction referencing
implementations tracking by value rather than location means that we can
get around two of the issues with VarLocs. DBG_VALUE instructions that
re-assign the same value to a variable are no longer a problem, because we
can "see through" to the value being assigned. We also don't need to do
anything special during the dataflow stages: the "variable value problem"
doesn't need to know whether a value is available most of the time, and the
times it deoes need to know are always when entry values need to be
terminated.
The patch modifies the "TransferTracker" class, adding methods to identify
when a variable ias an entry value candidate, and when a machine value is
an entry value. recoverAsEntryValue tests these two things and emits an
entry-value expression if they're true. It's used when we clobber or
otherwise lose a value and can't find a replacement location for the value
it contained.
Differential Revision: https://reviews.llvm.org/D88406
This enables proper lowering of non-byte sized loads. We still aren't
faithfully preserving memory types everywhere, so the legality checks
still only consider the size.
Previously we didn't preserve the memory type and had to blindly
interpret a number of bytes. Now that non-byte memory accesses are
representable, we can handle these correctly.
Ported from DAG version (minus some weird special case i1 legality
checking which I don't fully understand, and we don't have a way to
query for)
For now, this is NFC and the test changes are placeholders. Since the
legality queries are still relying on byte-flattened memory sizes, the
legalizer can't actually see these non-byte accesses. This keeps this
change self contained without merging it with the larger patch to
switch to LLT memory queries.
This will currently accept the old number of bytes syntax, and convert
it to a scalar. This should be removed in the near future (I think I
converted all of the tests already, but likely missed a few).
Not sure what the exact syntax and policy should be. We can continue
printing the number of bytes for non-generic instructions to avoid
test churn and only allow non-scalar types for generic instructions.
This will currently print the LLT in parentheses, but accept parsing
the existing integers and implicitly converting to scalar. The
parentheses are a bit ugly, but the parser logic seems unable to deal
without either parentheses or some keyword to indicate the start of a
type.
In various circumstances, when we clobber a register there may be
alternative locations that the value is live in. The classic example would
be a value loaded from the stack, and then clobbered: the value is still
available on the stack. InstrRefBasedLDV was coping with this at block
starts where it's forced to pick a location, however it wasn't searching
for alternative locations when values were clobbered.
This patch notifies the "Transfer Tracker" object when clobbers occur, and
it's able to find alternatives and issue DBG_VALUEs for that location. See:
the added test.
Differential Revision: https://reviews.llvm.org/D88405
When clamping the index for a memory access to a stacked vector we must
take into account the entire type being accessed, not just assume that
we are accessing only a single element.
Differential Revision: https://reviews.llvm.org/D105016
GlobalISel is relying on regular MachineMemOperands to track all of
the memory properties of accesses. Just the raw byte size is
insufficent to disambiguate all situations. For example, if we need to
split an unaligned extending load, we need to know the number of bits
in the original source value and can't infer it from the result
type. This is also a problem for extending vector loads.
This does decrease the maximum representable size from the full
uint64_t bytes to a maximum of 16-bits. No in tree testcases hit this,
other than places using UINT64_MAX for unknown sizes. This may be an
issue for G_MEMCPY and co., although they can just use unknown size
for large static sizes. This also has potential for backend abuse by
relying on the type when it really shouldn't be relevant after
selection.
This does not include the necessary MIR printer/parser changes to
represent this.
We were trying to expand these if they were going to be expanded
in op legalization so that we generated the minimum number of
operations. We failed to take into account that NVT could be
promoted to another legal type in op legalization.
Hoping this fixes the issue on the VE target reported as a follow
up to D96681. The check line changes were taken from before
1e46b6f401 so this patch does
appear to improve some cases that had previously regressed.
This patch reads machine value numbers from DBG_PHI instructions (marking
where SSA PHIs used to be), and matches them up with DBG_INSTR_REF
instructions that refer to them. Essentially they are two separate parts of
a DBG_VALUE: the place to read the value (register and program position),
and where the variable is assigned that value.
Sometimes these DBG_PHIs can be duplicated, usually by tail duplication.
This corresponds to the SSA structure of the program being destroyed, and
the original PHI being split. When this happens: run LLVMs standard
SSAUpdater utility, to work out what values should appear in which blocks.
The majority of this patch is boilerplate to make use of SSAUpdater.
If there are any additional PHIs on the path between multiple DBG_PHIs and
their using DBG_INSTR_REF, their existance is validated, just in case a
value gets clobbered along the way (see dbg-phis-with-loops.mir for
several examples).
Differential Revision: https://reviews.llvm.org/D86814
- Add standalone metadata parsing support so that machine metadata nodes
could be populated before and accessed during MIR is parsed.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D103282
Add UNIQUED and DISTINCT properties in Metadata.def and use them to
implement restrictions on the `distinct` property of MDNodes:
* DIExpression can currently be parsed from IR or read from bitcode
as `distinct`, but this property is silently dropped when printing
to IR. This causes accepted IR to fail to round-trip. As DIExpression
appears inline at each use in the canonical form of IR, it cannot
actually be `distinct` anyway, as there is no syntax to describe it.
* Similarly, DIArgList is conceptually always uniqued. It is currently
restricted to only appearing in contexts where there is no syntax for
`distinct`, but for consistency it is treated equivalently to
DIExpression in this patch.
* DICompileUnit is already restricted to always being `distinct`, but
along with adding general support for the inverse restriction I went
ahead and described this in Metadata.def and updated the parser to be
general. Future nodes which have this restriction can share this
support.
The new UNIQUED property applies to DIExpression and DIArgList, and
forbids them to be `distinct`. It also implies they are canonically
printed inline at each use, rather than via MDNode ID.
The new DISTINCT property applies to DICompileUnit, and requires it to
be `distinct`.
A potential alternative change is to forbid the non-inline syntax for
DIExpression entirely, as is done with DIArgList implicitly by requiring
it appear in the context of a function. For example, we would forbid:
!named = !{!0}
!0 = !DIExpression()
Instead we would only accept the equivalent inlined version:
!named = !{!DIExpression()}
This essentially removes the ability to create a `distinct` DIExpression
by construction, as there is no syntax for `distinct` inline. If this
patch is accepted as-is, the result would be that the non-canonical
version is accepted, but the following would be an error and produce a diagnostic:
!named = !{!0}
; error: 'distinct' not allowed for !DIExpression()
!0 = distinct !DIExpression()
Also update some documentation to consistently use the inline syntax for
DIExpression, and to describe the restrictions on `distinct` for nodes
where applicable.
Reviewed By: StephenTozer, t-tye
Differential Revision: https://reviews.llvm.org/D104827
This intrinsic blocks floating point transformations by the optimizer.
Author: Pengfei
Reviewed By: LuoYuanke, Andy Kaylor, Craig Topper, kpn
Differential Revision: https://reviews.llvm.org/D99675
This patch relands https://reviews.llvm.org/D104454, but fixes some failing
builds on Mac OS which apparently has a different definition for size_t,
that caused 'ambiguous operator overload' for the implicit conversion
of TypeSize to a scalar value.
This reverts commit b732e6c9a8.
Peephole optimizer should not be introducing sub-reg definitions
as they are illegal in machine SSA phase. This patch modifies
the optimizer to not emit sub-register definitions.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D103408
Adds legalizer, register bank select, and instruction
select support for G_SBFX and G_UBFX. These opcodes generate
scalar or vector ALU bitfield extract instructions for
AMDGPU. The instructions allow both constant or register
values for the offset and width operands.
The 32-bit scalar version is expanded to a sequence that
combines the offset and width into a single register.
There are no 64-bit vgpr bitfield extract instructions, so the
operations are expanded to a sequence of instructions that
implement the operation. If the width is a constant,
then the 32-bit bitfield extract instructions are used.
Moved the AArch64 specific code for creating G_SBFX to
CombinerHelper.cpp so that it can be used by other targets.
Only bitfield extracts with constant offset and width values
are handled currently.
Differential Revision: https://reviews.llvm.org/D100149
A combination of features ^ that lead to a mismatch of expectations
about how a subprogram definition DIE would be produced with/without a
declaration when taking full -g debug info and inlining it into a -gmlt
CU - specifically when using Split DWARF that doesn't support cross-CU
references, so we have to put the -g debug info into the -gmlt CU, which
gets confusing about which mode is respected.
This patch comes down on respecting the CU the debug info is emitted
into, rather than preserving the full debug info when it's emitted into
the gmlt CU.
This ports the AArch64 SABD and USBD over to DAG Combine, where they can be
used by more backends (notably MVE in a follow-up patch). The matching code
has changed very little, just to handle legal operations and types
differently. It selects from (ABS (SUB (EXTEND a), (EXTEND b))), producing
a ubds/abdu which is zexted to the original type.
Differential Revision: https://reviews.llvm.org/D91937
This add as a fold of sub(0, splat(sub(0, x))) -> splat(x). This can
come up in the lowering of right shifts under AArch64, where we generate
a shift left of a negated number.
Differential Revision: https://reviews.llvm.org/D103755
This change is NFC upstream. We pass in the loop's block to the kernel
rewriter explicitly, instead of assuming it's the loop's top block. This
change is made for downstream targets where this assumption doesn't hold.
Differential Revision: https://reviews.llvm.org/D104811
To reflect that the size may be scalable, a TypeSize is returned
instead of an unsigned. In places where the result is used,
it currently relies on an implicit cast of TypeSize -> uint64_t,
which asserts that the type is not scalable.
This patch is NFC for fixed-width vectors.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D104454
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
We don't constant fold based on demanded bits elsewhere in
SimplifyDemandedBits, so I don't think we should shrink them either.
The affected ARM test changes because a constant become non-opaque
and eventually enabled some constant folding. This no longer happens.
I checked and InstCombine is able to simplify this test. I'm not sure exactly
what it was trying to test.
Reviewed By: lebedev.ri, dmgreen
Differential Revision: https://reviews.llvm.org/D104832
This also adds new interfaces for the fixed- and scalable case:
* LLT::fixed_vector
* LLT::scalable_vector
The strategy for migrating to the new interfaces was as follows:
* If the new LLT is a (modified) clone of another LLT, taking the
same number of elements, then use LLT::vector(OtherTy.getElementCount())
or if the number of elements is halfed/doubled, it uses .divideCoefficientBy(2)
or operator*. That is because there is no reason to specifically restrict
the types to 'fixed_vector'.
* If the algorithm works on the number of elements (as unsigned), then
just use fixed_vector. This will need to be fixed up in the future when
modifying the algorithm to also work for scalable vectors, and will need
then need additional tests to confirm the behaviour works the same for
scalable vectors.
* If the test used the '/*Scalable=*/true` flag of LLT::vector, then
this is replaced by LLT::scalable_vector.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D104451
This is a partial reapply of the original commit and the followup commit
that were previously reverted; this reapply also includes a small fix
for a potential source of non-determinism, but also has a small change
to turn off variadic debug value salvaging, to ensure that any future
revert/reapply steps to disable and renable this feature do not risk
causing conflicts.
Differential Revision: https://reviews.llvm.org/D91722
This reverts commit 386b66b2fc.
Having type symmetry with these is somewhat necessary when implementing support for 192-bit values.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D104621
Stats added:
1. NumCleanupLandingPadsUnreachable: how many cleanup landing pads were optimized as unreachable
1. NumCleanupLandingPadsRemaining: how many cleanup landing pads remain
1. NumNoUnwind: Number of functions with nounwind attribute
1. NumUnwind: Number of functions with unwind attribute
DwarfEHPrepare is always run a single time as part of `TargetPassConfig::addISelPasses()` which makes it an ideal place near the end of the pipeline to record this information.
Example output from clang built with exceptions cumulative during thinLTO backend (NumCleanupLandingPadsUnreachable was not incremented):
"dwarfehprepare.NumCleanupLandingPadsRemaining": 123660,
"dwarfehprepare.NumNoUnwind": 323836,
"dwarfehprepare.NumUnwind": 472893,
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104161
This optimization pre-promotes the input and constants for a
switch instruction to a legal type so that all the generated compares
share the same extend. Since RISCV prefers sext for i32 to i64
extends, we should honor that to use sext.w instead of a pair
of shifts.
Reviewed By: jrtc27
Differential Revision: https://reviews.llvm.org/D104612
When inserting UnregisterFn, if there is a musttail call, we must insert before the call so that we don't break the musttail call contract.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104807
When inserting UnregisterFn, if there is a musttail call, we must insert before the call so that we don't break the musttail call contract.
Differential Revision: https://reviews.llvm.org/D104807
The is from discussion in https://reviews.llvm.org/D104247#inline-993387
The contract and reassoc flags shouldn't imply each other .
All the aggressive fsub fusion reassociate operations,
we should guard them with reassoc flag check.
Reviewed By: mcberg2017
Differential Revision: https://reviews.llvm.org/D104723
Summary:
generate eh_info when vector registers are saved according to the traceback table.
struct eh_info_t {
unsigned version; /* EH info version 0 */
#if defined(64BIT)
char _pad[4]; /* padding */
#endif
unsigned long lsda; /* Pointer to Language Specific Data Area */
unsigned long personality; /* Pointer to the personality routine */
};
the value of lsda and personality is zero when the number of vector registers saved is large zero and there is not personality of the function
Reviewers: Jason Liu
Differential Revision: https://reviews.llvm.org/D103651
This patch aims to add the scalable property to LLT. The rest of the
patch-series changes the interfaces to take/return ElementCount and
TypeSize, which both have the ability to represent the scalable property.
The changes are mostly mechanical and aim to be non-functional changes
for fixed-width vectors.
For scalable vectors some unit tests have been added, but no effort has
been put into making any of the GlobalISel algorithms work with scalable
vectors yet. That will be left as future work.
The work is split into a series of 5 patches to make reviews easier.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D104450
Since this method can apply to cmpxchg operations, make sure it's clear
what value we're actually retrieving. This will help ensure we don't
accidentally ignore the failure ordering of cmpxchg in the future.
We could potentially introduce a getOrdering() method on AtomicSDNode
that asserts the operation isn't cmpxchg, but not sure that's
worthwhile.
Differential Revision: https://reviews.llvm.org/D103338
According to IR LangRef, the FMF flag:
contract
Allow floating-point contraction (e.g. fusing a multiply followed by an
addition into a fused multiply-and-add).
reassoc
Allow reassociation transformations for floating-point instructions.
This may dramatically change results in floating-point.
My understanding is that these two flags shouldn't imply each other,
as we might have a SDNode that can be reassociated with others, but
not contractble.
eg: We may want following fmul/fad/fsub to freely reassoc, but don't
want fma being generated here.
%F = fmul reassoc double %A, %B ; <double> [#uses=1]
%G = fmul reassoc double %C, %D ; <double> [#uses=1]
%H = fadd reassoc double %F, %G ; <double> [#uses=1]
%I = fsub reassoc double %H, %E ; <double> [#uses=1]
Before https://reviews.llvm.org/D45710, `reassoc` flag actually
did not imply isContratable either.
The current implementation also only check the flag in fadd node,
ignoring fmul node, this patch update that as well.
Reviewed By: spatel, qiucf
Differential Revision: https://reviews.llvm.org/D104247
Fixes a minor bug when trying to iterate through use operands when
updating debug use operands.
Extends a test to include above.
Differential Revision: https://reviews.llvm.org/D104576
TypePromotion is meant to be a generic pass and doesn't reference
any ARM intrinsics so it shouldn't include IntrinsicsARM.h.
The other Intrinsic related headers appear to be unneeded as well.
- Distinct metadata needs generating in the codegen to attach correct
AAInfo on the loads/stores after lowering, merging, and other relevant
transformations.
- This patch adds 'MachhineModuleSlotTracker' to help assign slot
numbers to these newly generated unnamed metadata nodes.
- To help 'MachhineModuleSlotTracker' track machine metadata, the
original 'SlotTracker' is rebased from 'AbstractSlotTrackerStorage',
which provides basic interfaces to create/retrive metadata slots. In
addition, once LLVM IR is processsed, additional hooks are also
introduced to help collect machine metadata and assign them slot
numbers.
- Finally, if there is any such machine metadata, 'MIRPrinter' outputs
an additional 'machineMetadataNodes' field containing all the
definition of those nodes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D103205
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.
I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104477
The Interleave Access pass will convert shuffle(binop(load, load)) to
binop(shuffle(load), shuffle(load)), in order to create more
interleaving load patterns (VLD2/3/4) that might have been messed up by
instcombine. As shown in D104247 we were missing copying IR flags to the
new instruction though, which should just be kept the same as the
original instruction.
Differential Revision: https://reviews.llvm.org/D104255
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.
The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.
One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.
Differential Revision: https://reviews.llvm.org/D99439
This only applies to FastIsel. GlobalIsel seems to sidestep
the issue.
This fixes https://bugs.llvm.org/show_bug.cgi?id=46996
One of the things we do in llvm is decide if a type needs
consecutive registers. Previously, we just checked if it
was an array or not.
(plus an SVE specific check that is not changing here)
This causes some confusion when you arbitrary IR like:
```
%T1 = type { double, i1 };
define [ 1 x %T1 ] @foo() {
entry:
ret [ 1 x %T1 ] zeroinitializer
}
```
We see it is an array so we call CC_AArch64_Custom_Block
which bails out when it sees the i1, a type we don't want
to put into a block.
This leaves the location of the double in some kind of
intermediate state and leads to odd codegen. Which then crashes
the backend because it doesn't know how to implement
what it's been asked for.
You get this:
```
renamable $d0 = FMOVD0
$w0 = COPY killed renamable $d0
```
Rather than this:
```
$d0 = FMOVD0
$w0 = COPY $wzr
```
The backend knows how to copy 64 bit to 64 bit registers,
but not 64 to 32. It can certainly be taught how but the real
issue seems to be us even trying to assign a register block
in the first place.
This change makes the logic of
AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters
a bit more in depth. If we find an array, also check that all the
nested aggregates in that array have a single member type.
Then CC_AArch64_Custom_Block's assumption of a type that looks
like [ N x type ] will be valid and we get the expected codegen.
New tests have been added to exercise these situations. Note that
some of the output is not ABI compliant. The aim of this change is
to simply handle these situations and not to make our processing
of arbitrary IR ABI compliant.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D104123
We create flag variable "__llvm_fs_discriminator__" in the binary
to indicate that FSAFDO hierarchical discriminators are used.
This variable might be GC'ed by the linker since it is not explicitly
reference. I initially added the var to the use list in pass
MIRFSDiscriminator but it did not work. It turned out the used global
list is collected in lowering (before MIR pass) and then emitted in
the end of pass pipeline.
Here I add the variable to the use list in IR level's AddDiscriminators
pass. The machine level code is still keep in the case IR's
AddDiscriminators is not invoked. If this is the case, this just use
-Wl,--export-dynamic-symbol=__llvm_fs_discriminator__
to force the emit.
Differential Revision: https://reviews.llvm.org/D103988
We create flag variable "__llvm_fs_discriminator__" in the binary
to indicate that FSAFDO hierarchical discriminators are used.
This variable might be GC'ed by the linker since it is not explicitly
reference. I initially added the var to the use list in pass
MIRFSDiscriminator but it did not work. It turned out the used global
list is collected in lowering (before MIR pass) and then emitted in
the end of pass pipeline.
In this patch, we use a "common" linkage for this variable so that
it will be GC'ed by the linker.
Differential Revision: https://reviews.llvm.org/D103988
Iff we have `SCALAR_TO_VECTOR` (and we demand it's only defined 0'th element),
and said scalar was produced by `EXTRACT_VECTOR_ELT` from the 0'th element
of some vector, then we can just continue traversal into said source vector.
This comes up in X86 vector uniform shift lowering.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D104250
6e5628354e regressed the Windows build as
the return type no longer matched in both branches for the return value
type deduction. This uses a bit more compiler magic to deal with that.
The sorting, obviously, must be stable, else we will have random assembly fluctuations.
Apparently there was no test coverage that would benefit from that,
so i've added one test.
The sorting consists of two parts - just sort the input vectors,
and recompute the shuffle mask -> input vector mapping.
I don't believe we need to do anything else.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D104187
Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary.
This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288
(as mentioned by @fhahn), after committing D91250.
Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99173
Register allocation may spill virtual registers to the stack, which can
increase alignment requirements of the stack frame. If the the function
did not require stack realignment before register allocation, the
registers required to do so may not be reserved/available. This results
in a stack frame that requires realignment but can not be realigned.
Instead, only increase the alignment of the stack if we are still able
to realign.
The register SpillAlignment will be ignored if we can't realign, and the
backend will be responsible for emitting the correct unaligned loads and
stores. This seems to be the assumed behaviour already, e.g.
ARMBaseInstrInfo::storeRegToStackSlot and X86InstrInfo::storeRegToStackSlot
are both `canRealignStack` aware.
Differential Revision: https://reviews.llvm.org/D103602
<string> is currently the highest impact header in a clang+llvm build:
https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html
One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.
This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.
Differential Revision: https://reviews.llvm.org/D103888
When reducing vector builds to shuffles it possible that
the DAG combiner may try to extract invalid subvectors.
This happens as the existing code assumes vectors will be power
of 2 sizes, which is already untrue, but becomes more noticable
with v6 and v7 types.
Specifically the existing code assumes that half PowerOf2Ceil of
a given vector index will fit twice into a given vector.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D103880
-Wframe-larger-than= is an interesting warning; we can't know the frame
size until PrologueEpilogueInsertion (PEI); very late in the compilation
pipeline.
-Wframe-larger-than= was propagated through CC1 as an -mllvm flag, then
was a cl::opt in LLVM's PEI pass; this meant it was dropped during LTO
and needed to be re-specified via -plugin-opt.
Instead, make it part of the IR proper as a module level attribute,
similar to D103048. Introduce -fwarn-stack-size CC1 option.
Reviewed By: rsmith, qcolombet
Differential Revision: https://reviews.llvm.org/D103928
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D95425
We will need to set the ssp canary bit in traceback table to communicate
with unwinder about the canary.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D103202
As shown in:
https://llvm.org/PR50623
...and the similar tests here, we were not accounting for
store merging of different sizes that do not cover the
entire range of the wide value to be stored.
This is the easy fix: just make sure that all of the
original stores are the same size, so when we calculate
the wide width, it's a simple N * M check.
This still allows all of the motivating optimizations from:
D86420 / 54a5dd485c
D87112 / 7a06b166b1
We could enhance this code to track individual bytes and
allow merging multiple sizes.
This patch changes RVV's policy for its supported list of fixed-length
vector types by capping by vector size rather than element count. Now
all 1024-byte vectors (of supported element types) are supported, rather
than all 256-element vectors.
This is a more natural fit for the architecture, and allows us to, for
example, improve the support for vector bitcasts.
This change necessitated the adding of some new simple types to avoid
"regressing" on the number of currently-supported vectors. We round out
the 1024-byte types by adding `v512i8`, `v1024i8`, `v512i16` and
`v512f16`.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D103884
G_INSERT legalization is incomplete and doesn't work very
well. Instead try to use sequences of G_MERGE_VALUES/G_UNMERGE_VALUES
padding with undef values (although this can get pretty large).
For the case of load/store narrowing, this is still performing the
load/stores in irregularly sized pieces. It might be cleaner to split
this down into equal sized pieces, and rely on load/store merging to
optimize it.
When narrowing G_ADD and G_SUB, handle types that aren't a multiple of
the type we're narrowing to. This allows us to handle types like s96
on 64 bit targets.
Note that the test here has a couple of dead instructions because of
the way the setup legalizes. I wasn't able to come up with a way to
write this test that avoids that easily.
Differential Revision: https://reviews.llvm.org/D97811
When narrowing G_INSERT, handle types that aren't a multiple of the
type we're narrowing to. This comes up if we're narrowing something
like an s96 to fit in 64 bit registers and also for non-byte multiple
packed types if they come up.
This implementation handles these cases by extending the extra bits to
the narrow size and truncating the result back to the destination
size.
Differential Revision: https://reviews.llvm.org/D97791
shuffle(concat(x,undef),concat(y,undef)) -> concat(shuffle(x,y),shuffle(x,y))
If the original shuffle references any of the upper (undef) subvector elements, ensure the split shuffle masks uses undef instead of an out-of-bounds value.
Fixes PR50609
> This reapplies c0f3dfb9, which was reverted following the discovery of
> crashes on linux kernel and chromium builds - these issues have since
> been fixed, allowing this patch to re-land.
This reverts commit 36ec97f76a.
The change caused non-determinism in the compiler, see comments on the code
review at https://reviews.llvm.org/D91722.
Reverting to unbreak people's builds until that can be addressed.
This also reverts the follow-up "[DebugInfo] Limit the number of values
that may be referenced by a dbg.value" in
a0bd6105d8.
Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector
type cannot be legalized by widening/splitting. When this is the method of legalization
found, getTypeLegalizationCost will return an Invalid cost.
The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call
getTypeLegalizationCost and will now also return an Invalid cost for unsupported types.
Reviewed By: sdesmalen, david-arm
Differential Revision: https://reviews.llvm.org/D102515
This sets the AllowTruncation flag on isConstOrConstSplat in
isNullOrNullSplat, allowing it to see truncated constant zeroes on
architectures such as AArch64, where only a i32.i64 are legal. As a
truncation of 0 is always 0, this should always be valid, allowing some
extra folding to happen including some of the cases from D103755.
Differential Revision: https://reviews.llvm.org/D103756
Needs to be discussed more.
This reverts commit 255a5c1baa6020c009934b4fa342f9f6dbbcc46
This reverts commit df2056ff3730316f376f29d9986c9913b95ceb1
This reverts commit faff79b7ca144e505da6bc74aa2b2f7cffbbf23
This reverts commit d2a9020785c6e02afebc876aa2778fa64c5cafd
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.
Differential Revision: https://reviews.llvm.org/D103759
Was reverted in 0507fc2ffc, in phi-coalesce-subreg.mir I'd explicitly named
some passes to run instead of specifying a range. As a result some
two-address-instrs weren't correctly rewritten and the verifier got upset.
Original commit message:
[DebugInstrRef][2/3] Track PHI values through register coalescing
In the instruction referencing variable location model, we store variable
locations that point at PHIs in MachineFunction during register allocation.
Unfortunately, register coalescing can substantially change the locations
of registers, and so that PHI-variable-location side table needs
maintenence during the pass.
This patch builds an index from the side table, and whenever a vreg gets
coalesced into another vreg, update the index to record the new vreg that
the PHI happens in. It also accepts a limited range of subregister
coalescing, for example merging a subregister into a larger class.
Differential Revision: https://reviews.llvm.org/D86813
This patch extends the SelectionDAG's ability to constant-fold vector
arithmetic to include support for SPLAT_VECTOR. This is not only for
scalable-vector types but also for fixed-length vector types, which
helps Hexagon in a couple of cases.
The original RISC-V test case was in fact an infinite DAGCombine loop.
The pattern `and (truncate v1), (truncate v2)` can be combined to
`truncate (and v1, v2)` but the truncate can similarly be combined back
to `truncate (and v1, v2)` (but, crucially, only when one of `v1` or
`v2` is a constant vector).
It wasn't exposed in on fixed-length types because a TRUNCATE of a
constant BUILD_VECTOR was folded into the BUILD_VECTOR itself, whereas
this did not happen for the equivalent (scalable-vector) SPLAT_VECTOR.
Reviewed By: RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D103246
Summary: When -strict-dwarf=true is specified, the calling convention info
DW_CC_pass_by_value or DW_CC_pass_by_reference can only be generated at DWARF5.
Reviewed By: shchenz, dblaikie
Differential Revision: https://reviews.llvm.org/D103300
If we're not emitting separate fences for the success/failure cases, we
need to pass the merged ordering to the target so it can emit the
correct instructions.
For the PowerPC testcase, we end up with extra fences, but that seems
like an improvement over missing fences. If someone wants to improve
that, the PowerPC backed could be taught to emit the fences after isel,
instead of depending on fences emitted by AtomicExpand.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33332 .
Differential Revision: https://reviews.llvm.org/D103342
This is a followup to D103422. The DenseMapInfo implementations for
ArrayRef and StringRef are moved into the ArrayRef.h and StringRef.h
headers, which means that these two headers no longer need to be
included by DenseMapInfo.h.
This required adding a few additional includes, as many files were
relying on various things pulled in by ArrayRef.h.
Differential Revision: https://reviews.llvm.org/D103491
In the instruction referencing variable location model, we store variable
locations that point at PHIs in MachineFunction during register
allocation. Unfortunately, register coalescing can substantially change
the locations of registers, and so that PHI-variable-location side table
needs maintenence during the pass.
This patch builds an index from the side table, and whenever a vreg gets
coalesced into another vreg, update the index to record the new vreg that
the PHI happens in. It also accepts a limited range of subregister
coalescing, for example merging a subregister into a larger class.
Differential Revision: https://reviews.llvm.org/D86813
The `DAGTypeLegalizer::WidenVSELECTMask` function is not (yet) ready for
scalable vector types, and has numerous places in which it tries to grab
either the fixed size or number of elements of its types.
I believe that it should be possible to update this method to properly
account for scalable-vector types, but we don't have test cases for
that; RISC-V bails out early on as it has legal i1 vector masks. As
such, this patch just prevents it from crashing.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D103536
The attached tests check for the regression in DAGCombiner's
`visitVSELECT`, which may call this method.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D103534
This extends 434c8e013a and ede3982792 to handle signed
predicates by sign-extending the setcc operands.
This is not shown directly in https://llvm.org/PR50055 ,
but the pattern is visible by changing the unsigned convert
to signed in the source code.
This patch was split from https://reviews.llvm.org/D102246
[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This is mainly for ProfileData part of change. It will load
FS Profile when such profile is detected. For an extbinary format profile,
create_llvm_prof tool will add a flag to profile summary section.
For other format profiles, the users need to use an internal option
(-profile-isfs) to tell the compiler that the profile uses FS discriminators.
This patch also simplified the bit API used by FS discriminators.
Differential Revision: https://reviews.llvm.org/D103041
This is a follow-up to D103280 that eases the use restrictions,
so we can handle the motivating case from:
https://llvm.org/PR50055
The loop code is adapted from similar use checks in
ExtendUsesToFormExtLoad() and SliceUpLoad(). I did not see an
easier way to filter out non-chain uses of load values.
Differential Revision: https://reviews.llvm.org/D103462
Use RuntimeLibcalls to get a common way to pick correct RTLIB::POWI_*
libcall for a given value type.
This includes a small refactoring of ExpandFPLibCall and
ExpandArgFPLibCall in SelectionDAGLegalize to share a bit of code,
plus adding an ExpandFPLibCall version that can be called directly
when expanding FPOWI/STRICT_FPOWI to ensure that we actually use
the same RTLIB::Libcall when expanding the libcall as we used when
checking the legality of such a call by doing a getLibcallName check.
Differential Revision: https://reviews.llvm.org/D103050
The FPOWI DAG node is normally lowered to a libcall to one of the
RTLIB::POWI* runtime functions and the exponent should normally
have a type matching sizeof(int) when making the call. Thus,
type promotion of the exponent could lead to an FPOWI with a type
for the second operand that would be incorrect when doing the
libcall (a situation which would be hard to detect post-legalization
if we allow such FPOWI nodes).
This patch is changing DAGTypeLegalizer::PromoteIntOp_FPOWI to
do the rewrite into a libcall directly instead of promoting the
operand. This way we can check that the exponent is smaller than
sizeof(int) and we can let TargetLowering handle promotion as
part of making the libcall. It could be noticed here that makeLibCall
has some knowledge about targets such as 64-bit RISCV, for which the
libcall argument should be extended to a type larger than sizeof(int).
Differential Revision: https://reviews.llvm.org/D102950
D85085 was pushed earlier but broke tests on mac and win:
http://lab.llvm.org:8080/green/job/clang-stage1-RA/21182/consoleFull#-706149783d489585b-5106-414a-ac11-3ff90657619c
Recommitting it after adding mtriple to the llc commands.
Emit correct location lists with basic block sections.
This patch addresses multiple things:
1) It ensures that const_value is emitted when possible with basic block
sections.
2) It emits location lists such that the labels are always within the
section boundary.
3) It fixes a bug when the parameter is first used in a non-entry block
which is in a different section from the entry block.
Differential Revision: https://reviews.llvm.org/D85085
It's still in use in a few places so we can't delete it yet but there's not
many at this point.
Differential Revision: https://reviews.llvm.org/D103352
This patch transforms the sequence
lea (reg1, reg2), reg3
sub reg3, reg4
to two sub instructions
sub reg1, reg4
sub reg2, reg4
Similar optimization can also be applied to LEA/ADD sequence.
The modifications to TwoAddressInstructionPass is to ensure the operands of ADD
instruction has expected order (the dest register of LEA should be src register of ADD).
Differential Revision: https://reviews.llvm.org/D101970
It breaks up the function pass manager in the codegen pipeline.
With empty parameters, it looks at the -mllvm flag -rewrite-map-file.
This is likely not in use.
Add a check that we only have one function pass manager in the codegen
pipeline.
Some tests relied on the fact that we had a module pass somewhere in the
codegen pipeline.
addr-label.ll crashes on ARM due to this change. This is because a
ARMConstantPoolConstant containing a BasicBlock to represent a
blockaddress may hold an invalid pointer to a BasicBlock if the
blockaddress is invalidated by its BasicBlock getting removed. In that
case all referencing blockaddresses are RAUW a constant int. Making
ARMConstantPoolConstant::CVal a WeakVH fixes the crash, but I'm not sure
that's the right fix. As a workaround, create a barrier right before
ISel so that IR optimizations can't happen while a
ARMConstantPoolConstant has been created.
Reviewed By: rnk, MaskRay, compnerd
Differential Revision: https://reviews.llvm.org/D99707
I accidentaly pushed a draft of D103280 that was discussed
during the review, but it was not supposed to be the final
version.
Rather than revert and recommit, I'm updating the existing
code. This way we have a record of the codegen diff that
would result if we decide to remove this predicate in the
future.
sext (vsetcc X, Y) --> vsetcc (zext X), (zext Y) --
(when the zexts are free and a bunch of other conditions)
We have a couple of similar folds to this already for vector selects,
but this pattern slips through because it is only a setcc.
The tests are based on the motivating case from:
https://llvm.org/PR50055
...but we need extra logic to get that example, so I've left that as
a TODO for now.
Differential Revision: https://reviews.llvm.org/D103280
The D35953, D62650 and D73691 introduced trimming of variables locations
in LiveDebugVariables pass, since there are some cases where after
the virtregrewrite we have exploded number of DBG_VALUEs created for some
inlined variables. As it looks, all problematic cases were regarding
inlined variables, so it seems reasonable to stop trimming the location
ranges for non-inlined variables.
It has very good impact on the llvm-locstats report.
Differential Revision: https://reviews.llvm.org/D102917
extractelement is poison if the index is out-of-bounds, so just
scalarizing the load may introduce an out-of-bounds load, which is UB.
To avoid introducing new UB, we can mask the index so it only contains
valid indices.
Fixes PR50382.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D103077
Using the proper API automatically sets `__stack_chk_guard` to `dso_local` if
`Reloc::Static`. This wasn't strictly necessary until recently when dso_local was
no longer implied by `TargetMachine::shouldAssumeDSOLocal` for
`__stack_chk_guard`. By using the proper API, we can avoid generating unnecessary
GOT relocations.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D102646
Follow the same strategy used for atomic loads/stores by converting the operands to equally-sized integer types.
This change prevents the atomic expansion pass from generating illegal LL/SC pairs when targeting AArch64: `expand-atomicrmw-xchg-fp.ll` would previously instantiate intrinsics such as `llvm.aarch64.ldaxr.p0f32` that cannot be lowered.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D103232
If a cmpxchg specifies acquire or seq_cst on failure, make sure we
generate code consistent with that ordering even if the success ordering
is not acquire/seq_cst.
At one point, it was ambiguous whether this sort of construct was valid,
but the C++ standad and LLVM now accept arbitrary combinations of
success/failure orderings.
This doesn't address the corresponding issue in AtomicExpand. (This was
reported as https://bugs.llvm.org/show_bug.cgi?id=33332 .)
Fixes https://bugs.llvm.org/show_bug.cgi?id=50512.
Differential Revision: https://reviews.llvm.org/D103284
Parameter positions seem like they should be unsigned.
While there, make function names lowercase per coding standards.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D103224
SwiftTailCC has a different set of requirements than the C calling convention
for a tail call. The exact argument sequence doesn't have to match, but fewer
ABI-affecting attributes are allowed.
Also make sure the musttail diagnostic triggers if a musttail call isn't
actually a tail call.
Also changes the fewerElements helper to use the lookthrough constant helper
instead of m_ICst, since m_ICst doesn't look through extends.
Differential Revision: https://reviews.llvm.org/D103227
Adjusting the load register type is a widenScalar type action, not a
lowering. lowerLoad should be reserved for operations that change the
memory access size, such as unaligned load decomposition. With this
trying to adjust the register type, it was hard to avoid infinite
loops in the legalizer. Adds a bandaid to avoid regressing a few
AArch64 tests, but I'm not sure what the exact condition is and
there's probably a cleaner way to do this.
For AMDGPU this regresses handling of some cases for unaligned loads,
but the way this is currently working is a pretty ugly hack.
This patch adds a way for the target to configure the type it uses for
the explicit vector length operands of VP SDNodes. The type must be a
legal integer type (there is still no target-independent legalization of
this operand) and must currently be at least as big as i32, the type
used by the IR intrinsics. An implicit zero-extension takes place on
targets which choose a larger type. All VP nodes should be created with
this type used for the EVL operand.
This allows 64-bit RISC-V to avoid custom legalization of all VP nodes,
keeping them in their target-independent form for that bit longer.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D103027
DAGCombine's `mergeStoresOfConstantsOrVecElts` optimization is told
whether it's to use vector types and also whether it's to issue a
truncating store. However, the truncating store code path assumes a
scalar integer `ConstantSDNode`, and when using vector types it creates
either a `BUILD_VECTOR` or `CONCAT_VECTORS` to store: neither of which
is a constant.
The `riscv64` target is able to expose a crash here because it switches
on both code paths at the same time. The `f32` is stored as `i32` which
must be promoted to `i64`, necessitating a truncating store.
It also decides later that it prefers a vector store of `v2f32`.
While vector truncating stores are legal, this combine is not able to
emit them. We also don't have a test case. This patch adds an assert to
catch this case more gracefully, and updates one of the caller functions
to the function to turn off the use of truncating stores when preferring
vectors.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D103173
This patch extends the cases in which the legalizer is able to express
VSELECT in terms of XOR/AND/OR. When dealing with a VSELECT between
boolean vector types, the mask itself is an all-ones or all-ones value
of the operand type, so a 0/1 boolean type behaves identically to a 0/-1
type.
This greatly helps RISC-V which relies on expansion for these nodes. It
also allows scalable-vector bool VSELECTs to use the default expansion,
where before it would crash in SelectionDAG::UnrollVectorOp.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D103147
Thhis is a port from the DAG legalization. We're still missing some of the
canonicalizations of shuffles but it's a start.
Differential Revision: https://reviews.llvm.org/D102828
There were a bunch of lost debug location remarks that show up when legalizing
tail calls on AArch64.
This would happen because we drop the return in the block where we emit the
tail call. So, we end up dropping the debug location, which makes the
LostDebugLocObserver report a missing debug location.
Although it's *true* that we lose these debug locations, this isn't
a particularly useful remark. We expect to drop these debug locations when
emitting tail calls. Suppressing remarks in this case is preferable, since the
amount of noise could hide actual debug location related bugs.
To do this, I just plumbed the LostDebugLocObserver through the relevant
LegalizerHelper functions. This is the only case I can think of where we need
the LostDebugLocObserver in the LegalizerHelper. So, rather than storing it
in the LegalizerHelper proper and mucking around with the constructors, I
figured it'd be cleanest to take the simplest path for now.
This clears up ~20 noisy lost debug location remarks on CTMark in AArch64 at
-Os.
Differential Revision: https://reviews.llvm.org/D103128
This patch addresses multiple things:
1) It ensures that const_value is emitted when possible with basic block
sections.
2) It emits location lists such that the labels are always within the
section boundary.
3) It fixes a bug when the parameter is first used in a non-entry block
which is in a different section from the entry block.
Differential Revision: https://reviews.llvm.org/D85085
This patch introduces "DBG_PHI" instructions, a marker of where a PHI
instruction used to be, before PHI elimination. Under the instruction
referencing model, we want to know where every value in the function is
defined -- and a PHI, even if implicit, is such a place.
Just like instruction numbers, we can use this to identify a value to be
used as a variable value, but we don't need to know what instruction
defines that value, for example:
bb1:
DBG_PHI $rax, 1
[... more insts ... ]
bb2:
DBG_INSTR_REF 1, 0, !1234, !DIExpression()
This specifies that on entry to bb1, whatever value is in $rax is known
as value number one -- and the later DBG_INSTR_REF marks the position
where variable !1234 should take on value number one.
PHI locations are stored in MachineFunction for the duration of the
regalloc phase in the DebugPHIPositions map. The map is populated by
PHIElimination, and then flushed back into the instruction stream by
virtregrewriter. A small amount of maintenence is needed in
LiveDebugVariables to account for registers being split, but only for
individual positions, not for entire ranges of blocks.
Differential Revision: https://reviews.llvm.org/D86812
DwarfDebug unconditionally assumes for all call instructions the 0th
operand is the callee operand, which seems to be true for other targets,
but not for WebAssembly. This adds `TargetInstrInfo::getCallOperand`
method whose default implementation returns `getOperand(0)` and makes
WebAssembly overrides it to use its own utility method to get the callee
operand.
This also fixes an existing bug in `WebAssembly::getCalleeOp`, which was
uncovered by this CL.
Reviewed By: dschuff, djtodoro
Differential Revision: https://reviews.llvm.org/D102978
Support virtual, physical and tied i128 register operands in inline assembly.
i128 is on SystemZ not really supported and is not a legal type and generally
such a value will be split into two i64 parts. There are however some
instructions that require a pair of two GPR64 registers contained in the GR128
bit reg class, which is untyped.
For inline assmebly operands, it proved to be very cumbersome to first follow
the general behavior of splitting an i128 operand into two parts and then
later rebuild the INLINEASM MI to have one GR128 register. Instead, some
minor common code changes were made to SelectionDAGBUilder to only create one
GR128 register part to begin with. In particular:
- getNumRegisters() now has an optional parameter "RegisterVT" which is
passed by AddInlineAsmOperands() and GetRegistersForValue().
- The bitcasting in GetRegistersForValue is not performed if RegVT is
Untyped.
- The RC for a tied use in AddInlineAsmOperands() is now computed either from
the tied def (virtual register), or by getMinimalPhysRegClass() (physical
register).
- InstrEmitter.cpp:EmitCopyFromReg() has been fixed so that the register
class (DstRC) can also be computed for an illegal type.
In the SystemZ backend getNumRegisters(), splitValueIntoRegisterParts() and
joinRegisterPartsIntoValue() have been implemented to handle i128 operands.
Differential Revision: https://reviews.llvm.org/D100788
Review: Ulrich Weigand
Precursor to D100944. The logic for determining the unique ID had become
quite difficult to reason about, so I have factored this out into a
separate function.
Differential Revision: https://reviews.llvm.org/D102336
- When memory intrinsics, such as memcpy, the attached scoped AA
metadata is not passed down to the backend. As a result, the backend
cannot schedule relevant memory operations around them following that
hint. In this patch, SelectionDAG is enhanced to propagate that
metadata (scoped AA only) when they are lowered into loads and stores.
Differential Revision: https://reviews.llvm.org/D102215
The findLoopPreheader function will currently not find a preheader if it
branches to multiple different loop headers. This patch adds an option
to relax that, allowing ARMLowOverheadLoops to process more loops
successfully. This helps with WhileLoopStart setup instructions that can
branch/fallthrough to the low overhead loop and to branch to a separate
loop from the same preheader (but I don't believe it is possible for
both loops to be low overhead loops).
Differential Revision: https://reviews.llvm.org/D102747
This makes it possible for targets to define their own MCObjectFileInfo.
This MCObjectFileInfo is then used to determine things like section alignment.
This is a follow up to D101462 and prepares for the RISCV backend defining the
text section alignment depending on the enabled extensions.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101921
D29668 enabled to avoid a useless copy of the argument value into an alloca if the caller places it in memory (as it often happens on x86) by directly forwarding the pointer to it. This optimization is illegal if the type contains padding bytes: if a truncating store into the alloca is replaced the upper bits are filled with garbage and produce code misbehaving at runtime.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D102153
D88631 added initial support for:
- -mstack-protector-guard=
- -mstack-protector-guard-reg=
- -mstack-protector-guard-offset=
flags, and D100919 extended these to AArch64. Unfortunately, these flags
aren't retained for LTO. Make them module attributes rather than
TargetOptions.
Link: https://github.com/ClangBuiltLinux/linux/issues/1378
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D102742
SelectionDAG forces us to have a weird ABI for 16-bit values without
legal 16-bit operations, but currently GlobalISel bypasses this and
sometimes ends up using the gfx8+ ABI in some contexts. Make sure
we're testing the normal ABI to avoid a test change in a future patch.
This reapplies c0f3dfb9, which was reverted following the discovery of
crashes on linux kernel and chromium builds - these issues have since
been fixed, allowing this patch to re-land.
This reverts commit 4397b7095d.
The function `reduceOperationWidth` helps to legalize a vector
operation either by narrowing its type or by scalarizing the
operation itself. It currently supports instructions with one result.
This patch, in addition allows the same for instructions with two
results (for instance, G_SDIVREM).
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D100725
Previously APFloat::convertToDouble may be called only for APFloats that
were built using double semantics. Other semantics like single precision
were not allowed although corresponding numbers could be converted to
double without loss of precision. The similar restriction applied to
APFloat::convertToFloat.
With this change any APFloat that can be precisely represented by double
can be handled with convertToDouble. Behavior of convertToFloat was
updated similarly. It make the conversion operations more convenient and
adds support for formats like half and bfloat.
Differential Revision: https://reviews.llvm.org/D102671
Unlike normal loads these don't have an extension field, but we know
from TargetLowering whether these are sign-extending or zero-extending,
and so can optimise away unnecessary extensions.
This was noticed on RISC-V, where sign extensions in the calling
convention would result in unnecessary explicit extension instructions,
but this also fixes some Mips inefficiencies. PowerPC sees churn in the
tests as all the zero extensions are only for promoting 32-bit to
64-bit, but these zero extensions are still not optimised away as they
should be, likely due to i32 being a legal type.
This also simplifies the WebAssembly code somewhat, which currently
works around the lack of target-independent combines with some ugly
patterns that break once they're optimised away.
Re-landed with correct handling in ComputeNumSignBits for Tmp == VTBits,
where zero-extending atomics were incorrectly returning 0 rather than
the (slightly confusing) required return value of 1.
Re-landed again after D102819 fixed PowerPC to correctly zero-extend all
of its atomics as it claimed to do, since the combination of that bug
and this optimisation caused buildbot regressions.
Reviewed By: RKSimon, atanasyan
Differential Revision: https://reviews.llvm.org/D101342
The use of `SelectionDAG::getSplatValue` isn't guaranteed to return a
type-legal splat value as it may implicitly extract a vector element
from another shuffle. It is not permitted to introduce an illegal type
when lowering shuffles.
This patch addresses the crash by adding a boolean flag to
`getSplatValue`, defaulting to false, which when set will ensure a
type-legal return value. If it is unable to do that it will fail to
return a splat value.
I've been through the existing uses of `getSplatValue` in other targets
and was unable to find a need or test cases showing a need to update
their uses. In some cases, the call is made during `LegalizeVectorOps`
which may still produce illegal scalar types. In other situations, the
illegally-typed splat value may be quickly patched up to a legal type
(such as any-extending the returned `extract_vector_elt` up to a legal
type) before `LegalizeDAG` notices.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D102687
Currently, variadic dbg.values (i.e. those using a DIArgList as part of
their location) are not handled properly by FastISel or GlobalISel, and
will produce invalid DBG_VALUE instructions if they encounter them. This
patch fixes this issue by emitting undef DBG_VALUE instructions for
variadic dbg.values, so that no incorrect instruction is produced and
any prior variable location is terminated.
This is simply a quick-fix to prevent errors; a correct implementation
should come later for these ISel pipelines to ensure that we do not drop
debug information unnecessarily.
Differential Revision: https://reviews.llvm.org/D102500
When trying to return a type such as <vscale x 1 x i32> from a
function we crash in DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR
when attempting to get the fixed number of elements in the vector.
For the simple case we are dealing with, i.e. extracting
<vscale x 1 x i32> from index 0 of input vector <vscale x 4 x i32>
we can simply rely upon existing code that just returns the input.
Differential Revision: https://reviews.llvm.org/D102605
When attempting to return something like a <vscale x 1 x i32>
type from a function we end up trying to widen the vector by
inserting a <vscale x 1 x i32> subvector into an undefined
<vscale x 4 x i32> vector. However, during legalisation we
then attempt to widen the INSERT_SUBVECTOR operands and hit
an error in WidenVectorOperand.
This patch adds a new WidenVecOp_INSERT_SUBVECTOR function
that currently only supports inserting subvectors into undefined
vectors.
Differential Revision: https://reviews.llvm.org/D102501
The operation of some VP intrinsics do/will not map to regular
instruction opcodes. Returning 'None' seems more intuitive here than
'Instruction::Call'.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D102778
- This patch (is one in a series of patches) which introduces HLASM Parser support (for the first parameter of inline asm statements) to LLVM ([[ https://lists.llvm.org/pipermail/llvm-dev/2021-January/147686.html | main RFC here ]])
- This patch in particular introduces HLASM Parser support for Z machine instructions.
- The approach taken here was to subclass `AsmParser`, and make various functions and variables as "protected" wherever appropriate.
- The `HLASMAsmParser` class overrides the `parseStatement` function. Two new private functions `parseAsHLASMLabel` and `parseAsMachineInstruction` are introduced as well.
The general syntax is laid out as follows (more information available in [[ https://www.ibm.com/support/knowledgecenter/SSENW6_1.6.0/com.ibm.hlasm.v1r6.asm/asmr1023.pdf | HLASM V1R6 Language Reference Manual ]] - Chapter 2 - Instruction Statement Format):
```
<TokA><spaces.*><TokB><spaces.*><TokC><spaces.*><TokD>
```
1. TokA is referred to as the Name Entry. This token is optional
2. TokB is referred to as the Operation Entry. This token is mandatory.
3. TokC is referred to as the Operand Entry. This token is mandatory
4. TokD is referred to as the Remarks Entry. This token is optional
- If TokA is provided, then we either parse TokA as a possible comment or as a label (Name Entry), Tok B as the Operation Entry and so on.
- If TokA is not provided (i.e. we have one or more spaces and then the first token), then we will parse the first token (i.e TokB) as a possible Z machine instruction, TokC as the operands to the Z machine instruction and TokD as a possible Remark field
- TokC (Operand Entry), no spaces are allowed between OperandEntries. If a space occurs it is classified as an error.
- TokD if provided is taken as is, and emitted as a comment.
The following additional approach was examined, but not taken:
- Adding custom private only functions to base AsmParser class, and only invoking them for z/OS. While this would eliminate the need for another child class, these private functions would be of non-use to every other target. Similarly, adding any pure virtual functions to the base MCAsmParser class and overriding them in AsmParser would also have the same disadvantage.
Testing:
- This patch doesn't have tests added with it, for the sole reason that MCStreamer Support and Object File support hasn't been added for the z/OS target (yet). Hence, it's not possible generate code outright for the z/OS target. They are in the process of being committed / process of being worked on.
- Any comments / feedback on how to combat this "lack of testing" due to other missing required features is appreciated.
Reviewed By: Kai, uweigand
Differential Revision: https://reviews.llvm.org/D98276
This is a step towards relying more on node-level FMF rather than function-wide
or target settings.
I think it was just an oversight that we didn't get this path in D87361
or follow-on patches.
The lack of FMF propagation is blocking D90901 from converting tests to IR-level FMF.
We can't do much more than this currently because we also fail to propagate flags
from x86-specific node to generic FMA node. That would be another patch, so the
test just verifies that we can transfer from IR to initial SDAG node.
Differential Revision: https://reviews.llvm.org/D102725
This patch transforms the sequence
lea (reg1, reg2), reg3
sub reg3, reg4
to two sub instructions
sub reg1, reg4
sub reg2, reg4
Similar optimization can also be applied to LEA/ADD sequence.
The modifications to TwoAddressInstructionPass is to ensure the operands of ADD
instruction has expected order (the dest register of LEA should be src register
of ADD).
Differential Revision: https://reviews.llvm.org/D101970
This patch implements first part of Flow Sensitive SampleFDO (FSAFDO).
It has the following changes:
(1) disable current discriminator encoding scheme,
(2) new hierarchical discriminator for FSAFDO.
For this patch, option "-enable-fs-discriminator=true" turns on the new
functionality. Option "-enable-fs-discriminator=false" (the default)
keeps the current SampleFDO behavior. When the fs-discriminator is
enabled, we insert a flag variable, namely, llvm_fs_discriminator, to
the object. This symbol will checked by create_llvm_prof tool, and used
to generate a profile with FS-AFDO discriminators enabled. If this
happens, for an extbinary format profile, create_llvm_prof tool
will add a flag to profile summary section.
Differential Revision: https://reviews.llvm.org/D102246
For opaque pointers, we're trying to avoid uses of
PointerType::getElementType().
A couple of ISel places use PointerType::getElementType(). Some of these
are easy to fix by using ArgListEntry's indirect types.
The inalloca type wasn't stored there, as opposed to preallocated and
byval which have their indirect types available, so add it and use it.
This is a reland after an MSan fix in D102667.
Differential Revision: https://reviews.llvm.org/D101713
Currently all AA analyses marked as preserved are stateless, not taking
into account their dependent analyses. So there's no need to mark them
as preserved, they won't be invalidated unless their analyses are.
SCEVAAResults was the one exception to this, it was treated like a
typical analysis result. Make it like the others and don't invalidate
unless SCEV is invalidated.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D102032
Use existing KnownBits helpers from KnownBits.h to simplify G_ICMPs.
E.g.
x == x -> true
x != x -> false
load(x) > 1 -> true (when the load is known to be greater than 1)
And so on.
Differential Revision: https://reviews.llvm.org/D102542
This patch is the Part-1 (FE Clang) implementation of HW Exception handling.
This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).
This is the first step of this project; only X86_64 target is enabled in this patch.
Compiler options:
For clang-cl.exe, the option is -EHa, the same as MSVC.
For clang.exe, the extra option is -fasync-exceptions,
plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.
NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.
The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow
three rules:
* First, no exception can move in or out of _try region., i.e., no "potential
faulty instruction can be moved across _try boundary.
* Second, the order of exceptions for instructions 'directly' under a _try
must be preserved (not applied to those in callees).
* Finally, global states (local/global/heap variables) that can be read
outside of _try region must be updated in memory (not just in register)
before the subsequent exception occurs.
The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++
side. When a C++ function (in the same compilation unit with option -EHa ) is
called by a SEH C function, a hardware exception occurs in C++ code can also
be handled properly by an upstream SEH _try-handler or a C++ catch(...).
As such, when that happens in the middle of an object's life scope, the dtor
must be invoked the same way as C++ Synchronous Exception during unwinding
process.
Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.
This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.
One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:
A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing
SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others. If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:
Warning: jump bypasses variable with a non-trivial destructor
In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.
Implementation:
Part-1: Clang implementation described below.
Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end().
_scope_begin() is immediately added after ctor() is called and EHStack is pushed.
So it must be an invoke, not a call. With that it's also guaranteed an
EH-cleanup-pad is created regardless whether there exists a call in this scope.
_scope_end is added before dtor(). These two intrinsics make the computation of
Block-State possible in downstream code gen pass, even in the presence of
ctor/dtor inlining.
Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark
_try boundary and to prevent from exceptions being moved across _try boundary.
All memory instructions inside a _try are considered as 'volatile' to assure
2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's
acceptable as the amount of code directly under _try is very small.
Part-2 (will be in Part-2 patch): LLVM implementation described below.
For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).
For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.
The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.
Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.htmlhttps://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html
Differential Revision: https://reviews.llvm.org/D80344/new/
Follow up to D88631 but for aarch64; the Linux kernel uses the command
line flags:
1. -mstack-protector-guard=sysreg
2. -mstack-protector-guard-reg=sp_el0
3. -mstack-protector-guard-offset=0
to use the system register sp_el0 for the stack canary, enabling the
kernel to have a unique stack canary per task (like a thread, but not
limited to userspace as the kernel can preempt itself).
Address pr/47341 for aarch64.
Fixes: https://github.com/ClangBuiltLinux/linux/issues/289
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen
Differential Revision: https://reviews.llvm.org/D100919
Ensure we tell getShiftAmountTy that we're working with pre-legalized types to prevent cases where the (legalized) shift type can no longer handle the (non-legalized) type width.
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34366
Swift's new concurrency features are going to require guaranteed tail calls so
that they don't consume excessive amounts of stack space. This would normally
mean "tailcc", but there are also Swift-specific ABI desires that don't
naturally go along with "tailcc" so this adds another calling convention that's
the combination of "swiftcc" and "tailcc".
Support is added for AArch64 and X86 for now.
The select-of-constants transform was asserting that its constant vector
inputs did not implicitly truncate their input without that as an
explicit precondition to the function. This patch relaxes that assertion
into an early return to skip the optimization.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D102393
ScheduleDAGFast.cpp is compiled to object file, but the ScheduleDAGFast
object file isn't linked into clang executable file as no symbol is
referred by outside. Add calling to createXxx of ScheduleDAGFast.cpp,
then the ScheduleDAGFast object file will be linked into clang
executable file. The static RegisterScheduler will register scheduler
fast and linearize at clang boot time.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D101601
This adds a simple fold into codegenprepare that converts comparison of
branches towards comparison with zero if possible. For example:
%c = icmp ult %x, 8
br %c, bla, blb
%tc = lshr %x, 3
becomes
%tc = lshr %x, 3
%c = icmp eq %tc, 0
br %c, bla, blb
As a first order approximation, this can reduce the number of
instructions needed to perform the branch as the shift is (often) needed
anyway. At the moment this does not effect very much, as llvm tends to
prefer the opposite form. But it can protect against regressions from
commits like rG9423f78240a2.
Simple cases of Add and Sub are added along with Shift, equally as the
comparison to zero can often be folded with cpsr flags.
Differential Revision: https://reviews.llvm.org/D101778
This patch adds support for GCC's -fstack-usage flag. With this flag, a stack
usage file (i.e., .su file) is generated for each input source file. The format
of the stack usage file is also similar to what is used by GCC. For each
function defined in the source file, a line with the following information is
produced in the .su file.
<source_file>:<line_number>:<function_name> <size_in_byte> <static/dynamic>
"Static" means that the function's frame size is static and the size info is an
accurate reflection of the frame size. While "dynamic" means the function's
frame size can only be determined at run-time because the function manipulates
the stack dynamically (e.g., due to variable size objects). The size info only
reflects the size of the fixed size frame objects in this case and therefore is
not a reliable measure of the total frame size.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D100509
This extends any frame record created in the function to include that
parameter, passed in X22.
The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:
* 0b0000 => a normal, non-extended record with just [FP, LR]
* 0b0001 => the extended record [X22, FP, LR]
* 0b1111 => kernel space, and a non-extended record.
All other values are currently reserved.
If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.
There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).
Since 5de2d189e6 this particular warning
hasn't had the location of the source file containing the inline
assembly.
Fix this by reporting via LLVMContext. Which means that we no longer
have the "instantiated into assembly here" lines but they were going to
point to the start of the inline asm string anyway.
This message is already tested via IR in llvm. However we won't have
the required location info there so I've added a C file test in clang
to cover it.
(though strictly, this is testing llvm code)
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D102244
I've taken the following steps to add unwinding support from inline assembly:
1) Add a new `unwind` "attribute" (like `sideeffect`) to the asm syntax:
```
invoke void asm sideeffect unwind "call thrower", "~{dirflag},~{fpsr},~{flags}"()
to label %exit unwind label %uexit
```
2.) Add Bitcode writing/reading support + LLVM-IR parsing.
3.) Emit EHLabels around inline assembly lowering (SelectionDAGBuilder + GlobalISel) when `InlineAsm::canThrow` is enabled.
4.) Tweak InstCombineCalls/InlineFunction pass to not mark inline assembly "calls" as nounwind.
5.) Add clang support by introducing a new clobber: "unwind", which lower to the `canThrow` being enabled.
6.) Don't allow unwinding callbr.
Reviewed By: Amanieu
Differential Revision: https://reviews.llvm.org/D95745
We want it to be available in analyzes so that we could use the
CodeGen notion in middle-end passes (for example, to check if
a GC may free some particular pointer).
This is a preparatory patch that simply moves the files around.
Note: if this causes some build issues, this patch must just be reverted.
Differential Revision: https://reviews.llvm.org/D100557
Reviewed By: reames
This patch extends the vector type-conversion and legalization capabilities of
scalable vector types.
Firstly, `vscale x 1` types now behave more like the corresponding `vscale x
2+` types. This enables the integer promotion legalization of extended scalable
types, such as the promotion of `<vscale x 1 x i5>` to `<vscale x 1 x i8>`.
These `vscale x 1` types are also now better handled by
`getVectorTypeBreakdown`, where what looks like older handling for 1-element
fixed-length vector types was spuriously updated to include scalable types.
Widening of scalable types is now better supported, by using `INSERT_SUBVECTOR`
to insert the smaller scalable vector "value" type into the wider scalable
vector "part" type. This allows AArch64 to pass and return `vscale x 1` types
by value by widening.
There are still cases where we are unable to legalize `vscale x 1` types, such
as where expansion would require splitting the vector in two.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D102073
Fixes a bug in the DAG combiner that eliminates the stores because it missed
to inspect the address space of the pointers.
%v = load %ptr_as1
// no chain side effect
store %v, %ptr_as2
As well as
store %v, %ptr_as1
store %v, %ptr_as2
Fixes a test for above in X86.
Differential Revision: https://reviews.llvm.org/D102096
MachineRegisterInfo caches the reserved register set that is computed by
by TargetRegisterInfo::getReservedRegs, so call into MRI to get the
reserved regs to avoid recomputing them.
In particular this speeds up AMDGPU's SIFormMemoryClauses pass because
AMDGPU has a particularly complicated reserved set that is expensive to
compute.
Differential Revision: https://reviews.llvm.org/D102318
Previous crashes caused by this patch were the result of machine
subregisters being incorrectly handled in updateDbgUsersToReg; this has
been fixed by using RegUnits to determine overlapping registers, instead
of using the register values directly.
Differential Revision: https://reviews.llvm.org/D101523
This reverts commit 7ca26c5fa2.
Currently the ValueHandler handles both selecting the type and
location for arguments, as well as inserting instructions needed to
handle them. Split this so that the determination of the argument
handling is independent of the function state. Currently the checks
for tail call compatibility do not follow the full assignment logic,
so it misses cases where arguments require nontrivial legalization.
This should help avoid targets ending up in a buggy state where the
argument evaluation may change in different contexts.
STATEPOINT is a fancy and complex pseudo instruction which
has both tied defs and regmask operand.
Basic FastRA algorithm is as follows:
1. Mark registers used by defs as free
2. If instruction has regmask operand displace clobbered registers
according to regmask.
3. Assign registers for use operands.
In case of tied defs step 1 is replaced with allocation of registers
for them. But regmask is still processed, which may displace already
allocated registers. As a result, tied use and def will get assigned
to different registers.
This patch makes FastRA to process instruction's RegMask (if any) when
checking for physical registers interference.
That way tied operands won't get registers clobbered by regmask.
Reviewed By: arsenm, skatkov
Differential Revision: https://reviews.llvm.org/D99284
This change was originally landed in: 5000a1b4b9
It was reverted in: 061e071d8c
This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.
Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)
Like the ELF linker merging is only performed at `-O1` and above.
This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)
Differential Revision: https://reviews.llvm.org/D97657
This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.
Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)
Like the ELF linker merging is only performed at `-O1` and above.
This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)
Differential Revision: https://reviews.llvm.org/D97657
For opaque pointers, we're trying to avoid uses of
PointerType::getElementType().
A couple of ISel places use PointerType::getElementType(). Some of these
are easy to fix by using ArgListEntry's indirect types.
The inalloca type wasn't stored there, as opposed to preallocated and
byval which have their indirect types available, so add it and use it.
Differential Revision: https://reviews.llvm.org/D101713
For contiguous ranges we drop the last bit-test case but in doing so we skip
adding the new MBB PHI edges to the list of replacement PHI edges, and as a
result we incorrectly omit them in the G_PHI in finishPendingPhis().
Was found when bootstrapping clang with -O3 and GlobalISel enabled on Apple Silicon.
The logic for x86_64 position-independent TType encodings was backwards,
using 8 bytes where 4 were wanted and 4 where 8 were wanted. For regular
x86_64, this was mostly harmless, exception tables are allowed to use
8-byte encodings even when it is not needed. For the large code model,
and for X32, however, the generated exception tables were wrong. For the
large code model, we cannot assume that the address will fit in 4 bytes.
For X32, we cannot use 64-bit relocations.
Fixes PR50148.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D102132
Expanding a fixed length operation involves wrapping the operation in an
insert/extract subvector pair, as such, when this is done to bitcast we
end up with an extract_subvector of a bitcast. DAGCombine tries to
convert this into a bitcast of an extract_subvector which restores the
initial fixed length bitcast, causing an infinite loop of legalization.
As part of this patch, we must make sure the above DAGCombine does not
trigger after legalization if the created bitcast would not be legal.
Differential Revision: https://reviews.llvm.org/D101990
A ConstantAggregateZero may be created from a scalable vector type.
However, it still assumed fixed number of elements when queried for
them. This patch changes ConstantAggregateZero to correctly report its
element count.
This change fixes a couple of issues. Firstly, it fixes a crash in
Constant::getUniqueValue when called on a scalable-vector
zeroinitializer constant.
Secondly, it fixes a latent bug in GlobalISel's IRTranslator in which
translating a scalable-vector zeroinitializer would hit the assertion in
ConstantAggregateZero::getNumElements when casting to a FixedVectorType,
rather than reporting an error more gracefully. This is currently
hypothetical as the IRTranslator has deeper issues preventing the use of
scalable vector types.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D102082
The function template `CallLowering::setArgFlags` is invoked both
for arguments and return values. In the latter case, it calls
`getParamStackAlign` with argument index `~0u`. Nothing wrong
happens now, as the argument is safely incremented back to 0
inside `getParamStackAlign` (the type is `unsigned`), but in
principle it's fragile and may become incorrect.
Differential Revision: https://reviews.llvm.org/D102004
This patch extends VectorLegalizer::ExpandSELECT to permit expansion
also for scalable vector types. The only real change is conditionally
checking for BUILD_VECTOR or SPLAT_VECTOR legality depending on the
vector type.
We can use this to fix "cannot select" errors for scalable vector
selects on the RISCV target. Note that in future patches RISCV will
possibly custom-lower vector SELECTs to VSELECTs for branchless codegen.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D102063
Printing pass manager invocations is fairly verbose and not super
useful.
This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.
This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D101797
Similar to X86 D73230 & 46788a21f9
With this change, we can set dso_local in clang's -fpic -fno-semantic-interposition mode,
for default visibility external linkage non-ifunc-non-COMDAT definitions.
For such dso_local definitions, variable access/taking the address of a
function/calling a function will go through a local alias to avoid GOT/PLT.
Note: the 'S' inline assembly constraint refers to an absolute symbolic address
or a label reference (D46745).
Differential Revision: https://reviews.llvm.org/D101872
Reapply b623df3c, which was reverted while reverting a different patch
with a breaking change. There are no underlying issues with this patch,
so no changes have been made to the original patch.
This reverts commit b11e4c9907.
This patch fixes a crash in the compiler that occurs when certain
invalidated SDDbgValues are emitted. The cause of this was that we would
attempt to check the liveness of the debug value's operands, which
triggers an assert if any of those operands are invalid. This patch
changes this check such that it only occurs if the SDDbgValue is valid;
if not, the check is irrelevant anyway, so can be safely ignored.
Differential Revision: https://reviews.llvm.org/D101540
Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts.
Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication.
I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to).
NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly.
Differential Revision: https://reviews.llvm.org/D101987
This patch modifies updateDbgUsersToReg to properly handle
DBG_VALUE_LIST instructions, by replacing the hard-coded operand indices
(i.e. getOperand(0)) with the more general getDebugOperandsForReg(), and
updating the register for all matching operands.
Differential Revision: https://reviews.llvm.org/D101523
Serialize ScavengeFI from SIMachineFunctionInfo into yaml.
ScavengeFI is not used outside of the PrologEpilogInserter,
so this shouldn't change anything.
Differential Revision: https://reviews.llvm.org/D101367
Add a new wrapper function addAttribute() for Die.addValue() function,
so we can do some attributes control in one single interface.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D101125
Unlike normal loads these don't have an extension field, but we know
from TargetLowering whether these are sign-extending or zero-extending,
and so can optimise away unnecessary extensions.
This was noticed on RISC-V, where sign extensions in the calling
convention would result in unnecessary explicit extension instructions,
but this also fixes some Mips inefficiencies. PowerPC sees churn in the
tests as all the zero extensions are only for promoting 32-bit to
64-bit, but these zero extensions are still not optimised away as they
should be, likely due to i32 being a legal type.
This also simplifies the WebAssembly code somewhat, which currently
works around the lack of target-independent combines with some ugly
patterns that break once they're optimised away.
Re-landed with correct handling in ComputeNumSignBits for Tmp == VTBits,
where zero-extending atomics were incorrectly returning 0 rather than
the (slightly confusing) required return value of 1.
Reviewed By: RKSimon, atanasyan
Differential Revision: https://reviews.llvm.org/D101342
This change enables emitting CFI unwind information for debugging purpose
for targets with MCAsmInfo::ExceptionsType == ExceptionHandling::None.
Currently generating CFI unwind information is entangled with supporting
the exceptions, even when AsmPrinter explicitly recognizes that the unwind
tables are being generated as debug information.
In fact, the unwind information is not generated even if we specify
--force-dwarf-frame-section, unless exceptions are enabled. The LIT test
llvm/test/CodeGen/AMDGPU/debug_frame.ll demonstrates this behavior.
Enable this option for AMDGPU to prepare for future patches which add
complete CFI support.
Reviewed By: dblaikie, MaskRay
Differential Revision: https://reviews.llvm.org/D78778
Unfortunately the current call lowering code is built on top of the
legacy MVT/DAG based code. However, GlobalISel was not using it the
same way. In short, the DAG passes legalized types to the assignment
function, and GlobalISel was passing the original raw type if it was
simple.
I do believe the DAG lowering is conceptually broken since it requires
picking a type up front before knowing how/where the value will be
passed. This ends up being a problem for AArch64, which wants to pass
i1/i8/i16 values as a different size if passed on the stack or in
registers.
The argument type decision is split across 3 different places which is
hard to follow. SelectionDAG builder uses
getRegisterTypeForCallingConv to pick a legal type, tablegen gives the
illusion of controlling the type, and the target may have additional
hacks in the C++ part of the call lowering. AArch64 hacks around this
by not using the standard AnalyzeFormalArguments and special casing
i1/i8/i16 by looking at the underlying type of the original IR
argument.
I believe people have generally assumed the calling convention code is
processing the original types, and I've discovered a number of dead
paths in several targets.
x86 actually relies on the opposite behavior from AArch64, and relies
on x86_32 and x86_64 sharing calling convention code where the 64-bit
cases implicitly do not work on x86_32 due to using the pre-legalized
types.
AMDGPU targets without legal i16/f16 have always used a broken ABI
that promotes to i32/f32. GlobalISel accidentally fixed this to be the
ABI we should have, but this fixes it so we're using the worse ABI
that is compatible with the DAG. Ideally we would fix the DAG to match
the old GlobalISel behavior, but I don't wish to fight that battle.
A new native GlobalISel call lowering framework should let the target
process the incoming types directly.
CCValAssigns select a "ValVT" and "LocVT" but the meanings of these
aren't entirely clear. Different targets don't use them consistently,
even within their own call lowering code. My current belief is the
intent was "ValVT" is supposed to be the legalized value type to use
in the end, and and LocVT was supposed to be the ABI passed type
(which is also legalized).
With the default CCState::Analyze functions always passing the same
type for these arguments, these only differ when the TableGen part of
the lowering decide to promote the type from one legal type to
another. AArch64's i1/i8/i16 hack ends up inverting the meanings of
these values, so I had to add an additional hack to let the target
interpret how large the argument memory is.
Since targets don't consistently interpret ValVT and LocVT, this
doesn't produce quite equivalent code to the initial DAG
lowerings. I've opted to consistently interpret LocVT as the in-memory
size for stack passed values, and ValVT as the register type to assign
from that memory. We therefore produce extending loads directly out of
the IRTranslator, whereas the DAG would emit regular loads of smaller
values. This will also produce loads/stores that are wider than the
argument value if the allocated stack slot is larger (and there will
be undef padding bytes). If we had the optimizations to reduce
load/stores based on truncated values, this wouldn't produce a
different end result.
Since ValVT/LocVT are more consistently interpreted, we now will emit
more G_BITCASTS as requested by the CCAssignFn. For example AArch64
was directly assigning types to some physical vector registers which
according to the tablegen spec should have been casted to a vector
with a different element type.
This also moves the responsibility for inserting
G_ASSERT_SEXT/G_ASSERT_ZEXT from the target ValueHandlers into the
generic code, which is closer to how SelectionDAGBuilder works.
I had to xfail an x86 test since I don't see a quick way to fix it
right now (I filed bug 50035 for this). It's broken independently of
this change, and only triggers since now we end up with more ands
which hit the improperly handled selection pattern.
I also observed that FP arguments that need promotion (e.g. f16 passed
as f32) are broken, and use regular G_TRUNC and G_ANYEXT.
TLDR; the current call lowering infrastructure is bad and nobody has
ever understood how it chooses types.
- Move the code preventing CSE of `isConvergent` instrs into
`ProcessBlockCSE` (from `isProfitableToCSE`)
- Add comments explaining why `isConvergent` is used to prevent
CSE of non-local instrs in MachineCSE and the new test
This untangles the MCContext and the MCObjectFileInfo. There is a circular
dependency between MCContext and MCObjectFileInfo. Currently this dependency
also exists during construction: You can't contruct a MOFI without a MCContext
without constructing the MCContext with a dummy version of that MOFI first.
This removes this dependency during construction. In a perfect world,
MCObjectFileInfo wouldn't depend on MCContext at all, but only be stored in the
MCContext, like other MC information. This is future work.
This also shifts/adds more information to the MCContext making it more
available to the different targets. Namely:
- TargetTriple
- ObjectFileType
- SubtargetInfo
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101462
This seems to have broken sanitizers, giving lots of
Assertion `NumBits <= MAX_INT_BITS && "bitwidth too large"' failed.
failures across multiple targets (currently X86 and PowerPC). Reverting
until I have a chance to reproduce and debug.
This reverts commit 6e876f9ded.
Unlike normal loads these don't have an extension field, but we know
from TargetLowering whether these are sign-extending or zero-extending,
and so can optimise away unnecessary extensions.
This was noticed on RISC-V, where sign extensions in the calling
convention would result in unnecessary explicit extension instructions,
but this also fixes some Mips inefficiencies. PowerPC sees churn in the
tests as all the zero extensions are only for promoting 32-bit to
64-bit, but these zero extensions are still not optimised away as they
should be, likely due to i32 being a legal type.
This also simplifies the WebAssembly code somewhat, which currently
works around the lack of target-independent combines with some ugly
patterns that break once they're optimised away.
Reviewed By: RKSimon, atanasyan
Differential Revision: https://reviews.llvm.org/D101342
Fix a bug where buildZExtInReg will create and use a new register instead of using the register from parameter DstOp Res.
Reviewed By: arsenm, foad
Differential Revision: https://reviews.llvm.org/D101871
statepoint instruction uses tied-def registers to represent live gc value which
is use and def at the same time on a call.
At the same time invoke statepoint instruction is a last split point which can throw and
jump to landing pad.
As a result we have instructon which is last split point with tied-defs registers and
we need to teach Greedy RA to work with it.
The option -use-registers-for-gc-values-in-landing-pad controls whether statepoint lowering
will generate tied-defs for invoke statepoint and is off by default now.
To resolve all issues the following changes has been done.
1) Last Split point for invoke statepoint should be statepoint itself
If statepoint has a def it is a relocated gc pointer and it should be available in landing pad.
So we cannot split interval after statepoint at end of basic block.
2) Do not split interval on tied-def
If end of interval for overlap utility is a use which has tied-def we
should not split interval on this instruction due to in this case use
and def may have different registers and it breaks tied-def property.
3) Take into account Last Split Point for enterIntvAtEnd
If the use after Last Split Point is a def so it should be tied-def and
we can take the def of the tied-use as ParentVNI and thus
tied-use and tied-def will be live in resulting interval.
4) Handle the case when def is after LIP in InlineSpiller
If def of LI is after last insertion point of basic block we cannot hoist in this BB.
The example of such instruction is invoke statepoint where def represents the
relocated live gc pointer. Invoke is a last insertion point and its def is located after it.
In this case there is no place to insert spill and we bail out.
5) Fix removeBackCopies to account empty copies
RegAssignMap cannot hold empty interval, so do not set stop
to kill value if it produces empty interval.
This can happen if we remove back-copy and right before that we have another
back-copy.
For example, for parent %0 we can get
%1 = COPY %0
%2 = COPY %0
while we removing %2 we cannot set kill for %1 due to its empty.
6) Do not hoist copy to BB if its def is after LSP
If the parent def is a LastSplitPoint or later we cannot hoist copy to this basic block
because inserted copy (or re-materialization) will be located before the def.
All parts have been reviewed separately as follows:
https://reviews.llvm.org/D100747https://reviews.llvm.org/D100748https://reviews.llvm.org/D100750https://reviews.llvm.org/D100927https://reviews.llvm.org/D100945https://reviews.llvm.org/D101028
Reviewers: reames, rnk, void, MatzeB, wmi, qcolombet
Reviewed By: reames, qcolombet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D101150
This patch adds the two MVTs to fix a legalizer crash when using vector
shuffles of <256 x i16> and <128 x i16> on RISC-V. The legalizer can't
promote the operand of `v256i32 = any_extend_vector_inreg v128i16`.
Reviewed By: craig.topper, RKSimon
Differential Revision: https://reviews.llvm.org/D101769
Removing an assertion introduced with D68945. The
patch was later reverted with 6531a78ac4, but failed
to remove this assertion. It causes a problem while
trying to split a 64-bit argument into sub registers.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D101594
This reverts the revert 02c5ba8679
Fix:
Pass was registered as DUMMY_FUNCTION_PASS causing the newpm-pass
functions to be doubly defined. Triggered in -DLLVM_ENABLE_MODULE=1
builds.
Original commit:
This patch implements expansion of llvm.vp.* intrinsics
(https://llvm.org/docs/LangRef.html#vector-predication-intrinsics).
VP expansion is required for targets that do not implement VP code
generation. Since expansion is controllable with TTI, targets can switch
on the VP intrinsics they do support in their backend offering a smooth
transition strategy for VP code generation (VE, RISC-V V, ARM SVE,
AVX512, ..).
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D78203
atomicrmw instructions are expanded by AtomicExpandPass before register allocation
into cmpxchg loops. Register allocation can insert spills between the exclusive loads
and stores, which invalidates the exclusive monitor and can lead to infinite loops.
To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them
after register allocation.
Floating point legalisation:
f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to
f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually
f32 ATOMIC_LOAD_FADD_16(*i16, f32)
Differential Revision: https://reviews.llvm.org/D101164
Originally submitted as 3338290c18.
Reverted in c7df6b1223.
The comment about how to make use of debugger tuning within DwarfDebug
really belongs inside the DwarfDebug declaration, where it will be
easier to find.
This allows for a much more efficient encoding for small negative
numbers by storing the sign bit first and negating the rest of
the bits. This was already being used for OPC_CheckInteger.
For every in tree target this affects, the table got smaller.
R600GenDAGISel.inc saw the largest reduction of 7K.
I did have to add a new opcode for StringIntegers used for
register class ids and subregister indices since we don't have the
integer value to encode. The enum name is emitted directly into
the table. Previously assumed the enum would expand to a positive
7-bit number. We might be able to just shift that right by 1 and
assume it is a positive 6 bit number, but that will need more
investigation.
This extends the early-ifcvt pass to avoid a few more cases where the resulting
select instructions would have matching operands. Additionally, we now use TII
to determine "sameness" of the operands so that as TII gets smarter, so too
will ifcvt.
The attached test case was bugpoint-reduced down from CINT2000/252.eon in the
test-suite. See: https://clang.godbolt.org/z/WvnrcrGEn
Differential Revision: https://reviews.llvm.org/D101508
This extends the early-ifcvt pass to avoid a few more cases where the resulting
select instructions would have matching operands. Additionally, we now use TII
to determine "sameness" of the operands so that as TII gets smarter, so too
will ifcvt.
The attached test case was bugpoint-reduced down from CINT2000/252.eon in the
test-suite. See: https://clang.godbolt.org/z/WvnrcrGEn
Differential Revision: https://reviews.llvm.org/D101508
atomicrmw instructions are expanded by AtomicExpandPass before register allocation
into cmpxchg loops. Register allocation can insert spills between the exclusive loads
and stores, which invalidates the exclusive monitor and can lead to infinite loops.
To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them
after register allocation.
Floating point legalisation:
f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to
f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually
f32 ATOMIC_LOAD_FADD_16(*i16, f32)
Differential Revision: https://reviews.llvm.org/D101164
Summary:
This patch implements the backend implementation of adding global variables
directly to the table of contents (TOC), rather than adding the address of the
variable to the TOC.
Currently, this patch will look for the "toc-data" attribute on symbols in the
IR, and then add those symbols to the TOC.
ATM, this is implemented for 32 bit AIX.
Reviewers: sfertile
Differential Revision: https://reviews.llvm.org/D101178
This patch implements expansion of llvm.vp.* intrinsics
(https://llvm.org/docs/LangRef.html#vector-predication-intrinsics).
VP expansion is required for targets that do not implement VP code
generation. Since expansion is controllable with TTI, targets can switch
on the VP intrinsics they do support in their backend offering a smooth
transition strategy for VP code generation (VE, RISC-V V, ARM SVE,
AVX512, ..).
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D78203
Don't assert if there are unassigned virtual registers. Maintain
LiveIntervals by removing the RegUnits for allocated registers, since
they should not longer be necessary.
One part I find somewhat questionable is the special handling
necessary for handleIdentityCopy. The LiveIntervals for the relevant
regunits needs to be removed.
In a future change it will be possible to run register
allocation with a specific set of register classes,
so some of the remaining virtual registers will still
be meaningful.
Value only used by metadata can be removed from .addrsig table.
This solves the undefined symbol error when enabling addrsig table on COFF LTO.
Differential Revision: https://reviews.llvm.org/D101512
Summary:
Personality routine could be an alias to another personality routine.
Fix the situation when we compile the file that contains the personality
routine and the file also have functions that need to refer to the
personality routine.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D101401
Functions can have section names set via #pragma or section attributes,
basic block sections should be correctly named for such functions.
With #pragma, the expectation is that all functions in that file are placed
in the same section in the final binary. Basic block sections should be
correctly named with the unique flag set so that the final binary has all the
basic blocks of the function in that named section. This patch fixes the bug
by calling getExplictSectionGlobal when implicit-section-name attribute is set
to make sure the function's basic blocks get the correct section name.
Differential Revision: https://reviews.llvm.org/D101311
Some liveins *can* come from this block (e.g. any SSA value except the call),
it's only the ones that produce `landingpad` values that can't and I didn't
think it through properly.
These registers get defined by the runtime, not the block being allocated, and
treating them as preassigned in RegAllocFast adds extra pressure, sometimes
enough to make the function unallocatable.
This reverts commit 3b8ec86fd5.
Revert "[X86] Refine AMX fast register allocation"
This reverts commit c3f95e9197.
This pass breaks using LLVM in a multi-threaded environment by
introducing global state.
This replaces D98479.
This allows type legalization to form SPLAT_VECTOR_PARTS so we don't
lose the splattedness when the scalar type is split.
I'm handling SPLAT_VECTOR_PARTS for fixed vectors separately so
we can continue using non-VL nodes for scalable vectors.
I limited to RV32+vXi64 because DAGCombiner::visitBUILD_VECTOR likes
to form SPLAT_VECTOR before seeing if it can replace the BUILD_VECTOR
with other operations. Especially interesting is a splat BUILD_VECTOR of
the extract_vector_elt which can become a splat shuffle, but won't if
we form SPLAT_VECTOR first. We either need to reorder visitBUILD_VECTOR
or add visitSPLAT_VECTOR.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D100803
This is a compile time optimization. DILocation:get() is expensive to call, and
we were calling it to create a line zero debug loc for *every* instruction we
translated. We only really need to do this just before we build constants in the
entry block, so I moved this code there. This reduces the LLVM -O0 codegen time
of sqlite3 IR by around 0.7% instructions executed and by about ~2% in CPU time.
We can probably do better with a more involved change, since the reason we need
to create one for each new constant is because we're using the debug scope and
inlined-at loc. If we just use a single instruction's scope and drop the
inlined-at, we can just cache these and have them be free.
This was picking a concrete size for a physical register, and
enforcing exact match on the virtual register's type size. Some
targets add multiple types to a register class, and some are smaller
than the full bit width. For example x86 adds f32 to 128-bit xmm
registers, and AMDGPU adds i16/f16 to 32-bit registers.
It might be better to represent these cases as a copy of the full
register and an extraction of the subpart, but a lot of code assumes
you can directly copy. This will help fix the current usage of the DAG
calling convention infrastructure which is incompatible with how
GlobalISel is now using it.
The API is somewhat cumbersome here, but I just mirrored the existing
functions, except now with LLTs (and allow returning null on failure,
unlike the MVT version). I think the concept of selecting register
classes based on type is flawed to begin with, but I'm trying to keep
this compatible with the existing handling.
This patch fixes a crash in LiveDebugVariables for inputs where a
DBG_VALUE_LIST had 64 or more debug operands. This was triggering an
assert, which was added under the assumption that only bad CodeGen would
result in such a limit being hit, but relatively simple source files
that result in these incredibly long debug values have been found, so
this assert has been changed to a condition that drops the debug value
if it is not met.
Differential Revision: https://reviews.llvm.org/D101373
In terms of readability, the `enum CFIMoveType` didn't better document what it
intends to convey i.e. the type of CFI section that gets emitted.
Reviewed By: dblaikie, MaskRay
Differential Revision: https://reviews.llvm.org/D76519
Previously we used an i32 constant to store the saturation width, but i32 isn't
legal on RISCV64. This wasn't a big deal to fix, but it is extra work for the
type legalizer.
This patch uses a VTSDNode to store the type similar to SEXT_INREG. This makes
it opaque to the type legalizer.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D101262
GCC supports negative values for -mstack-protector-guard-offset=, this
should be a signed value. Pre-req to D100919.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101325
The .file directive is changed to only have basename in D36018 for
ELF.
But on AIX, we require the .file directive to also contain the
directory info. This aligns with other AIX compiler like XLC and is
required by some AIX tool like DBX.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D99785
This reverts commit 0ce723cb22.
D76519 was not quite NFC. If we see a CFISection::Debug function before a
CFISection::EH one (-fexceptions -fno-asynchronous-unwind-tables), we may
incorrectly pick CFISection::Debug and emit a `.cfi_sections .debug_frame`.
We should use .eh_frame instead.
This scenario is untested.
https://reviews.llvm.org/D99400 set clang DefaultDebuggerTuning for AIX
to dbx. However, we still need to update the target default so that llc
and other tools will get the same default debuggertuning, and avoid
passing extra options in LTO.
Reviewed By: #powerpc, shchenz, dblaikie
Differential Revision: https://reviews.llvm.org/D101197
In terms of readability, the `enum CFIMoveType` didn't better document what it
intends to convey i.e. the type of CFI section that gets emitted.
Reviewed By: dblaikie, MaskRay
Differential Revision: https://reviews.llvm.org/D76519
The data member 'shouldEmitMoves' is only used in DwarfCFIException::beginFunction()
and 'shouldEmitCFI' in DwarfCFIExceptionBase serves its purpose.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101155
At the moment, MachineCSE allows CSE-ing convergent instrs which are
non-local to each other. This can cause illegal codegen as convergent
instrs are control flow dependent. The patch prevents non-local CSE of
convergent instrs by adding a check in isProfitableToCSE and rejecting
CSE-ing if we're considering CSE-ing non-local convergent instrs. We
can still CSE convergent instrs which are in the same control flow
scope, so the patch purposely does not make all convergent instrs
non-CSE candidates in isCSECandidate.
https://reviews.llvm.org/D101187
Previous build failures were caused by an error in bitcode reading and
writing for DIArgList metadata, which has been fixed in e5d844b587.
There were also some unnecessary asserts that were being triggered on
certain builds, which have been removed.
This reverts commit dad5caa59e.
Make following function return void:
addLabel()
addSectionLabel()
addSectionDelta()
This aligns with other attributes adding functions.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D101022
This is mostly NFC except that for end of BB not previous slot is used.
Idx is used to find a def of sibling live interval in that slot.
The def on end of MBB and on previous slot of end MBB should be the same,
so it should be NFC.
Reviewers: reames, qcolombet, MatzeB, wmi, rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100922
The Linux kernel objtool diagnostic `call without frame pointer save/setup`
arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism
introduced in D100251, it's trivial to respect the command line
-m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it.
Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan)
Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan)
Also document the function attribute "frame-pointer" which is long overdue.
Differential Revision: https://reviews.llvm.org/D101016
Add PromoteIntOp_FP_TO_XINT_SAT to type legalize the bit width
operand from i32 to i64 for RV64.
Add test cases for the saturating intrinsics for half/float/double
and i32/i64. CodeGen is definitely not optimal. We can probably
make use of the native behavior of fcvt instructions in many cases.
Fixes PR50083
It is proper to relax non-negative limitation of step_vector.
Also this patch adds more combines for step_vector:
(sub X, step_vector(C)) -> (add X, step_vector(-C))
Differential Revision: https://reviews.llvm.org/D100812
This change adds debug information about whether PGO is being used or
not.
Microsoft performance tooling (e.g. xperf, WPA) uses this information to
show whether functions are optimized with PGO or not, as well as whether
PGO information is invalid.
This information is useful for validating whether training scenarios are
providing good coverage of real world scenarios, showing if profile data
is out of date, etc.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D99994
CommandLine.h is indirectly included in ~50% of TUs when building
clang, and VirtualFileSystem.h is large.
(Already remarked by jhenderson on D70769.)
No behavior change.
Differential Revision: https://reviews.llvm.org/D100957
This patch adds incrementally-better support for SPLAT_VECTOR in a
handful of vector combines by changing a few more
isBuildVectorAllOnes/isBuildVectorAllZeros to the equivalent
isConstantSplatVectorAllOnes/Zeros calls.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D100851
PHIElimination may insert copy instructions in multiple basic
blocks. Moving debug locations across basic block boundaries would be
misleading as illustrated by the test case.
rdar://75463656
Differential Revision: https://reviews.llvm.org/D100886
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.
This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.
After this lands and has baked for a couple days, planned cleanups:
Make GCStatepointInst a IntrinsicInst subclass.
Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
Do the same in SelectionDAG.
Do the same in FastISEL.
Differential Revision: https://reviews.llvm.org/D99976
This is currently built on top of the SelectionDAG call lowering, but
does not use it the same way. SelectionDAG passes legalized types to
the assignment functions, and the tablegenerated assignment functions
may change the value types expected for registers. This does not
change the types used, just moves the register creation to help fix
this in the future.
Defer the register creation until after all of the assignment
decisions have been made. This will also help have correct tail call
compatibility checking in a future change. Currently it does not work
as expected for any arguments split across multiple registers.
Re-land the patch with a fix of clang test.
Cost of spill location is computed basing on relative branch frequency
where corresponding spill/reload/copy are located.
While the number itself is highly depends on incoming IR,
the total cost can be used when do some changes in RA.
Revert "Revert "[GreedyRA ORE] Add Cost of spill locations into remark""
This reverts commit 680f3d6de7.
This patch changes ISD::isBuildVectorAllZeros to
ISD::isConstantSplatVectorAllZeros which handles zero sclar vector.
TestPlan: check-llvm
Differential Revision: https://reviews.llvm.org/D100813
This patch relaxes the requirement that the STEP_VECTOR step constant
must be of a type at least as large as the vector element type. This
does not permit its use on targets which have legal vector element types
larger than the largest legal scalar type, such as i64 vectors on RV32.
As such, the requirement has been loosened so that the step operand must
be any scalar type so long as the constant immediate is non-negative and
the value fits inside the vector element type.
This limits combining optimizations in certain circumstances but in
practice it's unlikely to be a hindrance.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D100660
Cost of spill location is computed basing on relative branch frequency
where corresponding spill/reload/copy are located.
While the number itself is highly depends on incoming IR,
the total cost can be used when do some changes in RA.
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100020
Pseudo probe are currently given a slot index like other regular instructions. This affects register pressure and lifetime weight computation because of enlarged lifetime length with pseudo probe instructions. As a consequence, program could get different code generated w/ and w/o pseudo probes. I'm closing the gap by excluding pseudo probes from stack index and downstream register allocation related passes.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D100334
Flipping the default value of SkipPseudoOp to true for those MIR APIs to favor maximum performance. Note that certain spots like branch folding and MIR if-conversion is are disabled for better counts quality. For these two optimizations, this is a no-diff change.
The counts quality with SPEC2017 before/after this change is unchanged.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D100332
In an env that reuses compiler instances for multiple compilations, this
omission results in non-deterministic assembly output (names of the
auto-generated labels) if the order or full set of Modules compiled
varies.
Differential Revision: https://reviews.llvm.org/D100797
When the ProcResGroup has BufferSize=0,
1. if there is a subunit in the list of write resources for the
scheduling class, do not attempt to schedule the ProcResGroup.
2. if there is not a subunit in the list of write resources for the
scheduling class, choose a subunit to use instead of the ProcResGroup.
3. having both the ProcResGroup and any of its subunits in the resources
implied by a InstRW is not supported.
Used to model parallel uses from a pool of resources.
Differential Revision: https://reviews.llvm.org/D98976
It turns out we actually import a bunch of selection code for intrinsics. The
imported code checks that the register banks on the G_INTRINSIC instruction
are correct. If so, it goes ahead and selects it.
This adds code to AArch64RegisterBankInfo to allow us to correctly determine
register banks on intrinsics which have known register bank constraints.
For now, this only handles @llvm.aarch64.neon.uaddlv. This is necessary for
porting AArch64TargetLowering::LowerCTPOP.
Also add a utility for getting the intrinsic ID from a G_INTRINSIC instruction.
This seems a little nicer than having to know about how intrinsic instructions
are structured.
Differential Revision: https://reviews.llvm.org/D100398
Move the findDbg* functions into lib/IR/DebugInfo.cpp from
lib/Transforms/Utils/Local.cpp.
D99169 adds a call to a function (findDbgUsers) that lives in
lib/Transforms/Utils/Local.cpp (LLVMTransformUtils) from lib/IR/Value.cpp
(LLVMCore). The Core lib doesn't include TransformUtils. The builtbots caught
this here: https://lab.llvm.org/buildbot/#/builders/109/builds/12664. This patch
moves the function, and the 3 similar ones for consistency, into DebugInfo.cpp
which is part of LLVMCore.
Reviewed By: dblaikie, rnk
Differential Revision: https://reviews.llvm.org/D100632
When trying to clamp a constant index into a scalable vector we can
test if the index is less than the minimum number of elements in the
vector. If so, we can simply return the index because we know it is
guaranteed to fit inside the vector.
Differential Revision: https://reviews.llvm.org/D100639
Greedy RA adds copies of virtual registers when splitting live interval.
This stat might be useful.
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100017
If Virtual Register is alive in landing pad its def must be
before the call causing the exception or it should be statepoint instruction itself and
in this case def actually means the relocation of gc pointer and is alive in
landing pad.
The test shows the triggering this check for an option under development
use-registers-for-gc-values-in-landing-pad which is off by default until
it is functionally correct.
Reviewers: reames, void, jyknight, nickdesaulniers, efriedma, arsenm, rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100525
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from
if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
to
if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
Introduce a getValueAsBool that normalize the check, with the following
behavior:
no attributes or attribute set to "false" => return false
attribute set to "true" => return true
Differential Revision: https://reviews.llvm.org/D99299
Instead of managing memory by hand, delegate it to std::vector. This makes the
code much simpler, and also avoids repeatedly computing the storage size.
According to valgrind --tool=callgrind, this also slightly decreases the
instruction count, but by a small margin.
This is a recommit of 82f0e3d3ea with one usage
fixed in llvm/lib/CodeGen/RegisterScavenging.cpp.
Not the slight API change: BitVector::clear() now has the same behavior as any
other container: it does not free memory, but indeed sets the size of the
BitVector to 0. It is thus incorrect to access its content right afterwards, a
scenario which wasn't enforced in previous implementation.
Differential Revision: https://reviews.llvm.org/D100387
Add the `IsText` argument to `GetFile` and `GetFileOrSTDIN` which will help z/OS distinguish between text and binary correctly. This is an extension to [this patch](https://reviews.llvm.org/D97785)
Reviewed By: abhina.sreeskantharajan, amccarth
Differential Revision: https://reviews.llvm.org/D100488
When we pass a AArch64 Homogeneous Floating-Point
Aggregate (HFA) argument with increased alignment
requirements, for example
struct S {
__attribute__ ((__aligned__(16))) double v[4];
};
Clang uses `[4 x double]` for the parameter, which is passed
on the stack at alignment 8, whereas it should be at
alignment 16, following Rule C.4 in
AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules)
Currently we don't have a way to express in LLVM IR the
alignment requirements of the function arguments. The align
attribute is applicable to pointers only, and only for some
special ways of passing arguments (e..g byval). When
implementing AAPCS32/AAPCS64, clang resorts to dubious hacks
of coercing to types, which naturally have the needed
alignment. We don't have enough types to cover all the
cases, though.
This patch introduces a new use of the stackalign attribute
to control stack slot alignment, when and if an argument is
passed in memory.
The attribute align is left as an optimizer hint - it still
applies to pointer types only and pertains to the content of
the pointer, whereas the alignment of the pointer itself is
determined by the stackalign attribute.
For byval arguments, the stackalign attribute assumes the
role, previously perfomed by align, falling back to align if
stackalign` is absent.
On the clang side, when passing arguments using the "direct"
style (cf. `ABIArgInfo::Kind`), now we can optionally
specify an alignment, which is emitted as the new
`stackalign` attribute.
Patch by Momchil Velikov and Lucas Prates.
Differential Revision: https://reviews.llvm.org/D98794
Patchpoint instructions have operands which is actually zero cost
(or the same as register) to use the value from the stack.
In terms of statistic it makes same to separate them.
Move from computation instructions related to stack spill/reload to
number of stack slot referenced.
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100016
Statepoint instruction has a deopt section which is actually live-through the call.
Currently this is handled by special post pass after RA - fixup-statepoint-caller-saved.
This change teaches Greedy RA that if segment of live interval is ended with statepoint
instruction and its reg is used in deopt bundle then this live interval interferes regmask of this statepoint
and as a result caller-saved register cannot be assigned to this live interval.
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100296
The IR stack protector pass must insert stack checks before the call instead of
between it and the return.
Similarly, SDAG one should recognize that ADJCALLFRAME instructions could be
part of the terminal sequence of a tail call. In this case because such call
frames cannot be nested in LLVM the stack protection code must skip over the
whole sequence (or risk clobbering argument registers).
We saw some big compiling time impact after enabling the debug entry value
feature for X86 platform(D73534). Compiling time goes from 900s->1600s with
our testcase. It is caused by allocating/freeing the memory busily.
'using FwdRegWorklist = MapVector<unsigned, SmallVector<FwdRegParamInfo, 2>>;'
The value for this map is vector, and we miss the reference when access the
element. The same happens for `auto CalleesMap = MF->getCallSitesInfo();` which is a DenseMap.
Reviewed by: djtodoro, flychen50
Differential Revision: https://reviews.llvm.org/D100162
It breaks up the function pass manager in the codegen pipeline.
With empty parameters, it looks at the -mllvm flag -rewrite-map-file.
This is likely not in use.
Add a check that we only have one function pass manager in the codegen
pipeline.
This required reverting commit 9583a3f2625818b78c0cf6d473cdedb9f23ad82c:
"[AsmPrinter] Delete dead takeDeletedSymbsForFunction()".
This was not NFC as initially thought. By coalescing two function
psas managers, this exposed the reverted code as necessary.
addr-label.ll was crashing due to an emitted blockaddress's block being
removed but the label not emitted.
Some tests relied on the fact that we had a module pass somewhere in the
codegen pipeline.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D99707
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D98027
These cases were failing before, but with cryptic asserts.
Add asserts in the RegScavenger that fail earlier with better
messages. NFC
Differential Revision: https://reviews.llvm.org/D100109
During SelectionDAG, we must track the SDNodes that each SDDbgValue depends on
to compute its value. These are ultimately derived from the location operands to
the SDDbgValue, but were stored in a separate vector prior to this patch. This
resulted in cases where one of the lists was updated incorrectly, resulting in
crashes during compilation. This patch fixes the issue by directly recomputing
the dependency list from the SDDbgOperands in getDependencies().
Differential Revision: https://reviews.llvm.org/D99423
Combine all collected stats into separate struct RAGreedyStats
with add and report methods.
The motivation is to extend the number of statistics to capture and instead of
adding new parameters, just combine all of them into one structure.
Additionally I plan to use report from different places in future to report data
for function as well.
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100012
To save compile time, avoid computation of stats if ORE will not emit it.
The motivation is to add more stats and compute it only if it will dumped.
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100010
Summary: Set the default DwarfInlinedStrings as inlined strings for DBX, due to DBX does not support .dwstr section for now.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D99933
Pseudo probes, when scattered in a block, can be chained dependencies of other regular DAG nodes and block DAG combine optimizations. To fix this, scattered probes in a block are grouped and placed at the beginning of the block. This shouldn't affect the profile quality.
Test Plan:
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D100002
This allows FoldConstantArithmetic to handle SPLAT_VECTOR in
addition to BUILD_VECTOR. This allows it to support scalable
vectors. I'm also allowing fixed length SPLAT_VECTOR which is
used by some targets, but I'm not familiar enough to write tests
for those targets.
I had to block this function from running on CONCAT_VECTORS to
avoid calling getNode for a CONCAT_VECTORS of 2 scalars.
This can happen because the 2 operand getNode calls this
function for any opcode. Previously we were protected because
CONCAT_VECTORs of BUILD_VECTOR is folded to a larger BUILD_VECTOR
before that call. But it's not always possible to fold a CONCAT_VECTORS
of SPLAT_VECTORs, and we don't even try.
This fixes PR49781 where DAG combine thought constant folding
should be possible, but FoldConstantArithmetic couldn't do it.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D99682
I do not see any bit-width restriction from the point of the
LLVM Lang Ref - Operand Bundles on the types of the deopt bundle
operands. Statepoint Lowering seems to be able to work with any
types.
This patch relaxes the two related assertions and adds a new test
for this change.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D100006
Looking at the Doxygen-generated documentation for the llvm namespace
currently shows all sorts of random comments from different parts of the
codebase. These are mostly caused by:
- File doc comments that aren't marked with \file, so they're attached to
the next declaration, which is usually "namespace llvm {".
- Class doc comments placed before the namespace rather than before the
class.
- Code comments before the namespace that (in my opinion) shouldn't be
extracted by doxygen at all.
This commit fixes these comments. The generated doxygen documentation now
has proper docs for several classes and files, and the docs for the llvm
and llvm::detail namespaces are now empty.
Reviewed By: thakis, mizvekov
Differential Revision: https://reviews.llvm.org/D96736
Follow up to a6d2a8d6f5. These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.
Problem:
On SystemZ we need to open text files in text mode. On Windows, files opened in text mode adds a CRLF '\r\n' which may not be desirable.
Solution:
This patch adds two new flags
- OF_CRLF which indicates that CRLF translation is used.
- OF_TextWithCRLF = OF_Text | OF_CRLF indicates that the file is text and uses CRLF translation.
Developers should now use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF.
So this is the behaviour per platform with my patch:
z/OS:
OF_None: open in binary mode
OF_Text : open in text mode
OF_TextWithCRLF: open in text mode
Windows:
OF_None: open file with no carriage return
OF_Text: open file with no carriage return
OF_TextWithCRLF: open file with carriage return
The Major change is in llvm/lib/Support/Windows/Path.inc to only set text mode if the OF_CRLF is set.
```
if (Flags & OF_CRLF)
CrtOpenFlags |= _O_TEXT;
```
These following files are the ones that still use OF_Text which I left unchanged. I modified all these except raw_ostream.cpp in recent patches so I know these were previously in Binary mode on Windows.
./llvm/lib/Support/raw_ostream.cpp
./llvm/lib/TableGen/Main.cpp
./llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
./llvm/unittests/Support/Path.cpp
./clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
./clang/lib/Frontend/CompilerInstance.cpp
./clang/lib/Driver/Driver.cpp
./clang/lib/Driver/ToolChains/Clang.cpp
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D99426
This is a followup to D98145: As far as I know, tracking of kill
flags in FastISel is just a compile-time optimization. However,
I'm not actually seeing any compile-time regression when removing
the tracking. This probably used to be more important in the past,
before FastRA was switched to allocate instructions in reverse
order, which means that it discovers kills as a matter of course.
As such, the kill tracking doesn't really seem to serve a purpose
anymore, and just adds additional complexity and potential for
errors. This patch removes it entirely. The primary changes are
dropping the hasTrivialKill() method and removing the kill
arguments from the emitFast methods. The rest is mechanical fixup.
Differential Revision: https://reviews.llvm.org/D98294
In order to bring up scalable vector support in LLVM incrementally,
we introduced behaviour to emit a warning, instead of an error, when
asking the wrong question of a scalable vector, like asking for the
fixed number of elements.
This patch puts that behaviour under a flag. The default behaviour is
that the compiler will always error, which means that all LLVM unit
tests and regression tests will now fail when a code-path is taken that
still uses the wrong interface.
The behaviour to demote an error to a warning can be individually enabled
for tools that want to support experimental use of scalable vectors.
This patch enables that behaviour when driving compilation from Clang.
This means that for users who want to try out scalable-vector support,
fixed-width codegen support, or build user-code with scalable vector
intrinsics, Clang will not crash and burn when the compiler encounters
such a case.
This allows us to do away with the following pattern in many of the SVE tests:
RUN: .... 2>%t
RUN: cat %t | FileCheck --check-prefix=WARN
WARN-NOT: warning: ...
The behaviour to emit warnings is only temporary and we expect this flag
to be removed in the future when scalable vector support is more stable.
This patch also has fixes the following tests:
unittests:
ScalableVectorMVTsTest.SizeQueries
SelectionDAGAddressAnalysisTest.unknownSizeFrameObjects
AArch64SelectionDAGTest.computeKnownBitsSVE_ZERO_EXTEND_VECTOR_INREG
regression tests:
Transforms/InstCombine/vscale_gep.ll
Reviewed By: paulwalker-arm, ctetreau
Differential Revision: https://reviews.llvm.org/D98856
The main part of the patch is the change in RegAllocGreedy.cpp: Q.collectInterferringVregs()
needs to be called before iterating the interfering live ranges.
The rest of the patch offers support that is the case: instead of clearing the query's
InterferingVRegs field, we invalidate it. The clearing happens when the live reg matrix
is invalidated (existing triggering mechanism).
Without the change in RegAllocGreedy.cpp, the compiler ices.
This patch should make it more easily discoverable by developers that
collectInterferringVregs needs to be called before iterating.
I will follow up with a subsequent patch to improve the usability and maintainability of Query.
Differential Revision: https://reviews.llvm.org/D98232
If the inner shuffle already contains undef elements, then accept them in the merged shuffle as well.
This helps some X86 HADD/SUB patterns where slow targets were ending up with HADD/SUB because the (un)merged shuffles were stuck either side of the ADD/SUB - meaning we ended up with a total cost much higher than the "2*shuffle+add" that a slow target usually expands a HADD/SUB to.
This allows these optimisations to apply to e.g. `urem i16` directly
before `urem` is promoted to i32 on architectures where i16 operations
are not intrinsically legal (such as on Aarch64). The legalization then
later can happen more directly and generated code gets a chance to avoid
wasting time on computing results in types wider than necessary, in the end.
Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D88785
GCC warning:
```
/llvm-project/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp: In member function ‘bool llvm::CombinerHelper::matchFunnelShiftToRotate(llvm::MachineInstr&)’:
/llvm-project/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp:3882:35: warning: ?: using integer constants in boolean context, the expression will always evaluate to ‘true’ [-Wint-in-bool-context]
3882 | Opc == TargetOpcode::G_FSHL ? TargetOpcode::G_ROTL : TargetOpcode::G_ROTR;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
This patch adds 3 methods, one for power-of-2 vectors which use tree
reductions using vector ops, before a final reduction op. For non-pow-2
types it generates multiple narrow reductions and combines the values with
scalar ops.
Differential Revision: https://reviews.llvm.org/D97163
Negative numbers are represented using DW_OP_consts along with signed representation
of the number as the argument.
Test case IR is generated using Fortran front-end.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D99273
Basically a port of isBitfieldExtractOpFromSExtInReg in AArch64ISelDAGToDAG.
This is only done post-legalization for now. Once the legalizer knows how to
decompose these back into shifts, this requirement can probably be removed.
Differential Revision: https://reviews.llvm.org/D99230
Currently needsStackRealignment returns false if canRealignStack returns false.
This means that the behavior of needsStackRealignment does not correspond to
it's name and description; a function might need stack realignment, but if it
is not possible then this function returns false. Furthermore,
needsStackRealignment is not virtual and therefore some backends have made use
of canRealignStack to indicate whether a function needs stack realignment.
This patch attempts to clarify the situation by separating them and introducing
new names:
- shouldRealignStack - true if there is any reason the stack should be
realigned
- canRealignStack - true if we are still able to realign the stack (e.g. we
can still reserve/have reserved a frame pointer)
- hasStackRealignment = shouldRealignStack && canRealignStack (not target
customisable)
Targets can now override shouldRealignStack to indicate that stack realignment
is required.
This change will make it easier in a future change to handle the case where we
need to realign the stack but can't do so (for example when the register
allocator creates an aligned spill after the frame pointer has been
eliminated).
Differential Revision: https://reviews.llvm.org/D98716
Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87
This is needed for Fortran assumed shape arrays whose dimensions are
defined as,
- 'count' is taken from array descriptor passed as parameter by
caller, access from descriptor is defined by type DIExpression.
- 'lowerBound' is defined by callee.
The current alternate way represents using upperBound in place of
count, where upperBound is calculated in callee in a temp variable
using lowerBound and count
Representation with count (DIExpression) is not only clearer as
compared to upperBound (DIVariable) but it has another advantage that
variable count is accessed by being parameter has better chance of
survival at higher optimization level than upperBound being local
variable.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D99335
Empty functions (functions with no real code) are irrelevant for propeller optimizations and their addresses sometimes conflict with other functions which obfuscates the analysis.
This simple change skips the BB address map emission for such functions.
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D99395
D89239 adjusts the stack offset of emergency spill slots for overaligned
stacks. However the adjustment is not valid for targets whose stack
grows up (such as AMDGPU).
This change makes the adjustment conditional only to those targets whose
stack grows down.
Fixes https://bugs.llvm.org/show_bug.cgi?id=49686
Differential Revision: https://reviews.llvm.org/D99504
This is currently performed in SelectionDAGLegalize, here we make it also
happen in LegalizeVectorOps, allowing a target to lower the SETCC condition
codes first in LegalizeVectorOps and then lower to a custom node afterwards,
without having to duplicate all of the SETCC condition legalization in the
target specific lowering.
As a result of this, fixed length floating point SETCC nodes can now be
properly lowered for SVE.
Differential Revision: https://reviews.llvm.org/D98939
This patch adds a new isIntOrFPConstant helper function to check if a
SDValue is a integer of FP constant. This pattern is used in various
places.
There also are places that incorrectly just check for integer constants,
e.g. D99384, so hopefully this helper will help people avoid that issue.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D99428
Darwin platforms for both AArch64 and X86 can provide optimized `bzero()`
routines. In this case, it may be preferable to use `bzero` in place of a
memset of 0.
This adds a G_BZERO generic opcode, similar to G_MEMSET et al. This opcode can
be generated by platforms which may want to use bzero.
To emit the G_BZERO, this adds a pre-legalize combine for AArch64. The
conditions for this are largely a port of the bzero case in
`AArch64SelectionDAGInfo::EmitTargetCodeForMemset`.
The only difference in comparison to the SelectionDAG code is that, when
compiling for minsize, this will fire for all memsets of 0. The original code
notes that it's not beneficial to do this for small memsets; however, using
bzero here will save a mov from wzr. For minsize, I think that it's preferable
to prioritise omitting the mov.
This also fixes a bug in the libcall legalization code which would delete
instructions which could not be legalized. It also adds a check to make sure
that we actually get a libcall name.
Code size improvements (Darwin):
- CTMark -Os: -0.0% geomean (-0.1% on pairlocalalign)
- CTMark -Oz: -0.2% geomean (-0.5% on bullet)
Differential Revision: https://reviews.llvm.org/D99358
This may occur when swifterror codegen in the translator generates these,
but we shouldn't try to handle them since they should have regclasses anyway.
rdar://75784009
Differential Revision: https://reviews.llvm.org/D99287
This patch changes the interface to take a RegisterKind, to indicate
whether the register bitwidth of a scalar register, fixed-width vector
register, or scalable vector register must be returned.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D98874
Statepoint instruction is known to have a variable and big number of operands.
It is possible that Register Allocator will split live intervals in the way that all
physical registers are occupied by "zero-length" live intervals which are marked
as not-spillable.
While intervals are marked as not-spillable in the moment of creation when they are
really zero-length it is possible that in future as part of re-materialization there will
need for physical register between def and use of such tiny interval (the use is not
related to this interval at all).
As all physical registers are assigned to not-spillable intervals there is not avaialbe
registers and RA reports an error.
The idea of the fix is avoid marking tiny live intervals where there is a use in statepoint
instruction in var args section. Such interval may be perfectly spilled and folded to
operand of statepoint.
Reviewers: reames, dantrushin, qcolombet, dsanders, dmgreen
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D98766
This avoids temporary and memcpy call when computing large expressions.
It's basically some kind of poor man's expression template, but it seems easier
to maintain to have a single generic `apply` call instead of the whole
expression template machinery here.
Differential Revision: https://reviews.llvm.org/D98176
This patch adds a new llvm.experimental.stepvector intrinsic,
which takes no arguments and returns a linear integer sequence of
values of the form <0, 1, ...>. It is primarily intended for
scalable vectors, although it will work for fixed width vectors
too. It is intended that later patches will make use of this
new intrinsic when vectorising induction variables, currently only
supported for fixed width. I've added a new CreateStepVector
method to the IRBuilder, which will generate a call to this
intrinsic for scalable vectors and fall back on creating a
ConstantVector for fixed width.
For scalable vectors this intrinsic is lowered to a new ISD node
called STEP_VECTOR, which takes a single constant integer argument
as the step. During lowering this argument is set to a value of 1.
The reason for this additional argument at the codegen level is
because in future patches we will introduce various generic DAG
combines such as
mul step_vector(1), 2 -> step_vector(2)
add step_vector(1), step_vector(1) -> step_vector(2)
shl step_vector(1), 1 -> step_vector(2)
etc.
that encourage a canonical format for all targets. This hopefully
means all other targets supporting scalable vectors can benefit
from this too.
I've added cost model tests for both fixed width and scalable
vectors:
llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll
llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll
as well as codegen lowering tests for fixed width and scalable
vectors:
llvm/test/CodeGen/AArch64/neon-stepvector.ll
llvm/test/CodeGen/AArch64/sve-stepvector.ll
See this thread for discussion of the intrinsic:
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html
This patch adds a fallthrough bit to basic block metadata, indicating whether the basic block can fallthrough without taking any branches. The bit will help us avoid an intel LBR bug which results in occasional duplicate entries at the beginning of the LBR stack.
This patch uses `MachineBasicBlock::canFallThrough()` to set the bit. This is not a const method because it eventually calls `TargetInstrInfo::analyzeBranch`, but it calls this function with the default `AllowModify=false`. So we can either make the argument to the `getBBAddrMapMetadata` non-const, or we can use `const_cast` when calling `canFallThrough`. I decide to go with the latter since this is purely due to legacy code, and in general we should not allow the BasicBlock to be mutable during `getBBAddrMapMetadata`.
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D96918
This is no-functional-change intended (NFC), but needed to allow
optimizer passes to use the API. See D98898 for a proposed usage
by SimplifyCFG.
I'm simplifying the code by removing the cl::opt. That was added
back with the original commit in D19488, but I don't see any
evidence in regression tests that it was used. Target-specific
overrides can use the usual patterns to adjust as necessary.
We could also restore that cl::opt, but it was not clear to me
exactly how to do it in the convoluted TTI class structure.
We've messed this up a few times recently on RISCV. Experiments
with these asserts found a couple issues on other targets as well.
They've all been cleaned up now so we can put in these asserts to
catch future issues
I had to waive Glue because ADDC/ADDE/etc legalization replaces
Glue with i32 on at least AArch64. X86 used to do the same before
we switched to ADDCARRY. So I guess that's just how that works.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D98979
Don't bother calling ComputeNumSignBits if N00Bits < ExtVTBits. No
matter what answer we get back this will be true:
(N00Bits - DAG.ComputeNumSignBits(N00, DemandedSrcElts)) < ExtVTBits)
So we might as well save the computation. This makes the code more
consistent with the similar (sext_in_reg (sext x)) handling above.
As commented by @craig.topper on rG1ba5c550d418, we can't guarantee that we'll be extending zero bits, just sign bit. So, revert to the old code for zero_extend_vector_inreg cases.
There is a bunch of similar bitfield extraction code throughout *ISelDAGToDAG.
E.g, ARMISelDAGToDAG, AArch64ISelDAGToDAG, and AMDGPUISelDAGToDAG all contain
code that matches a bitfield extract from an and + right shift.
Rather than duplicating code in the same way, this adds two opcodes:
- G_UBFX (unsigned bitfield extract)
- G_SBFX (signed bitfield extract)
They work like this
```
%x = G_UBFX %y, %lsb, %width
```
Where `lsb` and `width` are
- The least-significant bit of the extraction
- The width of the extraction
This will extract `width` bits from `%y`, starting at `lsb`. G_UBFX zero-extends
the result, while G_SBFX sign-extends the result.
This should allow us to use the combiner to match the bitfield extraction
patterns rather than duplicating pattern-matching code in each target.
Differential Revision: https://reviews.llvm.org/D98464
Reuse the existing KnownBits multiplication code to handle the 'extend + multiply + extract high bits' pattern for multiply-high ops.
Noticed while looking at the codegen for D88785 / D98587 - the patch helps division-by-constant expansion code in particular, which suggests that we might have some further KnownBits div/rem cases we could handle - but this was far easier to implement.
Differential Revision: https://reviews.llvm.org/D98857
Followup to D96345, handle unary shuffles of binops (as well as binary shuffles) if we can merge the shuffle with inner operand shuffles.
Differential Revision: https://reviews.llvm.org/D98646
Extend this to support ComputeNumSignBits of the (used) source vector elements so that we can handle more than just the case where we're sext_in_reg from the source element signbit.
Noticed while investigating the poor codegen in D98587.
byval requires an implicit copy between the caller and callee such
that the callee may write into the stack area without it modifying the
value in the parent. Previously, this was passing through the raw
pointer value which would break if the callee wrote into it.
Most of the time, this copy can be optimized out (however we don't
have the optimization SelectionDAG does yet).
This will trigger more fallbacks for AMDGPU now, since we don't have
legalization for memcpy yet (although we should stop using byval
anyway).