Move the last{a,b} operation to the vector operand of the binary instruction if
the binop's operand is a splat value. This essentially converts the binop
to a scalar operation.
Example:
// If x and/or y is a splat value:
lastX (binop (x, y)) --> binop(lastX(x), lastX(y))
Differential Revision: https://reviews.llvm.org/D106932
Change-Id: I93ff5302f9a7972405ee0d3854cf115f072e99c0
I was originally going to try to implement this in target-independent
code, but it's actually sort of tricky to generate the correct sequence
for vectors like nxv2f32. So just stick this in target-specific code,
at least for now.
Differential Revision: https://reviews.llvm.org/D107608
The patterns for fixed length gather/scatter with 32-bit offsets and
64-bit memory type are slightly different that the rest of the patterns,
as such the lowering needs to be slightly different to ensure the
correct types are used.
Differential Revision: https://reviews.llvm.org/D107576
Shuffles which are broken into separate halves reveal splats in which
a half is accessed via one index; such operations can be optimized to
use "vrgather.vi".
This optimization could be achieved by adding extra patterns to match
`vrgather_vv_vl` which uses a splat as an index operand, but this patch
instead identifies splat earlier. This way, future optimizations can
build on top of the data gathered here, e.g., to splat-gather dominant
indices and insert any leftovers.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D107449
And assign RegClass (i.e. operand class for all GPR) as the super class
of ARegClass and DRegClass. Note that this is a NFC change because
actually we already had XRDReg to model either address or data register
operands (as well as test coverage for it). The new super class syntax
added here is just making the relations between three RegClass-es more
explicit.
The decoder function and table are the same as FPR128, use that instead.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D107644
Previously we converted ISD condition codes to integers and stored
them directly in our MIR instructions. The ISD enum kind of belongs
to SelectionDAG so that seems like incorrect layering.
This patch instead uses a CondCode node on RISCV::SELECT_CC until
isel and then converts it from ISD encoding to a RISCV specific value.
This value can be converted to/from the RISCV branch opcodes in the
RISCV namespace.
My larger motivation is to possibly support a microarchitectural
feature of some CPUs where a short forward branch over a single
instruction can be predicated internally. This will require a new
pseudo instruction for select that needs to carry a branch condition
and live probably until RISCVExpandPseudos. At that point it can be
expanded to control flow without other instructions ending up in the
predicated basic block. Using an ISD encoding in RISCVExpandPseudos
doesn't seem like correct layering.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D107400
This is the data to be stored so it should be an input.
To keep operand order similar between loads and stores, move the temp
register to the first dest operand of floating point loads. Rework
the assembler code accordingly.
This doesn't have any functional effect because this Pseudo is only
used by the assembler which doesn't use ins/outs.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D107309
Previously ADD & ADDA (as well as SUB & SUBA) instructions are mixed
together, which not only violated Motorola assembly's syntax but also
made asm parsing more difficult. This patch separates these two kinds of
instructions migrate rest of the tests from
test/CodeGen/M68k/Encoding/Arithmetic to test/MC/M68k/Arithmetic.
Note that we observed minor regressions on codegen quality: Sometimes
isel uses ADD instead of ADDA even the latter can lead to shorter
sequence of code. This issue implies that some isel patterns might need
to be updated.
The fcvt fp to integer instructions saturate if their input is
infinity or out of range, but the instructions produce a maximum
integer for nan instead of 0 required for the ISD opcodes.
This means we can use the instructions to do the saturating
conversion, but we'll need to fix up the nan case at the end.
We can probably improve the i8 and i16 default codegen as well,
but I'll leave that for a follow up.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D107230
The IR for pmuldq/pmuludq intrinsics uses a sext_inreg/zext_inreg
pattern on the inputs. Ideally we pattern match these away during
isel. It is possible for LICM or other middle end optimizations
to separate the extend from the mul. This prevents SelectionDAG
from removing it or depending on how the extend is lowered, we
may not be able to generate an AssertSExt/AssertZExt in the
mul basic block. This will prevent pmuldq/pmuludq from being
formed at all.
This patch teaches shouldSinkOperands to recognize this so
that CodeGenPrepare will clone the extend into the same basic
block as the mul.
Fixes PR51371.
Differential Revision: https://reviews.llvm.org/D107689
This isn't optimal, but prevents crashing when the libcall isn't
available. It just calculates the full product and makes sure the high bits
match the sign of the low half. Each of the pieces should go through their own
type legalization.
This can make D107420 unnecessary.
Needs tests, but I wanted to start discussion about D107420.
Reviewed By: FreddyYe
Differential Revision: https://reviews.llvm.org/D107581
Some of the Arm complex pattern functions call canExtractShiftFromMul,
which can modify the DAG in-place. For this to be valid and handled
successfully we need to define ComplexPatternFuncMutatesDAG.
Differential Revision: https://reviews.llvm.org/D107476
This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.
HSAStreamer will use function attributes, "device-init" and
"device-fini" to distinguish between init and fini kernels from
the regular kernels and will emit metadata with ".kind" set to
"init" and "fini" respectively.
To reduce the number of init and fini kernels, the ctors and
dtors present in the llvm's global.ctors and global.dtors lists
are called from a single init and fini kernel respectively.
Reviewed by: yaxunl
Differential Revision: https://reviews.llvm.org/D105682
D107068 fixed the same problem on aarch64 but the arm variant wasn't exposed in existing test coverage.
I've copied the arm64-neon-copy tests (and stripped the intrinsic test from it) for testing on arm neon builds as well.
As reported on PR51281, an internal fuzz test encountered an issue when extracting constant bits from a SUBV_BROADCAST node from a constant pool source larger than the broadcasted subvector width.
The getTargetConstantBitsFromNode was assuming that the Constant would the same size as the subvector, resulting in the incorrect packing of the per-element bits data.
This patch attempts to solve this by using the SUBV_BROADCAST node to determine the subvector width, and then ensuring we extract only the lowest bits from Constant of that subvector bitsize.
Differential Revision: https://reviews.llvm.org/D107158
We can improve on the generic splitting by using ffbh/ffbl, which have a
defined result when the input is zero.
Differential Revision: https://reviews.llvm.org/D107442
This is the counterpart to G_AMDGPU_FFBH_U32 which already exists. These
instructions have a defined result of -1 when the input is zero.
Differential Revision: https://reviews.llvm.org/D107441
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.
* The most common mechanism is using an unordered comparison made by
instruction 'fcmp uno'. This simple solution is target-independent
and works well in most cases. It however is not suitable if floating
point exceptions are tracked. Corresponding IEEE 754 operation and C
function must never raise FP exception, even if the argument is a
signaling NaN. Compare instructions usually does not have such
property, they raise 'invalid' exception in such case. So this
mechanism is unsuitable when exception behavior is strict. In
particular it could result in unexpected trapping if argument is SNaN.
* Another solution was implemented in https://reviews.llvm.org/D95948.
It is used in the cases when raising FP exceptions by 'isnan' is not
allowed. This solution implements 'isnan' using integer operations.
It solves the problem of exceptions, but offers one solution for all
targets, however some can do the check in more efficient way.
* Solution implemented by https://reviews.llvm.org/D96568 introduced a
hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
specific code into IR. Now only SystemZ implements this hook and it
generates a call to target specific intrinsic function.
Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:
* The operation 'isnan' is hidden behind generic integer operations or
target-specific intrinsics. It complicates analysis and can prevent
some optimizations.
* IR can be created by tools other than clang, in this case treatment
of 'isnan' has to be duplicated in that tool.
Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':
std::isnan(std::numeric_limits<float>::quiet_NaN())
The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.
To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.
Differential Revision: https://reviews.llvm.org/D104854
I just hit a nasty bug when writing a unit test after calling MF->getFrameInfo()
without declaring the variable as a reference.
Deleting the copy-constructor also showed a place in the ARM backend which was
doing the same thing, albeit it didn't impact correctness there from the looks of it.
This implements LanaiTargetLowering::CanLowerReturn, thereby ensuring
all return values conform to the RetCC and get sret-demoted as
necessary.
A regression test is also added that exercises this functionality.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D107086
Similar cleanup to G_EXTRACT (51bd4e874f).
Also swap the order of clamp/widen to avoid unnecessary complex merges.
Add a bunch of missing testcases to legalize-inserts while we're at it.
Differential Revision: https://reviews.llvm.org/D107601
Similar to other cleanup commits which widen instructions before clamping
during legalization. Purpose of this is to avoid weird type breakdowns.
In terms of G_IMPLICIT_DEF, this simplifies legalization for other instructions.
The legalizer has to emit G_IMPLICIT_DEF to legalize certain instructions, so
this can help with emitting merges elsewhere.
Differential Revision: https://reviews.llvm.org/D107604
Using REG_SEQUENCE produces better code than INSERT_SUBREG,
we can omit one move instruction in many cases.
Fixes: SWDEV-298028
Differential Revision: https://reviews.llvm.org/D107602
When there is a `setjmp` call in a function, we transform every callsite
of `setjmp` to record its information by calling `saveSetjmp` function,
and we also transform every callsite of a function that can longjmp to
to check if a longjmp occurred and if so jump to the corresponding
post-setjmp BB. Currently we are doing this for every function that
contains a call to `setjmp`, but if there is no other function call
within that function that can longjmp, this transformation of `setjmp`
callsite and all the preparation of `setjmpTable` in the entry of the
function are not necessary.
This checks if a setjmp-calling function has any other calls that can
longjmp, and if not, skips the function for the SjLj transformation.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D107530
This takes the existing SVE costing for the various min/max reduction
intrinsics and expands it to NEON, where I believe it applies equally
well.
In the process it changes the lowering to use min/max cost, as opposed
to summing up the cost of ICmp+Select.
Differential Revision: https://reviews.llvm.org/D106239
This allows us to avoid odd type breakdowns + allows us to legalize types like
s88 in the first place.
Add some testcases for known legal types + testcases for s4 and s88.
Differential Revision: https://reviews.llvm.org/D107607
This simplifies our existing G_EXTRACT rules and adds some test coverage. Mostly
changing this because it should make it easier to improve legalization for
instructions which use G_EXTRACT as part of the legalization process.
This also adds support for legalizing some weird types. Similar to other recent
legalizer changes, this changes the order of widening/clamping.
There was some dead code in our existing rules (e.g. the p0 case would never get
hit), so this knocks those out and makes the types we want to handle explicit.
This also removes some checks which, nowadays, are handled by the
MachineVerifier.
Differential Revision: https://reviews.llvm.org/D107505
This is re-landing the same patch again, but without the changes to
LegalizerHelper that regressed the Mips test:
test/CodeGen/Mips/GlobalISel/llvm-ir/ctpop.ll
Differential revision: https://reviews.llvm.org/D106494
Having `NewMask` outside of an if and rebinding `BaseMask` `ArrayRef`
to it is confusing. Instead, just move the `Mask` vector higher up,
and change the code that earlier had no access to it but now does
to use `Mask` instead of `BaseMask`.
This has no other intentional changes.
This is a recommit of 35c0848b57,
that was reverted to simplify reversion of an earlier change.
G_CONCAT_VECTORS shows up from time to time when legalizing other instructions.
We actually import patterns for the v16s8 <- v8s8, v8s8 case so marking it
as legal gives us selection for free.
Differential Revision: https://reviews.llvm.org/D107512
IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes.
This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector.
This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector.
Fixes PR50053
Differential Revision: https://reviews.llvm.org/D107068
As suggested on D107370, this patch renames the tuning feature flags to start with 'Tuning' instead of 'Feature'.
Differential Revision: https://reviews.llvm.org/D107459
This implements `MCInstrAnalysis::evaluateMemoryOperandAddress()` for
Arm so that the disassembler can print the target address of memory
operands that use PC+immediate addressing.
Differential Revision: https://reviews.llvm.org/D105979
Emit references to '__do_global_ctors' and '__do_global_dtors' to allow
constructor/destructor routines to run.
Reviewed by: MaskRay
Differential Revision: https://reviews.llvm.org/D107133
Kuniyuki Iwashima reported in [1] that llvm compiler may
convert a loop exit condition with "i < bound" to "i != bound", where
"i" is the loop index variable and "bound" is the upper bound.
In case that "bound" is not a constant, verifier will always have "i != bound"
true, which will cause verifier failure since to verifier this is
an infinite loop.
The fix is to avoid transforming "i < bound" to "i != bound".
In llvm, the transformation is done by IndVarSimplify pass.
The compiler checks loop condition cost (i = i + 1) and if the
cost is lower, it may transform "i < bound" to "i != bound".
This patch implemented getArithmeticInstrCost() in BPF TargetTransformInfo
class to return a higher cost for such an operation, which
will prevent the transformation for the test case
added in this patch.
[1] https://lore.kernel.org/netdev/1994df05-8f01-371f-3c3b-d33d7836878c@fb.com/
Differential Revision: https://reviews.llvm.org/D107483
Clamp the max number of elements when legalizing G_PHI. This allows us to
legalize some common fallbacks like 4 x s64.
Here's an example: https://godbolt.org/z/6YocsEYTd
Had to add -global-isel-abort=0 to legalize-phi.mir to account for the
G_EXTRACT_VECTOR_ELT from the 32 x s8 G_PHI.
Differential Revision: https://reviews.llvm.org/D107508
`catch` instruction can have any number of result values depending on
its tag, but so far we have only needed a single i32 return value for
C++ exception so the instruction was specified that way. But using the
instruction for SjLj handling requires multiple return values.
This makes `catch` instruction's results variadic and moves selection of
`throw` and `catch` instruction from ISelLowering to ISelDAGToDAG.
Moving `catch` to ISelDAGToDAG is necessary because I am not aware of
a good way to do instruction selection for variadic output instructions
in TableGen. This also moves `throw` because 1. `throw` and `catch`
share the same utility function and 2. there is really no reason we
should do that in ISelLowering in the first place. What we do is mostly
the same in both places, and moving them to ISelDAGToDAG allows us to
remove unnecessary mid-level nodes for `throw` and `catch` in
WebAssemblyISD.def and WebAssemblyInstrInfo.td.
This also adds handling for new `catch` instruction to AsmTypeCheck.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D107423
Use a tail policy operand instead. Inspired by the work in D105092,
but without the intrinsic interface changes.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D106512
This allows us to handle weird types like s88; we first widen to s128, then
clamp back down to s64.
https://godbolt.org/z/9xqbP46Mz
Also this makes it possible for GISel to legalize the case in pr48188.ll. It
now does the same thing as SDAG, although regalloc chooses different registers.
Differential Revision: https://reviews.llvm.org/D107417
Going through our legalization rules and doing some cleanup.
Widening and then clamping is usually easier than clamping and then widening.
This allows us to legalize some weird types like s88.
Differential Revision: https://reviews.llvm.org/D107413
This fixes a bug where implicit uses of EFLAGS were not marked as ReadAdvance in
the RM/MR variants of ADC/SBB (PR51318)
This also fixes the absence of ReadAdvance for the register operand of
RMW arithmetic instructions (PR51322).
Differential Revision: https://reviews.llvm.org/D107367
An insert subvector that is inserting the result of a vector predicate
sized load into undef at index 0, whose result is casted to a predicate
type, can be combined into a direct predicate load. Likewise the same
applies to extract subvector but in reverse.
The purpose of this optimization is to clean up cases that will be
introduced in a later patch where casts to/from predicate types from i8
types will use insert subvector, rather than going through memory early.
This optimization is done in SVEIntrinsicOpts rather than InstCombine to
re-introduce scalable loads as late as possible, to give other
optimizations the best chance possible to do a good job.
Differential Revision: https://reviews.llvm.org/D106549
Don't know how to custom expand this
UNREACHABLE executed at llvm-project/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16788
The fix is to provide missing expansions for:
case ISD::STRICT_FP_TO_UINT:
case ISD::STRICT_FP_TO_SINT:
A test case is provided.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D107452
This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.
HSAStreamer will use function attributes, "device-init" and
"device-fini" to distinguish between init and fini kernels from
the regular kernels and will emit metadata with ".kind" set to
"init" and "fini" respectively.
To reduce the number of init and fini kernels, the ctors and
dtors present in the llvm's global.ctors and global.dtors lists
are called from a single init and fini kernel respectively.
Reviewed by: yaxunl
Differential Revision: https://reviews.llvm.org/D105682
Having `NewMask` outside of an if and rebinding `BaseMask` `ArrayRef`
to it is confusing. Instead, just move the `Mask` vector higher up,
and change the code that earlier had no access to it but now does
to use `Mask` instead of `BaseMask`.
This has no other intentional changes.
I want to hoist `Mask` variable higher up,
but then it would clash with this one.
So let's rename this one first.
There are no other intentional changes here other than said rename.
This assert is intended to ensure that the high registers are not
selected when it is passed to one of the thumb UXT instructions. However
it was triggering even for 32 bit where no UXT instruction is emitted.
Fixes PR51313.
Differential Revision: https://reviews.llvm.org/D107363
Given a shuffle mask, if it is picking from an input that is splat
given the current granularity of the shuffle, then adjust the mask
to pick from the same lane of the input as the mask element is in.
This may result in a shuffle being simplified into a blend.
I believe this is correct given that the splat detection matches the one
just above the new code,
My basic thought is that we might be able to get less regressions
by handling multiple insertions of the same value into a vector
if we form broadcasts+blend here, as opposed to D105390,
but i have not really thought this through,
and did not try implementing it yet.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D107009
Our list of slow/fast tuning feature flags has become pretty extensive and is randomly interleaved with ISA and Security (Retpoline etc.) flags, not even based on when the ISAs/flags were introduced, making it tricky to locate them. Plus we started treating tuning flags separately some time ago, so this patch tries to group the flags to match.
I've left them mostly in the same order within each group - I'm happy to rearrange them further if there are specific ISA or Tuning flags that you think should be kept closer together.
Differential Revision: https://reviews.llvm.org/D107370
If there's a region of the stack reserved for potential tail call arguments
(only the case when we guarantee tail calls will be honoured), this is right
next to the incoming stored return address, not necessarily next to the
callee-saved area, so combining the two into a single figure leads to incorrect
offsets in some edge cases.
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.
* The most common mechanism is using an unordered comparison made by
instruction 'fcmp uno'. This simple solution is target-independent
and works well in most cases. It however is not suitable if floating
point exceptions are tracked. Corresponding IEEE 754 operation and C
function must never raise FP exception, even if the argument is a
signaling NaN. Compare instructions usually does not have such
property, they raise 'invalid' exception in such case. So this
mechanism is unsuitable when exception behavior is strict. In
particular it could result in unexpected trapping if argument is SNaN.
* Another solution was implemented in https://reviews.llvm.org/D95948.
It is used in the cases when raising FP exceptions by 'isnan' is not
allowed. This solution implements 'isnan' using integer operations.
It solves the problem of exceptions, but offers one solution for all
targets, however some can do the check in more efficient way.
* Solution implemented by https://reviews.llvm.org/D96568 introduced a
hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
specific code into IR. Now only SystemZ implements this hook and it
generates a call to target specific intrinsic function.
Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:
* The operation 'isnan' is hidden behind generic integer operations or
target-specific intrinsics. It complicates analysis and can prevent
some optimizations.
* IR can be created by tools other than clang, in this case treatment
of 'isnan' has to be duplicated in that tool.
Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':
std::isnan(std::numeric_limits<float>::quiet_NaN())
The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.
To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.
Differential Revision: https://reviews.llvm.org/D104854
While collecting reachable callees (from kernels), ignore call graph node which
does not have associated function or associated function is not a definition.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D107329
- Rename `wasm.catch` intrinsic to `wasm.catch.exn`, because we are
planning to add a separate `wasm.catch.longjmp` intrinsic which
returns two values.
- Rename several variables
- Remove an unnecessary parameter from `canLongjmp` and `isEmAsmCall`
from LowerEmscriptenEHSjLj pass
- Add `-verify-machineinstrs` in a test for a safety measure
- Add more comments + fix some errors in comments
- Replace `std::vector` with `SmallVector` for cases likely with small
number of elements
- Renamed `EnableEH`/`EnableSjLj` to `EnableEmEH`/`EnableEmSjLj`: We are
soon going to add `EnableWasmSjLj`, so this makes the distincion
clearer
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D107405
Previously we would emit constant pool entries for ldr inline asm at the
very end of AsmPrinter::doFinalization(). However, if we're emitting
dwarf aranges, that would end all sections with aranges. Then if we have
constant pool entries to be emitted in those same sections, we'd hit an
assert that the section has already been ended.
We want to emit constant pool entries before emitting dwarf aranges.
This patch splits out arm32/64's constant pool entry emission into its
own MCTargetStreamer virtual method.
Fixes PR51208
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D107314
This changes the lowering of f32 and f64 COPY from a 128bit vector ORR to
a fmov of the appropriate type. At least on some CPU's with 64bit NEON
data paths this is expected to be faster, and shouldn't be slower on any
CPU that treats fmov as a register rename.
Differential Revision: https://reviews.llvm.org/D106365
Return false from runOnFunction if nothing changed. Curiously
we already returned a bool from detectAndFoldOffset, but didn't
use it.
Fix a couple breaks after returns that I saw while auditing
detectAndFoldOffset.
Differential Revision: https://reviews.llvm.org/D107303
This patch extends the optimization of VID-sequence BUILD_VECTORs
introduced in D104921 to include simple fractional steps composed of a
separated integer numerator and denominator.
A notable limitation in this sequence detection is that only sequences
with steps N/1 or 1/D are found, meaning that the step between elements
and the frequency with which it changes is consistent across the whole
sequence. Fractional steps such as 2/3 won't be matched as those would
involve more complex tracking of state or some level of backtracking.
As is stands, however, this patch is sufficient to match common
interleave-type shuffle indices, for example matching `<0,0,1,1>` (or
commonly `<0,u,1,u>` or `<u,0,u,1>`) to an index sequence divided by 2.
While the optimization is relatively `undef`-tolerant, due to greedy
pattern-matching there even are some simple patterns which confuse the
sequence detection into identifying either a suboptimal sequence or no
sequence at all.
Currently only fractional-step sequences identified as having a
power-of-two denominator are actually lowered to RVV instructions. This
is to avoid introducing divisions into the generated code.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D106533
Add a comment when there is a shifted value,
add x9, x0, #291, lsl #12 ; =1191936
but not when the immediate value is unshifted,
subs x9, x0, #256 ; =256
when the comment adds nothing additional to the reader.
Differential Revision: https://reviews.llvm.org/D107196
These instructions have an implicit use of vcc which counts towards the
constant bus limit. Pre gfx10 this means that the explicit operands
cannot be sgprs. Use the custom inserter hook to call legalizeOperands
to enforce that restriction.
Fixes https://bugs.llvm.org/show_bug.cgi?id=51217
Differential Revision: https://reviews.llvm.org/D106868
Add new pass LowerRefTypesIntPtrConv to generate debugtrap
instruction for an inttoptr and ptrtoint of a reference type instead
of erroring, since calling these instructions on non-integral pointers
has been since allowed (see ac81cb7e6).
Differential Revision: https://reviews.llvm.org/D107102
If a vsetvli instruction is not compatible with the next vector instruction,
and there is no other things that may update or use VL/VTYPE, we could merge
it with the next vsetvli instruction that should be insert for the vector
instruction.
This commit only merge VTYPE with the former vsetvli instruction which has
the same VL.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D106857
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D107271
When a value is expected to be extended, we should emit an extended load rather
than a normal G_LOAD.
Add checklines to arm64-abi.ll which show that we now emit the correct loads.
For ease of comparison: https://godbolt.org/z/8WvY6EfdE
Differential Revision: https://reviews.llvm.org/D107313
Add new pass LowerRefTypesIntPtrConv to generate trap
instruction for an inttoptr and ptrtoint of a reference type instead
of erroring, since calling these instructions on non-integral pointers
has been since allowed (see ac81cb7e6).
Differential Revision: https://reviews.llvm.org/D107102
This optimizes out the mask when the result of a bitmask is interpreted as an i8
or i16 value. Resolves PR50507.
Differential Revision: https://reviews.llvm.org/D107103
This patch adds an initial ShuffleVectorInst::isInsertSubvectorMask helper to recognize 2-op shuffles where the lowest elements of one of the sources are being inserted into the "in-place" other operand, this includes "concat_vectors" patterns as can be seen in the Arm shuffle cost changes. This also helped fix a x86 issue with irregular/length-changing SK_InsertSubvector costs - I'm hoping this will help with D107188
This doesn't currently attempt to work with 1-op shuffles that could either be a "widening" shuffle or a self-insertion.
The self-insertion case is tricky, but we currently always match this with the existing SK_PermuteSingleSrc logic.
The widening case will be addressed in a follow up patch that treats the cost as 0.
Masks with a high number of undef elts will still struggle to match optimal subvector widths - its currently bounded by minimum-width possible insertion, whilst some cases would benefit from wider (pow2?) subvectors.
Differential Revision: https://reviews.llvm.org/D107228
If the block target for a WLSTP instruction is known to be out of range,
and cannot be fixed by the ARMBlockPlacementPass, we can relax it to a
DLSTP (and cmp/branch) to still allow the creation of tail predicated
loops. That is what this patch does, adding extra revert code to the
fallback path of ARMBlockPlacementPass.
Due to the code produced when reverting, this creates a DLSTP between a
Bcc and a Br. As a DLS isn't necessarily a terminator we need to split
the block to move the DLS/Br into.
Differential Revision: https://reviews.llvm.org/D104709
Application of default mapping to BVH intrinsics was missing.
Copy parts of SelectionDAG test to GlobalISel test as these would
have indicated this error.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D107211
Currently, the default alignment is much larger than the actual size of
the vector in memory. Fix this to use a sane default.
For SVE, temporarily remove lowering of load/store operations for
predicates with less than 16 elements. The layout the backend was
assuming for SVE predicates with less than 16 elements doesn't agree
with the frontend. More work probably needs to be done here.
This change is, strictly speaking, not backwards-compatible at the
bitcode level. But probably nobody is actually depending on that; i1
vectors in memory are rare, and the code that does use them probably
ends up forcing the alignment to something sane anyway. If we think
this is a concern, I can restrict this to scalable vectors for now
(where it's actually causing issues for me at the moment).
Differential Revision: https://reviews.llvm.org/D88994
This patch legalizes the Machine Value Type introduced in D94096 for loads
and stores. A new target hook named getAsmOperandValueType() is added which
maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization.
Differential Revision: https://reviews.llvm.org/D94097
This distributes reductions based on the relative offset of loads, if
one is found from their operands. Given chains of reductions this will
then sort them in ascending load order, which in turn can help simple
prefetches latch on to increasing strides more easily.
Differential Revision: https://reviews.llvm.org/D106569
This adds a combine for adds of reductions, distributing them so that
they occur sequentially to enable better use of accumulating VADDVA
instructions. It combines:
add(X, add(vecreduce(Y), vecreduce(Z))) ->
add(add(X, vecreduce(Y)), vecreduce(Z))
and
add(add(A, reduce(B)), add(C, reduce(D))) ->
add(add(add(A, C), reduce(B)), reduce(D))
These together distribute the add's so that more reductions can be
selected to VADDVA.
Differential Revision: https://reviews.llvm.org/D106532
Under MVE we can use VADDV/VADDVA's to perform integer add reductions,
so it can be beneficial to use more reductions than summing subvectors
and reducing once. Especially for VMLAV/VMLAVA the mul can be
incorporated into the reduction, producing less instructions.
Some of the test cases currently get larger due to extra integer adds,
but will be improved in a followup patch.
Differential Revision: https://reviews.llvm.org/D106531
The Scalable Matrix Extension (SME) introduces a new execution mode
called Streaming SVE mode. In streaming mode a substantial subset of the
SVE and SVE2 instruction set is available, along with new outer product,
load, store, extract and insert instructions that operate on the new
architectural register state for the matrix.
To support streaming mode this patch introduces a new subtarget feature
+streaming-sve. If enabled, the subset of SVE(2) instructions are
available. The existing behaviour for SVE(2) remains unchanged, the
subset of instructions that are legal in streaming mode are enabled if
either +sve[2] or +streaming-sve is specified. Instructions that are
illegal in streaming mode remain predicated on +sve[2].
The SME target feature has been updated to imply +streaming-sve rather
than +sve.
The following changes are made to the SVE(2) tests:
* For instructions that are legal in streaming mode:
- added RUN line to verify +streaming-sve enables the instruction.
- updated diagnostic to 'instruction requires: streaming-sve or sve'.
* For instructions that are illegal in streaming-mode:
- added RUN line to verify +streaming-sve does not enable the
instruction.
SVE(2) instructions that are legal in streaming mode have:
if !HaveSVE[2]() && !HaveSME() then UNDEFINED;
at the top of the pseudocode in the XML.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06/SVE-Instructions
Reviewed By: sdesmalen, david-arm
Differential Revision: https://reviews.llvm.org/D106272
Pulled out the OptimizationLevel class from PassBuilder in order to be able to access it from within the PassManager and avoid include conflicts.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D107025
Regsier hints when copying to a UACC register do not always produce VSRp
registers. This patch makes sure that we do not produce hints in cases
where the subregsiter of the UACC is not a VSRp.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D107101
Add disassembler support for the NORM and NORMH instructions. These instructions
only exist when the ARC processor is configured with the "norm" extension.
fferential Revision: https://reviews.llvm.org/D107118
This patch prevents GlobalISel from optimizing out redundant branch
instructions when compiling without optimizations.
The motivating example is code like the following common pattern in
Swift, where users expect to be able to set a breakpoint on the early
exit:
public func f(b: Bool) {
guard b else {
return // I would like to set a breakpoint here.
}
...
}
The patch modifies two places in GlobalISEL: The first one is in
IRTranslator.cpp where the removal of redundant branches is made
conditional on the optimization level. The second one is in
AArch64InstructionSelector.cpp where an -O0 *only* optimization is
being removed.
Disabling these optimizations increases code size at -O0 by
~8%. However, doing so improves debuggability, and debug builds are
the primary reason why developers compile without optimizations. We
thus concluded that this is the right trade-off.
rdar://79515454
This tenatively reapplies the patch without modifications, the LLDB
test that has blocked this from landing previously has since been
modified to hopefully no longer be sensitive to this change.
Differential Revision: https://reviews.llvm.org/D105238
Same as 91bd3ad128, this doesn't really
change anything but gives the registers better names than the ones
tablegen would define. And fills in the missing gaps.
An incorrect mask type when lowering an SVE gather/scatter was causing
a codegen fault which manifested as the incorrect predicate size being
used for an SVE gather/scatter, (e.g.. p0.b rather than p0.d).
Fixes PR51182.
Differential Revision: https://reviews.llvm.org/D106943
While v_cmp will AND inactive lanes with 0, that is not the case for logical
operations.
This fixes a Vulkan CTS test that would hang otherwise.
Differential Revision: https://reviews.llvm.org/D105709
This patch aims to improve the performance of BUILD_VECTORs which are
identified as containing a dominant element. Given that most
floating-point constants themselves require a load from the constant
pool, it was possible for the optimization to actually increase the
number of individual loads on small vectors. The exception is the zero
constant -- +0.0 -- which can be materialized efficiently.
While this optimization could do with a proper cost model to weigh the
benfits of a single vector load vs. the manipulation of individual
elements -- even for integer vectors which often require several
instructions to materialize -- without a concrete RVV implementation to
work with any heuristic is likely to be both more obtuse and inaccurate.
Until then, this patch fixes at least one known obvious deficiency.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D106963
Swap the order of widening so that we widen to the next power-of-2 first when
legalizing G_LOAD.
Also, provide a minimum type for the power of 2 to disallow s2 + s1. Clamping
ought to disallow s2 and s1, but I think it's better to be explicit about the
expected minimum size.
We probably need a similar change for G_STORE, but it seems to be a bit more
finnicky. So, let's just handle G_LOAD for now.
Differential Revision: https://reviews.llvm.org/D107013
We were handing types like s88 like
1) clamp to the range
2) widen to the next power of 2
This isn't desirable because it causes an odd breakdown for types like s88.
If we widen to the next power of 2 (s128) first, then we get a clean breakdown
when we clamp back to s64.
Differential Revision: https://reviews.llvm.org/D106998
[[noreturn]] can be used since Oct 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015.
Note: the definition of LLVM_ATTRIBUTE_NORETURN is kept for now.
The sign_extend we insert here can get turned into a zero_extend if
the sign bit is known zero. This can enable a setcc combine that
shrinks compares with zero_extend. This reduces the use count of
the zero_extend allowing other combines to turn it back into an
any_extend.
This restricts the combine to only cases where the result is used
by a CopyToReg. This works for my original motivating case. I
hope the CopyToReg use will prevent any converted extends from
turning back into an any_extend.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D106754
In this episode, we are trying to avoid an x86 micro-arch quirk where complex
(3 operand) LEA potentially costs significantly more than simple LEA. So we
simultaneously push and pull the math around the CMOV to balance the operations.
I looked at the debug spew during instruction selection and decided against
trying a later DAGToDAG transform -- it seems very difficult to match if the
trailing memops are already selected and managing the creation of extra
instructions at that level is always tricky.
Differential Revision: https://reviews.llvm.org/D106918
This makes a couple of changes to the costing of MLA reduction patterns,
to more accurately cost various patterns that can come up from
vectorization.
- The Arm implementation of getExtendedAddReductionCost is altered to
only provide costs for legal or smaller types. Larger than legal types
need to be split, which currently does not work very well, especially
for predicated reductions where the predicate may be legal but needs to
be split. Currently we limit it to legal or smaller input types.
- The getReductionPatternCost has learnt that reduce(ext(mul(ext, ext))
is a pattern that can come up, and can be treated the same as
reduce(mul(ext, ext)) providing the extension types match.
- And it has been adjusted to not count the ext in reduce(mul(ext, ext))
as part of a reduce(mul) pattern.
Together these changes help to more accurately cost the mla reductions
in cases such as where the extend types don't match or the extend
opcodes are different, picking better vector factors that don't result
in expanded reductions.
Differential Revision: https://reviews.llvm.org/D106166
The dst/dstt/dstst/dststt instructions are nop's on all PowerPC
cores that AIX supports. The AIX assembler also does not accept
these mnemonics. Turn them into nop's on AIX (similar to dstall).
This is partially a workaround. SILowerI1Copies does not understand
unstructured loops. This would result in inserting instructions to
merge a mask register in the same block where it was defined in an
unstructured loop.
Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions
with normal codegen patterns.
Differential Revision: https://reviews.llvm.org/D106724
- This patch consists of the bare basic code needed in order to generate some assembly for the z/OS target.
- Only the .text and the .bss sections are added for now.
- The relevant MCSectionGOFF/Symbol interfaces have been added. This enables us to print out the GOFF machine code sections.
- This patch enables us to add simple lit tests wherever possible, and contribute to the testing coverage for the z/OS target
- Further improvements and additions will be made in future patches.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D106380
This reverts commit 1cfecf4fc4.
This commit broke LLVM code generated through XLA by removing a
conditional on Ld->getExtensionType() == ISD::NON_EXTLOAD
This is not a perfect revert. The new function is left as other uses of
it exist now.