This patch improves salvageDebugInfoImpl by allowing it to salvage arithmetic
operations with two or more non-const operands; this includes the GetElementPtr
instruction, and most Binary Operator instructions. These salvages produce
DIArgList locations and are only valid for dbg.values, as currently variadic
DIExpressions must use DW_OP_stack_value. This functionality is also only added
for salvageDebugInfoForDbgValues; other functions that directly call
salvageDebugInfoImpl (such as in ISel or Coroutine frame building) can be
updated in a later patch.
Differential Revision: https://reviews.llvm.org/D91722
This patch updates the various IR passes to correctly handle dbg.values with a
DIArgList location. This patch does not actually allow DIArgLists to be produced
by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
Other than that, it should cover every IR pass.
Most of the changes simply extend code that operated on a single debug value to
operate on the list of debug values in the style of any_of, all_of, for_each,
etc. Instances of setOperand(0, ...) have been replaced with with
replaceVariableLocationOp, which takes the value that is being replaced as an
additional argument. In places where this value isn't readily available, we have
to track the old value through to the point where it gets replaced.
Differential Revision: https://reviews.llvm.org/D88232
This patch introduces a new intrinsic @llvm.experimental.vector.splice
that constructs a vector of the same type as the two input vectors,
based on a immediate where the sign of the immediate distinguishes two
variants. A positive immediate specifies an index into the first vector
and a negative immediate specifies the number of trailing elements to
extract from the first vector.
For example:
@llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> ; index
@llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, -3) ==> <B, C, D, E> ; trailing element count
These intrinsics support both fixed and scalable vectors, where the
former is lowered to a shufflevector to maintain existing behaviour,
although while marked as experimental the recommended way to express
this operation for fixed-width vectors is to use shufflevector. For
scalable vectors where it is not possible to express a shufflevector
mask for this operation, a new ISD node has been implemented.
This is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].
Patch by Paul Walker and Cullen Rhodes.
[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D94708
This patch adds partial support in Instruction Selection for dbg.values that use
a DIArgList. This patch does not add support for producing DBG_VALUE_LIST, but
adds the logic for processing DIArgLists within the ISel pass. This change is
largely focused on handleDebugValue and some of the functions that it calls.
Outside of this, salvageDebugInfo and transferDbgValues have been modified to
replace individual operands instead of the entire value; dangling debug info for
variadic debug values is not currently supported (but may be added later).
Differential Revision: https://reviews.llvm.org/D88589
This patch modifies the class that represents debug values during ISel,
SDDbgValue, to support multiple location operands (to represent a dbg.value that
uses a DIArgList). Part of this class's functionality has been split off into a
new class, SDDbgOperand.
The new class SDDbgOperand represents a single value, corresponding to an SSA
value or MachineOperand in the IR and MIR respectively. Members of SDDbgValue
that were previously related to that specific value (as opposed to the
variable or DIExpression), such as the Kind enum, have been moved to
SDDbgOperand. SDDbgValue now contains an array of SDDbgOperand instead, allowing
it to hold more than one of these values.
All changes outside SDDbgValue are simply updates to use the new interface.
Differential Revision: https://reviews.llvm.org/D88585
This patch updates DbgVariableIntrinsics to support use of a DIArgList for the
location operand, resulting in a significant change to its interface. This patch
does not update all IR passes to support multiple location operands in a
dbg.value; the only change is to update the DbgVariableIntrinsic interface and
its uses. All code outside of the intrinsic classes assumes that an intrinsic
will always have exactly one location operand; they will still support
DIArgLists, but only if they contain exactly one Value.
Among other changes, the setOperand and setArgOperand functions in
DbgVariableIntrinsic have been made private. This is to prevent code from
setting the operands of these intrinsics directly, which could easily result in
incorrect/invalid operands being set. This does not prevent these functions from
being called on a debug intrinsic at all, as they can still be called on any
CallInst pointer; it is assumed that any code directly setting the operands on a
generic call instruction is doing so safely. The intention for making these
functions private is to prevent DIArgLists from being overwritten by code that's
naively trying to replace one of the Values it points to, and also to fail fast
if a DbgVariableIntrinsic is updated to use a DIArgList without a valid
corresponding DIExpression.
explicitly emitting retainRV or claimRV calls in the IR
This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.
Original commit message:
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
which indicates the call is implicitly followed by a marker
instruction and an implicit retainRV/claimRV call that consumes the
call result. In addition, it emits a call to
@llvm.objc.clang.arc.noop.use, which consumes the call result, to
prevent the middle-end passes from changing the return type of the
called function. This is currently done only when the target is arm64
and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
claimRV is attached to the call since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since the ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if retainRV is attached to the call and
does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a
constant value if the call has the operand bundle. This ensures the
call always has at least one user (the call to
@llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.
> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
> which indicates the call is implicitly followed by a marker
> instruction and an implicit retainRV/claimRV call that consumes the
> call result. In addition, it emits a call to
> @llvm.objc.clang.arc.noop.use, which consumes the call result, to
> prevent the middle-end passes from changing the return type of the
> called function. This is currently done only when the target is arm64
> and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
> with the operand bundle in the IR and removes the inserted calls after
> processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
> operand bundle. It doesn't remove the operand bundle on the call since
> the backend needs it to emit the marker instruction. The retainRV and
> claimRV calls are emitted late in the pipeline to prevent optimization
> passes from transforming the IR in a way that makes it harder for the
> ARC middle-end passes to figure out the def-use relationship between
> the call and the retainRV/claimRV calls (which is the cause of
> PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
> nothing in the callee prevents it from being paired up with the
> retainRV/claimRV call in the caller. It then inserts a release call if
> claimRV is attached to the call since autoreleaseRV+claimRV is
> equivalent to a release. If it cannot find an autoreleaseRV call, it
> tries to transfer the operand bundle to a function call in the callee.
> This is important since the ARC optimizer can remove the autoreleaseRV
> returning the callee result, which makes it impossible to pair it up
> with the retainRV/claimRV call in the caller. If that fails, it simply
> emits a retain call in the IR if retainRV is attached to the call and
> does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
> constant value if the call has the operand bundle. This ensures the
> call always has at least one user (the call to
> @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
> multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
> calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808
This reverts commit ed4718eccb.
This patch addresses issues arising from the fact that the index type
used for subvector insertion/extraction is inconsistent between the
intrinsics and SDNodes. The intrinsic forms require i64 whereas the
SDNodes use the type returned by SelectionDAG::getVectorIdxTy.
Rather than update the intrinsic definitions to use an overloaded index
type, this patch fixes the issue by transforming the index to the
correct type as required. Any loss of index bits going from i64 to a
smaller type is unexpected, and will be caught by an assertion in
SelectionDAG::getVectorIdxConstant.
The patch also updates the documentation for INSERT_SUBVECTOR and adds
an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR.
This necessitated changes to AArch64 which was using i64 for
EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed
its codegen after updating the backend accordingly.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D97459
This patch adds a new intrinsic experimental.vector.reduce that takes a single
vector and returns a vector of matching type but with the original lane order
reversed. For example:
```
vector.reverse(<A,B,C,D>) ==> <D,C,B,A>
```
The new intrinsic supports fixed and scalable vectors types.
The fixed-width vector relies on shufflevector to maintain existing behaviour.
Scalable vector uses the new ISD node - VECTOR_REVERSE.
This new intrinsic is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].
Patch by Paul Walker (@paulwalker-arm).
[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html
Differential Revision: https://reviews.llvm.org/D94883
In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling.
This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker.
This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag.
The change includes a test that when the module flag is enabled the section is correctly generated.
The set of exception continuation information includes returns from exceptional control flow (catchret in llvm).
In order to collect catchret we:
1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation,
2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and
3) Combines these targets with the other EHCont targets that were already being collected.
Change originally authored by Daniel Frampton <dframpto@microsoft.com>
For more details, see MSVC documentation for `/guard:ehcont`
https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D94835
explicitly emitting retainRV or claimRV calls in the IR
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
which indicates the call is implicitly followed by a marker
instruction and an implicit retainRV/claimRV call that consumes the
call result. In addition, it emits a call to
@llvm.objc.clang.arc.noop.use, which consumes the call result, to
prevent the middle-end passes from changing the return type of the
called function. This is currently done only when the target is arm64
and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
claimRV is attached to the call since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since the ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if retainRV is attached to the call and
does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a
constant value if the call has the operand bundle. This ensures the
call always has at least one user (the call to
@llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
As for SETCC, use a less expensive condition code when generating
STRICT_FSETCC if the node is known not to have Nan.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D91972
This commit moves a line in SelectionDAGBuilder::handleDebugValue to
avoid implicitly casting a TypeSize object to an unsigned earlier than
necessary. It was possible that we bail out of the loop before the value
is ever used, which means we could create a superfluous TypeSize
warning.
Reviewed By: DavidTruby
Differential Revision: https://reviews.llvm.org/D96423
The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:
1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D95982
As for SETCC, use a less expensive condition code when generating
STRICT_FSETCC if the node is known not to have Nan.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D91972
emitting retainRV or claimRV calls in the IR
This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.
Original commit message:
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
emitting retainRV or claimRV calls in the IR
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
The AArch64 DAG combine added by D90945 & D91433 extends the index
of a scalable masked gather or scatter to i32 if necessary.
This patch removes the combine and instead adds shouldExtendGSIndex, which
is used by visitMaskedGather/Scatter in SelectionDAGBuilder to query whether
the index should be extended before calling getMaskedGather/Scatter.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D94525
To set non-default rounding mode user usually calls function 'fesetround'
from standard C library. This way has some disadvantages.
* It creates unnecessary dependency on libc. On the other hand, setting
rounding mode requires few instructions and could be made by compiler.
Sometimes standard C library even is not available, like in the case of
GPU or AI cores that execute small kernels.
* Compiler could generate more effective code if it knows that a particular
call just sets rounding mode.
This change introduces new IR intrinsic, namely 'llvm.set.rounding', which
sets current rounding mode, similar to 'fesetround'. It however differs
from the latter, because it is a lower level facility:
* 'llvm.set.rounding' does not return any value, whereas 'fesetround'
returns non-zero value in the case of failure. In glibc 'fesetround'
reports failure if its argument is invalid or unsupported or if floating
point operations are unavailable on the hardware. Compiler usually knows
what core it generates code for and it can validate arguments in many
cases.
* Rounding mode is specified in 'fesetround' using constants like
'FE_TONEAREST', which are target dependent. It is inconvenient to work
with such constants at IR level.
C standard provides a target-independent way to specify rounding mode, it
is used in FLT_ROUNDS, however it does not define standard way to set
rounding mode using this encoding.
This change implements only IR intrinsic. Lowering it to machine code is
target-specific and will be implemented latter. Mapping of 'fesetround'
to 'llvm.set.rounding' is also not implemented here.
Differential Revision: https://reviews.llvm.org/D74729
This recommits 2c51bef76c.
I've fixed the broken check line from when I renamed the test function.
Original commit message:
This builds on D94142 where scalable vectors are allowed in structs.
I did have to fix one scalable vector issue in the vector type
creation for these intrinsics where we used getVectorNumElements
instead of ElementCount.
This builds on D94142 where scalable vectors are allowed in structs.
I did have to fix one scalable vector issue in the vector type
creation for these intrinsics where we used getVectorNumElements
instead of ElementCount.
Differential Revision: https://reviews.llvm.org/D94149
The ``llvm.experimental.noalias.scope.decl`` intrinsic identifies where a noalias
scope is declared. When the intrinsic is duplicated, a decision must
also be made about the scope: depending on the reason of the duplication,
the scope might need to be duplicated as well.
Reviewed By: nikic, jdoerfert
Differential Revision: https://reviews.llvm.org/D93039
This implements basic instructions for the new spec.
- Adds new versions of instructions: `catch`, `catch_all`, and `rethrow`
- Adds support for instruction selection for the new instructions
- `catch` needs a custom routine for the same reason `throw` needs one,
to encode `__cpp_exception` tag symbol.
- Updates `WebAssembly::isCatch` utility function to include `catch_all`
and Change code that compares an instruction's opcode with `catch` to
use that function.
- LateEHPrepare
- Previously in LateEHPrepare we added `catch` instruction to both
`catchpad`s (for user catches) and `cleanuppad`s (for destructors).
In the new version `catch` is generated from `llvm.catch` intrinsic
in instruction selection phase, so we only need to add `catch_all`
to the beginning of cleanup pads.
- `catch` is generated from instruction selection, but we need to
hoist the `catch` instruction to the beginning of every EH pad,
because `catch` can be in the middle of the EH pad or even in a
split BB from it after various code transformations.
- Removes `addExceptionExtraction` function, which was used to
generate `br_on_exn` before.
- CFGStackfiy: Deletes `fixUnwindMismatches` function. Running this
function on the new instruction causes crashes, and the new version
will be added in a later CL, whose contents will be completely
different. So deleting the whole function will make the diff easier to
read.
- Reenables all disabled tests in exception.ll and eh-lsda.ll and a
single basic test in cfg-stackify-eh.ll.
- Updates existing tests to use the new assembly format. And deletes
`br_on_exn` instructions from the tests and FileCheck lines.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D94040
`wasm_rethrow_in_catch` intrinsic and builtin are used in order to
rethrow an exception when the exception is caught but there is no
matching clause within the current `catch`. For example,
```
try {
foo();
} catch (int n) {
...
}
```
If the caught exception does not correspond to C++ `int` type, it should
be rethrown. These intrinsic/builtin were renamed `rethrow_in_catch`
because at the time I thought there would be another intrinsic for C++'s
`throw` keyword, which rethrows an exception. It turned out that `throw`
keyword doesn't require wasm's `rethrow` instruction, so we rename
`rethrow_in_catch` to just `rethrow` here.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D94038
Recently a few patches are made to move towards using select i1 instead of and/or i1 to represent "a && b"/"a || b" in C/C++.
"a && b" in C/C++ does not evaluate b if a is false whereas 'and a, b' in IR evaluates b and uses its result regardless of the result of a.
This is problematic because it can cause miscompilation if b was an erroneous operation (https://llvm.org/pr48353).
In C/C++, the result is simply false because b is not evaluated, but in IR the result is poison.
The discussion at D93065 has more context about this.
This patch makes two branch-splitting optimizations (one in SelectionDAGBuilder, one in CodeGenPrepare) recognize
select form of and/or as well using m_LogicalAnd/Or.
Since it is CodeGen, I think this is semantically ok (at least as safe as what codegen already did).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93853
This patch adds support for the fptoui.sat and fptosi.sat intrinsics,
which provide basically the same functionality as the existing fptoui
and fptosi instructions, but will saturate (or return 0 for NaN) on
values unrepresentable in the target type, instead of returning
poison. Related mailing list discussion can be found at:
https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ
The intrinsics have overloaded source and result type and support
vector operands:
i32 @llvm.fptoui.sat.i32.f32(float %f)
i100 @llvm.fptoui.sat.i100.f64(double %f)
<4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f)
// etc
On the SelectionDAG layer two new ISD opcodes are added,
FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands
and one result. The second operand is an integer constant specifying
the scalar saturation width. The idea here is that initially the
second operand and the scalar width of the result type are the same,
but they may change during type legalization. For example:
i19 @llvm.fptsi.sat.i19.f32(float %f)
// builds
i19 fp_to_sint_sat f, 19
// type legalizes (through integer result promotion)
i32 fp_to_sint_sat f, 19
I went for this approach, because saturated conversion does not
compose well. There is no good way of "adjusting" a saturating
conversion to i32 into one to i19 short of saturating twice.
Specifying the saturation width separately allows directly saturating
to the correct width.
There are two baseline expansions for the fp_to_xint_sat opcodes. If
the integer bounds can be exactly represented in the float type and
fminnum/fmaxnum are legal, we can expand to something like:
f = fmaxnum f, FP(MIN)
f = fminnum f, FP(MAX)
i = fptoxi f
i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN
If the bounds cannot be exactly represented, we expand to something
like this instead:
i = fptoxi f
i = select f ult FP(MIN), MIN, i
i = select f ogt FP(MAX), MAX, i
i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN
It should be noted that this expansion assumes a non-trapping fptoxi.
Initial tests are for AArch64, x86_64 and ARM. This exercises all of
the scalar and vector legalization. ARM is included to test float
softening.
Original patch by @nikic and @ebevhan (based on D54696).
Differential Revision: https://reviews.llvm.org/D54749
Currently the backend special cases x86_intrcc and treats the first
parameter as byval. Make the IR require byval for this parameter to
remove this special case, and avoid the dependence on the pointee
element type.
Fixes bug 46672.
I'm not sure the IR is enforcing all the calling convention
constraints. clang seems to ignore the attribute for empty parameter
lists, but the IR tolerates it.
Adds the ExtensionType flag, which reflects the LoadExtType of a MaskedGatherSDNode.
Also updated SelectionDAGDumper::print_details so that details of the gather
load (is signed, is scaled & extension type) are printed.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D91084
This commit adds two new intrinsics.
- llvm.experimental.vector.insert: used to insert a vector into another
vector starting at a given index.
- llvm.experimental.vector.extract: used to extract a subvector from a
larger vector starting from a given index.
The codegen work for these intrinsics has already been completed; this
commit is simply exposing the existing ISD nodes to LLVM IR.
Reviewed By: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D91362
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
Lowers the llvm.masked.gather intrinsics (scalar plus vector addressing mode only)
Changes in this patch:
- Add custom lowering for MGATHER, using getGatherVecOpcode() to choose the appropriate
gather load opcode to use.
- Improve codegen with refineIndexType/refineUniformBase, added in D90942
- Tests added for gather loads with 32 & 64-bit scaled & unscaled offsets.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D91092
The mapping between registers and relative size has been updated to
use TypeSize to account for the size of scalable EVTs.
The patch is a NFCI, if not for the fact that with this change the
function `getUnderlyingArgRegs` does not raise a warning for implicit
conversion of `TypeSize` to `unsigned` when generating machine code
from the test added to the patch.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D92096
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).
This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.
The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.
Differential Revision: https://reviews.llvm.org/D91649
We currently don't match this which limits the effectiveness of D91120 until
InstCombine starts canonicalizing to llvm.abs. This should be easy to remove
if/when we remove the SPF_ABS handling.
Differential Revision: https://reviews.llvm.org/D92118
This change introduces a MIR target-independent pseudo instruction corresponding to the IR intrinsic llvm.pseudoprobe for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.
An `llvm.pseudoprobe` intrinsic call will be lowered into a target-independent operation named `PSEUDO_PROBE`. Given the following instrumented IR,
```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
%cmp = icmp eq i32 %x, 0
call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
br i1 %cmp, label %bb1, label %bb2
bb1:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
br label %bb3
bb2:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
br label %bb3
bb3:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
ret void
}
```
the corresponding MIR is shown below. Note that block `bb3` is duplicated into `bb1` and `bb2` where its probe is duplicated too. This allows for an accurate execution count to be collected for `bb3`, which is basically the sum of the counts of `bb1` and `bb2`.
```
bb.0.bb0:
frame-setup PUSH64r undef $rax, implicit-def $rsp, implicit $rsp
TEST32rr killed renamable $edi, renamable $edi, implicit-def $eflags
PSEUDO_PROBE 837061429793323041, 1, 0
$edi = MOV32ri 1, debug-location !13; test.c:0
JCC_1 %bb.1, 4, implicit $eflags
bb.2.bb2:
PSEUDO_PROBE 837061429793323041, 3, 0
PSEUDO_PROBE 837061429793323041, 4, 0
$rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp
RETQ
bb.1.bb1:
PSEUDO_PROBE 837061429793323041, 2, 0
PSEUDO_PROBE 837061429793323041, 4, 0
$rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp
RETQ
```
The target op PSEUDO_PROBE will be converted into a piece of binary data by the object emitter with no machine instructions generated. This is done in a different patch.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D86495
The default version only works if the returned node has a single
result. The X86 and PowerPC versions support multiple results
and allow a single result to be returned from a node with
multiple outputs. And allow a single result that is not result 0
of the node.
Also replace the Mips version since the new version should work
for it. The original version handled multiple results, but only
if the new node and original node had the same number of results.
Differential Revision: https://reviews.llvm.org/D91846
When constructing a MemoryLocation by hand, require that a
LocationSize is explicitly specified. D91649 will split up
LocationSize::unknown() into two different states, and callers
should make an explicit choice regarding the kind of MemoryLocation
they want to have.
The `dso_local_equivalent` constant is a wrapper for functions that represents a
value which is functionally equivalent to the global passed to this. That is, if
this accepts a function, calling this constant should have the same effects as
calling the function directly. This could be a direct reference to the function,
the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the
resolved function as a call target.
When lowered, the returned address must have a constant offset at link time from
some other symbol defined within the same binary. The address of this value is
also insignificant. The name is leveraged from `dso_local` where use of a function
or variable is resolved to a symbol in the same linkage unit.
In this patch:
- Addition of `dso_local_equivalent` and handling it
- Update Constant::needsRelocation() to strip constant inbound GEPs and take
advantage of `dso_local_equivalent` for relative references
This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959)
which makes vtables readonly. This works by replacing the dynamic relocations for
function pointers in them with static relocations that represent the offset between
the vtable and virtual functions. If a function is externally defined,
`dso_local_equivalent` can be used as a generic wrapper for the function to still
allow for this static offset calculation to be done.
See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details.
Differential Revision: https://reviews.llvm.org/D77248
In some cases, the values passed to `asm sideeffect` calls cannot be
mapped directly to simple MVTs. Currently, we crash in the backend if
that happens. An example can be found in the @test_vector_too_large_r_m
test case, where we pass <9 x float> vectors. In practice, this can
happen in cases like the simple C example below.
using vec = float __attribute__((ext_vector_type(9)));
void f1 (vec m) {
asm volatile("" : "+r,m"(m) : : "memory");
}
One case that use "+r,m" constraints for arbitrary data types in
practice is google-benchmark's DoNotOptimize.
This patch updates visitInlineAsm so that it use MVT::Other for
constraints with complex VTs. It looks like the rest of the backend
correctly deals with that and properly legalizes the type.
And we still report an error if there are no registers to satisfy the
constraint.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D91710
Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only)
Changes included in this patch:
- Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use.
Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts.
- Added the getCanonicalIndexType function to convert redundant addressing
modes (e.g. scaling is redundant when accessing bytes)
- Tests with 32 & 64-bit scaled & unscaled offsets
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D90941
This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter().
Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print
the details of masked scatters (is truncating, signed or scaled).
This is the first in a series of patches which adds support for scalable masked scatters
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D90939
In certain places in llvm/lib/CodeGen we were relying upon the TypeSize
comparison operators when in fact the code was only ever expecting
either scalar values or fixed width vectors. I've changed some of these
places to use the equivalent scalar operator.
Differential Revision: https://reviews.llvm.org/D88482
In certain places in the code we can never end up in a situation where
we're mixing fixed width and scalable vector types. For example,
we can't have truncations and extends that change the lane count. Also,
in other places such as GenWidenVectorStores and GenWidenVectorLoads we
know from the behaviour of FindMemType that we can never choose a vector
type with a different scalable property.
In various places I have used EVT::bitsXY functions instead of
TypeSize::isKnownXY, where it probably makes sense to keep an assert
that scalable properties match.
Differential Revision: https://reviews.llvm.org/D88654
The STRICT was causing unnecessary confusion. I think SEQ is a more accurate
name for what they actually do, and the other obvious option of "ORDERED"
has the issue of already having a meaning in FP contexts.
Differential Revision: https://reviews.llvm.org/D88791
getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize
FP constants to the RHS. When this happens we should create the node
with the FMF that was requested. By using FlagInserter when can ensure
any calls to getNode/getSetcc during canonicalization will also get the flags.
Differential Revision: https://reviews.llvm.org/D88063
When processing PHI nodes after a callbr, we need to make sure that the
PHI nodes on the default branch are resolved after the callbr
(inserted after INLINEASM_BR). The PHI node values on the indirect
branches are processed before the INLINEASM_BR.
Differential Revision: https://reviews.llvm.org/D86260
SelectionDAGBuilder was inconsistently mangling values based on ABI
Calling Conventions when getting them through copyFromRegs in
SelectionDAGBuilder, causing duplicate value type convertions for
function arguments. The checking for the mangling requirement was based
on the value's originating instruction and was performed outside of, and
inspite of, the regular Calling Convention Lowering.
The issue could be observed in a scenario such as:
```
%arg1 = load half, half* %const, align 2
%arg2 = call fastcc half @someFunc()
call fastcc void @otherFunc(half %arg1, half %arg2)
; Here, %arg2 was incorrectly mangled twice, as the CallConv data from
; the call to @someFunc() was taken into consideration for the check
; when getting the value for processing the call to @otherFunc(...),
; after the proper convertion had taken place when lowering the return
; value of the first call.
```
This patch fixes the issue by disregarding the Calling Convention
information for such copyFromRegs, making sure the ABI mangling is
properly contanined in the Calling Convention Lowering.
This fixes Bugzilla #47454.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87844
The versions that take 'unsigned' will be removed in the future.
I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D87592
This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130.
This required adding a SDNodeFlags to SelectionDAG::getSetCC.
Now we manage to contant fold some stuff undefs during the
initial getNode that we don't do in later DAG combines.
Differential Revision: https://reviews.llvm.org/D87200
Previously SDNodeFlags::instersectWith(Flags) would do nothing if Flags was
in an undefined state, which is very bad given that this is the default when
getNode() is called without passing an explicit SDNodeFlags argument.
This meant that if an already existing and reused node had a flag which the
second caller to getNode() did not set, that flag would remain uncleared.
This was exposed by https://bugs.llvm.org/show_bug.cgi?id=47092, where an NSW
flag was incorrectly set on an add instruction (which did in fact overflow in
one of the two original contexts), so when SystemZElimCompare removed the
compare with 0 trusting that flag, wrong-code resulted.
There is more that needs to be done in this area as discussed here:
Differential Revision: https://reviews.llvm.org/D86871
Review: Ulrich Weigand, Sanjay Patel
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().
Differential Revision: https://reviews.llvm.org/D86065
When joining the legal parts of vector arguments into its original value
during the lower of Formal Arguments in SelectionDAGBuilder, the Calling
Convention information was not being propagated for the handling of each
individual parts. The same did not happen when lowering calls, causing a
mismatch.
This patch fixes the issue by properly propagating the Calling
Convention details.
This fixes Bugzilla #47001.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86715
This adapts legalization of intrinsic get.active.lane.mask to the new semantics
as described in D86147. Because the second argument is now the loop tripcount,
we legalize this intrinsic to an 'icmp ULT' instead of an ULE when it was the
backedge-taken count.
Differential Revision: https://reviews.llvm.org/D86302
In SelectionDAGBuilder always translate the fshl and fshr intrinsics to
FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and
ORs. Improve the legalization of FSHL and FSHR to avoid code quality
regressions.
Differential Revision: https://reviews.llvm.org/D77152
Changes the Offset arguments to both functions from int64_t to TypeSize
& updates all uses of the functions to create the offset using TypeSize::Fixed()
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85220
The custom lowering saves an instruction over the generic expansion, by
taking advantage of the fact that PowerPC shift instructions are well
defined in the shift-by-bitwidth case.
Differential Revision: https://reviews.llvm.org/D83948
This patch stops unconditionally transforming FSUB(-0,X) into an FNEG(X) while building the DAG. There is also one small change to handle the new FSUB(-0,X) similarly to FNEG(X) in the AMDGPU backend.
Differential Revision: https://reviews.llvm.org/D84056
This fixes an assertion failure that was being triggered in
SelectionDAG::getZeroExtendInReg(), where it was trying to extend the <2xi32>
to i64 (which should have been <2xi64>).
Fixes: rdar://66016901
Differential Revision: https://reviews.llvm.org/D84884
This adds the llvm.abs(), llvm.umin(), llvm.umax(), llvm.smin(),
and llvm.smax() intrinsics specified in D81829. For SelectionDAG,
the ISD opcodes and all the legalization and lowering already exist,
so this just wires them up to the intrinsic in the SDAG builder and
adds rudimentary tests. For GlobalISel only the min/max intrinsics
are wired up, as llvm.abs() will require the addition of a G_ABS op,
and corresponding legalization support.
Differential Revision: https://reviews.llvm.org/D84125
When the byref attribute is added, there will need to be two similar
functions for the existing cases which have an associate value copy,
and byref which does not. Most, but not all of the existing uses will
use the existing version.
The associated size function added by D82679 also needs to
contextually differ, and will help eliminate a few places still
relying on pointee element types.
Some of the system registers readable on AArch64 and ARM platforms
return different values with each read (for example a timer counter),
these shouldn't be hoisted outside loops or otherwise interfered with,
but the normal @llvm.read_register intrinsic is only considered to read
memory.
This introduces a separate @llvm.read_volatile_register intrinsic and
maps all system-registers on ARM platforms to use it for the
__builtin_arm_rsr calls. Registers declared with asm("r9") or similar
are unaffected.
`__stack_chk_fail` does not return, but `unreachable` was not generated
following `call __stack_chk_fail`. This had a possibility to generate an
invalid binary for functions with a return type, because
`__stack_chk_fail`'s return type is void and `call __stack_chk_fail` can
be the last instruction in the function whose return type is non-void.
Generating `unreachable` after it makes sure CFGStackify's
`fixEndsAtEndOfFunction` handles it correctly.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D83277
This patch fixes all remaining warnings in:
llvm/test/CodeGen/AArch64/sve-trunc.ll
llvm/test/CodeGen/AArch64/sve-vector-splat.ll
I hit some warnings related to getCopyPartsToVector. I fixed two
issues:
1. In widenVectorToPartType() we assumed that we'd always be
using BUILD_VECTOR nodes to expand from one vector type to another,
which is incorrect for scalable vector types. I've fixed this for now
by simply bailing out immediately for scalable vectors.
2. In getCopyToPartsVector() I've changed the code to compare
the element counts of different types.
Differential Revision: https://reviews.llvm.org/D83028
SelectionDAGBuilder converts logic-of-compares into multiple branches based
on a boolean TLI setting in isJumpExpensive(). But that probably never
considered the pattern of extracted bools from a vector compare - it seems
unlikely that we would want to turn vector logic into control-flow.
The motivating x86 reduction case is shown in PR44565:
https://bugs.llvm.org/show_bug.cgi?id=44565
...and that test shows the expected improvement from using pmovmsk codegen.
For AArch64, I modified the test to include an extra op because the simpler
test gets transformed by a codegen invocation of SimplifyCFG.
Differential Revision: https://reviews.llvm.org/D82602
Whilst trying to assemble the following test:
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_set2.c
I discovered we were hitting some warnings about possible invalid
calls to getVectorNumElements() in getCopyToPartsVector(). I've
tried to fix these by using ElementCount types where possible and
I've made the assumption that we don't support using a fixed width
vector to copy parts of a scalable vector, and vice versa. Looking
at how the copy is implemented I think that's the right thing for
now.
Differential Revision: https://reviews.llvm.org/D82744
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.
Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.
Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.
Reviewed By: nickdesaulniers, void
Differential Revision: https://reviews.llvm.org/D79794
This lowers intrinsic @llvm.get.active.lane.mask to a setcc node, i.e. an icmp
ule, and creates vectors for its 2 arguments on which the comparison is
performed.
Differential Revision: https://reviews.llvm.org/D82292
Summary:
- AssertAlign node records the guaranteed alignment on its source node,
where these alignments are retrieved from alignment attributes in LLVM
IR. These tracked alignments could help DAG combining and lowering
generating efficient code.
- In this patch, the basic support of AssertAlign node is added. So far,
we only generate AssertAlign nodes on return values from intrinsic
calls.
- Addressing selection in AMDGPU is revised accordingly to capture the
new (base + offset) patterns.
Reviewers: arsenm, bogner
Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81711
Summary:
Half-precision floating point arguments and returns are currently
promoted to either float or int32 in clang's CodeGen and there's
no existing support for the lowering of `half` arguments and returns
from IR in AArch32's backend.
Such frontend coercions, implemented as coercion through memory
in clang, can cause a series of issues in argument lowering, as causing
arguments to be stored on the wrong bits on big-endian architectures
and incurring in missing overflow detections in the return of certain
functions.
This patch introduces the handling of half-precision arguments and returns in
the backend using the actual "half" type on the IR. Using the "half"
type the backend is able to properly enforce the AAPCS' directions for
those arguments, making sure they are stored on the proper bits of the
registers and performing the necessary floating point convertions.
Reviewers: rjmccall, olista01, asl, efriedma, ostannard, SjoerdMeijer
Reviewed By: ostannard
Subscribers: stuij, hiraditya, dmgreen, llvm-commits, chill, dnsampaio, danielkiss, kristof.beyls, cfe-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75169
Summary:
The naked function attribute is meant to suppress all function
prologue/epilogue instructions.
On ARM, some are still emitted if an argument greater than 64 bytes in size
(the threshold for using the byval attribute in IR) is passed partially
in registers.
Perform the check for Attribute::Naked and early exit in
SelectionDAGISel::LowerArguments().
Checking in ARMFrameLowering::determineCalleeSaves() is too late.
A test case is included.
Reviewers: llvm-commits, olista01, danielkiss
Reviewed By: danielkiss
Subscribers: kristof.beyls, hiraditya, danielkiss
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80715
Change-Id: Icedecf2a4ad31bc3c35ab0df7489a9d346e1f7cc
Now that all of the statepoint related routines have classes with isa support, let's cleanup.
I'm leaving the (dead) utitilities in tree for a few days so that I can do the same cleanup downstream without breakage.
In the current statepoint design, we have four distinct groups of operands to the call: call args, gc transition args, deopt args, and gc args. This format prexisted the support in IR for operand bundles and was in fact one of the inspirations for the extension. However, we never went back and rearchitected statepoints to fully leverage bundles.
This change is the first in a small sequence to do so. All this does is extend the SelectionDAG lowering code to allow deopt and gc transition operands to be specified in either inline argument bundles or operand bundles.
Differential Revision: https://reviews.llvm.org/D8059
This intrinsic implements IEEE-754 operation roundToIntegralTiesToEven,
and performs rounding to the nearest integer value, rounding halfway
cases to even. The intrinsic represents the missed case of IEEE-754
rounding operations and now llvm provides full support of the rounding
operations defined by the standard.
Differential Revision: https://reviews.llvm.org/D75670
If the caller needs to reponsible for making sure the MaybeAlign
has a value, then we should just make the caller convert it to an Align
with operator*.
I explicitly deleted the relational comparison operators that
were being inherited from Optional. It's unclear what the meaning
of two MaybeAligns were one is defined and the other isn't
should be. So make the caller reponsible for defining the behavior.
I left the ==/!= operators from Optional. But now that exposed a
weird quirk that ==/!= between Align and MaybeAlign required the
MaybeAlign to be defined. But now we use the operator== from
Optional that takes an Optional and the Value.
Differential Revision: https://reviews.llvm.org/D80455
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.
In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.
This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.
The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.
Force any function containing a preallocated call to use the frame
pointer.
Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.
Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).
Aside from the tests added here, I checked that this codegen produces
correct code for something like
```
struct A {
A();
A(A&&);
~A();
};
void bar() {
foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```
by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.
Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77689
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.
In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.
This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.
The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.
Force any function containing a preallocated call to use the frame
pointer.
Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.
Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).
Aside from the tests added here, I checked that this codegen produces
correct code for something like
```
struct A {
A();
A(A&&);
~A();
};
void bar() {
foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```
by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77689
Now that load/store alignment is required, we no longer need most
of them. Also switch the getLoadStoreAlignment() helper to return
Align instead of MaybeAlign.
Along the lines of D77454 and D79968. Unlike loads and stores, the
default alignment is getPrefTypeAlign, to match the existing handling in
various places, including SelectionDAG and InstCombine.
Differential Revision: https://reviews.llvm.org/D80044
This is D77454, except for stores. All the infrastructure work was done
for loads, so the remaining changes necessary are relatively small.
Differential Revision: https://reviews.llvm.org/D79968
The fact that loads and stores can have the alignment missing is a
constant source of confusion: code that usually works can break down in
rare cases. So fix the LoadInst API so the alignment is never missing.
To reduce the number of changes required to make this work, IRBuilder
and certain LoadInst constructors will grab the module's datalayout and
compute the alignment automatically. This is the same alignment
instcombine would eventually apply anyway; we're just doing it earlier.
There's a minor risk that the way we're retrieving the datalayout
could break out-of-tree code, but I don't think that's likely.
This is the last in a series of patches, so most of the necessary
changes have already been merged.
Differential Revision: https://reviews.llvm.org/D77454
Summary:
This patch handles illegal scalable types when lowering IR operations,
addressing several places where the value of isScalableVector() is
ignored.
For types such as <vscale x 8 x i32>, this means splitting the
operations. In this example, we would split it into two
operations of type <vscale x 4 x i32> for the low and high halves.
In cases such as <vscale x 2 x i32>, the elements in the vector
will be promoted. In this case they will be promoted to
i64 (with a vector of type <vscale x 2 x i64>)
Reviewers: sdesmalen, efriedma, huntergr
Reviewed By: efriedma
Subscribers: david-arm, tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78812
We allocated a suitably aligned frame index so we know that all the values
have ABI alignment.
For MIPS this avoids using pair of lwl + lwr instructions instead of a
single lw. I found this when compiling CHERI pure capability code where
we can't use the lwl/lwr unaligned loads/stores and and were to falling
back to a byte load + shift + or sequence.
This should save a few instructions for MIPS and possibly other backends
that don't have fast unaligned loads/stores.
It also improves code generation for CodeGen/X86/pr34653.ll and
CodeGen/WebAssembly/offset.ll since they can now use aligned loads.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78999
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
Summary:
When generating code for the LLVM IR zeroinitialiser operation, if
the vector type is scalable we should be using SPLAT_VECTOR instead
of BUILD_VECTOR.
Differential Revision: https://reviews.llvm.org/D78636
Using getValueType() is not correct for architectures extended with CHERI since
we need a pointer type and not the value that is loaded. While stack
protector is useless when you have CHERI (since CHERI provides much
stronger security guarantees), we still have a test to check that we can
generate correct code for checks. Merging b281138a1b
into our tree broke this test. Fix by using TLI.getFrameIndexTy().
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D77785
Summary:
Remove asserting vector getters from Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: dexonsmith, sdesmalen, efriedma
Reviewed By: efriedma
Subscribers: cfe-commits, hiraditya, llvm-commits
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D77278
I've always found the "findValue" a little odd and
inconsistent with other things in SDB.
This simplfifies the code in SDB to just handle a splat constant
address or a 2 operand GEP in the same BB. This removes the
need for "findValue" since the operands to the GEP are
guaranteed to be available. The splat constant handling is
new, but was needed to avoid regressions due to constant
folding combining GEPs created in CGP.
CGP is now responsible for canonicalizing gather/scatters into
this form. The pattern I'm using for scalarizing, a scalar GEP
followed by a GEP with an all zeroes index, seems to be subject
to constant folding that the insertelement+shufflevector was not.
Differential Revision: https://reviews.llvm.org/D76947
The "Align" passed into getMachineMemOperand etc. is the alignment of
the MachinePointerInfo, not the alignment of the memory operation.
(getAlign() on a MachineMemOperand automatically reduces the alignment
to account for this.)
We were passing on wrong (overconservative) alignment in a bunch of
places. Fix a bunch of these, mostly in legalization. And while I'm
here, switch to the new Align APIs.
The test changes are all scheduling changes: the biggest effect of
preserving large alignments is that it improves alias analysis, so the
scheduler has more freedom.
(I was originally just trying to do a minor cleanup in
SelectionDAGBuilder, but I accidentally went deeper down the rabbit
hole.)
Differential Revision: https://reviews.llvm.org/D77687
Summary:
No error or warning is emitted when specific reserved registers are
written to in inline assembly. Therefore, writes to the program counter
or to the frame pointer, for instance, were permitted, which could have
led to undesirable behaviour.
Example:
int foo() {
register int a __asm__("r7"); // r7 = frame-pointer in M-class ARM
__asm__ __volatile__("mov %0, r1" : "=r"(a) : : );
return a;
}
In contrast, GCC issues an error in the same scenario.
This patch detects writes to specific reserved registers in inline
assembly for ARM and emits an error in such case. The detection works
for output and input operands. Clobber operands are not handled here:
they are already covered at a later point in
AsmPrinter::emitInlineAsm(const MachineInstr *MI). The registers
covered are: program counter, frame pointer and base pointer.
This is ARM only. Therefore the implementation of other targets'
counterparts remain open to do.
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76848
I only left it at the interface to ParseConstraints since that
needs updates to other callers in different files. I'll do that
as a follow up.
Differential Revision: https://reviews.llvm.org/D77892
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: stoklund, sdesmalen, efriedma
Reviewed By: sdesmalen
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77272
These should not be assuming address space 0. Calling getPointerTy is
generally the wrong thing to do, since you should already know the
type from the incoming IR.
Summary:
A bug report mentioned that LLVM was producing jumps off the end of a
function when using "asm goto with outputs". Further digging pointed to
MachineBasicBlocks that had their address taken and were indirect
targets of INLINEASM_BR being removed by BranchFolder, because their
predecessor list was empty, so they appeared to have no entry.
This was a cascading failure caused earlier, during Pre-RA instruction
scheduling. We have a few special cases in Pre-RA instruction scheduling
where we split a MachineBasicBlock in two. This requires careful
handing of predecessor and successor lists for a MachineBasicBlock that
was split, and careful handing of PHI MachineInstrs that referred to the
MachineBasicBlock before it was split.
The clue that led to this fix was the observation that many callers of
MachineBasicBlock::splice() frequently call
MachineBasicBlock::transferSuccessorsAndUpdatePHIs() to update their PHI
nodes after a splice. We don't want to reuse that method, as we have
custom successor transferring logic for this block split.
This patch fixes 2 pre-existing bugs, and adds tests.
The first bug was that MachineBasicBlock::splice() correctly handles
updating most successors and predecessors; we don't need to do anything
more than removing the previous fallthrough block from the first half of
the split block post splice. Previously, we were updating the successor
list incorrectly (updating successors updates predecessors).
The second bug was that PHI nodes that needed registers from the first
half of the split block were not having entries populated. The register
live out information was correct, and the FuncInfo->PHINodesToUpdate was
correct. Specifically, the check in SelectionDAGISel::FinishBasicBlock:
for (unsigned i = 0, e = FuncInfo->PHINodesToUpdate.size(); i != e; ++i) {
MachineInstrBuilder PHI(*MF, FuncInfo->PHINodesToUpdate[i].first);
if (!FuncInfo->MBB->isSuccessor(PHI->getParent()))
continue;
PHI.addReg(FuncInfo->PHINodesToUpdate[i].second).addMBB(FuncInfo->MBB);
was `continue`ing because FuncInfo->MBB tracks the second half of
the post-split block; no one was updating PHI entries for the first half
of the post-split block.
SelectionDAGBuilder::UpdateSplitBlock() already expects to perform
special handling for MachineBasicBlocks that were split post calls to
ScheduleDAGSDNodes::EmitSchedule(), so I'm confident that it's both
correct for ScheduleDAGSDNodes::EmitSchedule() to return the second half
of the split block `CopyBB` which updates `FuncInfo->MBB` (ie. the
current MachineBasicBlock being processed), and perform special handling
for this in SelectionDAGBuilder::UpdateSplitBlock().
Reviewers: void, craig.topper, efriedma
Reviewed By: void, efriedma
Subscribers: hfinkel, fhahn, MatzeB, efriedma, hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76961
The previous code used the type of the first field for the VT
passed to getNode for every field.
I've based the implementation here off what is done in visitSelect
as it removes the need to special case aggregates.
Differential Revision: https://reviews.llvm.org/D77093
Summary:
This is a roll forward of D77394 minus AlignmentFromAssumptions (which needs to be addressed separately)
Differences from D77394:
- DebugStr() now prints the alignment value or `None` and no more `Align(x)` or `MaybeAlign(x)`
- This is to keep Warning message consistent (CodeGen/SystemZ/alloca-04.ll)
- Removed a few unneeded headers from Alignment (since it's included everywhere it's better to keep the dependencies to a minimum)
Reviewers: courbet
Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77537
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.
This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.
I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors. Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it. Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.
Differential Revision: https://reviews.llvm.org/D72467
Summary:
This change adds amdgcn.reloc.constant intrinsic to the amdgpu backend, which will compile into a relocation entry in the resulting elf.
The intrinsics takes a MetadataNode (String) as its only argument, which specifies the symbol name of the relocation entry.
`SelectionDAGBuilder::getValueImpl` is changed to allow metadata operands passed through to ISel.
Author: csyonghe <yonghe@google.com>
Reviewers: tpr, nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76440
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jyknight, sdardis, nemanjai, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, jfb, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77059
These transforms rely on a vector reduction flag on the SDNode
set by SelectionDAGBuilder. This flag exists because SelectionDAG
can't see across basic blocks so SelectionDAGBuilder is looking
across and saving the info. X86 is the only target that uses this
flag currently. By removing the X86 code we can remove the flag
and the SelectionDAGBuilder code.
This pass adds a dedicated IR pass for X86 that looks across the
blocks and transforms the IR into a form that the X86 SelectionDAG
can finish.
An advantage of this new approach is that we can enhance it to
shrink the phi nodes and final reduction tree based on the zeroes
that we need to concatenate to bring the partially reduced
reduction back up to the original width.
Differential Revision: https://reviews.llvm.org/D76649
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: dylanmckay, sdardis, nemanjai, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76551
Summary:
* Remove a bunch of asserts checking for unsupported scalable types and
add some more now that they are supported.
* Propagate the scalable flag where necessary.
* Add another `EVT::getExtendedVectorVT` method that takes an
ElementCount parameter.
* Add `EVT::isExtendedScalableVector` and
`EVT::getExtendedVectorElementCount` - latter is currently unused.
Reviewers: sdesmalen, efriedma, rengolin, craig.topper, huntergr
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75672
SelectionDAG CSEs nodes based on their result type and operands, but not their flags. The flags are expected to be intersected when they are CSEd. In SelectionDAGBuilder, for FP nodes we manage both the fast math flags and the nofpexcept flag after the nodes have already been CSEd when they were created with getNode. The management of the fastmath flags before the constrained nodes prevents the nofpexcept management from working correctly.
This commit moves the FMF handling for constrained intrinsics into their visitor and disables the common FMF handling for these nodes.
Differential Revision: https://reviews.llvm.org/D75224
I believe we were previously calculating a pointer info with the scalar base and an offset of 0. But that's not really where the gather is pointing. The offset is a function of the indices of the GEP we looked through.
Also set the size of the MachineMemOperand to UnknownSize
Differential Revision: https://reviews.llvm.org/D76157
Summary:
callbr's indirect branches aren't expected to be taken, so reduce their
probabilities to 0 while increasing the default destination to 1. This
allows some code improvements through block placement.
Reviewers: nickdesaulniers
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72656
This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order.
This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code.
Differential Revision: https://reviews.llvm.org/D75132
Summary:
Terminators in LLVM aren't prohibited from returning values. This means that
the "callbr" instruction, which is used for "asm goto", can support "asm goto
with outputs."
This patch removes all restrictions against "callbr" returning values. The
heavy lifting is done by the code generator. The "INLINEASM_BR" instruction's
a terminator, and the code generator doesn't allow non-terminator instructions
after a terminator. In order to correctly model the feature, we need to copy
outputs from "INLINEASM_BR" into virtual registers. Of course, those copies
aren't terminators.
To get around this issue, we split the block containing the "INLINEASM_BR"
right before the "COPY" instructions. This results in two cheats:
- Any physical registers defined by "INLINEASM_BR" need to be marked as
live-in into the block with the "COPY" instructions. This violates an
assumption that physical registers aren't marked as "live-in" until after
register allocation. But it seems as if the live-in information only
needs to be correct after register allocation. So we're able to get away
with this.
- The indirect branches from the "INLINEASM_BR" are moved to the "COPY"
block. This is to satisfy PHI nodes.
I've been told that MLIR can support this handily, but until we're able to
use it, we'll have to stick with the above.
Reviewers: jyknight, nickdesaulniers, hfinkel, MaskRay, lattner
Reviewed By: nickdesaulniers, MaskRay, lattner
Subscribers: rriddle, qcolombet, jdoerfert, MatzeB, echristo, MaskRay, xbolva00, aaron.ballman, cfe-commits, JonChesterfield, hiraditya, llvm-commits, rnk, craig.topper
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D69868
Summary:
This patch adds intrinsics and ISelDAG nodes for signed
and unsigned fixed-point division:
```
llvm.sdiv.fix.sat.*
llvm.udiv.fix.sat.*
```
These intrinsics perform scaled, saturating division
on two integers or vectors of integers. They are
required for the implementation of the Embedded-C
fixed-point arithmetic in Clang.
Reviewers: bjope, leonardchan, craig.topper
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71550
This includes both GEPs where the indexed type is a scalable vector, and
GEPs where the result type is a scalable vector.
Differential Revision: https://reviews.llvm.org/D73602
Summary:
Backends should fold the subtraction into the comparison, but not all
seem to. Moreover, on targets where pointers are not integers, such as
CHERI, an integer subtraction is not appropriate. Instead we should just
compare the two pointers directly, as this should work everywhere and
potentially generate more efficient code.
Reviewers: bogner, lebedev.ri, efriedma, t.p.northover, uweigand, sunfish
Reviewed By: lebedev.ri
Subscribers: dschuff, sbc100, arichardson, jgravelle-google, hiraditya, aheejin, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74454
This reverts commit ed29dbaafa.
I'm backing out D68945, which as the discussion for D73526 shows, doesn't
seem to handle the -O0 path through the codegen backend correctly. I'll
reland the patch when a fix is worked out, apologies for all the churn.
The two parent commits are part of this revert too.
Conflicts:
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
llvm/test/DebugInfo/X86/dbg-addr-dse.ll
SelectionDAGBuilder conflict is due to a nearby change in e39e2b4a79
that's technically unrelated. dbg-addr-dse.ll conflicted because
41206b61e3 (legitimately) changes the order of two lines.
There are further modifications to dbg-value-func-arg.ll: it landed after
the patch being reverted, and I've converted indirection to be represented
by the isIndirect field rather than DW_OP_deref.
This reverts commit 3137fe4d23.
I'm backing out D68945, which this patch is a follow up for. It'll be
re-landed when D68945 is fixed.
The changes to dbg-value-func-arg.ll occur because our handling of certain
kinds of location now mixes up indirection that happens at different points
in a DIExpression. While this is a regression, it's a return to the prior
behaviour while a better patch is sought.
The CATCHPAD node mostly existed to be selected into the EH_RESTORE
instruction, which sets the frame back up when 32-bit Windows exceptions
return to the parent function. However, creating this MachineInstr early
increases the risk that other passes will come along and insert
instructions that use the stack before ESP and EBP are restored. That
happened in PR44697.
Instead of representing these in the instruction stream early, delay it
until PEI. Mark the blocks where this needs to happen as EHPads, but not
funclet entry blocks. Passes after PEI have to be careful not to hoist
instructions that can use stack across frame setup instructions, so this
should be relatively reliable.
Fixes PR44697
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D73752
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73885
This is based on this llvm-dev thread http://lists.llvm.org/pipermail/llvm-dev/2019-December/137521.html
The current strategy for f16 is to promote type to float every except where the specific width is required like loads, stores, and bitcasts. This results in rounding occurring in odd places instead of immediately after arithmetic operations. This interacts in weird ways with the __fp16 type in clang which is a storage only type where arithmetic is always promoted to float. InstCombine can remove some fpext/fptruncs around such arithmetic and turn it into arithmetic on half. This wouldn't be so bad if SelectionDAG was able to put those fpext/fpround back in when it promotes.
It is also not obvious how to handle to make the existing strategy work with STRICT fp. We need to use STRICT versions of the conversions which require chain operands. But if the conversions are created for a bitcast, there is no place to get an appropriate chain from.
This patch implements a different strategy where conversions are emitted directly around arithmetic operations. And otherwise its passed around as an i16 including in arguments and return values. This can result in more conversions between arithmetic operations, but is closer to matching the IR the frontend generates for __fp16. And it will allow us to use the chain from constrained arithmetic nodes to link the STRICT_FP_TO_FP16/STRICT_FP16_TO_FP that will need to be added. I've set it up so that each target can opt into the new behavior. Converting all the targets myself was more than I was able to handle.
Differential Revision: https://reviews.llvm.org/D73749
Summary:
This is a follow up on D61634. It adds an LLVM IR intrinsic to allow better implementation of memcpy from C++.
A follow up CL will add the intrinsics in Clang.
Reviewers: courbet, theraven, t.p.northover, jdoerfert, tejohnson
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71710
and macro FUNCTION likewise. NFCI.
Some functions like fmuladd don't really have a node, we should divide
the declaration form those have node to avoid introducing fake nodes.
Differential Revision: https://reviews.llvm.org/D72871
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.
Reviewers: xbolva00, courbet, bollu
Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73099
In LLVM IR, vscale can be represented with an intrinsic. For some targets,
this is equivalent to the constexpr:
getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1
This can be used to propagate the value in CodeGenPrepare.
In ISel we add a node that can be legalized to one or more
instructions to materialize the runtime vector length.
This patch also adds SVE CodeGen support for VSCALE, which maps this
node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD]
instructions (scaled multiples of 2, 4, or 8 bytes, respectively).
Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner
Reviewed by: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68203
This was dropping the invariant metadata on dead argument loads, so
they weren't deleted.
Atomics still need to be fixed the same way. Also, apparently store
was never preserving dereferencable which should also be fixed.
Code in getRoot made the assumption that every node in PendingLoads
must always itself have a dependency on the current DAG root node.
After the changes in 04a8696, it turns out that this assumption no
longer holds true, causing wrong codegen in some cases (e.g. stores
after constrained FP intrinsics might get deleted).
To fix this, we now need to make sure that the TokenFactor created
by getRoot always includes the previous root, if there is no implicit
dependency already present.
The original getControlRoot code already has exactly this check,
so this patch simply reuses that code now for getRoot as well.
This fixes the regression.
NFC if no constrained FP intrinsic is present.
We need to ensure that fpexcept.strict nodes are not optimized away even if
the result is unused. To do that, we need to chain them into the block's
terminator nodes, like already done for PendingExcepts.
This patch adds two new lists of pending chains, PendingConstrainedFP and
PendingConstrainedFPStrict to hold constrained FP intrinsic nodes without
and with fpexcept.strict markers. This allows not only to solve the above
problem, but also to relax chains a bit further by no longer flushing all
FP nodes before a store or other memory access. (They are still flushed
before nodes with other side effects.)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D72341
Summary:
This patch adds intrinsics and ISelDAG nodes for
signed and unsigned fixed-point division:
llvm.sdiv.fix.*
llvm.udiv.fix.*
These intrinsics perform scaled division on two
integers or vectors of integers. They are required
for the implementation of the Embedded-C fixed-point
arithmetic in Clang.
Patch by: ebevhan
Reviewers: bjope, leonardchan, efriedma, craig.topper
Reviewed By: craig.topper
Subscribers: Ka-Ka, ilya, hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70007
Summary:
Remove the restrictions that preventing "asm goto" from returning non-void
values. The values returned by "asm goto" are only valid on the "fallthrough"
path.
Reviewers: jyknight, nickdesaulniers, hfinkel
Reviewed By: jyknight, nickdesaulniers
Subscribers: rsmith, hiraditya, llvm-commits, cfe-commits, craig.topper, rnk
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69876
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).
There's no major functionality change, except for targets that never
implemented this check.
This LLVM attribute was originally added in
d9699bc7bd (2015).
Reviewers: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D72118
The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all
other such flags. This is a problem, because it implies that all code that
transforms SDNodes without copying flags can introduce a correctness bug,
not just a missed optimization.
This patch changes the default to false. This makes it necessary to move
setting the (No)FPExcept flag for constrained intrinsics from the
visitConstrainedIntrinsic routine to the generic visit routine at the
place where the other flags are set, or else the intersectFlagsWith
call would erase the NoFPExcept flag again.
In order to avoid making non-strict FP code worse, whenever
SelectionDAGISel::SelectCodeCommon matches on a set of orignal nodes
none of which can raise FP exceptions, it will preserve this property
on all results nodes generated, by setting the NoFPExcept flag on
those result nodes that would otherwise be considered as raising
an FP exception.
To check whether or not an SD node should be considered as raising
an FP exception, the following logic applies:
- For machine nodes, check the mayRaiseFPException property of
the underlying MI instruction
- For regular nodes, check isStrictFPOpcode
- For target nodes, check a newly introduced isTargetStrictFPOpcode
The latter is implemented by reserving a range of target opcodes,
similarly to how memory opcodes are identified. (Note that there a
bit of a quirk in identifying target nodes that are both memory nodes
and strict FP nodes. To simplify the logic, right now all target memory
nodes are automatically also considered strict FP nodes -- this could
be fixed by adding one more range.)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D71841
Summary:
We noticed in Julia that the sequence below no longer turned into
a sequence of FMA instructions in LLVM 7+, but it did in LLVM 6.
```
%29 = fmul contract <4 x double> %wide.load, %wide.load16
%30 = fmul contract <4 x double> %wide.load13, %wide.load17
%31 = fmul contract <4 x double> %wide.load14, %wide.load18
%32 = fmul contract <4 x double> %wide.load15, %wide.load19
%33 = fadd fast <4 x double> %vec.phi, %29
%34 = fadd fast <4 x double> %vec.phi10, %30
%35 = fadd fast <4 x double> %vec.phi11, %31
%36 = fadd fast <4 x double> %vec.phi12, %32
```
Unlike Clang, Julia doesn't set the `unsafe-fp-math=true` function
attribute, but rather emits more local instruction flags.
This partially undoes https://reviews.llvm.org/D46854 and if required I can try to minimize the test further.
Reviewers: spatel, mcberg2017
Reviewed By: spatel
Subscribers: chriselrod, merge_guards_bot, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71495
getTargetConstant prevents any optimizations from operating on the
value and basically says its already been iseled. But since we
want the index to be in a register, this isn't true.
Prior to this we were generating a vbroadcast with an immediate
argument which is illegal and was flagged by the expensive checks
bot.
This reverts commit 1f3dd83cc1, reapplying
commit bb1b0bc4e5.
The original commit failed on some builds seemingly due to the use of a
bracketed constructor with an std::array, i.e. `std::array<> arr({...})`.
Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop
casts impossible. This patch enables the salvaging of casts by using the
DW_OP_LLVM_convert operator for SExt and Trunc instructions.
There is another issue which is exposed by this fix, in which fragment
DIExpressions (which are preserved more readily by this patch) for
values that must be split across registers in ISel trigger an assertion,
as the 'split' fragments extend beyond the bounds of the fragment
DIExpression causing an error. This patch also fixes this issue by
checking the fragment status of DIExpressions which are to be split, and
dropping fragments that are invalid.
Summary: Add calculation for elements in structures in getting uniform
base for the Gather/Scatter intrinsic.
Reviewers: craig.topper, c-rhodes, RKSimon
Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71442
Summary:
To find potential opportunities to use getMemBasePlusOffset() I looked at
all ISD::ADD uses found with the regex getNode\(ISD::ADD,.+,.+Ptr
in lib/CodeGen/SelectionDAG. If this patch is accepted I will convert
the files in the individual backends too.
The motivation for this change is our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project). We use a separate register
type to store pointers (128-bit capabilities, which are effectively
unforgeable and monotonic fat pointers). These capabilities permit a
reduced set of operations and therefore use a separate ValueType (iFATPTR).
to represent pointers implemented as capabilities.
Therefore, we need to avoid using ISD::ADD for our patterns that operate
on pointers and need to use a function that chooses ISD::ADD or a new
ISD::PTRADD opcode depending on the value type.
We originally added a new DAG.getPointerAdd() function, but after this
patch series we can modify the implementation of getMemBasePlusOffset()
instead. Avoiding direct uses of ISD::ADD for pointer types will
significantly reduce the amount of assertion/instruction selection
failures for us in future upstream merges.
Reviewers: spatel
Reviewed By: spatel
Subscribers: merge_guards_bot, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71207
During SelectionDAG, if a value which is associated with a DBG_VALUE
needs to be split across multiple registers, the DBG_VALUE will be split
into a set of fragment expressions to recreate the original value.
If one or more of these fragments cannot be created, they would
previously be silently dropped, causing the old debug value to live past
its expiry date. This patch fixes this issue by keeping invalid
fragments while setting their value as Undef.
Differential revision: https://reviews.llvm.org/D70248
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
This adds support for constrained floating-point comparison intrinsics.
Specifically, we add:
declare <ty2>
@llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
metadata <condition code>,
metadata <exception behavior>)
declare <ty2>
@llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
metadata <condition code>,
metadata <exception behavior>)
The first variant implements an IEEE "quiet" comparison (i.e. we only
get an invalid FP exception if either argument is a SNaN), while the
second variant implements an IEEE "signaling" comparison (i.e. we get
an invalid FP exception if either argument is any NaN).
The condition code is implemented as a metadata string. The same set
of predicates as for the fcmp instruction is supported (except for the
"true" and "false" predicates).
These new intrinsics are mapped by SelectionDAG codegen onto two new
ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again
representing quiet vs. signaling comparison operations. Otherwise
those nodes look like SETCC nodes, with an additional chain argument
and result as usual for strict FP nodes. The patch includes support
for the common legalization operations for those nodes.
The patch also includes full SystemZ back-end support for the new
ISD nodes, mapping them to all available SystemZ instruction to
fully implement strict semantics (scalar and vector).
Differential Revision: https://reviews.llvm.org/D69281
This patch implements the following changes:
1) SelectionDAGBuilder::visitConstrainedFPIntrinsic currently treats
each constrained intrinsic like a global barrier (e.g. a function call)
and fully serializes all pending chains. This is actually not required;
it is allowed for constrained intrinsics to be reordered w.r.t one
another or (nonvolatile) memory accesses. The MI-level scheduler already
allows for that flexibility, so it makes sense to allow it at the DAG
level as well.
This patch therefore changes the way chains for constrained intrisincs
are created, and handles them basically like load operations are handled.
This has the effect that constrained intrinsics are no longer serialized
against one another or (nonvolatile) loads. They are still serialized
against stores, but that seems hard to change with the current DAG chain
setup, and it also doesn't seem to be a big problem preventing DAG
2) The OPC_CheckFoldableChainNode check requires that each of the
intermediate nodes in a multi-node pattern match only has a single use.
This check tends to fail if those intermediate nodes are strict operations
as those have a chain output that typically indeed has another use.
However, we don't really need to consider chains here at all, since they
will all be rewritten anyway by UpdateChains later. Other parts of the
matcher therefore already ignore chains, but this hasOneUse check doesn't.
This patch replaces hasOneUse by a custom test that verifies there is no
more than one use of any non-chain output value.
In theory, this change could affect code unrelated to strict FP nodes,
but at least on SystemZ I could not find any single instance of that
happening
3) The SystemZ back-end currently does not allow matching multiply-and-
extend operations (32x32 -> 64bit or 64x64 -> 128bit FP multiply) for
strict FP operations. This was not possible in the past due to the
problems described under 1) and 2) above.
With those issues fixed, it is now possible to fully support those
instructions in strict mode as well, and this patch does so.
Differential Revision: https://reviews.llvm.org/D70913
MVE has a basic symmetry between it's normal loads/store operations and
the masked variants. This means that masked loads and stores can use
pre-inc and post-inc addressing modes, just like the standard loads and
stores already do.
To enable that, this patch adds all the relevant infrastructure for
treating masked loads/stores addressing modes in the same way as normal
loads/stores.
This involves:
- Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra
Offset operand that is added after the PtrBase.
- Extending the IndexedModeActions from 8bits to 16bits to store the
legality of masked operations as well as normal ones. This array is
fairly small, so doubling the size still won't make it very large.
Offset masked loads can then be controlled with
setIndexedMaskedLoadAction, similar to standard loads.
- The same methods that combine to indexed loads, such as
CombineToPostIndexedLoadStore, are adjusted to handle masked loads in
the same way.
- The ARM backend is then adjusted to make use of these indexed masked
loads/stores.
- The X86 backend is adjusted to hopefully be no functional changes.
Differential Revision: https://reviews.llvm.org/D70176
This is recommit of commit e6584b2b7b, which was reverted in
30e7ee3c4b together with af57dbf12e.
Original message is below.
Enumerations that describe rounding mode and exception behavior were
defined inside ConstrainedFPIntrinsic. It makes sense to use the same
definitions to represent the same properties in other cases, not only
in constrained intrinsics. It was however inconvenient as required to
include constrained intrinsics definitions even if they were not needed.
Also using long scope prefix reduced readability.
This change moves these definitioins to the namespace llvm::fp.
No functional changes.
Differential Revision: https://reviews.llvm.org/D69552
Summary
In several places we need to enumerate all constrained intrinsics or IR
nodes that should be represented by them. It is easy to miss some of
the cases. To make working with these intrinsics more convenient and
robust, this change introduces file containing definitions of all
constrained intrinsics and some of their properties. This file can be
included to generate constrained intrinsics processing code.
Reviewers: kpn, andrew.w.kaylor, cameron.mcinally, uweigand
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69887
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.
AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
and a follow-up NFC rearrangement as it's causing a crash on valid. Testcase is on the original review thread.
This reverts commits af57dbf12e and e6584b2b7b
* Implements scalable size queries for MVTs, split out from D53137.
* Contains a fix for FindMemType to avoid using scalable vector type
to contain non-scalable types.
* Explicit casts for several places where implicit integer sign
changes or promotion from 32 to 64 bits caused problems.
* CodeGenDAGPatterns will treat scalable and non-scalable vector types
as different.
Reviewers: greened, cameron.mcinally, sdesmalen, rovka
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D66871
Enumerations that describe rounding mode and exception behavior were
defined inside ConstrainedFPIntrinsic. It makes sense to use the same
definitions to represent the same properties in other cases, not only
in constrained intrinsics. It was however inconvenient as required to
include constrained intrinsics definitions even if they were not needed.
Also using long scope prefix reduced readability.
This change moves these definitioins to the namespace llvm::fp.
No functional changes.
Differential Revision: https://reviews.llvm.org/D69552
Summary:
This patch redefines freeze instruction from being UnaryOperator to a subclass of UnaryInstruction.
ConstantExpr freeze is removed, as discussed in the previous review.
FreezeOperator is not added because there's no ConstantExpr freeze.
`freeze i8* null` test is added to `test/Bindings/llvm-c/freeze.ll` as well, because the null pointer-related bug in `tools/llvm-c/echo.cpp` is now fixed.
InstVisitor has visitFreeze now because freeze is not unaryop anymore.
Reviewers: whitequark, deadalnix, craig.topper, jdoerfert, lebedev.ri
Reviewed By: craig.topper, lebedev.ri
Subscribers: regehr, nlopes, mehdi_amini, hiraditya, steven_wu, dexonsmith, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69932
Small refactoring in visitConstrainedFPIntrinsic that should make
it easier to create DAG nodes requiring extra arguments. That is
the case currently only for STRICT_FP_ROUND, but may be the case
for additional nodes (in particular compares) in the future.
Extracted from the patch for D69281.
NFC.
From SelectionDAGs point of view, debug variable locations specified with
dbg.declare and dbg.addr are indirect -- they specify the address of
something. But calling conventions might mean that a Value is placed on
the stack somewhere, and this too is indirection. Previously this was
mixed up in the "IsIndirect" field of DBG_VALUE insts; this patch
separates them by encoding the indirection in a DIExpression.
If we have a dbg.declare or dbg.addr, then the expression produces an
address that then becomes a DWARF memory location. We can represent
this by putting a DW_OP_deref on the _end_ of the expression. If a Value
has been placed on the stack, then we need to put a DW_OP_deref on the
_start_ of the expression, to load the Value from the stack and have
the rest of the expression operate on it.
Differential Revision: https://reviews.llvm.org/D69028
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.
Reviewers: thakis, rnk, theraven, pcc
Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D65761
Adds a new ISD node to replicate a scalar value across all elements of
a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot
be used.
Fixes up default type legalization for scalable vectors after the
new MVT type ranges were introduced.
At present I only use this node for scalable vectors. A DAGCombine has
been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all
elements are the same, but only if the default operation action of
Expand has been overridden by the target.
I've only added result promotion legalization for scalable vector
i8/i16/i32/i64 types in AArch64 for now.
Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy
Reviewed By: jmolloy
Differential Revision: https://reviews.llvm.org/D47775
llvm-svn: 375222
This patch kills off a significant user of the "IsIndirect" field of
DBG_VALUE machine insts. Brought up in in PR41675, IsIndirect is
techncally redundant as it can be expressed by the DIExpression of a
DBG_VALUE inst, and it isn't helpful to have two ways of expressing
things.
Rather than setting IsIndirect, have DBG_VALUE creators add an extra deref
to the insts DIExpression. There should now be no appearences of
IsIndirect=True from isel down to LiveDebugVariables / VirtRegRewriter,
which is ensured by an assertion in LDVImpl::handleDebugValue. This means
we also get to delete the IsIndirect handling in LiveDebugVariables. Tests
can be upgraded by for example swapping the following IsIndirect=True
DBG_VALUE:
DBG_VALUE $somereg, 0, !123, !DIExpression(DW_OP_foo)
With one where the indirection is in the DIExpression, by _appending_
a deref:
DBG_VALUE $somereg, $noreg, !123, !DIExpression(DW_OP_foo, DW_OP_deref)
Which both mean the same thing.
Most of the test changes in this patch are updates of that form; also some
changes in how the textual assembly printer handles these insts.
Differential Revision: https://reviews.llvm.org/D68945
llvm-svn: 374877
Add a pass to lower is.constant and objectsize intrinsics
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374784
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374743
Add own version of the mathematical constants from the upcoming C++20 `std::numbers`.
Differential revision: https://reviews.llvm.org/D68257
llvm-svn: 374207
Earlier in the year intrinsics for lrint, llrint, lround and llround were
added to llvm. The constrained versions are now implemented here.
Reviewed by: andrew.w.kaylor, craig.topper, cameron.mcinally
Approved by: craig.topper
Differential Revision: https://reviews.llvm.org/D64746
llvm-svn: 373900
This is an omission in rL371441. Loads which happened to be unordered weren't being added to the PendingLoad set, and thus weren't be ordered w/respect to side effects which followed before the end of the block.
Included test case is how I spotted this. We had an atomic load being folded into a using instruction after a fence that load was supposed to be ordered with. I'm sure it showed up a bunch of other ways as well.
Spotted via manual inspecting of assembly differences in a corpus w/and w/o the new experimental mode. Finding this with testing would have been "unpleasant".
llvm-svn: 373814
This was reverted in r373454 due to breaking the expensive-checks bot.
This version addresses that by omitting the addSuccessorWithProb() call
when omitting the range check.
> Switch lowering: omit range check for bit tests when default is unreachable (PR43129)
>
> This is modeled after the same functionality for jump tables, which was
> added in r357067.
>
> Differential revision: https://reviews.llvm.org/D68131
llvm-svn: 373477
This is modeled after the same functionality for jump tables, which was
added in r357067.
Differential revision: https://reviews.llvm.org/D68131
llvm-svn: 373431
This intrinsics should be shift by immediate, but gcc allows any
i32 scalar and clang needs to match that. So we try to detect the
non-constant case and move the data from an integer register to an
MMX register.
Previously this was done by creating a v2i32 build_vector and
bitcast in SelectionDAGBuilder. This had to be done early since
v2i32 isn't a legal type. The bitcast+build_vector would be DAG
combined to X86ISD::MMX_MOVW2D which isel will turn into a
GPR->MMX MOVD.
This commit just moves the whole thing to lowering and emits
the X86ISD::MMX_MOVW2D directly to avoid the illegal type. The
test changes just seem to be due to nodes being linearized in a
different order.
llvm-svn: 372535
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)
This was missing one switch to getTargetConstant in an untested case.
llvm-svn: 372338
This broke the Chromium build, causing it to fail with e.g.
fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>
See llvm-commits thread of r372285 for details.
This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.
> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372314
Encode them directly as an imm argument to G_INTRINSIC*.
Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.
This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.
SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.
Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.
The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.
This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372285
This code was changed to accomodate fp128 being softened to itself
during type legalization on x86-64. This was done in order to create
libcalls while having fp128 as a legal type. We're now doing the
libcall creation during LegalizeDAG and the type legalization changes
to enable the old behavior have been removed. So this change to
SelectionDAGBuilder is no longer needed.
llvm-svn: 371771
This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM.
FastISel is mostly disabled for now since it would generate incorrect code for
ILP32.
llvm-svn: 371722
This is the first patch in a large sequence. The eventual goal is to have unordered atomic loads and stores - and possibly ordered atomics as well - handled through the normal ISEL codepaths for loads and stores. Today, there handled w/instances of AtomicSDNodes. The result of which is that all transforms need to be duplicated to work for unordered atomics. The benefit of the current design is that it's harder to introduce a silent miscompile by adding an transform which forgets about atomicity. See the thread on llvm-dev titled "FYI: proposed changes to atomic load/store in SelectionDAG" for further context.
Note that this patch is NFC unless the experimental flag is set.
The basic strategy I plan on taking is:
introduce infrastructure and a flag for testing (this patch)
Audit uses of isVolatile, and apply isAtomic conservatively*
piecemeal conservative* update generic code and x86 backedge code in individual reviews w/tests for cases which didn't check volatile, but can be found with inspection
flip the flag at the end (with minimal diffs)
Work through todo list identified in (2) and (3) exposing performance ops
(*) The "conservative" bit here is aimed at minimizing the number of diffs involved in (4). Ideally, there'd be none. In practice, getting it down to something reviewable by a human is the actual goal. Note that there are (currently) no paths which produce LoadSDNode or StoreSDNode with atomic MMOs, so we don't need to worry about preserving any behaviour there.
We've taken a very similar strategy twice before with success - once at IR level, and once at the MI level (post ISEL).
Differential Revision: https://reviews.llvm.org/D66309
llvm-svn: 371441
Summary:
Add an intrinsic that takes 2 unsigned integers with
the scale of them provided as the third argument and
performs fixed point multiplication on them. The
result is saturated and clamped between the largest and
smallest representable values of the first 2 operands.
This is a part of implementing fixed point arithmetic
in clang where some of the more complex operations
will be implemented as intrinsics.
Patch by: leonardchan, bjope
Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel
Reviewed By: leonardchan
Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57836
llvm-svn: 371308
When the number of return values exceeds the number of registers available,
SelectionDAGBuilder::visitRet transforms a function's return to use a
pointer to a buffer to hold return values. When the returned value is an
operator such as extractvalue, the value may have a non-zero result number.
Add that number to the indexing when obtaining the values to store.
This fixes https://bugs.llvm.org/show_bug.cgi?id=43132.
Differential Revision: https://reviews.llvm.org/D66978
llvm-svn: 370430
This implements constrained floating point intrinsics for FP to signed and
unsigned integers.
Quoting from D32319:
The purpose of the constrained intrinsics is to force the optimizer to
respect the restrictions that will be necessary to support things like the
STDC FENV_ACCESS ON pragma without interfering with optimizations when
these restrictions are not needed.
Reviewed by: Andrew Kaylor, Craig Topper, Hal Finkel, Cameron McInally, Roman Lebedev, Kit Barton
Approved by: Craig Topper
Differential Revision: http://reviews.llvm.org/D63782
llvm-svn: 370228
ConstantDataVector is a specialized verison of ConstantVector
that stores data in a packed array of bits instead of as
individual pointers to other Constants. But we really shouldn't
expose that if we can void it. And we should handle regular
ConstantVector equally well.
This removes a dyn_cast to ConstantDataVector and just calls
getSplatValue directly on a Constant* if the type is a vector.
llvm-svn: 370018