Commit Graph

1668 Commits

Author SHA1 Message Date
Joe Ellis 67464dfe36 [DebugInfo] Only perform TypeSize -> unsigned cast when necessary
This commit moves a line in SelectionDAGBuilder::handleDebugValue to
avoid implicitly casting a TypeSize object to an unsigned earlier than
necessary. It was possible that we bail out of the loop before the value
is ever used, which means we could create a superfluous TypeSize
warning.

Reviewed By: DavidTruby

Differential Revision: https://reviews.llvm.org/D96423
2021-02-11 13:54:09 +00:00
Hongtao Yu 1cb47a063e [CSSPGO] Unblock optimizations with pseudo probe instrumentation.
The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:

1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95982
2021-02-10 12:43:17 -08:00
Kazu Hirata 7e75f6fc1d [SelectionDAG] Use range-based for loops (NFC) 2021-02-09 22:14:30 -08:00
Nico Weber de1966e542 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 4a64d8fe39.
Makes clang crash when buildling trivial iOS programs, see comment
after https://reviews.llvm.org/D92808#2551401
2021-02-09 11:06:32 -05:00
Thomas Preud'homme a50ab8672d Revert STRICT_FCMP nonan optimisation
Summary: This reverts commit b7b61a7b5b which fails on some of the builders: http://lab.llvm.org:8011/#/builders/14/builds/5806

Reviewers:

Subscribers:
2021-02-09 11:27:35 +00:00
Thomas Preud'homme b7b61a7b5b Improve STRICT_FSETCC codegen in absence of no NaN
As for SETCC, use a less expensive condition code when generating
STRICT_FSETCC if the node is known not to have Nan.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D91972
2021-02-09 11:18:16 +00:00
Akira Hatanaka 4a64d8fe39 [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.

Original commit message:

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 06:09:42 -08:00
Akira Hatanaka 2fbbb18c1d Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 3fe3946d9a.

The commit violates layering by including a header from Analysis in
lib/IR/AutoUpgrade.cpp.
2021-02-05 06:00:05 -08:00
Akira Hatanaka 3fe3946d9a [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 05:55:18 -08:00
Kerry McLaughlin 9b4fcfaa9e [SVE][CodeGen] Remove performMaskedGatherScatterCombine
The AArch64 DAG combine added by D90945 & D91433 extends the index
of a scalable masked gather or scatter to i32 if necessary.

This patch removes the combine and instead adds shouldExtendGSIndex, which
is used by visitMaskedGather/Scatter in SelectionDAGBuilder to query whether
the index should be extended before calling getMaskedGather/Scatter.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D94525
2021-02-01 14:10:00 +00:00
Serge Pavlov bf416d166b [FPEnv] Intrinsic for setting rounding mode
To set non-default rounding mode user usually calls function 'fesetround'
from standard C library. This way has some disadvantages.

* It creates unnecessary dependency on libc. On the other hand, setting
  rounding mode requires few instructions and could be made by compiler.
  Sometimes standard C library even is not available, like in the case of
  GPU or AI cores that execute small kernels.
* Compiler could generate more effective code if it knows that a particular
  call just sets rounding mode.

This change introduces new IR intrinsic, namely 'llvm.set.rounding', which
sets current rounding mode, similar to 'fesetround'. It however differs
from the latter, because it is a lower level facility:

* 'llvm.set.rounding' does not return any value, whereas 'fesetround'
  returns non-zero value in the case of failure. In glibc 'fesetround'
  reports failure if its argument is invalid or unsupported or if floating
  point operations are unavailable on the hardware. Compiler usually knows
  what core it generates code for and it can validate arguments in many
  cases.
* Rounding mode is specified in 'fesetround' using constants like
  'FE_TONEAREST', which are target dependent. It is inconvenient to work
  with such constants at IR level.

C standard provides a target-independent way to specify rounding mode, it
is used in FLT_ROUNDS, however it does not define standard way to set
rounding mode using this encoding.

This change implements only IR intrinsic. Lowering it to machine code is
target-specific and will be implemented latter. Mapping of 'fesetround'
to 'llvm.set.rounding' is also not implemented here.

Differential Revision: https://reviews.llvm.org/D74729
2021-02-01 11:28:14 +07:00
Fangrui Song d5bbaaaf95 [XRay] Make __xray_customevent support non-Linux 2021-01-25 00:48:21 -08:00
Kazu Hirata 16baad8f4e [llvm] Use pop_back_val (NFC) 2021-01-24 12:18:57 -08:00
Craig Topper 79e798aca3 Recommit "[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results."
This recommits 2c51bef76c.

I've fixed the broken check line from when I renamed the test function.

Original commit message:
This builds on D94142 where scalable vectors are allowed in structs.

I did have to fix one scalable vector issue in the vector type
creation for these intrinsics where we used getVectorNumElements
instead of ElementCount.
2021-01-18 11:08:28 -08:00
Craig Topper 5d431c3d32 Revert "[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results."
This reverts commit 2c51bef76c.

I seem to have messed up the check lines in the test.
2021-01-18 11:00:20 -08:00
Craig Topper 2c51bef76c [RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results.
This builds on D94142 where scalable vectors are allowed in structs.

I did have to fix one scalable vector issue in the vector type
creation for these intrinsics where we used getVectorNumElements
instead of ElementCount.

Differential Revision: https://reviews.llvm.org/D94149
2021-01-18 10:41:36 -08:00
Jeroen Dobbelaere 668827b648 Introduce llvm.noalias.decl intrinsic
The ``llvm.experimental.noalias.scope.decl`` intrinsic identifies where a noalias
scope is declared. When the intrinsic is duplicated, a decision must
also be made about the scope: depending on the reason of the duplication,
the scope might need to be duplicated as well.

Reviewed By: nikic, jdoerfert

Differential Revision: https://reviews.llvm.org/D93039
2021-01-16 09:20:45 +01:00
Heejin Ahn 9e4eadeb13 [WebAssembly] Update basic EH instructions for the new spec
This implements basic instructions for the new spec.

- Adds new versions of instructions: `catch`, `catch_all`, and `rethrow`
- Adds support for instruction selection for the new instructions
 - `catch` needs a custom routine for the same reason `throw` needs one,
   to encode `__cpp_exception` tag symbol.
- Updates `WebAssembly::isCatch` utility function to include `catch_all`
  and Change code that compares an instruction's opcode with `catch` to
  use that function.
- LateEHPrepare
  - Previously in LateEHPrepare we added `catch` instruction to both
    `catchpad`s (for user catches) and `cleanuppad`s (for destructors).
    In the new version `catch` is generated from `llvm.catch` intrinsic
    in instruction selection phase, so we only need to add `catch_all`
    to the beginning of cleanup pads.
  - `catch` is generated from instruction selection, but we need to
    hoist the `catch` instruction to the beginning of every EH pad,
    because `catch` can be in the middle of the EH pad or even in a
    split BB from it after various code transformations.
  - Removes `addExceptionExtraction` function, which was used to
    generate `br_on_exn` before.
- CFGStackfiy: Deletes `fixUnwindMismatches` function. Running this
  function on the new instruction causes crashes, and the new version
  will be added in a later CL, whose contents will be completely
  different. So deleting the whole function will make the diff easier to
  read.
- Reenables all disabled tests in exception.ll and eh-lsda.ll and a
  single basic test in cfg-stackify-eh.ll.
- Updates existing tests to use the new assembly format. And deletes
  `br_on_exn` instructions from the tests and FileCheck lines.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D94040
2021-01-09 01:48:06 -08:00
Heejin Ahn 7be271537e [WebAssembly] Rename wasm_rethrow_in_catch intrinsic/builtin
`wasm_rethrow_in_catch` intrinsic and builtin are used in order to
rethrow an exception when the exception is caught but there is no
matching clause within the current `catch`. For example,
```
try {
  foo();
} catch (int n) {
  ...
}
```
If the caught exception does not correspond to C++ `int` type, it should
be rethrown. These intrinsic/builtin were renamed `rethrow_in_catch`
because at the time I thought there would be another intrinsic for C++'s
`throw` keyword, which rethrows an exception. It turned out that `throw`
keyword doesn't require wasm's `rethrow` instruction, so we rename
`rethrow_in_catch` to just `rethrow` here.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D94038
2021-01-08 06:55:04 -08:00
Juneyoung Lee 5cdf6ed744 [CodeGen] recognize select form of and/ors when splitting branch conditions
Recently a few patches are made to move towards using select i1 instead of and/or i1 to represent "a && b"/"a || b" in C/C++.
"a && b" in C/C++ does not evaluate b if a is false whereas 'and a, b' in IR evaluates b and uses its result regardless of the result of a.
This is problematic because it can cause miscompilation if b was an erroneous operation (https://llvm.org/pr48353).
In C/C++, the result is simply false because b is not evaluated, but in IR the result is poison.
The discussion at D93065 has more context about this.

This patch makes two branch-splitting optimizations (one in SelectionDAGBuilder, one in CodeGenPrepare) recognize
select form of and/or as well using m_LogicalAnd/Or.
Since it is CodeGen, I think this is semantically ok (at least as safe as what codegen already did).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93853
2021-01-01 04:46:10 +09:00
Kazu Hirata 1e3ed09165 [CodeGen] Use llvm::append_range (NFC) 2020-12-28 19:55:16 -08:00
Bjorn Pettersson a89d751fb4 Add intrinsics for saturating float to int casts
This patch adds support for the fptoui.sat and fptosi.sat intrinsics,
which provide basically the same functionality as the existing fptoui
and fptosi instructions, but will saturate (or return 0 for NaN) on
values unrepresentable in the target type, instead of returning
poison. Related mailing list discussion can be found at:
https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ

The intrinsics have overloaded source and result type and support
vector operands:

    i32 @llvm.fptoui.sat.i32.f32(float %f)
    i100 @llvm.fptoui.sat.i100.f64(double %f)
    <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f)
    // etc

On the SelectionDAG layer two new ISD opcodes are added,
FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands
and one result. The second operand is an integer constant specifying
the scalar saturation width. The idea here is that initially the
second operand and the scalar width of the result type are the same,
but they may change during type legalization. For example:

    i19 @llvm.fptsi.sat.i19.f32(float %f)
    // builds
    i19 fp_to_sint_sat f, 19
    // type legalizes (through integer result promotion)
    i32 fp_to_sint_sat f, 19

I went for this approach, because saturated conversion does not
compose well. There is no good way of "adjusting" a saturating
conversion to i32 into one to i19 short of saturating twice.
Specifying the saturation width separately allows directly saturating
to the correct width.

There are two baseline expansions for the fp_to_xint_sat opcodes. If
the integer bounds can be exactly represented in the float type and
fminnum/fmaxnum are legal, we can expand to something like:

    f = fmaxnum f, FP(MIN)
    f = fminnum f, FP(MAX)
    i = fptoxi f
    i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN

If the bounds cannot be exactly represented, we expand to something
like this instead:

    i = fptoxi f
    i = select f ult FP(MIN), MIN, i
    i = select f ogt FP(MAX), MAX, i
    i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN

It should be noted that this expansion assumes a non-trapping fptoxi.

Initial tests are for AArch64, x86_64 and ARM. This exercises all of
the scalar and vector legalization. ARM is included to test float
softening.

Original patch by @nikic and @ebevhan (based on D54696).

Differential Revision: https://reviews.llvm.org/D54749
2020-12-18 11:09:41 +01:00
Matt Arsenault 2e0e03c6a0 OpaquePtr: Require byval on x86_intrcc parameter 0
Currently the backend special cases x86_intrcc and treats the first
parameter as byval. Make the IR require byval for this parameter to
remove this special case, and avoid the dependence on the pointee
element type.

Fixes bug 46672.

I'm not sure the IR is enforcing all the calling convention
constraints. clang seems to ignore the attribute for empty parameter
lists, but the IR tolerates it.
2020-12-14 16:34:37 -05:00
Kerry McLaughlin 4519ff4b6f [SVE][CodeGen] Add the ExtensionType flag to MGATHER
Adds the ExtensionType flag, which reflects the LoadExtType of a MaskedGatherSDNode.
Also updated SelectionDAGDumper::print_details so that details of the gather
load (is signed, is scaled & extension type) are printed.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91084
2020-12-09 11:19:08 +00:00
Joe Ellis 80c33de2d3 [SelectionDAG] Add llvm.vector.{extract,insert} intrinsics
This commit adds two new intrinsics.

- llvm.experimental.vector.insert: used to insert a vector into another
  vector starting at a given index.

- llvm.experimental.vector.extract: used to extract a subvector from a
  larger vector starting from a given index.

The codegen work for these intrinsics has already been completed; this
commit is simply exposing the existing ISD nodes to LLVM IR.

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D91362
2020-12-09 11:08:41 +00:00
Simon Moll 3ffbc79357 [VP] Build VP SDNodes
Translate VP intrinsics to VP_* SDNodes.  The tests check whether a
matching vp_* SDNode is emitted.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D91441
2020-12-09 11:36:51 +01:00
Tim Northover c5978f42ec UBSAN: emit distinctive traps
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
2020-12-08 10:28:26 +00:00
Kerry McLaughlin f6dd32fd35 [SVE][CodeGen] Lower scalable masked gathers
Lowers the llvm.masked.gather intrinsics (scalar plus vector addressing mode only)

Changes in this patch:
- Add custom lowering for MGATHER, using getGatherVecOpcode() to choose the appropriate
  gather load opcode to use.
- Improve codegen with refineIndexType/refineUniformBase, added in D90942
- Tests added for gather loads with 32 & 64-bit scaled & unscaled offsets.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91092
2020-12-07 12:20:41 +00:00
Kazu Hirata a553ac9791 [CodeGen] llvm::erase_if (NFC) 2020-12-05 15:44:40 -08:00
Francesco Petrogalli f6150aa41a [SelectionDAGBuilder] Update signature of `getRegsAndSizes()`.
The mapping between registers and relative size has been updated to
use TypeSize to account for the size of scalable EVTs.

The patch is a NFCI, if not for the fact that with this change the
function `getUnderlyingArgRegs` does not raise a warning for implicit
conversion of `TypeSize` to `unsigned` when generating machine code
from the test added to the patch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D92096
2020-11-30 17:38:51 +00:00
Nikita Popov 4df8efce80 [AA] Split up LocationSize::unknown()
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649
2020-11-26 18:39:55 +01:00
Craig Topper 2d6042937b [SelectionDAGBuilder] Add SPF_NABS support to visitSelect
We currently don't match this which limits the effectiveness of D91120 until
InstCombine starts canonicalizing to llvm.abs. This should be easy to remove
if/when we remove the SPF_ABS handling.

Differential Revision: https://reviews.llvm.org/D92118
2020-11-25 14:54:26 -08:00
Hongtao Yu d0e42037bf [CSSPGO] MIR target-independent pseudo instruction for pseudo-probe intrinsic
This change introduces a MIR target-independent pseudo instruction corresponding to the IR intrinsic llvm.pseudoprobe for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

An `llvm.pseudoprobe` intrinsic call will be lowered into a target-independent operation named `PSEUDO_PROBE`. Given the following instrumented IR,

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}
```
the corresponding MIR is shown below. Note that block `bb3` is duplicated into `bb1` and `bb2` where its probe is duplicated too. This allows for an accurate execution count to be collected for `bb3`, which is basically the sum of the counts of `bb1` and `bb2`.

```
bb.0.bb0:
   frame-setup PUSH64r undef $rax, implicit-def $rsp, implicit $rsp
   TEST32rr killed renamable $edi, renamable $edi, implicit-def $eflags
   PSEUDO_PROBE 837061429793323041, 1, 0
   $edi = MOV32ri 1, debug-location !13; test.c:0
   JCC_1 %bb.1, 4, implicit $eflags

bb.2.bb2:
   PSEUDO_PROBE 837061429793323041, 3, 0
   PSEUDO_PROBE 837061429793323041, 4, 0
   $rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp
   RETQ

bb.1.bb1:
   PSEUDO_PROBE 837061429793323041, 2, 0
   PSEUDO_PROBE 837061429793323041, 4, 0
   $rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp
   RETQ
```

The target op PSEUDO_PROBE will be converted into a piece of binary data by the object emitter with no machine instructions generated. This is done in a different patch.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86495
2020-11-20 10:52:43 -08:00
Craig Topper a7eae62a42 [SelectionDAG][X86][PowerPC][Mips] Replace the default implementation of LowerOperationWrapper with the X86 and PowerPC version.
The default version only works if the returned node has a single
result. The X86 and PowerPC versions support multiple results
and allow a single result to be returned from a node with
multiple outputs. And allow a single result that is not result 0
of the node.

Also replace the Mips version since the new version should work
for it. The original version handled multiple results, but only
if the new node and original node had the same number of results.

Differential Revision: https://reviews.llvm.org/D91846
2020-11-20 10:06:53 -08:00
Nikita Popov 393b9e9db3 [MemLoc] Require LocationSize argument (NFC)
When constructing a MemoryLocation by hand, require that a
LocationSize is explicitly specified. D91649 will split up
LocationSize::unknown() into two different states, and callers
should make an explicit choice regarding the kind of MemoryLocation
they want to have.
2020-11-19 21:45:52 +01:00
Leonard Chan a97f62837f [llvm][IR] Add dso_local_equivalent Constant
The `dso_local_equivalent` constant is a wrapper for functions that represents a
value which is functionally equivalent to the global passed to this. That is, if
this accepts a function, calling this constant should have the same effects as
calling the function directly. This could be a direct reference to the function,
the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the
resolved function as a call target.

When lowered, the returned address must have a constant offset at link time from
some other symbol defined within the same binary. The address of this value is
also insignificant. The name is leveraged from `dso_local` where use of a function
or variable is resolved to a symbol in the same linkage unit.

In this patch:
- Addition of `dso_local_equivalent` and handling it
- Update Constant::needsRelocation() to strip constant inbound GEPs and take
  advantage of `dso_local_equivalent` for relative references

This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959)
which makes vtables readonly. This works by replacing the dynamic relocations for
function pointers in them with static relocations that represent the offset between
the vtable and virtual functions. If a function is externally defined,
`dso_local_equivalent` can be used as a generic wrapper for the function to still
allow for this static offset calculation to be done.

See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details.

Differential Revision: https://reviews.llvm.org/D77248
2020-11-19 10:26:17 -08:00
Florian Hahn 1983acce7c
[SelDAGBuilder] Do not require simple VTs for constraints.
In some cases, the values passed to `asm sideeffect` calls cannot be
mapped directly to simple MVTs. Currently, we crash in the backend if
that happens. An example can be found in the @test_vector_too_large_r_m
test case, where we pass <9 x float> vectors. In practice, this can
happen in cases like the simple C example below.

using vec = float __attribute__((ext_vector_type(9)));
void f1 (vec m) {
  asm volatile("" : "+r,m"(m) : : "memory");
}

One case that use "+r,m" constraints for arbitrary data types in
practice is google-benchmark's DoNotOptimize.

This patch updates visitInlineAsm so that it use MVT::Other for
constraints with complex VTs. It looks like the rest of the backend
correctly deals with that and properly legalizes the type.

And we still report an error if there are no registers to satisfy the
constraint.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D91710
2020-11-19 09:31:54 +00:00
Kerry McLaughlin 170947a5de [SVE][CodeGen] Lower scalable masked scatters
Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only)

Changes included in this patch:
 - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use.
    Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts.
 - Added the getCanonicalIndexType function to convert redundant addressing
   modes (e.g. scaling is redundant when accessing bytes)
 - Tests with 32 & 64-bit scaled & unscaled offsets

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90941
2020-11-11 11:50:22 +00:00
Kerry McLaughlin ffbbfc76ca [SVE][CodeGen] Add the isTruncatingStore flag to MSCATTER
This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter().
Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print
the details of masked scatters (is truncating, signed or scaled).

This is the first in a series of patches which adds support for scalable masked scatters

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90939
2020-11-11 10:58:24 +00:00
Gaurav Jain 4634ad6c0b [NFC] Set return type of getStackPointerRegisterToSaveRestore to Register
Differential Revision: https://reviews.llvm.org/D89858
2020-10-21 16:19:38 -07:00
David Sherwood 35a531fb45 [SVE][CodeGen][NFC] Replace TypeSize comparison operators with their scalar equivalents
In certain places in llvm/lib/CodeGen we were relying upon the TypeSize
comparison operators when in fact the code was only ever expecting
either scalar values or fixed width vectors. I've changed some of these
places to use the equivalent scalar operator.

Differential Revision: https://reviews.llvm.org/D88482
2020-10-19 08:30:31 +01:00
David Sherwood f693f915a0 [SVE][CodeGen] Replace uses of TypeSize comparison operators
In certain places in the code we can never end up in a situation where
we're mixing fixed width and scalable vector types. For example,
we can't have truncations and extends that change the lane count. Also,
in other places such as GenWidenVectorStores and GenWidenVectorLoads we
know from the behaviour of FindMemType that we can never choose a vector
type with a different scalable property.

In various places I have used EVT::bitsXY functions instead of
TypeSize::isKnownXY, where it probably makes sense to keep an assert
that scalable properties match.

Differential Revision: https://reviews.llvm.org/D88654
2020-10-19 08:08:41 +01:00
Amara Emerson e72cfd938f Rename the VECREDUCE_STRICT_{FADD,FMUL} SDNodes to VECREDUCE_SEQ_{FADD,FMUL}.
The STRICT was causing unnecessary confusion. I think SEQ is a more accurate
name for what they actually do, and the other obvious option of "ORDERED"
has the issue of already having a meaning in FP contexts.

Differential Revision: https://reviews.llvm.org/D88791
2020-10-07 10:45:09 -07:00
Amara Emerson 322d0afd87 [llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle legacy intrinsics.

Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

Differential Revision: https://reviews.llvm.org/D88787
2020-10-07 10:36:44 -07:00
Craig Topper 1127662c6d [SelectionDAG] Make sure FMF are propagated when getSetcc canonicalizes FP constants to RHS.
getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize
FP constants to the RHS. When this happens we should create the node
with the FMF that was requested. By using FlagInserter when can ensure
any calls to getNode/getSetcc during canonicalization will also get the flags.

Differential Revision: https://reviews.llvm.org/D88063
2020-10-05 14:55:23 -07:00
Dávid Bolvanský 179e15d53a [SystemZ] Optimize bcmp calls (PR47420)
Solves https://bugs.llvm.org/show_bug.cgi?id=47420

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87988
2020-09-25 17:55:39 +02:00
Bill Wendling 0c0c57f7b2 Revert "[CodeGen] Postprocess PHI nodes for callbr"
Accidental commit.

This reverts commit 7f4c940bd0.
2020-09-24 14:35:23 -07:00
Bill Wendling 7f4c940bd0 [CodeGen] Postprocess PHI nodes for callbr
When processing PHI nodes after a callbr, we need to make sure that the
PHI nodes on the default branch are resolved after the callbr
(inserted after INLINEASM_BR). The PHI node values on the indirect
branches are processed before the INLINEASM_BR.

Differential Revision: https://reviews.llvm.org/D86260
2020-09-24 14:34:28 -07:00
Lucas Prates 53d238a961 [CodeGen] Fixing inconsistent ABI mangling of vlaues in SelectionDAGBuilder
SelectionDAGBuilder was inconsistently mangling values based on ABI
Calling Conventions when getting them through copyFromRegs in
SelectionDAGBuilder, causing duplicate value type convertions for
function arguments. The checking for the mangling requirement was based
on the value's originating instruction and was performed outside of, and
inspite of, the regular Calling Convention Lowering.

The issue could be observed in a scenario such as:

```
%arg1 = load half, half* %const, align 2
%arg2 = call fastcc half @someFunc()
call fastcc void @otherFunc(half %arg1, half %arg2)
; Here, %arg2 was incorrectly mangled twice, as the CallConv data from
; the call to @someFunc() was taken into consideration for the check
; when getting the value for processing the call to @otherFunc(...),
; after the proper convertion had taken place when lowering the return
; value of the first call.
```

This patch fixes the issue by disregarding the Calling Convention
information for such copyFromRegs, making sure the ABI mangling is
properly contanined in the Calling Convention Lowering.

This fixes Bugzilla #47454.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87844
2020-09-21 10:05:34 +01:00
Simon Pilgrim bee79cdcc6 SelectionDAGBuilder.h - remove unnecessary includes. NFCI.
Reduce to forward declarations and move implicit dependencies down to the cpp files.
2020-09-15 12:18:22 +01:00
Craig Topper c193a689b4 [SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore.
The versions that take 'unsigned' will be removed in the future.

I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87592
2020-09-14 13:54:50 -07:00
Craig Topper 844e94a502 [SelectionDAGBuilder] Remove Unnecessary FastMathFlags temporary. Use SDNodeFlags instead. NFCI
This was a missed simplication in D87200
2020-09-08 15:50:12 -07:00
Craig Topper b1e68f885b [SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.:
This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130.

This required adding a SDNodeFlags to SelectionDAG::getSetCC.

Now we manage to contant fold some stuff undefs during the
initial getNode that we don't do in later DAG combines.

Differential Revision: https://reviews.llvm.org/D87200
2020-09-08 15:27:21 -07:00
Jonas Paulsson 714ceefad9 [SelectionDAG] Always intersect SDNode flags during getNode() node memoization.
Previously SDNodeFlags::instersectWith(Flags) would do nothing if Flags was
in an undefined state, which is very bad given that this is the default when
getNode() is called without passing an explicit SDNodeFlags argument.

This meant that if an already existing and reused node had a flag which the
second caller to getNode() did not set, that flag would remain uncleared.

This was exposed by https://bugs.llvm.org/show_bug.cgi?id=47092, where an NSW
flag was incorrectly set on an add instruction (which did in fact overflow in
one of the two original contexts), so when SystemZElimCompare removed the
compare with 0 trusting that flag, wrong-code resulted.

There is more that needs to be done in this area as discussed here:

Differential Revision: https://reviews.llvm.org/D86871

Review: Ulrich Weigand, Sanjay Patel
2020-09-05 10:30:38 +02:00
David Sherwood f4257c5832 [SVE] Make ElementCount members private
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().

Differential Revision: https://reviews.llvm.org/D86065
2020-08-28 14:43:53 +01:00
Lucas Prates 3d943bcd22 [CodeGen] Properly propagating Calling Convention information when lowering vector arguments
When joining the legal parts of vector arguments into its original value
during the lower of Formal Arguments in SelectionDAGBuilder, the Calling
Convention information was not being propagated for the handling of each
individual parts. The same did not happen when lowering calls, causing a
mismatch.

This patch fixes the issue by properly propagating the Calling
Convention details.

This fixes Bugzilla #47001.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D86715
2020-08-27 17:01:10 +01:00
aartbik 72305a08ff [llvm] [DAG] Fix bug in llvm.get.active.lane.mask lowering
This intrinsic only accepted proper machine vector lengths.
Fixed by this change. With unit tests.

https://bugs.llvm.org/show_bug.cgi?id=47299

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D86585
2020-08-26 10:16:31 -07:00
Sjoerd Meijer 39522b1e10 [SelectionDAG] Legalize intrinsic get.active.lane.mask
This adapts legalization of intrinsic get.active.lane.mask to the new semantics
as described in D86147. Because the second argument is now the loop tripcount,
we legalize this intrinsic to an 'icmp ULT' instead of an ULE when it was the
backedge-taken count.

Differential Revision: https://reviews.llvm.org/D86302
2020-08-25 15:00:10 +01:00
Jay Foad 0819a6416f [SelectionDAG] Better legalization for FSHL and FSHR
In SelectionDAGBuilder always translate the fshl and fshr intrinsics to
FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and
ORs. Improve the legalization of FSHL and FSHR to avoid code quality
regressions.

Differential Revision: https://reviews.llvm.org/D77152
2020-08-21 10:32:49 +01:00
Mehdi Amini a407ec9b6d Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private.""
Was reverted because MLIR/Flang builds were broken, these APIs have been
fixed in the meantime.
2020-08-19 17:26:36 +00:00
Mehdi Amini 4fc56d70aa Revert "[NFC][llvm] Make the contructors of `ElementCount` private."
This reverts commit 264afb9e6a.
(and dependent 6b742cc48 and fc53bd610f)

MLIR/Flang are broken.
2020-08-19 17:21:37 +00:00
Francesco Petrogalli 264afb9e6a [NFC][llvm] Make the contructors of `ElementCount` private.
Differential Revision: https://reviews.llvm.org/D86120
2020-08-19 16:26:44 +00:00
Kerry McLaughlin 85c7e89f3b [CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize
Changes the Offset arguments to both functions from int64_t to TypeSize
& updates all uses of the functions to create the offset using TypeSize::Fixed()

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85220
2020-08-11 12:17:10 +01:00
Bevin Hansson 5de6c56f7e [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics.
Summary:
This patch adds two intrinsics, llvm.sshl.sat and llvm.ushl.sat,
which perform signed and unsigned saturating left shift,
respectively.

These are useful for implementing the Embedded-C fixed point
support in Clang, originally discussed in
http://lists.llvm.org/pipermail/llvm-dev/2018-August/125433.html
and
http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html

Reviewers: leonardchan, craig.topper, bjope, jdoerfert

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83216
2020-08-07 15:09:24 +02:00
Jay Foad 28e322ea93 [PowerPC] Custom lowering for funnel shifts
The custom lowering saves an instruction over the generic expansion, by
taking advantage of the fact that PowerPC shift instructions are well
defined in the shift-by-bitwidth case.

Differential Revision: https://reviews.llvm.org/D83948
2020-08-04 16:30:49 +01:00
Cameron McInally 31c7a2fd5c [FPEnv] Don't transform FSUB(-0,X)->FNEG(X) in SelectionDAGBuilder.
This patch stops unconditionally transforming FSUB(-0,X) into an FNEG(X) while building the DAG. There is also one small change to handle the new FSUB(-0,X) similarly to FNEG(X) in the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D84056
2020-08-03 10:22:25 -05:00
Matt Arsenault 57bd64ff84 Support addrspacecast initializers with isNoopAddrSpaceCast
Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs
with the DataLayout.
2020-07-31 10:42:43 -04:00
Vitaly Buka b0eb40ca39 [NFC] Remove unused GetUnderlyingObject paramenter
Depends on D84617.

Differential Revision: https://reviews.llvm.org/D84621
2020-07-31 02:10:03 -07:00
Vitaly Buka 89051ebace [NFC] GetUnderlyingObject -> getUnderlyingObject
I am going to touch them in the next patch anyway
2020-07-30 21:08:24 -07:00
Jon Roelofs afae6d97fa [SelectionDAG] Fix lowering of vector geps
This fixes an assertion failure that was being triggered in
SelectionDAG::getZeroExtendInReg(), where it was trying to extend the <2xi32>
to i64 (which should have been <2xi64>).

Fixes: rdar://66016901

Differential Revision: https://reviews.llvm.org/D84884
2020-07-30 14:56:53 -06:00
Nikita Popov deb4bb2b3a [IR] Add min/max/abs intrinsics
This adds the llvm.abs(), llvm.umin(), llvm.umax(), llvm.smin(),
and llvm.smax() intrinsics specified in D81829. For SelectionDAG,
the ISD opcodes and all the legalization and lowering already exist,
so this just wires them up to the intrinsic in the SDAG builder and
adds rudimentary tests. For GlobalISel only the min/max intrinsics
are wired up, as llvm.abs() will require the addition of a G_ABS op,
and corresponding legalization support.

Differential Revision: https://reviews.llvm.org/D84125
2020-07-23 20:56:19 +02:00
Simon Pilgrim fa95688237 SelectionDAGBuilder.cpp - remove duplicate includes that already exist in SelectionDAGBuilder.h. NFC. 2020-07-22 14:19:41 +01:00
Matt Arsenault f659c44016 CodeGen: Add support for lowering byref attribute 2020-07-21 17:38:15 -04:00
Matt Arsenault 023883a834 IR: Rename Argument::hasPassPointeeByValueAttr to prepare for byref
When the byref attribute is added, there will need to be two similar
functions for the existing cases which have an associate value copy,
and byref which does not. Most, but not all of the existing uses will
use the existing version.

The associated size function added by D82679 also needs to
contextually differ, and will help eliminate a few places still
relying on pointee element types.
2020-07-16 13:50:49 -04:00
Tim Northover 5165b2b5fd AArch64+ARM: make LLVM consider system registers volatile.
Some of the system registers readable on AArch64 and ARM platforms
return different values with each read (for example a timer counter),
these shouldn't be hoisted outside loops or otherwise interfered with,
but the normal @llvm.read_register intrinsic is only considered to read
memory.

This introduces a separate @llvm.read_volatile_register intrinsic and
maps all system-registers on ARM platforms to use it for the
__builtin_arm_rsr calls. Registers declared with asm("r9") or similar
are unaffected.
2020-07-15 09:47:36 +01:00
Christopher Tetreault ff5b9a7b3b [SVE] Remove calls to VectorType::getNumElements from CodeGen
Reviewers: efriedma, fpetrogalli, sdesmalen, RKSimon, arsenm

Reviewed By: RKSimon

Subscribers: wdng, tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82210
2020-07-09 12:43:36 -07:00
Heejin Ahn 7e6793aa33 [WebAssembly] Generate unreachable after __stack_chk_fail
`__stack_chk_fail` does not return, but `unreachable` was not generated
following `call __stack_chk_fail`. This had a possibility to generate an
invalid binary for functions with a return type, because
`__stack_chk_fail`'s return type is void and `call __stack_chk_fail` can
be the last instruction in the function whose return type is non-void.
Generating `unreachable` after it makes sure CFGStackify's
`fixEndsAtEndOfFunction` handles it correctly.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D83277
2020-07-08 01:02:05 -07:00
David Sherwood c061e56e88 [CodeGen] Fix warnings in sve-vector-splat.ll and sve-trunc.ll
This patch fixes all remaining warnings in:

  llvm/test/CodeGen/AArch64/sve-trunc.ll
  llvm/test/CodeGen/AArch64/sve-vector-splat.ll

I hit some warnings related to getCopyPartsToVector. I fixed two
issues:

1. In widenVectorToPartType() we assumed that we'd always be
using BUILD_VECTOR nodes to expand from one vector type to another,
which is incorrect for scalable vector types. I've fixed this for now
by simply bailing out immediately for scalable vectors.
2. In getCopyToPartsVector() I've changed the code to compare
the element counts of different types.

Differential Revision: https://reviews.llvm.org/D83028
2020-07-07 09:21:47 +01:00
Guillaume Chatelet 87e2751cf0 [Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D83082
2020-07-03 08:06:43 +00:00
Sanjay Patel bc110de78a [SelectionDAG] don't split branch on logic-of-vector-compares
SelectionDAGBuilder converts logic-of-compares into multiple branches based
on a boolean TLI setting in isJumpExpensive(). But that probably never
considered the pattern of extracted bools from a vector compare - it seems
unlikely that we would want to turn vector logic into control-flow.

The motivating x86 reduction case is shown in PR44565:
https://bugs.llvm.org/show_bug.cgi?id=44565
...and that test shows the expected improvement from using pmovmsk codegen.

For AArch64, I modified the test to include an extra op because the simpler
test gets transformed by a codegen invocation of SimplifyCFG.

Differential Revision: https://reviews.llvm.org/D82602
2020-07-02 17:05:24 -04:00
David Sherwood c7df35d2b2 [CodeGen] Fix warnings in getCopyToPartsVector
Whilst trying to assemble the following test:

  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_set2.c

I discovered we were hitting some warnings about possible invalid
calls to getVectorNumElements() in getCopyToPartsVector(). I've
tried to fix these by using ElementCount types where possible and
I've made the assumption that we don't support using a fixed width
vector to copy parts of a scalable vector, and vice versa. Looking
at how the copy is implemented I think that's the right thing for
now.

Differential Revision: https://reviews.llvm.org/D82744
2020-07-02 09:08:20 +01:00
James Y Knight 4b0aa5724f Change the INLINEASM_BR MachineInstr to be a non-terminating instruction.
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.

Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.

Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.

Reviewed By: nickdesaulniers, void

Differential Revision: https://reviews.llvm.org/D79794
2020-07-01 12:51:50 -04:00
Guillaume Chatelet d3085c2501 [Alignment][NFC] Transition and simplify calls to DL::getABITypeAlignment
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82956
2020-07-01 14:31:56 +00:00
Guillaume Chatelet 28de229bc6 [Alignment][NFC] Migrate MachineFrameInfo::CreateStackObject to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82894
2020-07-01 07:28:11 +00:00
Sjoerd Meijer 243a5329d4 [SelectionDAG] Lower @llvm.get.active.lane.mask to setcc
This lowers intrinsic @llvm.get.active.lane.mask to a setcc node, i.e. an icmp
ule, and creates vectors for its 2 arguments on which the comparison is
performed.

Differential Revision: https://reviews.llvm.org/D82292
2020-06-26 07:46:38 +01:00
Michael Liao b1360caa82 [SDAG] Add new AssertAlign ISD node.
Summary:
- AssertAlign node records the guaranteed alignment on its source node,
  where these alignments are retrieved from alignment attributes in LLVM
  IR. These tracked alignments could help DAG combining and lowering
  generating efficient code.
- In this patch, the basic support of AssertAlign node is added. So far,
  we only generate AssertAlign nodes on return values from intrinsic
  calls.
- Addressing selection in AMDGPU is revised accordingly to capture the
  new (base + offset) patterns.

Reviewers: arsenm, bogner

Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81711
2020-06-23 00:51:11 -04:00
Lucas Prates a255931c40 [ARM] Supporting lowering of half-precision FP arguments and returns in AArch32's backend
Summary:
Half-precision floating point arguments and returns are currently
promoted to either float or int32 in clang's CodeGen and there's
no existing support for the lowering of `half` arguments and returns
from IR in AArch32's backend.

Such frontend coercions, implemented as coercion through memory
in clang, can cause a series of issues in argument lowering, as causing
arguments to be stored on the wrong bits on big-endian architectures
and incurring in missing overflow detections in the return of certain
functions.

This patch introduces the handling of half-precision arguments and returns in
the backend using the actual "half" type on the IR. Using the "half"
type the backend is able to properly enforce the AAPCS' directions for
those arguments, making sure they are stored on the proper bits of the
registers and performing the necessary floating point convertions.

Reviewers: rjmccall, olista01, asl, efriedma, ostannard, SjoerdMeijer

Reviewed By: ostannard

Subscribers: stuij, hiraditya, dmgreen, llvm-commits, chill, dnsampaio, danielkiss, kristof.beyls, cfe-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75169
2020-06-18 13:15:13 +01:00
Simon Wallis 4dba59689d [ARM] prologue instructions emitted for naked function with >64 byte argument
Summary:

The naked function attribute is meant to suppress all function
prologue/epilogue instructions.

On ARM, some are still emitted if an argument greater than 64 bytes in size
(the threshold for using the byval attribute in IR) is passed partially
in registers.

Perform the check for Attribute::Naked and early exit in
SelectionDAGISel::LowerArguments().

Checking in ARMFrameLowering::determineCalleeSaves() is too late.

A test case is included.

Reviewers: llvm-commits, olista01, danielkiss

Reviewed By: danielkiss

Subscribers: kristof.beyls, hiraditya, danielkiss

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80715

Change-Id: Icedecf2a4ad31bc3c35ab0df7489a9d346e1f7cc
2020-06-09 11:33:03 +01:00
Christopher Tetreault caa2fddce7 [SVE] Eliminate calls to default-false VectorType::get() from CodeGen
Reviewers: efriedma, c-rhodes, david-arm, spatel, craig.topper, aqjune, paquette, arsenm, gchatelet

Reviewed By: spatel, gchatelet

Subscribers: wdng, tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80313
2020-06-08 10:26:10 -07:00
Philip Reames 4c735439fd [Statepoint] Migrate a few tests to gc-live bundle format and fix assert
The assert was missed in 0e7c7705, migrating the test revealed the problem.
2020-06-04 18:15:58 -07:00
Henry Kao c57e41c000 [CodeGen][SVE] Replace deprecated calls in getCopyFromPartsVector()
Summary: Replaced getVectorNumElements() with getVectorElementCount(). Added operator overloads for class ElementCount. Fixes warning in several AArch64 unit tests.

Reviewers: sdesmalen, kmclaughlin, dancgr, efriedma, each, andwar, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80826
2020-06-03 11:20:02 -04:00
David Sherwood d8a78889f6 [CodeGen] Fix warning in visitShuffleVector
Make sure we only ask for the number of elements after we've
bailed out for scalable vectors.

Differential revision: https://reviews.llvm.org/D80632
2020-05-29 17:09:59 +01:00
Philip Reames 98a87c65a3 [Statepoint] Reduce scope of usage of ImmutableStatepoint
Can't quite fully remove it yet as some more items need sunk the GCStatepointInst class from the wrapper, but we can at least reduce scope.
2020-05-27 18:57:42 -07:00
Philip Reames 87bea912c2 [Statepoint] Replace uses of isX functions with idiomatic isa<X>
Now that all of the statepoint related routines have classes with isa support, let's cleanup.

I'm leaving the (dead) utitilities in tree for a few days so that I can do the same cleanup downstream without breakage.
2020-05-27 18:32:28 -07:00
Philip Reames 1af3705c7f Start migrating away from statepoint's inline length prefixed argument bundles
In the current statepoint design, we have four distinct groups of operands to the call: call args, gc transition args, deopt args, and gc args. This format prexisted the support in IR for operand bundles and was in fact one of the inspirations for the extension. However, we never went back and rearchitected statepoints to fully leverage bundles.

This change is the first in a small sequence to do so. All this does is extend the SelectionDAG lowering code to allow deopt and gc transition operands to be specified in either inline argument bundles or operand bundles.

Differential Revision: https://reviews.llvm.org/D8059
2020-05-27 09:16:10 -07:00
Serge Pavlov 4d20e31f73 [FPEnv] Intrinsic llvm.roundeven
This intrinsic implements IEEE-754 operation roundToIntegralTiesToEven,
and performs rounding to the nearest integer value, rounding halfway
cases to even. The intrinsic represents the missed case of IEEE-754
rounding operations and now llvm provides full support of the rounding
operations defined by the standard.

Differential Revision: https://reviews.llvm.org/D75670
2020-05-26 19:24:58 +07:00
Craig Topper 7392820f98 [Align] Remove operations on MaybeAlign that asserted that it had a defined value.
If the caller needs to reponsible for making sure the MaybeAlign
has a value, then we should just make the caller convert it to an Align
with operator*.

I explicitly deleted the relational comparison operators that
were being inherited from Optional. It's unclear what the meaning
of two MaybeAligns were one is defined and the other isn't
should be. So make the caller reponsible for defining the behavior.

I left the ==/!= operators from Optional. But now that exposed a
weird quirk that ==/!= between Align and MaybeAlign required the
MaybeAlign to be defined. But now we use the operator== from
Optional that takes an Optional and the Value.

Differential Revision: https://reviews.llvm.org/D80455
2020-05-22 21:54:28 -07:00
Arthur Eubanks 8a88755610 Reland [X86] Codegen for preallocated
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

```
struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77689
2020-05-20 11:25:44 -07:00
Arthur Eubanks b8cbff51d3 Revert "[X86] Codegen for preallocated"
This reverts commit 810567dc69.

Some tests are unexpectedly passing
2020-05-20 10:04:55 -07:00
Arthur Eubanks 810567dc69 [X86] Codegen for preallocated
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

```
struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77689
2020-05-20 09:20:38 -07:00