Commit Graph

4237 Commits

Author SHA1 Message Date
Oliver Stannard 3676ef1053 [ARM][GISel] Treat calls as variadic even if only fixed arguments provided
For the ARM hard-float calling convention, calls to variadic functions
need to be treated diffrently, even if only the fixed arguments are
provided.

This fixes GCC-C-execute-pr68390 in the test-suite, which is failing on
the ARM GlobaISel bot.
2021-01-15 09:37:01 +00:00
Sam Tebbs 60fda8ebb6 [ARM] Add a pass that re-arranges blocks when there is a backwards WLS branch
Blocks can be laid out such that a t2WhileLoopStart branches backwards. This is forbidden by the architecture and so it fails to be converted into a low-overhead loop. This new pass checks for these cases and moves the target block, fixing any fall-through that would then be broken.

Differential Revision: https://reviews.llvm.org/D92385
2021-01-13 17:23:00 +00:00
Mircea Trofin 05e90cefeb [NFC] Disallow unused prefixes under llvm/test/CodeGen
This patch finishes addressing unused prefixes under CodeGen: 2
remaining tests fixed, and then undo-ing the lit.local.cfg changes under
various subdirs and moving the policy under CodeGen.

Differential Revision: https://reviews.llvm.org/D94430
2021-01-11 12:32:18 -08:00
Paul Robinson be179b9946 [FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option
This option is not used for anything after #c161665 (D91737).
This commit reapplies #a474657.
2021-01-11 09:32:49 -08:00
Paul Robinson c161775dec [FastISel] Flush local value map on every instruction
Local values are constants or addresses that can't be folded into
the instruction that uses them. FastISel materializes these in a
"local value" area that always dominates the current insertion
point, to try to avoid materializing these values more than once
(per block).

https://reviews.llvm.org/D43093 added code to sink these local
value instructions to their first use, which has two beneficial
effects. One, it is likely to avoid some unnecessary spills and
reloads; two, it allows us to attach the debug location of the
user to the local value instruction. The latter effect can
improve the debugging experience for debuggers with a "set next
statement" feature, such as the Visual Studio debugger and PS4
debugger, because instructions to set up constants for a given
statement will be associated with the appropriate source line.

There are also some constants (primarily addresses) that could be
produced by no-op casts or GEP instructions; the main difference
from "local value" instructions is that these are values from
separate IR instructions, and therefore could have multiple users
across multiple basic blocks. D43093 avoided sinking these, even
though they were emitted to the same "local value" area as the
other instructions. The patch comment for D43093 states:

  Local values may also be used by no-op casts, which adds the
  register to the RegFixups table. Without reversing the RegFixups
  map direction, we don't have enough information to sink these
  instructions.

This patch undoes most of D43093, and instead flushes the local
value map after(*) every IR instruction, using that instruction's
debug location. This avoids sometimes incorrect locations used
previously, and emits instructions in a more natural order.

In addition, constants materialized due to PHI instructions are
not assigned a debug location immediately; instead, when the
local value map is flushed, if the first local value instruction
has no debug location, it is given the same location as the
first non-local-value-map instruction.  This prevents PHIs
from introducing unattributed instructions, which would either
be implicitly attributed to the location for the preceding IR
instruction, or given line 0 if they are at the beginning of
a machine basic block.  Neither of those consequences is good
for debugging.

This does mean materialized values are not re-used across IR
instruction boundaries; however, only about 5% of those values
were reused in an experimental self-build of clang.

(*) Actually, just prior to the next instruction. It seems like
it would be cleaner the other way, but I was having trouble
getting that to work.

This reapplies commits cf1c774d and dc35368c, and adds the
modification to PHI handling, which should avoid problems
with debugging under gdb.

Differential Revision: https://reviews.llvm.org/D91734
2021-01-11 08:32:36 -08:00
Paul Robinson e5eb5c8a7f NFC: Use -LABEL more
There were a number of tests needing updates for D91734, and I added a
bunch of LABEL directives to help track down where those had to go.
These directives are an improvement independent of the functional
patch, so I'm committing them as their own separate patch.
2021-01-11 08:14:58 -08:00
David Green 1ae762469f [ARM] Update and regenerate test checks. NFC 2021-01-08 14:54:16 +00:00
David Green 63dce70b79 [ARM] Handle any extend whilst lowering addw/addl/subw/subl
Same as a9b6440edd, use zanyext to treat any_extends as zero extends
during lowering to create addw/addl/subw/subl nodes.

Differential Revision: https://reviews.llvm.org/D93835
2021-01-06 11:26:39 +00:00
David Green ddb82fc76c [ARM] Handle any extend whilst lowering mull
Similar to 78d8a821e2 but for ARM, this handles any_extend whilst
creating MULL nodes, treating them as zextends.

Differential Revision: https://reviews.llvm.org/D93834
2021-01-06 10:51:12 +00:00
David Green 0c59a4da59 [ARM][AArch64] Some extra test to show anyextend lowering. NFC 2021-01-05 17:34:23 +00:00
Roman Lebedev 2461cdb417
[CodeGen][SimplifyCFG] Teach DwarfEHPrepare to preserve DomTree
Once the default for SimplifyCFG flips, we can no longer pass nullptr
instead of DomTree to SimplifyCFG, so we need to propagate it here.

We don't strictly need to actually preserve DomTree in DwarfEHPrepare,
but we might as well do it, since it's trivial.
2021-01-02 01:01:19 +03:00
Roman Lebedev b23b1bcc26
[NFC][CodeGen][Tests] Mark all tests that fail to preserve DomTree for SimplifyCFG as such
These tests start to fail when the SimplifyCFG's default regarding DomTree
updating is switched on, so mark them as needing changes.
2021-01-02 01:01:19 +03:00
Fangrui Song 2047c10c22 [TargetMachine] Drop implied dso_local for definitions in ELF static relocation model/PIE
TargetMachine::shouldAssumeDSOLocal currently implies dso_local for such definitions.

Since clang -fno-pic add the dso_local specifier, we don't need to special case.
2020-12-30 16:57:50 -08:00
Fangrui Song bf1160c1d6 [test] Add explicit dso_local to definitions in ELF static relocation model tests 2020-12-30 16:52:23 -08:00
Fangrui Song a64b89e69e [ARM][test] Add explicit dso_local to definitions in ELF static relocation model tests
TargetMachine::shouldAssumeDSOLocal currently implies dso_local for such definitions.

Adding explicit dso_local makes these tests align with the clang -fno-pic behavior
and allow the removal of the TargetMachine::shouldAssumeDSOLocal special case.
2020-12-30 15:23:21 -08:00
David Green 7a3e11fe96 [ARM] Add some NEON anyextend testing. NFC
This cleans up and regenerates the NEON addw/addl/subw/subl/mlal etc
tests, adding some tests that turn the zext into anyextend using an and
mask.
2020-12-27 13:18:10 +00:00
Evgeniy Brevnov 9fb074e7bb [BPI] Improve static heuristics for "cold" paths.
Current approach doesn't work well in cases when multiple paths are predicted to be "cold". By "cold" paths I mean those containing "unreachable" instruction, call marked with 'cold' attribute and 'unwind' handler of 'invoke' instruction. The issue is that heuristics are applied one by one until the first match and essentially ignores relative hotness/coldness
 of other paths.

New approach unifies processing of "cold" paths by assigning predefined absolute weight to each block estimated to be "cold". Then we propagate these weights up/down IR similarly to existing approach. And finally set up edge probabilities based on estimated block weights.

One important difference is how we propagate weight up. Existing approach propagates the same weight to all blocks that are post-dominated by a block with some "known" weight. This is useless at least because it always gives 50\50 distribution which is assumed by default anyway. Worse, it causes the algorithm to skip further heuristics and can miss setting more accurate probability. New algorithm propagates the weight up only to the blocks that dominates and post-dominated by a block with some "known" weight. In other words, those blocks that are either always executed or not executed together.

In addition new approach processes loops in an uniform way as well. Essentially loop exit edges are estimated as "cold" paths relative to back edges and should be considered uniformly with other coldness/hotness markers.

Reviewed By: yrouban

Differential Revision: https://reviews.llvm.org/D79485
2020-12-23 22:47:36 +07:00
Kristof Beyls df8ed39283 [ARM] harden-sls-blr: avoid r12 and lr in indirect calls.
As a linker is allowed to clobber r12 on function calls, the code
transformation that hardens indirect calls is not correct in case a
linker does so.  Similarly, the transformation is not correct when
register lr is used.

This patch makes sure that r12 or lr are not used for indirect calls
when harden-sls-blr is enabled.

Differential Revision: https://reviews.llvm.org/D92469
2020-12-19 12:39:59 +00:00
Kristof Beyls a4c1f5160e [ARM] Harden indirect calls against SLS
To make sure that no barrier gets placed on the architectural execution
path, each indirect call calling the function in register rN, it gets
transformed to a direct call to __llvm_slsblr_thunk_mode_rN.  mode is
either arm or thumb, depending on the mode of where the indirect call
happens.

The llvm_slsblr_thunk_mode_rN thunk contains:

bx rN
<speculation barrier>

Therefore, the indirect call gets split into 2; one direct call and one
indirect jump.
This transformation results in not inserting a speculation barrier on
the architectural execution path.

The mitigation is off by default and can be enabled by the
harden-sls-blr subtarget feature.

As a linker is allowed to clobber r12 on function calls, the
above code transformation is not correct in case a linker does so.
Similarly, the transformation is not correct when register lr is used.
Avoiding r12/lr being used is done in a follow-on patch to make
reviewing this code easier.

Differential Revision: https://reviews.llvm.org/D92468
2020-12-19 12:33:42 +00:00
Kristof Beyls 320fd3314e [ARM] Implement harden-sls-retbr for Thumb mode
The only non-trivial consideration in this patch is that the formation
of TBB/TBH instructions, which is done in the constant island pass, does
not understand the speculation barriers inserted by the SLSHardening
pass. As such, when harden-sls-retbr is enabled for a function, the
formation of TBB/TBH instructions in the constant island pass is
disabled.

Differential Revision: https://reviews.llvm.org/D92396
2020-12-19 12:32:47 +00:00
Kristof Beyls 195f44278c [ARM] Implement harden-sls-retbr for ARM mode
Some processors may speculatively execute the instructions immediately
following indirect control flow, such as returns, indirect jumps and
indirect function calls.

To avoid a potential miss-speculatively executed gadget after these
instructions leaking secrets through side channels, this pass places a
speculation barrier immediately after every indirect control flow where
control flow doesn't return to the next instruction, such as returns and
indirect jumps, but not indirect function calls.

Hardening of indirect function calls will be done in a later,
independent patch.

This patch is implementing the same functionality as the AArch64 counter
part implemented in https://reviews.llvm.org/D81400.
For AArch64, returns and indirect jumps only occur on RET and BR
instructions and hence the function attribute to control the hardening
is called "harden-sls-retbr" there. On AArch32, there is a much wider
variety of instructions that can trigger an indirect unconditional
control flow change.  I've decided to stick with the name
"harden-sls-retbr" as introduced for the corresponding AArch64
mitigation.

This patch implements this for ARM mode. A future patch will extend this
to also support Thumb mode.

The inserted barriers are never on the correct, architectural execution
path, and therefore performance overhead of this is expected to be low.
To ensure these barriers are never on an architecturally executed path,
when the harden-sls-retbr function attribute is present, indirect
control flow is never conditionalized/predicated.

On targets that implement that Armv8.0-SB Speculation Barrier extension,
a single SB instruction is emitted that acts as a speculation barrier.
On other targets, a DSB SYS followed by a ISB is emitted to act as a
speculation barrier.

These speculation barriers are implemented as pseudo instructions to
avoid later passes to analyze them and potentially remove them.

The mitigation is off by default and can be enabled by the
harden-sls-retbr subtarget feature.

Differential Revision: https://reviews.llvm.org/D92395
2020-12-19 11:42:39 +00:00
Bjorn Pettersson a89d751fb4 Add intrinsics for saturating float to int casts
This patch adds support for the fptoui.sat and fptosi.sat intrinsics,
which provide basically the same functionality as the existing fptoui
and fptosi instructions, but will saturate (or return 0 for NaN) on
values unrepresentable in the target type, instead of returning
poison. Related mailing list discussion can be found at:
https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ

The intrinsics have overloaded source and result type and support
vector operands:

    i32 @llvm.fptoui.sat.i32.f32(float %f)
    i100 @llvm.fptoui.sat.i100.f64(double %f)
    <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f)
    // etc

On the SelectionDAG layer two new ISD opcodes are added,
FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands
and one result. The second operand is an integer constant specifying
the scalar saturation width. The idea here is that initially the
second operand and the scalar width of the result type are the same,
but they may change during type legalization. For example:

    i19 @llvm.fptsi.sat.i19.f32(float %f)
    // builds
    i19 fp_to_sint_sat f, 19
    // type legalizes (through integer result promotion)
    i32 fp_to_sint_sat f, 19

I went for this approach, because saturated conversion does not
compose well. There is no good way of "adjusting" a saturating
conversion to i32 into one to i19 short of saturating twice.
Specifying the saturation width separately allows directly saturating
to the correct width.

There are two baseline expansions for the fp_to_xint_sat opcodes. If
the integer bounds can be exactly represented in the float type and
fminnum/fmaxnum are legal, we can expand to something like:

    f = fmaxnum f, FP(MIN)
    f = fminnum f, FP(MAX)
    i = fptoxi f
    i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN

If the bounds cannot be exactly represented, we expand to something
like this instead:

    i = fptoxi f
    i = select f ult FP(MIN), MIN, i
    i = select f ogt FP(MAX), MAX, i
    i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN

It should be noted that this expansion assumes a non-trapping fptoxi.

Initial tests are for AArch64, x86_64 and ARM. This exercises all of
the scalar and vector legalization. ARM is included to test float
softening.

Original patch by @nikic and @ebevhan (based on D54696).

Differential Revision: https://reviews.llvm.org/D54749
2020-12-18 11:09:41 +01:00
Yvan Roux 923ca0b411 [ARM][MachineOutliner] Fix costs model.
Fix candidates calls costs models allocation and prepare stack fixups
handling.

Differential Revision: https://reviews.llvm.org/D92933
2020-12-17 16:08:23 +01:00
David Green c100d7ba36 [NFC] Chec[^k] -> Check
Some test updates all appearing to use the wrong spelling of CHECK.
2020-12-08 11:54:39 +00:00
Martin Storsjö 78a57069b5 [CodeGen] Restore accessing __stack_chk_guard via a .refptr stub on mingw after 2518433f86
Add tests for this particular detail for x86 and arm (similar tests
already existed for x86_64 and aarch64).

The libssp implementation may be located in a separate DLL, and in
those cases, the references need to be in a .refptr stub, to avoid
needing to touch up code in the text section at runtime (which is
supported but inefficient for x86, and unsupported for arm).

Differential Revision: https://reviews.llvm.org/D92738
2020-12-07 09:35:12 +02:00
Layton Kifer ac522f8700 [DAGCombiner] Fold (sext (not i1 x)) -> (add (zext i1 x), -1)
Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets.

Differential Revision: https://reviews.llvm.org/D91589
2020-12-06 11:52:10 -05:00
Fangrui Song 109e70d357 [TargetMachine] Drop implied dso_local for an edge case (extern_weak + non-pic + hidden)
This does not deserve special handling. The code should be added to Clang
instead if deemed useful. With this simplification, we can additionally delete
the PIC extern_weak special case.
2020-12-05 15:52:33 -08:00
Fangrui Song a084c0388e [TargetMachine] Don't imply dso_local on function declarations in Reloc::Static model for ELF/wasm
clang/lib/CodeGen/CodeGenModule sets dso_local on applicable function declarations,
we don't need to duplicate the work in TargetMachine:shouldAssumeDSOLocal.
(Actually the long-term goal (started by r324535) is to drop TargetMachine::shouldAssumeDSOLocal.)

By not implying dso_local, we will respect dso_local/dso_preemptable specifiers
set by the frontend. This allows the proposed -fno-direct-access-external-data
option to work with -fno-pic and prevent a canonical PLT entry (SHN_UNDEF with non-zero st_value)
when taking the address of a function symbol.

This patch should be NFC in terms of the Clang emitted assembly because the case
we don't set dso_local is a case Clang sets dso_local. However, some tests don't
set dso_local on some function declarations and expose some differences. Most
tests have been fixed to be more robust in the previous commit.
2020-12-05 14:54:37 -08:00
Fangrui Song 6dbd0eac02 [test] Split some tests which test both static and pic relocation models
TargetMachine::shouldAssumeDSOLocal currently implies dso_local for
Static. Split some tests so that these `external dso_local global` will
align with the Clang behavior.
2020-12-04 18:11:35 -08:00
Evgeny Leviant f69936f529 Attempt to fix buildbot after rG993eaf2d69d8 2020-12-04 22:10:36 +03:00
Evgeny Leviant 993eaf2d69 Recommit [TableGen][SchedModels] Fix read/write variant substitution
Original commit rG112b3cb6ba49 introduced non-determinism in subtarget
generator due to iteration over DenseMap. New patch fixes this changing
ProcModelMapTy from DenseMap to std::map.
2020-12-04 21:50:34 +03:00
Fangrui Song 86fa896363 Revert D90844 "[TableGen][SchedModels] Fix read/write variant substitution"
This reverts commit 112b3cb6ba.

D90844 made lib/Target/AArch64/AArch64GenSubtargetInfo.inc non-deterministic.
2020-12-03 14:24:29 -08:00
David Blaikie 615f63e149 Revert "[FastISel] Flush local value map on ever instruction" and dependent patches
This reverts commit cf1c774d6a.

This change caused several regressions in the gdb test suite - at least
a sample of which was due to line zero instructions making breakpoints
un-lined. I think they're worth investigating/understanding more (&
possibly addressing) before moving forward with this change.

Revert "[FastISel] NFC: Clean up unnecessary bookkeeping"
This reverts commit 3fd39d3694.

Revert "[FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option"
This reverts commit a474657e30.

Revert "Remove static function unused after cf1c774."
This reverts commit dc35368ccf.

Revert "[lldb] Fix TestThreadStepOut.py after "Flush local value map on every instruction""
This reverts commit 53a14a47ee.
2020-12-01 14:26:23 -08:00
Hans Wennborg 2ca4785ac7 Remove rm -f cortex-a57-misched-mla.s; hopefully the bots have all cycled past it now 2020-12-01 13:50:49 +01:00
Paul Robinson a474657e30 [FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option
This option is not used for anything after #dc35368 (D91734).
2020-11-30 10:55:49 -08:00
Dmitri Gribenko 234a5297aa Add 'asserts' requiremnt to test/CodeGen/ARM/cortex-a57-misched-mla.mir
'-debug-only=machine-scheduler' only works when asserts are enabled.
2020-11-30 15:19:27 +01:00
Hans Wennborg 25d54abca5 Try harder to get rid off cortex-a57-misched-mla.s 2020-11-30 14:30:50 +01:00
Hans Wennborg 273641fedc Try to fix bots after 112b3cb by removing cortex-a57-misched-mla.s 2020-11-30 14:15:56 +01:00
Evgeny Leviant 129523588f Fix test case 2020-11-30 12:35:28 +03:00
Evgeny Leviant 112b3cb6ba [TableGen][SchedModels] Fix read/write variant substitution
Patch fixes multiple issues related to expansion of variant sched reads and
writes.

Differential revision: https://reviews.llvm.org/D90844
2020-11-30 11:55:55 +03:00
Paul Robinson cf1c774d6a [FastISel] Flush local value map on ever instruction
Local values are constants or addresses that can't be folded into
the instruction that uses them. FastISel materializes these in a
"local value" area that always dominates the current insertion
point, to try to avoid materializing these values more than once
(per block).

https://reviews.llvm.org/D43093 added code to sink these local
value instructions to their first use, which has two beneficial
effects. One, it is likely to avoid some unnecessary spills and
reloads; two, it allows us to attach the debug location of the
user to the local value instruction. The latter effect can
improve the debugging experience for debuggers with a "set next
statement" feature, such as the Visual Studio debugger and PS4
debugger, because instructions to set up constants for a given
statement will be associated with the appropriate source line.

There are also some constants (primarily addresses) that could be
produced by no-op casts or GEP instructions; the main difference
from "local value" instructions is that these are values from
separate IR instructions, and therefore could have multiple users
across multiple basic blocks. D43093 avoided sinking these, even
though they were emitted to the same "local value" area as the
other instructions. The patch comment for D43093 states:

  Local values may also be used by no-op casts, which adds the
  register to the RegFixups table. Without reversing the RegFixups
  map direction, we don't have enough information to sink these
  instructions.

This patch undoes most of D43093, and instead flushes the local
value map after(*) every IR instruction, using that instruction's
debug location. This avoids sometimes incorrect locations used
previously, and emits instructions in a more natural order.

This does mean materialized values are not re-used across IR
instruction boundaries; however, only about 5% of those values
were reused in an experimental self-build of clang.

(*) Actually, just prior to the next instruction. It seems like
it would be cleaner the other way, but I was having trouble
getting that to work.

Differential Revision: https://reviews.llvm.org/D91734
2020-11-25 13:05:00 -05:00
Craig Topper 4252f7773a [SelectionDAG][ARM][AArch64][Hexagon][RISCV][X86] Add SDNPCommutative to fma and fmad nodes in tablegen. Remove explicit commuted patterns from targets.
X86 was already specially marking fma as commutable which allowed
tablegen to autogenerate commuted patterns. This moves it to the target
independent definition and fix up the targets to remove now
unneeded patterns.

Unfortunately, the tests change because the commuted version of
the patterns are generating operands in a different than the
explicit patterns.

Differential Revision: https://reviews.llvm.org/D91842
2020-11-23 10:09:20 -08:00
Matt Arsenault 1d1234b2a4 OpaquePtr: Update more tests to use typed sret 2020-11-20 20:08:43 -05:00
Matt Arsenault 20c43d6bd5 OpaquePtr: Bulk update tests to use typed sret 2020-11-20 17:58:26 -05:00
Matt Arsenault 06c192d454 OpaquePtr: Bulk update tests to use typed byval
Upgrade of the IR text tests should be the only thing blocking making
typed byval mandatory. Partially done through regex and partially
manual.
2020-11-20 14:00:46 -05:00
David Green 32556a9832 [ARM] Remove more unused check prefixes, NFC 2020-11-14 15:37:53 +00:00
Pirama Arumuga Nainar 8262e94a6d [ARM] Fix PR 47980: Use constrainRegClass during foldImmediate opt.
Previously we used setRegClass to rgpr, which may expand the register
domain if the result was already in a constrained class (tcgpr in the
above PR).

Differential Revision: https://reviews.llvm.org/D91192
2020-11-10 13:38:11 -08:00
David Green 08d1c2d470 [ARM] Introduce t2DoLoopStartTP
This introduces a new pseudo instruction, almost identical to a
t2DoLoopStart but taking 2 parameters - the original loop iteration
count needed for a low overhead loop, plus the VCTP element count needed
for a DLSTP instruction setting up a tail predicated loop. The idea is
that the instruction holds both values and the backend
ARMLowOverheadLoops pass can pick between the two, depending on whether
it creates a tail predicated loop or falls back to a low overhead loop.

To do that there needs to be something that converts a t2DoLoopStart to
a t2DoLoopStartTP, for which this patch repurposes the
MVEVPTOptimisationsPass as a "tail predication and vpt optimisation"
pass. The extra operand for the t2DoLoopStartTP is chosen based on the
operands of VCTP's in the loop, and the instruction is moved as late in
the block as possible to attempt to increase the likelihood of making
tail predicated loops.

Differential Revision: https://reviews.llvm.org/D90591
2020-11-10 18:08:12 +00:00
David Green b2ac9681a7 [ARM] Alter t2DoLoopStart to define lr
This changes the definition of t2DoLoopStart from
t2DoLoopStart rGPR
to
GPRlr = t2DoLoopStart rGPR

This will hopefully mean that low overhead loops are more tied together,
and we can more reliably generate loops without reverting or being at
the whims of the register allocator.

This is a fairly simple change in itself, but leads to a number of other
required alterations.

 - The hardware loop pass, if UsePhi is set, now generates loops of the
   form:
       %start = llvm.start.loop.iterations(%N)
     loop:
       %p = phi [%start], [%dec]
       %dec = llvm.loop.decrement.reg(%p, 1)
       %c = icmp ne %dec, 0
       br %c, loop, exit
 - For this a new llvm.start.loop.iterations intrinsic was added, identical
   to llvm.set.loop.iterations but produces a value as seen above, gluing
   the loop together more through def-use chains.
 - This new instrinsic conceptually produces the same output as input,
   which is taught to SCEV so that the checks in MVETailPredication are not
   affected.
 - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has
   been left mostly as before. We should now more reliably be able to tell
   that the t2DoLoopStart is correct without having to prove it, but
   t2WhileLoopStart and tail-predicated loops will remain the same.
 - And all the tests have been updated. There are a lot of them!

This patch on it's own might cause more trouble that it helps, with more
tail-predicated loops being reverted, but some additional patches can
hopefully improve upon that to get to something that is better overall.

Differential Revision: https://reviews.llvm.org/D89881
2020-11-10 15:57:58 +00:00
Momchil Velikov 937ab6a785 [ARM][MachineOutliner] Emit more CFI instructions
This patch make the outliner emit CFI instructions in a few more
places:

  * after LR is restored, but before the return in an outlined
  function

  * around save/restore of LR to/from a register at calls to outlined
  functions

  * around save/restore of LR to/from the stack at calls to outlined
  functions

The latter two only when the function does NOT spill LR. If the
function spills LR, then outliner generated saves/restores around
calls are not considered interesting for unwinding the frame.

Differential Revision: https://reviews.llvm.org/D89483
2020-11-09 15:26:18 +00:00