Commit Graph

4645 Commits

Author SHA1 Message Date
Jonas Paulsson 122efef8ee Revert "Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions.""
This reverts commit 17db0de330.

Some more bots got broken - need to investigate.
2022-12-05 00:52:00 +01:00
Jonas Paulsson 17db0de330 Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
Init captures added in processBlock() to avoid capturing structured bindings,
which caused the build problems (with clang).

RISCV has this disabled for now until problems relating to post RA pseudo
expansions are resolved.
2022-12-03 14:15:15 -06:00
Bjorn Pettersson a11faeed44 [test] Switch to use -passes syntax in various test cases 2022-12-01 21:25:59 +01:00
Jonas Paulsson 8ef4632681 Revert "[CodeGen] Add new pass for late cleanup of redundant definitions."
Temporarily revert and fix buildbot failure.

This reverts commit 6d12599fd4.
2022-12-01 13:29:24 -05:00
Jonas Paulsson 6d12599fd4 [CodeGen] Add new pass for late cleanup of redundant definitions.
A new pass MachineLateInstrsCleanup is added to be run after PEI.

This is a simple pass that removes redundant and identical instructions
whenever found by scanning the MF once while keeping track of register
definitions in a map. These instructions are typically immediate loads
resulting from rematerialization, and address loads emitted by target in
eliminateFrameInde().

This is enabled by default, but a target could easily disable it by means of
'disablePass(&MachineLateInstrsCleanupID);'.

This late cleanup is naturally not "optimal" in removing instructions as it
is done by looking at phys-regs, but still quite effective. It would be
desirable to improve other parts of CodeGen and avoid these redundant
instructions in the first place, but there are no ideas for this yet.

Differential Revision: https://reviews.llvm.org/D123394

Reviewed By: RKSimon, foad, craig.topper, arsenm, asb
2022-12-01 13:21:35 -05:00
Freddy Ye 89f36dd8f3 [X86] Add ExpandLargeFpConvert Pass and enable for X86
As stated in
https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528,
this implementation is very similar to ExpandLargeDivRem, which expands
‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions
with a bitwidth above a threshold into auto-generated functions. This is
useful for targets like x86_64 that cannot lower fp convertions with more
than 128 bits. The expanded nodes are referring from the IR generated by
`compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`,
and etc.

Corner cases:
1. For fp16: as there is no related builtins added in compliler-rt. So I
mainly utilized the fp32 <-> fp16 lib calls to implement.
2. For fp80: as this pass is soft fp emulation and no fp80 instructions can
help in this problem. I recommend users to deprecate this usage. For now, the
implementation uses fp128 as the temporary conversion type and inserts
fptrunc/ext at top/end of the function.
3. For bf16: as clang FE currently doesn't support bf16 algorithm operations
(convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for
now.
4. For unsigned FPToI: since both default hardware behaviors and libgcc are
ignoring "returns 0 for negative input" spec. This pass follows this old way
to ignore unsigned FPToI. See this example:
https://gcc.godbolt.org/z/bnv3jqW1M

The end-to-end tests are uploaded at https://reviews.llvm.org/D138261

Reviewed By: LuoYuanke, mgehre-amd

Differential Revision: https://reviews.llvm.org/D137241
2022-12-01 13:47:43 +08:00
Marco Elver b95646fe70 Revert "Use-after-return sanitizer binary metadata"
This reverts commit d3c851d3fc.

Some bots broke:

- https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8796062278266465473/overview
- https://lab.llvm.org/buildbot/#/builders/124/builds/5759/steps/7/logs/stdio
2022-11-30 23:35:50 +01:00
Dmitry Vyukov d3c851d3fc Use-after-return sanitizer binary metadata
Currently per-function metadata consists of:
(start-pc, size, features)

This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D136078
2022-11-30 14:50:22 +01:00
Craig Topper f387918dd8 [TargetLowering][RISCV][ARM][AArch64][Mips] Reduce the number of AND mask constants used by BSWAP expansion.
We can reuse constants if we use SRL followed by AND and AND followed by SHL.
Similar was done to bitreverse previously.

Differential Revision: https://reviews.llvm.org/D138045
2022-11-15 14:36:01 -08:00
Nicholas Guy d52e2839f3 [ARM][CodeGen] Add support for complex deinterleaving
Adds the Complex Deinterleaving Pass implementing support for complex numbers in a target-independent manner, deferring to the TargetLowering for the given target to create a target-specific intrinsic.

Differential Revision: https://reviews.llvm.org/D114174
2022-11-14 14:02:27 +00:00
Nikita Popov 01ec0ff2dc [SimplifyCFG] Allow speculating block containing assume()
SpeculativelyExecuteBB(), which converts a branch + phi structure
into a select, currently bails out if the block contains an assume
(because it is not speculatable).

Adjust the fold to ignore ephemeral values (i.e. assumes and values
only used in assumes) for cost modelling purposes, and drop them
when performing the fold.

Theoretically, we could try to preserve the assume information by
generating a assume(br_cond || assume_cond) style assume, but this
is very unlikely to to be useful (because we don't do anything
useful with assumes of this form) and it would make things
substantially more complicated once we take operand bundle assumes
into account (which don't really support a || operation).
I'd prefer not to do that without good motivation.

Differential Revision: https://reviews.llvm.org/D137339
2022-11-04 09:26:35 +01:00
David Green f970b007e5 [ARM] Fix vector ule zero lowering
The instruction icmp ule <4 x i32> %0, zeroinitializer will usually be
simplified to icmp eq <4 x i32> %0, zeroinitializer. It is not
guaranteed though, and the code for lowering vector compares could pick
the wrong form of the instruction if this happened. I've tried to make
the code more explicit about the supported conditions.

This fixes NEON being unable to select VCMPZ with HS conditions, and
fixes some incorrect MVE patterns.

Fixes #58514.

Differential Revision: https://reviews.llvm.org/D136447
2022-11-02 22:34:05 +00:00
John Brawn 88ac25b357 [MachineCSE] Allow PRE of instructions that read physical registers
Currently MachineCSE forbids PRE when the instruction reads a physical
register. Relax this so that it's allowed when the value being read is
the same as what would be read in the place the instruction would be
hoisted to.

This is being done in preparation for adding FPCR handling to the
AArch64 backend, in order to prevent it to from worsening the
generated code, but for targets that already have a similar register
it should improve things.

This patch affects code generation in several tests. The new code
looks better except for in Thumb2/LowOverheadLoops/memcall.ll where
we perform PRE but the LowOverheadLoops transformation then undoes
it. Also in AMDGPU/selectcc-opt.ll the CHECK makes things look worse,
but actually the function as a whole is better (as a MOV is PRE'd).

Differential Revision: https://reviews.llvm.org/D136675
2022-11-02 13:53:12 +00:00
David Green b5caa68fb2 [ARM] Tests for various NEON vector compares. NFC 2022-11-01 15:00:56 +00:00
Matt Arsenault b60a9ccd02 AtomicExpand: Use InstSimplifyFolder
Automatically cleanup operations if we know the atomic has higher
alignment.
2022-10-31 23:31:42 -07:00
Daniel Thornburgh 75cdab6dc2 [llvm-objdump] Add --no-print-imm-hex to tests depending on it.
This prepares for an upcoming change to make --print-imm-hex the default
behavior of llvm-objdump. These tests were updated in a semi-automatic
fashion.

See D136972 for details.
2022-10-29 15:40:26 -07:00
Patrick Walton f3d49dbcb1 [test] Remove readonly from some parameters that are written through in tests.
In D136659 I found a few tests that write through readonly parameters:

* Analysis/BasicAA/pr18573.ll: @foo1 writes through %arr.ptr, but declares it
readonly. I removed the readonly annotation.

* CodeGen/ARM/ParallelDSP/aliasing.ll: @restrict writes through the readonly
%arg3, @store_alias_arg3_illegal_1 writes through the readonly %arg3, and
@store_alias_arg3_illegal_2 writes through the readonly %arg3. I removed
readonly from all three. Also, I added some CHECK-LABEL directives to make it
harder for FileCheck output to be mixed up.

* Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll:
@gather_nxv4i32_ind64_stride2 writes through the readonly %a. I removed the
readonly attribute.

* Transforms/LoopVectorize/interleaved-accesses.ll: @load_gap_reverse writes
through the readonly %P1 and %P2. Also, the corresponding C code in the comment
didn't match the test. I removed the readonly attribute from both parameters
and corrected the C code.

Differential Revision: https://reviews.llvm.org/D136880
2022-10-29 15:05:20 -07:00
Matt Arsenault a8762195d5 ARM: Fix stack warning test 2022-10-28 22:28:37 -07:00
Matt Arsenault c62745e167 DiagnosticInfo: Report function location for resource limits
We have some odd redundancy where clang specially handles
the stack size case. If clang prints it, the source location is first
followed by "warning". The backend diagnostic, as printed by other tools
puts "warning" first.
2022-10-28 21:42:57 -07:00
Nikita Popov 9481e4ba62 [ARM] Use DefaultAttrsIntrinsics
Use DefaultAttrsIntrinsics for most ARM intrinsics. This adds the
WillReturn, NoSync, NoFree and NoCallback attributes and is needed
to avoid regressions in the future.

I've switched to DefaultAttrIntrinsics for everything doing arithmetic
and load/store. I've left some TODOs in cases where all DefaultsAttrs
are not correct (e.g. ldrex etc are clearly not nosync) or it wasn't
entirely obvious to me (e.g. stuff interacting with a coprocessor).

Differential Revision: https://reviews.llvm.org/D136758
2022-10-28 09:59:38 +02:00
Paul Kirth 2e1e2f52f3 [CodeGen] Improve large stack frame diagnostic
Add statistics about how much memory is used, in variables, spills, and
unsafestack.

Issue #58168 describes some of the difficulty diagnosing stack size issues
identified by -Wframe-larger-than. D135488 addresses some of those issues by
giving developers a method to view the stack layout and thereby understand
where and how stack memory is used.

However, that solution requires an additional pass, when a short summary about
how the compiler has allocated stack memory can inform developers about where
they should investigate. When they need the complete context, D135488 can
provide them with a more comprehensive set of diagnostics.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D136484
2022-10-27 00:51:45 +00:00
David Green fb76d2ce6c [ARM] Fix the type for v4f16 duplane
This was previously using the 32bit variant of the instruction, instead
of the 16bit as intended.

Fixes #58512

Differential Revision: https://reviews.llvm.org/D136422
2022-10-21 10:10:35 +01:00
David Green 8c1a508616 [ARM] Regnereate armv8.2a-fp16-vector-intrinsics.ll test. NFC 2022-10-21 09:35:39 +01:00
chenglin.bi b18293edc3 [MC][COFF] Add COFF section flag "Info"
For now, we have not parse section flag `Info` in asm file. When we emit a section with info flag to asm, then compile asm to obj we will lose the Info flag for the section.
The motivation of this change is ARM64EC's hybmp$x section. If we lose the Info flag MSVC link will report a warning:
`warning LNK4078: multiple '.hybmp' sections found with different attributes`

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D136125
2022-10-19 10:32:58 +08:00
Keith Walker 6102364b0d [ARM] Add additional targets to divide tests.
The main motivation for these additional targets is to cover the
differences in the instructions available between Thumb2 and Thumb1.

Ths shows up in these test due to the lack of the following in
Thumb1:
- Mulitply and Subtract instruction (mls) - used when calculating
  a remainder.
- Unsigned Muliple Long instruction (umull) - used in certain
  cases when optimising division with a constant.

Differential Revision: https://reviews.llvm.org/D135875
2022-10-17 16:51:17 +01:00
Archibald Elliott 7d15212b8c [ARM] Support fp16/bf16 using w constraint
fp16 and bf16 values can be used in GCC's inline assembly using the "w"
constraint, which means "VFP floating-point registers d0-d31" - fp16 and
bf16 values are stored in S registers (which alias the D registers).

This change ensures that LLVM is compatible with GCC for programs that
use fp16 and the 'w' constraint.

Differential Revision: https://reviews.llvm.org/D135662
2022-10-13 10:32:06 +01:00
David Green 5e1a9d319d [ARM] Add lowering for bf16 neon vtrn, vzup and vuzp.
These go via Dag2Dag, which are better based on element sizes not the
exact element types.
2022-10-02 15:34:37 +01:00
David Green f2fde99461 [ARM] More bf16 shuffle handling, including perfect shuffles. 2022-10-02 14:31:51 +01:00
David Green 8193f0d1d2 [ARM] Add tablegen patterns for bf16 vrev 2022-10-02 13:42:14 +01:00
David Green 58369c8631 [ARM] Add tablegen patterns for bf16 vext
This adds missing tablegen patterns for VEXT, identical to the fp16
patterns as they only use baseline Neon operations.
Part of fixing #57770.
2022-10-02 12:45:58 +01:00
David Green 3651635eca [ARM][DAG] BF16 constant handling.
Much like f16 and f32, we shouldn't try to shrink bf16 to smaller fp
constant.  The code may not be optimal, but this allows us to legalize
bf16 constants under Arm without errors.
2022-10-02 11:51:08 +01:00
Filipp Zhinkin 945a1468c9 [ARM] Support all versions of AND, ORR, EOR and BIC in optimizeCompareInstr
Combine cmp with zero and all versions of AND, ORR, EOR and BIC instructions into S-suffixed versions.

Related issue: https://github.com/llvm/llvm-project/issues/57122

Reviewed By: efriedma, samtebbs

Differential Revision: https://reviews.llvm.org/D131786
2022-10-01 12:41:37 +03:00
Archibald Elliott ff4027d152 [ARM] Support fp16/bf16 using t constraint
fp16 and bf16 values can be used in GCC's inline assembly using the "t"
constraint, which means "VFP floating-point registers s0-s31" - fp16 and
bf16 values are stored in S registers too.

This change ensures that LLVM is compatible with GCC for programs that
use fp16 and the 't' constraint.

Fixes #57753

Differential Revision: https://reviews.llvm.org/D134553
2022-09-28 14:48:21 +01:00
Momchil Velikov 6602110152 [ARM] Enable and/cmp0 folding
The `CodeGenPrepare` pass can sink bitwise `and` used by compare to
zero into the basic blocks where the users are. This operation is
guarded by lowering hook, which is disabled for ARM.  In the ARM
architecture versions from v7-M up these two operations can be folded
into `tst rN, #imm` instruction. Sinking of `and` can also enable
the cmov-to-bfi DAG combiner.

This patch fixes some benchmark regressions caused
by https://reviews.llvm.org/D129370 as well scoring slightly better overall.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D134360
2022-09-26 11:31:23 +01:00
Momchil Velikov a412e9cd40 [ARM] Enable and/cmp0 folding - precommit test
Precommit test for D134360

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D134358
2022-09-26 11:05:49 +01:00
David Green 2b187effbd [ARM] Fix check lines in memcpy-inline.ll test. NFC
Commit c442698 updated the check lines in this file, but did so in a
way that removed a number of the existing checks, as the
update_llc_test_checks script does not understand all triples. This
fixes it up as needed to keep testing Thumb1 code.
2022-09-24 21:29:41 +01:00
Guillaume Chatelet c442698091 [NFC] update_llc_test_checks llvm/test/CodeGen/ARM/memcpy-inline.ll 2022-09-23 09:00:38 +00:00
Filipp Zhinkin fa67e281b2 [ARM] Add more tests on instructions fusion with comparison with zero; NFC
Baseline tests for D131786
2022-09-17 11:49:58 +03:00
Liqiang Tao 2e37557fde StackProtector: ensure stack checks are inserted before the tail call
The IR stack protector pass should insert stack checks before the tail
calls not only the musttail calls. So that the attributes `ssqreq` and
`tail call`, which are emited by llvm-opt, could be both enabled by
llvm-llc.

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D133860
2022-09-16 22:24:46 +08:00
Craig Topper 38ffa2bb96 [LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.
For remainder:
If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves
together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer
type, this urem will be expand by DAGCombiner using multiply by magic
constant. We do have to take into account that adding high and low
together can produce a carry, making it a (BitWidth / 2)+1 bit number.
So we need to also add back in the carry from the first addition.

For division:
We can use the above trick to compute the remainder, subtract that
remainder from the dividend, then multiply by the multiplicative
inverse of the Divisor modulo (1 << BitWidth).

This is based on the section "Remainder by Summing Digits" in
Hacker's delight.

The remainder trick is similar to a trick you may have learned for
determining if a decimal number is divisible by 3. You can add all the
digits together and see if the sum is divisible by 3. If you're not sure
if the sum is divisible by 3, you can add its digits together. This
can be repeated until you have a single decimal digit. If that digit
is 3, 6, or 9, then the original number is divisible by 3. This works
because 10 % 3 == 1.

gcc already does this same trick. There are additional tricks gcc
does urem as well as srem, udiv, and sdiv that I plan to add in
future patches.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130862
2022-09-12 10:34:52 -07:00
Filipp Zhinkin c4d0509e3b [ARM] Add tests on instructions fusion with comparison with zero; NFC
Baseline tests for D131786
2022-09-08 20:24:32 +03:00
Matthias Gehre 2090e85fee [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64
This adds the ExpandLargeDivRem to the default pass pipeline.
The limit at which it expands div/rem instructions is configured
via a new TargetTransformInfo hook (default: no expansion)
X86, Arm and AArch64 backends implement this hook to expand div/rem
instructions with more than 128 bits.

Differential Revision: https://reviews.llvm.org/D130076
2022-09-06 15:32:04 +01:00
John Brawn e26cadcc32 [ARM] Constant pools need 4-byte alignment if we only have tADR
When the only ADR instruction we have is the 16-bit thumb one then all
constant pool entries need to be 4-byte aligned, as tADR has an offset
that's a multiple of 4.

It looks like previously there happened to be no situations in which
we encountered a constant pool entry with alignment less than 4, so
failing to do this didn't cause any problems, but the expansion of
cttz to a table added by D128911 does use a constant pool with
alignment 1, so we now need to handle it correctly.

Differential Revision: https://reviews.llvm.org/D133199
2022-09-06 11:36:12 +01:00
Daniil Fukalov b4e1b0e00d [LiveIntervals] Split live intervals on any dead def
Each dead def of the same virtual register is required to be split into multiple
virtual registers with separate live intervals to avoid MachineVerifier error.

Partially fixes https://github.com/llvm/llvm-project/issues/56050 and
https://github.com/llvm/llvm-project/issues/56051

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D130477
2022-09-02 20:00:22 +03:00
Alex Richardson df00dac828 [ARM] Use getSymbolPreferLocal() in GetARMGVSymbol
This allows relaxing some relocations to symbol+offset instead of emitting
a relocation against a symbol.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131433
2022-08-26 09:34:06 +00:00
Alex Richardson 0483b00875 Mark the $local function begin symbol as a function
While this does not matter for most targets, when building for Arm Morello,
we have to mark the symbol as a function and add size information, so that
LLD can correctly evaluate relocations against the local symbol.
Since Morello is an out-of-tree target, I tried to reproduce this with
in-tree backends and with the previous reviews applied this results in
a noticeable difference when targeting Thumb.

Background: Morello uses a method similar Thumb where the encoding mode is
specified in the LSB of the symbol. If we don't mark the target as a
function, the relocation will not have the LSB set and calls will end up
using the wrong encoding mode (which will almost certainly crash).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131429
2022-08-26 09:34:04 +00:00
Stephen Long 525af9f8eb [MC] Omit fill value if it's zero when emitting code alignment
Previously, we were generating zeroes when generating code alignments for AArch64, but now we should omit the value and let the assembler choose to generate nops or zeroes.

Reviewed By: efriedma, MaskRay

Differential Revision: https://reviews.llvm.org/D132508
2022-08-25 10:07:33 -07:00
Alvin Wong c0214db51a [llvm] Mark CFGuard fn ptr symbol as DSO local and add tests for mingw
For mingw target, if a symbol is not marked DSO local, a `.refptr` is
generated for it. This makes CFG check calls use an extra pointer
dereference, which adds extra overhead compared to the MSVC version,
so mark the CFG guard check funciton pointer DSO local to stop it.
This should have no effect on MSVC target.

Also adapt the existing cfguard tests to run for mingw targets, so that
this change is checked.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D132331
2022-08-23 23:39:39 +03:00
Alan Zhao 8c8cfaaf0a Revert "[ARM] Use getSymbolPreferLocal() in GetARMGVSymbol"
This reverts commit 6db15a82cc.

Reverted because this breaks offical Chrome builds targeting Android on
arm: https://crbug.com/1354305

Repro: https://drive.google.com/file/d/1pgQI2adwx3DJJqIYvMY4i249ouHU0rmu/view?usp=sharing
2022-08-22 16:16:37 -04:00
Eli Friedman cfd2c5ce58 Untangle the mess which is MachineBasicBlock::hasAddressTaken().
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code.  Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.

Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.

So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.

Discovered while trying to sort out related stuff on D102817.

Differential Revision: https://reviews.llvm.org/D124697
2022-08-16 16:15:44 -07:00