Commit Graph

51 Commits

Author SHA1 Message Date
Florian Hahn 786c687810
[AArch64] Add support for FMA intrinsics to shouldSinkOperands.
If the fma operates on a legal vector type, the indexed variants can be
used, if the second operand is a splat of a valid index.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D126234
2022-05-27 10:37:03 +01:00
Florian Hahn a9a012086a
[AArch64] Add additional tests for sinking free shuffles for FMAs. 2022-05-26 10:35:38 +01:00
Florian Hahn 8661725686
[AArch64] Add tests with free shuffles for indexed fma variants.
The new tests contain examples where shuffles are free, because indexed
fma instructions can be used.
2022-05-23 20:27:42 +01:00
Momchil Velikov d0ea42a7c1 [AArch64] Async unwind - function epilogues
Reviewed By: MaskRay, chill

Differential Revision: https://reviews.llvm.org/D112330
2022-04-12 16:50:50 +01:00
Momchil Velikov 50a97aacac [AArch64] Async unwind - function prologues
Re-commit of 32e8b550e5

This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.

The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:

```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes

A correct unwind info could look like:
```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.

Reviewed By: danielkiss, MaskRay

Differential Revision: https://reviews.llvm.org/D111411
2022-03-24 16:16:44 +00:00
Hans Wennborg 85c53c7092 Revert "[AArch64] Async unwind - function prologues"
It caused builds to assert with:

  (StackSize == 0 && "We already have the CFA offset!"),
  function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624.

when targeting iOS. See comment on the code review for reproducer.

> This patch rearranges emission of CFI instructions, so the resulting
> DWARF and `.eh_frame` information is precise at every instruction.
>
> The current state is that the unwind info is emitted only after the
> function prologue. This is fine for synchronous (e.g. C++) exceptions,
> but the information is generally incorrect when the program counter is
> at an instruction in the prologue or the epilogue, for example:
>
> ```
> stp     x29, x30, [sp, #-16]!           // 16-byte Folded Spill
> mov     x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> after the `stp` is executed the (initial) rule for the CFA still says
> the CFA is in the `sp`, even though it's already offset by 16 bytes
>
> A correct unwind info could look like:
> ```
> stp     x29, x30, [sp, #-16]!           // 16-byte Folded Spill
> .cfi_def_cfa_offset 16
> mov     x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> Having this information precise up to an instruction is useful for
> sampling profilers that would like to get a stack backtrace. The end
> goal (towards this patch is just a step) is to have fully working
> `-fasynchronous-unwind-tables`.
>
> Reviewed By: danielkiss, MaskRay
>
> Differential Revision: https://reviews.llvm.org/D111411

This reverts commit 32e8b550e5.
2022-03-04 17:36:26 +01:00
Momchil Velikov 63c9aca12a Revert "[AArch64] Async unwind - function epilogues"
This reverts commit 74319d6794.

It causes test failures that look like infinite loop in asan/hwasan
unwinding.
2022-03-02 15:01:57 +00:00
Momchil Velikov 74319d6794 [AArch64] Async unwind - function epilogues
Counterpart of https://reviews.llvm.org/D111411 this change makes the
unwind information instruction precise in function epilogues.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112330
2022-03-02 13:15:11 +00:00
Momchil Velikov 32e8b550e5 [AArch64] Async unwind - function prologues
This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.

The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:

```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes

A correct unwind info could look like:
```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.

Reviewed By: danielkiss, MaskRay

Differential Revision: https://reviews.llvm.org/D111411
2022-02-28 13:37:57 +00:00
Florian Hahn 166968a892
[AArch64] Add test cases where zext can be lowered to series of tbl.
Add a set of tests for upcoming patches that allow lowering vector zext
using AArch64 tbl instructions instead of shifts.
2022-02-25 15:36:32 +00:00
Sunho Kim 44601f4956 [AARCH64][NEON] Allow to sink operands for aarch64_neon_pmull
This teaches AArch64TargetLowering::shouldSinkOperands to sink the
operands of aarch64_neon_pmull intrinsic.

Differential Revision: https://reviews.llvm.org/D117944
2022-02-03 16:46:49 +00:00
Micah Weston 93deac2e2b [AArch64] Optimize add/sub with immediate through MIPeepholeOpt
Fixes the build issue with D111034, whose goal was to optimize
add/sub with long immediates.

Optimize ([add|sub] r, imm) -> ([ADD|SUB] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([add|sub] r, imm) -> ([SUB|ADD] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

The change which fixed the build issue in D111034 was the use of new virtual
registers so that SSA form is maintained until deleting MI.

Differential Revision: https://reviews.llvm.org/D117429
2022-01-22 12:39:22 +00:00
Florian Hahn 62476c7c14
Revert "[AArch64] Revive optimize add/sub with immediate through MIPeepholeOpt"
This reverts commit e6698f0992.

This commit appears to introduce new machine verifier failures when
building the llvm-test-suite with `-mllvm -verify-machineinstrs` enabled:

https://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-aarch64-O3/11061/

FAILED: MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o
/Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite-build/tools/timeit --summary MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o.time /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/compiler/bin/clang -DNDEBUG  -B /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin    -Wno-unused-command-line-argument -mllvm -verify-machineinstrs -O3 -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk   -w -Werror=date-time -DTORONTO -MD -MT MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -MF MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o.d -o MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o   -c /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite/MultiSource/Benchmarks/Olden/health/health.c
*** Bad machine code: Illegal virtual register for instruction ***
- function:    alloc_tree
- basic block: %bb.1 if.else (0x7fc0db8f8bb0)
- instruction: %31:gpr64 = nsw MADDXrrr killed %39:gpr64sp, killed %25:gpr64, $xzr
- operand 1:   killed %39:gpr64sp
Expected a GPR64 register, but got a GPR64sp register
fatal error: error in backend: Found 1 machine code errors.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/compiler/bin/clang -DNDEBUG -B /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin -Wno-unused-command-line-argument -mllvm -verify-machineinstrs -O3 -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk -w -Werror=date-time -DTORONTO -MD -MT MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -MF MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o.d -o MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -c /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite/MultiSource/Benchmarks/Olden/health/health.c
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module '/Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite/MultiSource/Benchmarks/Olden/health/health.c'.
4.	Running pass 'Verify generated machine code' on function '@alloc_tree'
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  clang         0x000000011191896b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 43
1  clang         0x00000001119179b5 llvm::sys::RunSignalHandlers() + 85
2  clang         0x00000001119180e2 llvm::sys::CleanupOnSignal(unsigned long) + 210
3  clang         0x0000000111849f6a (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) + 106
4  clang         0x0000000111849ee8 llvm::CrashRecoveryContext::HandleExit(int) + 24
5  clang         0x0000000111914acc llvm::sys::Process::Exit(int, bool) + 44
6  clang         0x000000010f4e9be9 LLVMErrorHandler(void*, char const*, bool) + 89
7  clang         0x0000000114eba333 llvm::report_fatal_error(llvm::Twine const&, bool) + 323
8  clang         0x0000000110d8c620 (anonymous namespace)::MachineVerifier::BBInfo::~BBInfo() + 0
9  clang         0x0000000110cdddca llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 378
10 clang         0x00000001110b0154 llvm::FPPassManager::runOnFunction(llvm::Function&) + 1092
11 clang         0x00000001110b6268 llvm::FPPassManager::runOnModule(llvm::Module&) + 72
12 clang         0x00000001110b074a llvm::legacy::PassManagerImpl::run(llvm::Module&) + 986
13 clang         0x0000000111c20ad4 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) + 3764
14 clang         0x0000000111f6dd31 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 1905
15 clang         0x00000001131a28b3 clang::ParseAST(clang::Sema&, bool, bool) + 643
16 clang         0x00000001122b02a4 clang::FrontendAction::Execute() + 84
17 clang         0x000000011222d6a9 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 873
18 clang         0x000000011232faf5 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 661
19 clang         0x000000010f4e9860 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) + 2544
20 clang         0x000000010f4e7168 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) + 312
21 clang         0x00000001120ab187 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const::$_1>(long) + 23
22 clang         0x0000000111849eb4 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) + 228
23 clang         0x00000001120aac24 clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const + 324
24 clang         0x000000011207b85d clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const + 221
25 clang         0x000000011207bdad clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) const + 125
26 clang         0x0000000112092f7c clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) + 204
27 clang         0x000000010f4e6977 main + 10375
28 libdyld.dylib 0x00007fff6be90cc9 start + 1
29 libdyld.dylib 0x0000000000000018 start + 18446603338705728336
clang-14: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 14.0.0 (https://github.com/llvm/llvm-project.git c90d136be4)
Target: arm64-apple-darwin19.5.0
Thread model: posix
InstalledDir: /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/compiler/bin
clang-14: note: diagnostic msg:
********************
2022-01-18 13:17:02 +00:00
Micah Weston e6698f0992 [AArch64] Revive optimize add/sub with immediate through MIPeepholeOpt
Fixes the build issue with D111034, whose goal was to optimize
add/sub with long immediates.

Optimize ([add|sub] r, imm) -> ([ADD|SUB] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([add|sub] r, imm) -> ([SUB|ADD] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

The change which fixed the build issue in D111034 was the use of new virtual
registers so that SSA form is maintained until deleting MI.

Differential Revision: https://reviews.llvm.org/D117429
2022-01-17 17:17:15 +00:00
David Green 760d4d03d5 [AArch64] Sink splat shuffles to lane index intrinsics
This teaches AArch64TargetLowering::shouldSinkOperands to sink splat
shuffles to certain neon intrinsics, so that they can make use of the
lane variants of the instructions that are available.

Differential Revision: https://reviews.llvm.org/D112994
2021-11-22 08:11:35 +00:00
Ben Shi 59c3b48d99 Revert "[AArch64] Optimize add/sub with immediate"
This reverts commit 3de3ca3137.
2021-11-03 14:15:21 +08:00
Ben Shi 3de3ca3137 [AArch64] Optimize add/sub with immediate
Optimize ([add|sub] r, imm) -> ([ADD|SUB] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([add|sub] r, imm) -> ([SUB|ADD] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Reviewed By: jaykang10, dmgreen

Differential Revision: https://reviews.llvm.org/D111034
2021-11-03 03:06:43 +00:00
Ben Shi d0dbc991c0 Revert "[AArch64] Optimize add/sub with immediate"
This reverts commit 9bf6bef995.
2021-10-16 22:17:18 +00:00
Ben Shi 9bf6bef995 [AArch64] Optimize add/sub with immediate
Optimize ([add|sub] r, imm) -> ([ADD|SUB] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([add|sub] r, imm) -> ([SUB|ADD] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Reviewed By: jaykang10, dmgreen

Differential Revision: https://reviews.llvm.org/D111034
2021-10-16 08:50:39 +00:00
David Green adec922361 [AArch64] Make -mcpu=generic schedule for an in-order core
We would like to start pushing -mcpu=generic towards enabling the set of
features that improves performance for some CPUs, without hurting any
others. A blend of the performance options hopefully beneficial to all
CPUs. The largest part of that is enabling in-order scheduling using the
Cortex-A55 schedule model. This is similar to the Arm backend change
from eecb353d0e which made -mcpu=generic perform in-order scheduling
using the cortex-a8 schedule model.

The idea is that in-order cpu's require the most help in instruction
scheduling, whereas out-of-order cpus can for the most part out-of-order
schedule around different codegen. Our benchmarking suggests that
hypothesis holds. When running on an in-order core this improved
performance by 3.8% geomean on a set of DSP workloads, 2% geomean on
some other embedded benchmark and between 1% and 1.8% on a set of
singlecore and multicore workloads, all running on a Cortex-A55 cluster.

On an out-of-order cpu the results are a lot more noisy but show flat
performance or an improvement. On the set of DSP and embedded
benchmarks, run on a Cortex-A78 there was a very noisy 1% speed
improvement. Using the most detailed results I could find, SPEC2006 runs
on a Neoverse N1 show a small increase in instruction count (+0.127%),
but a decrease in cycle counts (-0.155%, on average). The instruction
count is very low noise, the cycle count is more noisy with a 0.15%
decrease not being significant. SPEC2k17 shows a small decrease (-0.2%)
in instruction count leading to a -0.296% decrease in cycle count. These
results are within noise margins but tend to show a small improvement in
general.

When specifying an Apple target, clang will set "-target-cpu apple-a7"
on the command line, so should not be affected by this change when
running from clang. This also doesn't enable more runtime unrolling like
-mcpu=cortex-a55 does, only changing the schedule used.

A lot of existing tests have updated. This is a summary of the important
differences:
 - Most changes are the same instructions in a different order.
 - Sometimes this leads to very minor inefficiencies, such as requiring
   an extra mov to move variables into r0/v0 for the return value of a test
   function.
 - misched-fusion.ll was no longer fusing the pairs of instructions it
   should, as per D110561. I've changed the schedule used in the test
   for now.
 - neon-mla-mls.ll now uses "mul; sub" as opposed to "neg; mla" due to
   the different latencies. This seems fine to me.
 - Some SVE tests do not always remove movprfx where they did before due
   to different register allocation giving different destructive forms.
 - The tests argument-blocks-array-of-struct.ll and arm64-windows-calls.ll
   produce two LDR where they previously produced an LDP due to
   store-pair-suppress kicking in.
 - arm64-ldp.ll and arm64-neon-copy.ll are missing pre/postinc on LPD.
 - Some tests such as arm64-neon-mul-div.ll and
   ragreedy-local-interval-cost.ll have more, less or just different
   spilling.
 - In aarch64_generated_funcs.ll.generated.expected one part of the
   function is no longer outlined. Interestingly if I switch this to use
   any other scheduled even less is outlined.

Some of these are expected to happen, such as differences in outlining
or register spilling. There will be places where these result in worse
codegen, places where they are better, with the SPEC instruction counts
suggesting it is not a decrease overall, on average.

Differential Revision: https://reviews.llvm.org/D110830
2021-10-09 15:58:31 +01:00
David Green 92128b7801 [AArch64] Regenerate even more tests
This updates a few more check lines, in some mte tests that were close
to auto generated already and some CodeGenPrepare/consthoist tests where
being able to see the entire code sequence is useful for determining
whether code differences are improvements or not.
2021-10-06 14:32:01 +01:00
Andrew Wei c9066c5d37 [CGP] Fix the crash for combining address mode when having cyclic dependency
In the combination of addressing modes, when replacing the matched phi nodes,
sometimes the phi node to be replaced has been modified. For example,
there’s matcher set [A, B] and [C, A], which will have cyclic dependency:
A is replaced by B and C will be replaced by A. Because we tried to match new phi node
to another new phi node, we should ignore new phi nodes when mapping new phi node to old one.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D108635
2021-08-26 22:52:42 +08:00
Tiehu Zhang 9cfa9b44a5 [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block
In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree,
which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert according to the use chain

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107262
2021-08-17 18:58:15 +08:00
David Green 013030a0b2 [AArch64] Correct sinking of shuffles to adds/subs
This was checking extends as shuffles, where as we should be checking
the operands. This helps sink the shuffles, creating more addl/subl
instructions.

Differential Revision: https://reviews.llvm.org/D107623
2021-08-10 13:25:42 +01:00
David Green 3f74a68c35 [AArch64] Regenerate sink-free-instructions.ll. NFC 2021-08-10 13:25:42 +01:00
David Sherwood 8439415333 [IR] Let ConstantVector::getSplat use poison instead of undef
This patch updates ConstantVector::getSplat to use poison instead
of undef when using insertelement/shufflevector to splat.

This follows on from D93793.

Differential Revision: https://reviews.llvm.org/D107751
2021-08-10 08:27:43 +01:00
Juneyoung Lee ae6e89327b Precommit tests that have poison as shufflevector's placeholder
This commit copies existing tests at llvm/Transforms containing
'shufflevector X, undef' and replaces them with 'shufflevector X, poison'.
The new copied tests have *-inseltpoison.ll suffix at its file name
(as db7a2f347f did)
See https://reviews.llvm.org/D93793

Test files listed using

grep -R -E "^[^;]*shufflevector <.*> .*, <.*> undef" | cut -d":" -f1 | uniq

Test files copied & updated using

file_org=llvm/test/Transforms/$1
if [[ "$file_org" = *-inseltpoison.ll ]]; then
  file=$file_org
else
  file=${file_org%.ll}-inseltpoison.ll
  if [ ! -f $file ]; then
    cp $file_org $file
  fi
fi
sed -i -E 's/^([^;]*)shufflevector <(.*)> (.*), <(.*)> undef/\1shufflevector <\2> \3, <\4> poison/g' $file
head -1 $file | grep "Assertions have been autogenerated by utils/update_test_checks.py" -q
if [ "$?" == 1 ]; then
  echo "$file : should be manually updated"
  # The test is manually updated
  exit 1
fi
python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file
2020-12-29 17:09:31 +09:00
Juneyoung Lee db7a2f347f Precommit transform tests that have poison as insertelement's placeholder
This commit copies existing tests at llvm/Transforms and replaces
'insertelement undef' in those files with 'insertelement poison'.
(see https://reviews.llvm.org/D93586)

Tests listed using this script:

grep -R -E '^[^;]*insertelement <.*> undef,' . | cut -d":" -f1 | uniq |
wc -l

Tests updated:

file_org=llvm/test/Transforms/$1
file=${file_org%.ll}-inseltpoison.ll
cp $file_org $file
sed -i -E 's/^([^;]*)insertelement <(.*)> undef/\1insertelement <\2> poison/g' $file
head -1 $file | grep "Assertions have been autogenerated by utils/update_test_checks.py" -q
if [ "$?" == 1 ]; then
  echo "$file : should be manually updated"
  # I manually updated the script
  exit 1
fi
python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file
2020-12-24 11:46:17 +09:00
Paul Walker 6d35bd1d48 [CodeGenPrepare] Update optimizeGatherScatterInst for scalable vectors.
optimizeGatherScatterInst does nothing specific to fixed length vectors
but uses FixedVectorType to extract the number of elements.  This patch
simply updates the code to use VectorType and getElementCount instead.

For testing I just copied Transforms/CodeGenPrepare/X86/gather-scatter-opt.ll
replacing `<4 x ` with `<vscale x 4`.

Differential Revision: https://reviews.llvm.org/D92572
2020-12-15 10:57:51 +00:00
Florian Hahn 7769030b93 Recommit "[PatternMatch] Match XOR variant of unsigned-add overflow check."
This version fixes a buildbot failure cause by picking the wrong insert
point for XORs. We cannot pick the XOR binary operator as insert point,
as it is not guaranteed that both input operands for the overflow
intrinsic are defined before it.

This reverts the revert commit
c7fc0e5da6.
2020-02-23 18:33:18 +00:00
Florian Hahn c7fc0e5da6 Revert "[PatternMatch] Match XOR variant of unsigned-add overflow check."
This reverts commit e01a3d49c2.
and commit a6a585b803.

This causes a failure on GreenDragon:
http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/9597
2020-02-19 19:37:08 +01:00
Florian Hahn e01a3d49c2 [PatternMatch] Match XOR variant of unsigned-add overflow check.
Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match
the expected pattern in CodeGenPerpare via UAddWithOverflow.

This causes a regression over Clang 7 on both X86 and AArch64:
https://gcc.godbolt.org/z/juhXYV

This patch extends UAddWithOverflow to also catch the XOR case, if the
XOR is only used in the ICMP. This covers just a single case, but I'd
like to make sure I am not missing anything before tackling the other
cases.

Reviewers: nikic, RKSimon, lebedev.ri, spatel

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D74228
2020-02-19 15:25:18 +01:00
Florian Hahn 7cbf710396 [CGP] Precommit tests for D74228. 2020-02-19 09:24:06 +01:00
Florian Hahn 106ae108c1 [CGP] Add uaddo test with math used, SPARC/AArch64 variants. 2020-02-18 12:49:08 +01:00
Fangrui Song ac14f7b10c [lit] Delete empty lines at the end of lit.local.cfg NFC
llvm-svn: 363538
2019-06-17 09:51:07 +00:00
Eric Christopher cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Florian Hahn 3b251963c3 [CGP] Add support for sinking operands to their users, if they are free.
This patch improves code generation for some AArch64 ACLE intrinsics. It adds
support to CGP to duplicate and sink operands to their user, if they can be
folded into a target instruction, like zexts and sub into usubl. It adds a
TargetLowering hook shouldSinkOperands, which looks at the operands of
instructions to see if sinking is profitable.

I decided to add a new target hook, as for the sinking to be profitable,
at least on AArch64, we have to look at multiple operands of an
instruction, instead of looking at the users of a zext for example.

The sinking is done in CGP, because it works around an instruction
selection limitation. If instruction selection is not limited to a
single basic block, this patch should not be needed any longer.

Alternatively this could be done in the LoopSink pass, which tries to
undo LICM for instructions in blocks that are not executed frequently.

Note that we do not force the operands to sink to have a single user,
because we duplicate them before sinking. Therefore this is only
desirable if they really can be done for free. Additionally we could
consider the impact on live ranges later on.

This should fix https://bugs.llvm.org/show_bug.cgi?id=40025.

As for performance, we have internal code that uses intrinsics and can
be speed up by 10% by this change.

Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D57377

llvm-svn: 353152
2019-02-05 10:27:40 +00:00
Eli Friedman a69084ffa8 [CodeGenPrepare] Fix bad IR created by large offset GEP splitting.
Creating the IR builder, then modifying the CFG, leads to an IRBuilder
where the BB and insertion point are inconsistent, so new instructions
have the wrong parent.

Modified an existing test because the test wasn't covering anything
useful (the "invoke" was not actually an invoke by the time we hit the
code in question).

Differential Revision: https://reviews.llvm.org/D55729

llvm-svn: 349693
2018-12-19 22:52:04 +00:00
Krzysztof Pszeniczny 2bfe759a8d Fix a use-after-RAUW bug in large GEP splitting
Summary:
Large GEP splitting, introduced in rL332015, uses a `DenseMap<AssertingVH<Value>, ...>`. This causes an assertion to fail (in debug builds) or undefined behaviour to occur (in release builds) when a value is RAUWed.

This manifested itself in the 7zip benchmark from the llvm test suite built on ARM with `-fstrict-vtable-pointers` enabled while RAUWing invariant group launders and splits in CodeGenPrepare.

This patch merges the large offsets of the argument and the result of an invariant.group strip/launder intrinsic before RAUWing.

Reviewers: Prazek, javed.absar, haicheng, efriedma

Reviewed By: Prazek, efriedma

Subscribers: kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D51936

llvm-svn: 344802
2018-10-19 19:02:16 +00:00
Haicheng Wu 0aae2bc260 [CGP] Split large data structres to sink more GEPs
Accessing the members of a large data structures needs a lot of GEPs which
usually have large offsets due to the size of the underlying data structure. If
the offsets are too large to fit into the r+i addressing mode, these GEPs cannot
be sunk to their users' blocks and many extra registers are needed then to carry
the values of these GEPs.

This patch tries to split a large data struct starting from %base like the
following.

Before:
BB0:
  %base     =

BB1:
  %gep0     = gep %base, off0
  %gep1     = gep %base, off1
  %gep2     = gep %base, off2

BB2:
  %load1    = load %gep0
  %load2    = load %gep1
  %load3    = load %gep2

After:
BB0:
  %base     =
  %new_base = gep %base, off0

BB1:
  %new_gep0 = %new_base
  %new_gep1 = gep %new_base, off1 - off0
  %new_gep2 = gep %new_base, off2 - off0

BB2:
  %load1    = load i32, i32* %new_gep0
  %load2    = load i32, i32* %new_gep1
  %load3    = load i32, i32* %new_gep2

In the above example, the struct is split into two parts. The first part still
starts from %base and the second part starts from %new_base. After the
splitting, %new_gep1 and %new_gep2 have smaller offsets and then can be sunk to
BB2 and folded into their users.

The algorithm to split data structure is simple and very similar to the work of
merging SExts. First, it collects GEPs that have large offsets when iterating
the blocks. Second, it splits the underlying data structures and updates the
collected GEPs to use smaller offsets.

Differential Revision: https://reviews.llvm.org/D42759

llvm-svn: 332015
2018-05-10 18:27:36 +00:00
Matt Arsenault 90e4f719e1 Fix some misc. -enable-var-scope violations
llvm-svn: 318006
2017-11-13 01:47:52 +00:00
Jun Bum Lim 90b6b5074a [CodeGenPrep] Skip merging empty case blocks
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly :

Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl

Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 289988
2016-12-16 20:38:39 +00:00
Jun Bum Lim f9416af191 Revert "[CodeGenPrep] Skip merging empty case blocks"
This reverts commit r289951.

llvm-svn: 289960
2016-12-16 17:06:14 +00:00
Jun Bum Lim 85347dde27 [CodeGenPrep] Skip merging empty case blocks
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block:

Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl

Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 289951
2016-12-16 16:03:31 +00:00
Joerg Sonnenberger caaa82d90d Revert r287553: [CodeGenPrep] Skip merging empty case blocks
It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line
670 ("Expected irreducible CFG").

llvm-svn: 288052
2016-11-28 18:56:54 +00:00
Jun Bum Lim 82f55c5446 [CodeGenPrep] Skip merging empty case blocks
Summary: Merging an empty case block into the header block of switch could cause
ISel to add COPY instructions in the header of switch, instead of the case
block, if the case block is used as an incoming block of a PHI. This could
potentially increase dynamic instructions, especially when the switch is in a
loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl

Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 287553
2016-11-21 16:47:28 +00:00
NAKAMURA Takumi 0b498c7af8 Move free-zext.ll to llvm/test/Transforms/CodeGenPrepare/AArch64/
llvm-svn: 253730
2015-11-20 22:55:34 +00:00
Sanjay Patel 0ed9aeaa5f [CGP] widen switch condition and case constants to target's register width (2nd try)
This is a redo of r251849 except the tests have been split into arch-specific folders
to hopefully make the bots happy.

This is a follow-up from the discussion in D12965. The block-at-a-time limitation of
SelectionDAG also came up in D13297.

Without the InstCombine change from D12965, I don't expect this patch to make any
difference in the real world because InstCombine does not shrink cases like this in
visitSwitchInst(). But we need to have this CGP safety harness in place before
proceeding with any shrinkage in D12965, so we won't generate extra extends for compares.

I've opted for IR regression tests in the patch because that seems like a clearer way to
test the transform, but PowerPC CodeGen for an i16 widening test is shown below. x86
will need more work to solve: https://llvm.org/bugs/show_bug.cgi?id=22473

Before:
BB#0:
  mr 4, 3
  extsh. 3, 4
  ble 0, .LBB0_5
 BB#1:
  cmpwi  3, 99
  bgt    0, .LBB0_9
 BB#2:
  rlwinm 4, 4, 0, 16, 31      <--- 32-bit mask/extend
  li 3, 0
  cmplwi         4, 1
  beqlr 0
 BB#3:
  cmplwi         4, 10
  bne    0, .LBB0_12
 BB#4:
  li 3, 1
  blr
.LBB0_5:
  rlwinm 3, 4, 0, 16, 31      <--- 32-bit mask/extend
  cmplwi         3, 65436
  beq    0, .LBB0_13
 BB#6:
  cmplwi         3, 65526
  beq    0, .LBB0_15
 BB#7:
  cmplwi         3, 65535
  bne    0, .LBB0_12
 BB#8:
  li 3, 4
  blr
.LBB0_9:
  rlwinm 3, 4, 0, 16, 31      <--- 32-bit mask/extend
  cmplwi         3, 100
  beq    0, .LBB0_14
...

After:
BB#0:
  rlwinm 4, 3, 0, 16, 31      <--- mask/extend to 32-bit and then use that for comparisons
  cmpwi  4, 999
  ble 0, .LBB0_5
 BB#1:
  lis 3, 0
  ori 3, 3, 65525
  cmpw   4, 3
  bgt    0, .LBB0_9
 BB#2:
  cmplwi         4, 1000
  beq    0, .LBB0_14
 BB#3:
  cmplwi         4, 65436
  bne    0, .LBB0_13
 BB#4:
  li 3, 6
  blr
.LBB0_5:
  li 3, 0
  cmplwi         4, 1
  beqlr 0
 BB#6:
  cmplwi         4, 10
  beq    0, .LBB0_12
 BB#7:
  cmplwi         4, 100
  bne    0, .LBB0_13
 BB#8:
  li 3, 2
  blr
.LBB0_9:
  cmplwi         4, 65526
  beq    0, .LBB0_15
 BB#10:
  cmplwi         4, 65535
  bne    0, .LBB0_13
...


Differential Revision: http://reviews.llvm.org/D13532

llvm-svn: 251857
2015-11-02 23:22:49 +00:00
Ahmed Bougacha 0788d49a40 [CodeGenPrepare][AArch64] Fix a TLI legality check on iPTR to use a lowered instead.
Fixes PR21548.  Related to PR20474.

llvm-svn: 221820
2014-11-12 22:16:55 +00:00