Commit Graph

59187 Commits

Author SHA1 Message Date
Michael Liao bf41c4d29e [codegen] Ensure target flags are cleared/set properly. NFC.
- When an operand is changed into an immediate value or like, ensure their
  target flags being cleared or set properly.

Differential Revision: https://reviews.llvm.org/D87109
2020-09-03 18:37:39 -04:00
Simon Pilgrim 1673a08044 SelectionDAG.h - remove unnecessary FunctionLoweringInfo.h include. NFCI.
Use forward declarations and move the include down to dependent files that actually use it.

This also exposes a number of implicit dependencies on KnownBits.h
2020-09-03 18:33:25 +01:00
Simon Pilgrim 83ca548fcb WebAssemblyUtilities.h - reduce unnecessary includes to forward declarations. NFCI. 2020-09-03 17:43:35 +01:00
Simon Pilgrim 58afaecdc2 X86/X86TargetObjectFile.cpp - remove unused headers. NFCI. 2020-09-03 15:17:44 +01:00
Simon Pilgrim 0563cd6739 Fix spelling mistake. NFC. 2020-09-03 15:17:44 +01:00
Simon Pilgrim 890707aa01 [X86] Avoid llvm-qualified-auto warning by not using auto. NFC.
Try to consistently use the actual type name in the file.
2020-09-03 14:21:17 +01:00
Simon Pilgrim 23d9f4b958 [X86] Fix llvm-qualified-auto warning by using auto*. NFC. 2020-09-03 14:21:17 +01:00
Simon Pilgrim 5b29269744 [X86] Fix llvm-qualified-auto warning by using const auto*. NFC. 2020-09-03 14:21:17 +01:00
Simon Pilgrim e56edb801b [X86][SSE] Fold select(X > -1, A, B) -> select(0 > X, B, A) (PR47404)
Help PBLENDVB peek through to the sign bit source of the selection mask by swapping the select condition and inputs.
2020-09-03 13:02:08 +01:00
Ben Shi c5716447c1 [NFC][RISCV] Simplify pass arg of RISCVMergeBaseOffsetOpt
Reviewed By: lenary, asb

Differential Revision: https://reviews.llvm.org/D87069
2020-09-03 20:01:23 +08:00
Martin Storsjö f5e2ea9a43 [AArch64] Add asm directives for the remaining SEH unwind codes
Add support in llvm-readobj for displaying them and support in the
asm parsser, AArch64TargetStreamer and MCWin64EH for emitting them.

The directives for the remaining basic opcodes have names that
match the opcode in the documentation.

The directives for custom stack cases, that are named
MSFT_OP_TRAP_FRAME, MSFT_OP_MACHINE_FRAME, MSFT_OP_CONTEXT
and MSFT_OP_CLEAR_UNWOUND_TO_CALL, are given matching assembler
directive names that fit into the rest of the opcode naming;
.seh_trap_frame, .seh_context, .seh_clear_unwound_to_call

The opcode MSFT_OP_MACHINE_FRAME is mapped to the existing
opecode enum UOP_PushMachFrame that is used on x86_64, and also
uses the corresponding existing x86_64 directive name
.seh_pushframe.

Differential Revision: https://reviews.llvm.org/D86889
2020-09-03 11:12:01 +03:00
Nemanja Ivanovic 69289cc10f [PowerPC] Fix broken kill flag after MI peephole
The test case in https://bugs.llvm.org/show_bug.cgi?id=47373 exposed
two bugs in the PPC back end. The first one was fixed in commit
2771407584 but the test case had to
be added without -verify-machineinstrs due to the second bug.
This commit fixes the use-after-kill that is left behind by the
PPC MI peephole optimization.
2020-09-02 17:07:49 -05:00
Nemanja Ivanovic 2771407584 [PowerPC] Do not legalize vector FDIV without VSX
Quite a while ago, we legalized these nodes as we added custom
handling for reciprocal estimates in the back end. We have since
moved to target-independent combines but neglected to turn off
legalization. As a result, we can now get selection failures on
non-VSX subtargets as evidenced in the listed PR.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=47373
2020-09-02 16:03:36 -05:00
Albion Fung 5d1fe3f903 [PowerPC] Implemented Vector Multiply Builtins
This patch implements the builtins for Vector Multiply Builtins (vmulxxd family of instructions), and adds the appropriate test cases for these builtins. The builtins utilize the vector multiply instructions itnroduced with ISA 3.1.

Differential Revision: 	https://reviews.llvm.org/D83955
2020-09-02 14:16:21 -05:00
Dmitry Preobrazhensky ecde200209 [AMDGPU][MC] Corrected parser to avoid generation of excessive error messages
Summary of changes:
- Changed parser to eliminate generation of excessive error messages;
- Corrected lit tests to match all expected error messages;
- Corrected lit tests to guard against unwanted extra messages (added option "--implicit-check-not=error:");
- Added missing checks and fixed some typos in tests.

See bug 46907: https://bugs.llvm.org/show_bug.cgi?id=46907

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D86940
2020-09-02 19:42:18 +03:00
Simon Pilgrim 888049b97a [X86][SSE] Fold vselect(pshufb,pshufb) -> or(pshufb,pshufb)
If the PSHUFBs have no other uses, then we can force the unselected elements to zero to OR them instead, avoiding both an extra mask load and a costly variable blend.

Eventually we should try to bring this into shuffle combining, once we can more easily convert between shuffles + select patterns.
2020-09-02 16:55:00 +01:00
Jay Foad 4bdab2e86a [AMDGPU] Fix offset for REL32_HI relocs
The addend in a REL32 reloc needs to be adjusted to account for the
offset from the PC value returned by the s_getpc instruction to the
point where the reloc is applied. This was being done correctly for
(GOTPC)REL32_LO but not for (GOTPC)REL32_HI. This will only make a
difference if the target symbol happens to get loaded almost exactly
a multiple of 4G away from the relocated instructions.

Differential Revision: https://reviews.llvm.org/D86938
2020-09-02 10:55:55 +01:00
Sander de Smalen f13beac51b [AArch64][SVE] Preserve full vector regs over EH edge.
Unwinders may only preserve the lower 64bits of Neon and SVE registers,
as only the registers in the base ABI are guaranteed to be preserved
over the exception edge. The caller will need to preserve additional
registers for when the call throws an exception and the unwinder has
tried to recover state.

For  e.g.

    svint32_t bar(svint32_t);
    svint32_t foo(svint32_t x, bool *err) {
      try { bar(x); } catch (...) { *err = true; }
      return x;
    }

`z0` needs to be spilled before the call to `bar(x)` and reloaded before
returning from foo, as the exception handler may have clobbered z0.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84737
2020-09-02 10:54:18 +01:00
Martin Storsjö 4820af2bfc [X86] Remove superfluous trailing semicolons, fixing warnings. NFC. 2020-09-02 11:43:27 +03:00
Simon Pilgrim 21d02dc595 [X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add general shuffle combining support
This patch uses partial DemandedElts masks to further simplify target shuffle chains and finally starts making target shuffle combining part of SimplifyDemandedBits/SimplifyDemandedVectorElts.

We already manage this for Depth == 0 cases, where combineX86ShuffleChain would early-out if the shuffle combined to the same op, but the patch generalizes this by manipulating the depth handling of combineX86ShufflesRecursively - calling with a new Depth = 0 and reducing the maximum shuffle combine depth accordingly.

Differential Revision: https://reviews.llvm.org/D66004
2020-09-02 09:24:46 +01:00
Amy Kwan 0c2d872d5d [PowerPC] Implement builtins for xvcvspbf16 and xvcvbf16spn
This patch adds the builtin implementation for the xvcvspbf16 and xvcvbf16spn
instructions.

Differential Revision: https://reviews.llvm.org/D86795
2020-09-01 17:16:43 -05:00
Amara Emerson 520ab710fb Revert "Revert "[GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _)" (and dependent patch "Optimize away a Not feeding a brcond by using tbz instead of tbnz.")"
This reverts commit 8693ddc743.

Re-committing with the test requiring asserts.
2020-09-01 14:29:04 -07:00
Jordan Rupprecht 8693ddc743 Revert "[GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _)" (and dependent patch "Optimize away a Not feeding a brcond by using tbz instead of tbnz.")
This reverts commit 8ad8f484b6. It causes crashes when running `ninja check-llvm-codegen-aarch64-globalisel`, e.g.
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24132/steps/test-stage1-compiler/logs/stdio.
Note that the crash does not seem to reproduce in debug builds.

5ded444252 depends on this, so revert that too.
2020-09-01 13:31:57 -07:00
Michael Liao 1f4e7463b5 [amdgpu] Run SROA after loop unrolling.
Summary: - There are promotable `alloca`s after loop unrolling.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, nikic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84252
2020-09-01 16:09:56 -04:00
Owen Anderson 5987da8764 Revert "Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain""
This reverts commit bc9a29b9ee.

The reasoning that this patch was wrong was itself incorrect
(see discussion on llvm-commits). This patch does seem to be exposing
a latent SVE code generation bug on non-public tests, which should
not block a correctness fix for public, non-SVE use cases.
2020-09-01 19:29:03 +00:00
Sean Fertile fecc27db11 [PowerPC][AIX] Update save/restore offset for frame and base pointers.
General purpose registers 30 and 31 are handled differently when they are
reserved as the base-pointer and frame-pointer respectively. This fixes the
offset of their fixed-stack objects when there are fpr calle-saved registers.

Differential Revision: https://reviews.llvm.org/D85850
2020-09-01 14:13:05 -04:00
Amara Emerson 5ded444252 [AArch64][GlobalISel] Optimize away a Not feeding a brcond by using tbz instead of tbnz.
Usually brconds are fed by compares, but not always, in which case we would
miss this fold.

Differential Revision: https://reviews.llvm.org/D86413
2020-09-01 11:06:06 -07:00
Eric Astor a57fdcdd40 x87 FPU state instructions do not use an f32 memory location
These instructions actually use a 512-byte location, where bytes 464-511 are ignored.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D86942
2020-09-01 13:50:07 -04:00
Qiu Chaofan 29ae448595 [PowerPC] Handle STRICT_FSETCC(S) in more cases
On -O0, i1 strict_fsetcc will be promoted to i32. We don't handle that
in TD patterns. This patch fills logic in PPCISelDAGToDAG to handle more
cases.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D86595
2020-09-02 00:33:21 +08:00
Amy Kwan ca2227c1b3 [PowerPC] Implement instruction definitions/MC Tests for xvcvspbf16 and xvcvbf16spn
This patch adds the td instruction definitions of the xvcvspbf16 and xvcvbf16spn
instructions, along with their respective MC tests.

Differential Revision: https://reviews.llvm.org/D86794
2020-09-01 10:59:43 -05:00
Paul Walker bc9a29b9ee Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain"
This reverts commit e9d9a61208.

This patch was previously revert by 04879086b4
with the reapplication being done after breaking the assert used to
ensure SP is always 16-byte aligned, which is a requirement of the AAPCS.

For extra context the latest patch caused runtime failures when
building with "-march=armv8-a+sve -mllvm -aarch64-sve-vector-bits-min=256".
2020-09-01 16:09:37 +01:00
Matt Arsenault 18bbd9f15e GlobalISel: Artifact combine unmerge of unmerge
Unmerges have the same fundamental problem as G_TRUNC, and G_TRUNC
could be implemented in terms of G_UNMERGE_VALUES. Reducing the number
of elements in unmerge results ends up producing the original unmerge
type profile, so the artifact combiner needs to eliminate the
intermediate illegal registers. This avoids infinite looping in the
legalizer in a future change.

Assuming an unmerge has each result unmerged the same way, this ends
up producing a new unmerge of the source for every definition. I'm not
sure if the artifact combiner should either insert temporary merges
here and erase the original merge, or if the combiner should look at
uses from defs rather than defs from uses for unmerges.

In a few cases this regresses from using 16-bit shifts for 8-bit
values to using 32-bit shifts, but I think these can be legalized
later (the other legalization rules don't try very hard to use 16-bit
shifts either).
2020-09-01 11:01:33 -04:00
David Green ffd0b31c7c Revert "[ARM] Register pressure with -mthumb forces register reload before each call"
Expensive checks are failing, complaining about additional MMO operands
added to the branch.
2020-09-01 07:39:54 +01:00
Prathamesh Kulkarni 85b4d286d7 [ARM] Register pressure with -mthumb forces register reload before each call
This patch implements the foldMemoryOperand hook in Thumb1InstrInfo,
allowing tBLXr and a spilled function address to be combined back into a
tBL. This can help with codesize at Oz, especailly in the tinycrypt
library.

Differential Revision: https://reviews.llvm.org/D79785
2020-08-31 20:00:30 +01:00
Sanjay Patel 2d3e12818e [FastISel] update to use intrinsic's isCommutative(); NFC
This requires adding a missing 'const' to the definition because
the callers are using const args, but there should be no change
in behavior.

The intrinsic method was added with D86798 / rG096527214033
2020-08-30 11:36:41 -04:00
Krzysztof Parzyszek 69fac677bc [Hexagon] Fix perfect shuffle generation for single vectors
Perfect shuffle instruction (vdealvdd/vshuffvdd) work on vector
pairs. When given a single input vector, half of it first needs
to be transposed into the other vector before the generated
shuffles can take effect. Also the first transpose needs to be
undone at the end (this last step was missing).
2020-08-30 06:43:16 -05:00
Martin Storsjö 5b86d130e2 [AArch64] Generate and parse SEH assembly directives
This ensures that you get the same output regardless if generating
code directly to an object file or if generating assembly and
assembling that.

Add implementations of the EmitARM64WinCFI*() methods in
AArch64TargetAsmStreamer, and fill in one blank in MCAsmStreamer.

Add corresponding directive handlers in AArch64AsmParser and
COFFAsmParser.

Some SEH directive names have been picked to match the prior art
for SEH assembly directives for x86_64, e.g. the spelling of
".seh_startepilogue" matching the preexisting ".seh_endprologue".

For the directives for saving registers, the exact spelling
from the arm64 documentation is picked, e.g. ".seh_save_reg" (to follow
that naming for all the other ones, e.g. ".seh_save_fregp_x"), while
the corresponding one for x86_64 is plain ".seh_savereg" without the
second underscore.

Directives in the epilogues have the same names as in prologues,
e.g. .seh_savereg, even though the registers are restored, not
saved, at that point.

Differential Revision: https://reviews.llvm.org/D86529
2020-08-29 15:15:22 +03:00
Rainer Orth 672d7836bb [Target][AArch64] Allow for char as int8_t in AArch64AsmParser.cpp
A couple of AArch64 tests were failing on Solaris, both sparc and x86:

  LLVM :: MC/AArch64/SVE/add-diagnostics.s
  LLVM :: MC/AArch64/SVE/cpy-diagnostics.s
  LLVM :: MC/AArch64/SVE/cpy.s
  LLVM :: MC/AArch64/SVE/dup-diagnostics.s
  LLVM :: MC/AArch64/SVE/dup.s
  LLVM :: MC/AArch64/SVE/mov-diagnostics.s
  LLVM :: MC/AArch64/SVE/mov.s
  LLVM :: MC/AArch64/SVE/sqadd-diagnostics.s
  LLVM :: MC/AArch64/SVE/sqsub-diagnostics.s
  LLVM :: MC/AArch64/SVE/sub-diagnostics.s
  LLVM :: MC/AArch64/SVE/subr-diagnostics.s
  LLVM :: MC/AArch64/SVE/uqadd-diagnostics.s
  LLVM :: MC/AArch64/SVE/uqsub-diagnostics.s

For example, reduced from `MC/AArch64/SVE/add-diagnostics.s`:

  add     z0.b, z0.b, #0, lsl #8

missed the expected diagnostics

  $ ./bin/llvm-mc -triple=aarch64 -show-encoding -mattr=+sve add.s
  add.s:1:21: error: immediate must be an integer in range [0, 255] with a shift amount of 0
  add     z0.b, z0.b, #0, lsl #8
                      ^

The message is `Match_InvalidSVEAddSubImm8`, emitted in the generated
`lib/Target/AArch64/AArch64GenAsmMatcher.inc` for `MCK_SVEAddSubImm8`.
When comparing the call to `::AArch64Operand::isSVEAddSubImm<char>` on both
Linux/x86_64 and Solaris, I find

  875	    bool IsByte = std::is_same<int8_t, std::make_signed_t<T>>::value;

is `false` on Solaris, unlike Linux.

The problem boils down to the fact that `int8_t` is plain `char` on
Solaris: both the sparc and i386 psABIs have `char` as signed.  However,
with

  9887	    DiagnosticPredicate DP(Operand.isSVEAddSubImm<int8_t>());

in `lib/Target/AArch64/AArch64GenAsmMatcher.inc`, `std::make_signed_t<int8_t>`
above yieds `signed char`, so `std::is_same<int8_t, signed char>` is `false`.

This can easily be fixed by also allowing for `int8_t` here and in a few
similar places.

Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and
`x86_64-pc-linux-gnu`.

Differential Revision: https://reviews.llvm.org/D85225
2020-08-29 10:01:04 +02:00
Matt Arsenault 9145d75226 AMDGPU: Fix incorrectly deleting copies after spilling SGPR tuples
The implicit def of the super register would appear to kill any live
uses of components before the spill, and would be deleted by
MachineCopyPropagation. We need to add implicit uses of the super
register, similarly to what copyPhysReg does. VGPR tuples appear to be
correctly handled already. I need to double check the SGPR->memory
path.
2020-08-28 17:50:37 -04:00
Craig Topper aab90384a3 [Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None)
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.

So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.

This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.

Differential Revision: https://reviews.llvm.org/D86744
2020-08-28 13:23:45 -07:00
Sjoerd Meijer 5f1cad4d29 [ARM] Skip combining base updates for vld1x NEON intrinsics
Skip this for now, to avoid a backend crash in:

  UNREACHABLE executed at llvm/lib/Target/ARM/ARMISelLowering.cpp:13412

This should fix PR45824.

Differential Revision: https://reviews.llvm.org/D86784
2020-08-28 20:29:15 +01:00
Benjamin Kramer 8782c72765 Strength-reduce SmallVectors to arrays. NFCI. 2020-08-28 21:14:20 +02:00
Anna Welker 064981f0ce [ARM][MVE] Enable MVE gathers and scatters by default
Enable MVE gather/scatters by default, which requires some
minor adaptations in some tests.

Differential revision: https://reviews.llvm.org/D86776
2020-08-28 19:05:29 +01:00
David Green 4ca60915bc [ARM] Correct predicate operand for offset gather/scatter
These arm_mve_vldr_gather_offset_predicated and
arm_mve_vstr_scatter_offset_predicated have some extra parameters
meaning the predicate is at a later operand. If a loop contains _only_
those masked instructions, we would miss transforming the active lane
mask.

Differential Revision: https://reviews.llvm.org/D86791
2020-08-28 17:48:15 +01:00
Albion Fung 331dcc43ea [PowerPC] Implemented Vector Load with Zero and Signed Extend Builtins
This patch implements the builtins for Vector Load with Zero and Signed Extend Builtins (lxvr_x for b, h, w, d), and adds the appropriate test cases for these builtins. The builtins utilize the vector load instructions itnroduced with ISA 3.1.

Differential Revision: 	https://reviews.llvm.org/D82502#inline-797941
2020-08-28 11:28:58 -05:00
David Sherwood f4257c5832 [SVE] Make ElementCount members private
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().

Differential Revision: https://reviews.llvm.org/D86065
2020-08-28 14:43:53 +01:00
Sam Parker b30adfb529 [ARM][LowOverheadLoops] Liveouts and reductions
Remove the code that tried to look for reduction patterns, since the
vectorizer and isel can now produce predicated arithmetic instructios
within the loop body. This has required some reorganisation and fixes
around live-out and predication checks, as well as looking for cases
where an input/output is initialised to zero.

Differential Revision: https://reviews.llvm.org/D86613
2020-08-28 13:56:16 +01:00
Ties Stuij d678e14c55 [AArch64][CodeGen] Restrict bfloat vector operations to what's actually supported
Previously in addTypeForNeon, we would set the operations for bfloat vectors
like other generic types. But as bfloat is a storage-only type a number of
operations shouldn't be set. This patch fixes that.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D85101
2020-08-28 11:44:37 +01:00
Kai Luo cbea17568f [PowerPC] PPCBoolRetToInt: Don't translate Constant's operands
When collecting `i1` values via `findAllDefs`, ignore Constant's
operands, since Constant's operands might not be `i1`.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46923 which causes ICE
```
llvm-project/llvm/lib/IR/Constants.cpp:1924: static llvm::Constant *llvm::ConstantExpr::getZExt(llvm::Constant *, llvm::Type *, bool): Assertion `C->getType()->getScalarSizeInBits() < Ty->getScalarSizeInBits()&& "SrcTy must be smaller than DestTy for ZExt!"' failed.
```

Differential Revision: https://reviews.llvm.org/D85007
2020-08-28 01:56:12 +00:00
Matt Arsenault af1c1e20f4 AMDGPU/GlobalISel: Implement computeKnownBits for groupstaticsize 2020-08-27 19:39:44 -04:00
Matt Arsenault 9d3dc276a6 AMDGPU: Fix broken switch braces 2020-08-27 19:39:39 -04:00
Matt Arsenault a1bc37c9e5 AMDGPU: Use caller subtarget, not intrinsic declaration
Intrinsic declarations use the default subtarget, but this should be
using the subtarget for the calling function. I haven't been able to
come up with a case where it matters though.
2020-08-27 16:42:09 -04:00
Krzysztof Parzyszek 4ef9275b9b [Hexagon] Emit better 32-bit multiplication sequence for HVXv62+ 2020-08-27 15:24:32 -05:00
Mikhail Maltsev ae1396c7d4 [ARM][BFloat16] Change types of some Arm and AArch64 bf16 intrinsics
This patch adjusts the following ARM/AArch64 LLVM IR intrinsics:
- neon_bfmmla
- neon_bfmlalb
- neon_bfmlalt
so that they take and return bf16 and float types. Previously these
intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from
implementation lacking bf16 IR type).

The neon_vbfdot[q] intrinsics are adjusted similarly. This change
required some additional selection patterns for vbfdot itself and
also for vector shuffles (in a previous patch) because of SelectionDAG
transformations kicking in and mangling the original code.

This patch makes the generated IR cleaner (less useless bitcasts are
produced), but it does not affect the final assembly.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D86146
2020-08-27 18:43:16 +01:00
Craig Topper ba852e1e19 [X86] Don't call hasFnAttribute and getFnAttribute for 'prefer-vector-width' and 'min-legal-vector-width' in getSubtargetImpl
We only need to call getFnAttribute and then check if the Attribute
is None or not.
2020-08-27 10:40:20 -07:00
Owen Anderson e9d9a61208 Reapply D70800: Fix AArch64 AAPCS frame record chain
Original Commit Message:
After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register)
may be placed in the middle of a stack frame if a function has both callee-saved
general-purpose registers and floating point registers. This will break the stack unwinders
that simply walk through the frame records (based on the guarantee from AAPCS64
"The Frame Pointer" section). This commit fixes the problem by adding the frame record offset.

Patch By: logan
Differential Revision: D70800
2020-08-27 17:29:41 +00:00
Benjamin Kramer b5924a8e27 [Hexagon] Fold another layer of single-use variable into assert. NFCI. 2020-08-27 16:52:34 +02:00
Benjamin Kramer 2b7df2707f [Hexagon] Fold single-use variable into assert. NFCI. 2020-08-27 16:44:22 +02:00
Matt Arsenault 6c770a09be AMDGPU: Hoist subtarget lookup 2020-08-27 10:27:56 -04:00
Krzysztof Parzyszek 154daf1f94 [Hexagon] Widen short vector stores to HVX vectors using masked stores
Also invent a flag -hexagon-hvx-widen=N to set the minimum threshold
for widening short vectors to HVX vectors.
2020-08-27 09:25:08 -05:00
Jay Foad 45eeb8c2a9 [AMDGPU] Remove unused variable introduced in r251860 2020-08-27 13:28:32 +01:00
Mikhail Maltsev 23d5e93f34 [AArch64] Optimize instruction selection for certain vector shuffles
This patch adds code to recognize vector shuffles which can be
represented as VDUP (splat) of a vector lane with of a different
(wider) type than the original vector lane type.

For example:
    shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
is essentially:
    shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 0, i32 0>

Such patterns are generated by the SelectionDAG machinery in some cases
(see DAGCombiner::visitBITCAST in DAGCombiner.cpp, the "Remove double
bitcasts from shuffles" part).

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D86225
2020-08-27 11:06:49 +01:00
Paul Walker 81337c915f [SVE] Fallback to default expansion when lowering SIGN_EXTEN_INREG from non-byte based source.
Differential Revision: https://reviews.llvm.org/D86394
2020-08-27 10:57:37 +01:00
Alex Richardson 5ba4d0365b [RISC-V] fmv.s/fmv.d should be as cheap as a move
Since the canonical floatig-point move is fsgnj rd, rs, rs, we should
handle this case in RISCVInstrInfo::isAsCheapAsAMove().

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D86518
2020-08-27 10:32:23 +01:00
Alex Richardson a11eeb4d4a [RISC-V] Mark C_MV as a move instruction
Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D86517
2020-08-27 10:32:23 +01:00
Alex Richardson 2259ce8c91 [RISC-V] ADDI/ORI/XORI x, 0 should be as cheap as a move
The isTriviallyRematerializable hook is only called for instructions that are
tagged as isAsCheapAsAMove. Since ADDI 0 is used for "mv" it should definitely
be marked with "isAsCheapAsAMove". This change avoids one stack spill in most of
the atomic-rmw.ll tests functions. It also avoids stack spills in two of our
out-of-tree CHERI tests.
ORI/XORI with zero may or may not be the same as a move micro-architecturally,
but since we are already doing it for register == x0, we might as well
do the same if the immediate is zero.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D86480
2020-08-27 10:32:22 +01:00
Piotr Sobczak 4e9d207117 [AMDGPU] Preserve vcc_lo when shrinking V_CNDMASK
There is no justification for changing vcc_lo to vcc
when shrinking V_CNDMASK, and such a change could
later confuse live variable analysis.

Make sure the original register is preserved.

Differential Revision: https://reviews.llvm.org/D86541
2020-08-27 10:22:50 +02:00
Sam Parker 03141aa04a [ARM] Enable outliner at -Oz for M-class
Enable default outlining when the function has the minsize attribute
and we're targeting an m-class core.

Differential Revision: https://reviews.llvm.org/D82951
2020-08-27 08:02:56 +01:00
Martin Storsjö 04879086b4 Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain"
This reverts commit 9936455204.

That commit caused failed assertions e.g. like this:

$ cat alloca.c
a;
b() {
  float c;
  d();
  a = __builtin_alloca(d);
  c = e();
  f(a);
  return c;
}
$ clang -target aarch64-linux-gnu -c alloca.c -O2
clang: ../lib/Target/AArch64/AArch64InstrInfo.cpp:3446: void
llvm::emitFrameOffset(llvm::MachineBasicBlock&,
llvm::MachineBasicBlock::iterator, const llvm::DebugLoc&, unsigned int,
unsigned int, llvm::StackOffset, const llvm::TargetInstrInfo*,
llvm::MachineInstr::MIFlag, bool, bool, bool*):
Assertion `(DestReg != AArch64::SP || Bytes % 16 == 0) &&
"SP increment/decrement not 16-byte aligned"' failed.
2020-08-27 09:39:56 +03:00
luxufan 888c02deee [RISCV] add the MC layer support of riscv vector Zvamo extension
Implements the assemble and disassemble support of RISCV Vector
extension zvamo instructions, base on the 0.9 spec version.

Reviewed  by HsiangKai

Differential Revision: https://reviews.llvm.org/D85069
2020-08-27 14:11:38 +08:00
Sam Parker a3e41d4581 [ARM] Make MachineVerifier more strict about terminators
Fix the ARM backend's analyzeBranch so it doesn't ignore predicated
return instructions, and make the MachineVerifier rule more strict.

Differential Revision: https://reviews.llvm.org/D40061
2020-08-27 07:10:20 +01:00
Amy Kwan 76b0f99ea8 [PowerPC] Implement Vector Multiply High/Divide Extended Builtins in LLVM/Clang
This patch implements the function prototypes vec_mulh and vec_dive in order to
utilize the vector multiply high (vmulh[s|u][w|d]) and vector divide extended
(vdive[s|u][w|d]) instructions introduced in Power10.

Differential Revision: https://reviews.llvm.org/D82609
2020-08-26 23:14:34 -05:00
Matt Arsenault 0b7f6cc71a GlobalISel: Add generic instructions for memory intrinsics
AArch64, X86 and Mips currently directly consumes these and custom
lowering to produce a libcall, but really these should follow the
normal legalization process through the libcall/lower action.
2020-08-26 20:08:45 -04:00
Craig Topper 92d3e70df3 [X86] Change pentium4 tuning settings and scheduler model back to their values before D83913.
Clang now defaults to -march=pentium4 -mtune=generic so we don't
need modern tune settings on pentium4.
2020-08-26 15:38:12 -07:00
Ahmed Bougacha 383f7c8858 [AArch64] Use CCAssignFnForReturn helper in more spots. NFC.
It was added for GISel, but SDAG could use it too!
2020-08-26 14:39:11 -07:00
Muhammad Asif Manzoor fd536eeed9 [AArch64][SVE] Add lowering for llvm fceil
Add the functionality to lower fceil for passthru variant

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D84548
2020-08-26 15:59:44 -04:00
Owen Anderson 9936455204 Reapply D70800: Fix AArch64 AAPCS frame record chain
Original Commit Message:
After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register)
may be placed in the middle of a stack frame if a function has both callee-saved
general-purpose registers and floating point registers. This will break the stack unwinders
that simply walk through the frame records (based on the guarantee from AAPCS64
"The Frame Pointer" section). This commit fixes the problem by adding the frame record offset.

Patch By: logan
2020-08-26 19:38:38 +00:00
Francesco Petrogalli 61dfa00957 [MC][SVE] Fix data operand for instruction alias of `st1d`.
The version of `st1d` that operates with vector plus immediate
addressing mode uses the alias `st1d { <Zn>.d }, <Pg>, [<Za>.d]` for
rendering `st1d { <Zn>.d }, <Pg>, [<Za>.d, #0]`. The disassembler was
generating `<Zn>.s` instead of `<Zn>.d>`.

Differential Revision: https://reviews.llvm.org/D86633
2020-08-26 18:22:17 +00:00
Krzysztof Parzyszek e15143d31b [Hexagon] Implement llvm.masked.load and llvm.masked.store for HVX 2020-08-26 13:10:22 -05:00
Matt Arsenault f78687df9b AMDGPU: Don't assert on misaligned DS read2/write2 offsets
This would assert with unaligned DS access enabled. The offset may not
be aligned. Theoretically the pattern predicate should check the
memory alignment, although it is possible to have the memory be
aligned but not the immediate offset.

In this case I would expect it to use ds_{read|write}_b64 with
unaligned access, but am not clear if there's a reason it doesn't.
2020-08-26 14:08:05 -04:00
Craig Topper 09288bcbf5 [X86] Add assembler support for .d32 and .d8 mnemonic suffixes to control displacement size.
This is an older syntax than the {disp32} and {disp8} pseudo
prefixes that were added a few weeks ago. We can reuse most of
the support for that to support .d32 and .d8 as well.
2020-08-26 10:45:50 -07:00
Owen Anderson 9061eb8245 Revert "Fix frame pointer layout on AArch64 Linux."
This broke stage2 of clang-cmake-aarch64-full.

This reverts commit a0aed80b22.
2020-08-26 17:17:14 +00:00
jasonliu 413054400d [XCOFF][AIX] Support relocation generation for large code model
Summary:
Support TOCU and TOCL relocation type for object file generation.

Reviewed by: DiggerLin

Differential Revision: https://reviews.llvm.org/D84549
2020-08-26 17:12:28 +00:00
Owen Anderson a0aed80b22 Fix frame pointer layout on AArch64 Linux.
When floating point callee-saved registers were used, the frame pointer would
incorrectly point to the bottom of the CSR space (containing saved floating-point
registers), rather than to the frame record.

While all frame offsets were calculated consistently, resulting in working code,
this prevented stack walkers from being about to traverse the frame list.
2020-08-26 16:09:49 +00:00
Jay Foad a75e67b3b4 [AMDGPU] Make more use of Subtarget reference in SIInstrInfo 2020-08-26 15:04:00 +01:00
Matt Arsenault ff34116cf0 AMDGPU: Use Subtarget reference in SIInstrInfo 2020-08-26 09:18:41 -04:00
Matt Arsenault 21ccedc24f AMDGPU/GlobalISel: Tolerate negated control flow intrinsic outputs
If the condition output is negated, swap the branch targets. This is
similar to what SelectionDAG does for when SelectionDAGBuilder
decides to invert the condition and swap the branches.

This is leaving behind a dead constant def for some reason.
2020-08-26 08:58:54 -04:00
Jay Foad 831457c6d5 [AMDGPU][GlobalISel] Eliminate barrier if workgroup size is not greater than wavefront size
If a workgroup size is known to be not greater than wavefront size
the s_barrier instruction is not needed since all threads are guaranteed
to come to the same point at the same time.

This is the same optimization that was implemented for SelectionDAG in
D31731.

Differential Revision: https://reviews.llvm.org/D86609
2020-08-26 13:47:51 +01:00
David Green 677c1590c0 [ARM] Increase MVE gather/scatter cost by MVECostFactor.
MVE Gather scatter codegeneration is looking a lot better than it used
to, but still has some issues. The instructions we currently model as 1
cycle per element, which is a bit low for some cases. Increasing the
cost by the MVECostFactor brings them in-line with our other instruction
costs. This will have the effect of only generating then when the extra
benefit is more likely to overcome some of the issues. Notably in
running out of registers and vectorizing loops that could otherwise be
SLP vectorized.

In the short-term whilst we look at other ways of dealing with those
more directly, we can increase the costs of gathers to make them more
likely to be beneficial when created.

Differential Revision: https://reviews.llvm.org/D86444
2020-08-26 13:03:46 +01:00
Pierre Gousseau cda6b09242 [X86] Make sure we do not clobber RBX with mwaitx when used as a base
pointer.

mwaitx uses EBX as one of its argument.
Using this instruction clobbers RBX as it is defined to hold one of the
input. When the backend uses dynamically allocated stack, RBX is used as
a reserved register for the base pointer.

This patch is adapted from @qcolombet patch for cmpxchg at r263325.

This fixes PR43528.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D73475
2020-08-26 11:20:31 +01:00
Cullen Rhodes 1f44dfb640 [AArch64][AsmParser] Fix bug in operand printer
The switch in AArch64Operand::print was changed in D45688 so the shift
can be printed after printing the register. This is implemented with
LLVM_FALLTHROUGH and was broken in D52485 when BTIHint was put between
the register and shift operands.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D86535
2020-08-26 09:31:36 +00:00
Sander de Smalen 5f47d4456d [AArch64][SVE] Fix calculation restore point for SVE callee saves.
This fixes an issue where the restore point of callee-saves in the
function epilogues was incorrectly calculated when the basic block
consisted of only a RET instruction. This caused dealloc instructions
to be inserted in between the block of callee-save restore instructions,
rather than before it.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D86099
2020-08-26 10:02:31 +01:00
Craig Topper 1d1515a9e2 [X86] Add an isel pattern for (i8 (trunc (i16 (bitconvert (v16i1 X))))) to avoid an extra EXTRACT_SUBREG
Since we can only copy to GR32 we had to EXTRACT from GR32, but
we would first go to GR16 and then the truncate would extra again
to GR8. This adds a special case to go directly from GR32 to GR8.
This would eventually get cleaned up, but though maybe we should
avoid doing it in the first place. Our k-register handling is weird
and we could probably stand to have some more special ISD nodes
for the conversions so the i32 type would be explicit.
2020-08-25 18:20:43 -07:00
Craig Topper b8ec8f5776 [X86] Remove extra getOperand(0) call from recently introduced store(extract_element(vtrunc)) to truncated store combine.
The IsExtractedElement already called getOperand(0) so Extract
here is the source vector. We shouldn't call getOperand(0). This
worked for the original test cases because the result was a
bitcast so the getOperand(0) accidently peeked through the bitcast
which is what we wanted.

In the failing case here, the operand turns out to be undef so
the getOperand(0) asserts because undef has no operands.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=25184

Differential Revision: https://reviews.llvm.org/D86428
2020-08-25 16:16:54 -07:00
Craig Topper ba319ac47e [X86] Remove a redundant COPY_TO_REGCLASS for VK16 after a KMOVWkr in an isel output pattern.
KMOVWkr produces VK16, there's no reason to copy it to VK16 again.

Test changes are presumably because we were scheduling based on
the COPY that is no longer there.
2020-08-25 15:19:27 -07:00
Stanislav Mekhanoshin b7760c3e5d [AMDGPU] Remove unsound dependency on ISA version in waitcnt
Differential Revision: https://reviews.llvm.org/D86566
2020-08-25 14:01:42 -07:00
Stanislav Mekhanoshin 817c831f02 [AMDGPU] Switch to named simm16 in vscnt insertion
Differential Revision: https://reviews.llvm.org/D86568
2020-08-25 13:05:27 -07:00
Ankit Aggarwal 2da1eefb58 [Hexagon] Check if EVT is simple type in HVX lowering 2020-08-25 15:02:44 -05:00
Krzysztof Parzyszek dcef5e0c37 [Hexagon] Remove (redundant) HexagonISelLowering::isHvxOperation(SDValue)
Use isHvxOperation(SDNode*) instead.
2020-08-25 11:45:08 -05:00
Matt Arsenault 0d2fe90063 AMDGPU/GlobalISel: Use more accurate legality rules for merge/unmerge
Most notably, we were incorrectly reporting <3 x s16> as a legal type
for these. Make sure these aren't legal to help make progress on
fixing the artifact combiner and vector legalizer
rules. Unfortunately, this means spreading the -global-isel-abort=0
hack, although this doesn't change the legalizer result in any
situation.
2020-08-25 09:40:20 -04:00