Commit Graph

57083 Commits

Author SHA1 Message Date
Matt Arsenault 0ba40d4ccf AMDGPU/GlobalISel: Combines for V_CVT_F32_UBYTE[0-3]
Ports the existing DAG combines, minus the simplify demanded bits
which seems to have no equivalent now. Without these, this isn't
particularly helpful in most of the IR sample cases.
2020-04-13 19:18:19 -04:00
Austin Kerbow a69b3e010c [AMDGPU][GlobalISel] Fix div_scale in FDIV lowering
Differential Revision: https://reviews.llvm.org/D78004
2020-04-13 15:54:49 -07:00
Heejin Ahn ba40896f99 [WebAssembly] Fix try placement in fixing unwind mismatches
Summary:
In CFGStackify, `fixUnwindMismatches` function fixes unwind destination
mismatches created by `try` marker placement. For example,
```
try
  ...
  call @qux  ;; This should throw to the caller!
catch
  ...
end
```
When `call @qux` is supposed to throw to the caller, it is possible that
it is wrapped inside a `catch` so in case it throws it ends up unwinding
there incorrectly. (Also it is possible `call @qux` is supposed to
unwind to another `catch` within the same function.)

To fix this, we wrap this inner `call @qux` with a nested
`try`-`catch`-`end` sequence, and within the nested `catch` body, branch
to the right destination:
```
block $l0
  try
    ...
    try                 ;; new nested try
      call @qux
    catch               ;; new nested catch
      local.set n       ;; store exnref to a local
      br $l0
    end
  catch
    ...
  end
end
local.get n             ;; retrieve exnref back
rethrow                 ;; rethrow to the caller
```

The previous algorithm placed the nested `try` right before the `call`.
But it is possible that there are stackified instructions before the
call from which the call takes arguments.
```
try
  ...
  i32.const 5
  call @qux  ;; This should throw to the caller!
catch
  ...
end
```

In this case we have to place `try` before those stackified
instructions.
```
block $l0
  try
    ...
    try                 ;; this should go *before* 'i32.const 5'
      i32.const 5
      call @qux
    catch
      local.set n
      br $l0
    end
  catch
    ...
  end
end
local.get n
rethrow
```

We correctly handle this in the first normal `try` placement phase
(`placeTryMarker` function), but failed to handle this in this
`fixUnwindMismatches`.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77950
2020-04-13 15:50:01 -07:00
Craig Topper 113f37a1f9 [CallSite removal][TargetLowering] Replace ImmutableCallSite with CallBase
Differential Revision: https://reviews.llvm.org/D77995
2020-04-13 13:50:15 -07:00
Rahman Lavaee 05192e585c Extend BasicBlock sections to allow specifying clusters of basic blocks in the same section.
Differential Revision: https://reviews.llvm.org/D76954
2020-04-13 12:19:59 -07:00
Rahman Lavaee 4ddf7ab454 Revert "Extend BasicBlock sections to allow specifying clusters of basic blocks"
This reverts commit 0d4ec16d3d Because
tests were not added to the commit.
2020-04-13 12:19:59 -07:00
Rahman Lavaee 0d4ec16d3d Extend BasicBlock sections to allow specifying clusters of basic blocks
in the same section.

This allows specifying BasicBlock clusters like the following example:
!foo
!!0 1 2
!!4
This places basic blocks 0, 1, and 2 in one section in this order, and
places basic block #4 in a single section of its own.
2020-04-13 11:46:11 -07:00
Matt Arsenault e6605a209c DAG: Fix wrong legality check for ISD::FMAD
Since 1725f28841, this should check
isFMADLegalForFAddFSub rather than the the plain isOperationLegal.

This would assert in a subset of cases due to an oddity in how FMAD is
selected. We will allow FMA formation pre-legalize, but not FMAD even
in cases where it would be valid.

The current hook requires passing in the root fadd/fsub. However, in
this distributed case, this would be far more complicated to pass in
the relevant operand. AMDGPU doesn't get any value from the node, and
only needs the type and is the only implementor, so I'm not sure why
we have this complexity. Just rename and expand the assert to avoid
the more complicated checks spread through the distribution logic.
2020-04-13 10:25:39 -07:00
Craig Topper 6dbf1a1229 [X86] Move X86ShuffleDecode.cpp/h into MCTargetDesc and remove X86Utils library. NFC
The shuffle decoding is used by X86ISelLowering and
MCTargetDesc/X86InstComments. The latter used to be in a
separate InstPrinter library. The Utils library existed to allow
InstPrinter and CodeGen to share the shuffle decoding. Since
X86InstComments now lives in the MCTargetDesc, which CodeGen
already depends on, we can sink the shuffle decoding there as well.

Differential Revision: https://reviews.llvm.org/D77980
2020-04-13 10:14:08 -07:00
Jay Foad bc78baec4c [X86] Improve combineVectorShiftImm
Summary:
Fold (shift (shift X, C2), C1) -> (shift X, (C1 + C2)) for logical as
well as arithmetic shifts. This is needed to prevent regressions from
an upcoming funnel shift expansion change.

While we're here, fold (VSRAI -1, C) -> -1 too.

Reviewers: RKSimon, craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77300
2020-04-13 15:54:55 +01:00
Simon Pilgrim 401cbe373b [X86][AVX] Attempt to scale masked shuffles to match the root type
Improve the chances of folding the writemask into the combined shuffle by scaling a wider shuffle mask to match the root's original type.

This creates a few minor issues with variable shuffles, preventing combines of shuffles because of the more limited support binary shuffle types. In most cases we're probably better off combining the shuffles and losing the writemask fold, but this isn't always going to be true.
2020-04-13 14:57:25 +01:00
Simon Pilgrim fdd9ff9700 [X86][AVX] Create splitVectorIntBinary helper.
Removes duplicate code from split256IntArith/split512IntArith.
2020-04-13 13:09:38 +01:00
Austin Kerbow eab9a4f119 [AMDGPU] Don't assert on partial exec copy
After Machine CSE and coalescing we can end up with copies of exec to
subregister SGPRs.

Differential Revision: https://reviews.llvm.org/D77992
2020-04-12 21:14:36 -07:00
Craig Topper dbb272b0a3 [CallSite removal][FastISel] Use CallBase instead of CallSite in fastLowerCall. 2020-04-12 18:02:24 -07:00
Craig Topper 42fc7852f5 [X86] Print k-mask in FMA3 comments. 2020-04-12 13:16:53 -07:00
Craig Topper 95192f548d [CallSite removal][TargetLowering] Use CallBase instead of CallSite in TargetLowering::ParseConstraints interface.
Differential Revision: https://reviews.llvm.org/D77929
2020-04-12 11:26:25 -07:00
Mircea Trofin d2f1cd5d97 [llvm][NFC] Refactor uses of CallSite to CallBase - call promotion
Summary:
Updated CallPromotionUtils and impacted sites. Parameters that are
expected to be non-null, and return values that are guranteed non-null,
were replaced with CallBase references rather than pointers.

Left FIXME in places where more changes are facilitated by CallBase, but
aren't CallSites: Instruction* parameters or return values, for example,
where the contract that they are actually CallBase values.

Reviewers: davidxl, dblaikie, wmi

Reviewed By: dblaikie

Subscribers: arsenm, jvesely, nhaehnle, eraman, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77930
2020-04-12 08:27:29 -07:00
Sanjay Patel d04db4825a [x86] use vector instructions to lower FP->int->FP casts
As discussed in PR36617:
https://bugs.llvm.org/show_bug.cgi?id=36617#c13
...we can avoid the likely slow round-trip from XMM to GPR to XMM
by using the vector versions of the convert instructions.

Based on experimental results from recent Intel/AMD chips, we don't
need to worry about triggering denorm stalls while operating on
garbage data in the high lanes with convert instructions, so this is
expected to always be as good or better perf than the scalar
instruction equivalent. FP exceptions are also not a concern because
strict code should not be using the regular SDAG opcodes.

Differential Revision: https://reviews.llvm.org/D77895
2020-04-12 10:26:43 -04:00
Simon Pilgrim 40581a0a2b [X86] Use isAnyZero shuffle mask helper where possible. NFC. 2020-04-12 10:57:29 +01:00
Craig Topper d3465e0691 [X86] Enable shuffle combining for AVX512 unless the root is used by a vselect
A lot of vectorized code doesn't use masks so we shouldn't penalize them by not doing shuffle combining on avx512 targets.

I've added support for VALIGNQ/VALIGND and 512-bit SHUF128 to prevent some regressions. I also prevented recombining 256-bit SHUF128 to PERM2X128. We may not need to add 256-bit SHUF128 support, but I don't think I found any cases requiring that in my testing.

Differential Revision: https://reviews.llvm.org/D77928
2020-04-11 20:05:10 -07:00
Matt Arsenault 96819011ca AMDGPU/GlobalISel: Fix RegBankSelect for v2s16 shifts
These need to be promoted and scalarized for the SALU.
2020-04-11 20:55:33 -04:00
Matt Arsenault ac8d51a3c6 AMDGPU/GlobalISel: Legalize 16-bit shift amounts to s16
The current selector depends on 16-bit shifts using 16-bit shift
amount types, but really it should accept either for all types.
2020-04-11 18:12:26 -04:00
Craig Topper d1da1b53ff [X86] Cleanup ISD::BRIND handling code in X86DAGToDAGISel::Select. NFC
-Drop llvm:: on MVT::i32
-Use getValueType instead of getSimpleValueType for an equality
check just cause its shorter and doesn't matter.
-Don't create a const SDValue & since its cheap to copy.
-Remove explicit case from MVT enum to EVT.
-Add message to assert.
2020-04-11 15:01:05 -07:00
Craig Topper 21a7d08e72 [X86] Move code that replaces ISD::VSELECT with X86ISD::BLENDV from X86DAGToDAGISel::Select to PreprocessISelDAG 2020-04-11 15:01:05 -07:00
Matt Arsenault c5497e5399 AMDGPU/GlobalISel: Fix legalizing <3 x s16> vselects 2020-04-11 15:59:51 -04:00
Fangrui Song 0a55d3f557 [MC] Default MCAsmInfo::UseIntegratedAssembler to true 2020-04-11 10:13:52 -07:00
Fangrui Song d2e5157c1f [MC] Add UseIntegratedAssembler = false. NFC 2020-04-11 10:13:49 -07:00
Matt Arsenault cf29333f40 AMDGPU/GlobalISel: Work around forming illegal zextload after legalize
Selection would fail after the post legalize combiner put an illegal
zextload back together.

The base combiner has parameter to only allow legal operations, but
they appear to not be used. I also don't see a nice way to remove a
single entry from all_combines, so just hack around this.
2020-04-11 10:52:58 -04:00
Sanjay Patel 1318ddbc14 [VectorUtils] rename scaleShuffleMask to narrowShuffleMaskElts; NFC
As proposed in D77881, we'll have the related widening operation,
so this name becomes too vague.

While here, change the function signature to take an 'int' rather
than 'size_t' for the scaling factor, add an assert for overflow of
32-bits, and improve the documentation comments.
2020-04-11 10:05:49 -04:00
Nemanja Ivanovic 512600e3c0 [PowerPC] Handle f16 as a storage type only
The PPC back end currently crashes (fails to select) with f16 input. This patch
expands it on subtargets prior to ISA 3.0 (Power9) and uses the HW conversions
on Power9.

Fixes https://bugs.llvm.org/show_bug.cgi?id=39865

Differential revision: https://reviews.llvm.org/D68237
2020-04-11 07:34:47 -05:00
Craig Topper 9c1842d8af Change FastISel::CallLoweringInfo::CS to be an ImmutableCallSite instead of a pointer. NFCI.
This is the same as what was done to the CallLoweringInfo in
TargetLowering.h in r309159.

This is just a step on the way to replacing this with CallBase.
2020-04-10 23:45:36 -07:00
Nemanja Ivanovic 04eae39617 [PowerPC] Another folow-up fix for 6c4b40def7
There was another issue introduced by this commit that the OP
initially missed. Namely, for functions that are free to use
R2 as a callee-saved register, we emit a TOC expression based
on the address of the GEP label without emitting the GEP label.
Since we only emit such expressions for the large code model, this
issue only surfaced there.

I have confirmed that with this fix, the kernel build is successful
with target "all".
2020-04-10 21:09:59 -05:00
Scott Constable 0505181006 [X86] Fix to X86LoadValueInjectionRetHardeningPass for possible segfault
`MBB.back()` could segfault if `MBB.empty()`. Fixed by checking for `MBB.empty()` in the loop.

Differential Revision: https://reviews.llvm.org/D77584
2020-04-10 18:28:08 -07:00
Sam Clegg 16206ee07d [WebAssembly] Minor cleanup to WebAssemblySubtarget. NFC.
Pretty much all other platforms pass CPU string as arg0 of
initializeSubtargetDependencies.

Differential Revision: https://reviews.llvm.org/D77894
2020-04-10 16:47:39 -07:00
Craig Topper b8a108140d [CallSite removal][X86] Remove uses of CallSite from X86WinEHState.cpp
Differential Revision: https://reviews.llvm.org/D77862
2020-04-10 11:34:06 -07:00
Fangrui Song a7aaaf7016 [MC][RISCV] Make .reloc support arbitrary relocation types
Similar to D76746 (ARM), D76754 (AArch64) and llvmorg-11-init-6967-g152d14da64c (x86)

Differential Revision: https://reviews.llvm.org/D77018
2020-04-10 10:43:53 -07:00
Craig Topper a6732069ee [CallSite removal][X86] Remove unneeded use of CallSite. NFC
We already have a CallInst, we can just get the calling convention from it.
2020-04-10 10:27:21 -07:00
Fangrui Song 7f36cb1f1a [AArch64InstPrinter] Change printAlignedLabel to print the target address in hexadecimal form
Similar to D76580 (x86) and D76591 (PPC).

```
// llvm-objdump -d output (before)
10000: 08 00 00 94                   bl      #32
10004: 08 00 00 94                   bl      #32

// llvm-objdump -d output (after)
10000: 08 00 00 94                   bl      0x10020
10004: 08 00 00 94                   bl      0x10024

// GNU objdump -d. The lack of 0x is not ideal due to ambiguity.
10000:       94000008        bl      10020 <bar+0x18>
10004:       94000008        bl      10024 <bar+0x1c>
```

The new output makes it easier to find the jump target.

Differential Revision: https://reviews.llvm.org/D77853
2020-04-10 09:21:09 -07:00
Simon Pilgrim 1824ae0f42 [X86] Remove defunct EmitLoweredAtomicFP declaration. NFC. 2020-04-10 17:05:07 +01:00
Simon Pilgrim dd84a2f77a [X86] Remove defunct emitFMA3Instr declaration. NFC. 2020-04-10 17:05:06 +01:00
Christopher Tetreault 65b8b643b4 Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: sdesmalen, efriedma, jonpa

Reviewed By: sdesmalen

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77265
2020-04-10 08:43:32 -07:00
Simon Pilgrim a88cc20456 ProfileSummaryInfo.h - remove unnecessary includes. NFC
Remove a number of includes that aren't necessary (nor are we relying on the remaining includes to provide the declarations), we just needed a llvm::Instruction forward declaration.

This exposed a couple of source files that were implicitly replying on the includes for their use of llvm::SmallSet or std::set, requiring local includes to be added there instead.
2020-04-10 16:25:48 +01:00
Stanislav Mekhanoshin 44920e8566 [AMDGPU] Disable sub-dword scralar loads IR widening
These will be widened in the DAG. In the meanwhile early
widening prevents otherwise possible vectorization of
such loads.

Differential Revision: https://reviews.llvm.org/D77835
2020-04-10 08:20:49 -07:00
Simon Pilgrim 91bc50c0d7 [CostModel][X86] Improve InsertElement costs for sub-128bit vectors
If we're inserting into v2i8/v4i8/v8i8/v2i16/v4i16 style sub-128bit vectors ensure we don't use the SK_PermuteTwoSrc cost of the legalized value type - this is a followup to rG12c629ec6c59 which added equivalent sub-128bit shuffle costs
2020-04-10 14:55:46 +01:00
Michael Liao b54b4ecac3 Fix `-Wextra` warning. NFC. 2020-04-10 03:22:02 -04:00
Kai Luo b7d5229d78 [PowerPC] Update alignment for ReuseLoadInfo in LowerFP_TO_INTForReuse
In LowerFP_TO_INTForReuse, when emitting `stfiwx`, alignment of 4 is
set for the `MachineMemOperand`, but RLI(ReuseLoadInfo)'s alignment is
not updated for following loads.

It's related to failed alignment check reported in
https://bugs.llvm.org/show_bug.cgi?id=45297

Differential Revision: https://reviews.llvm.org/D77624
2020-04-10 05:49:19 +00:00
Nemanja Ivanovic 7f3787c0f2 [PowerPC] Bail out of redundant LI elimination on an implicit kill
The transformation currently does not differentiate between explicit
and implicit kills. However, it is not valid to later simply clear
an implicit kill flag since the kill could be due to a call or return.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=45374
2020-04-09 22:17:29 -05:00
Heejin Ahn b647de9925 [WebAssembly] Use dummy debug info in Emscripten SjLj
Summary:
D74269 added debug info to newly created instructions, including calls
to `malloc` and `free`, by taking debug info from existing instructions
around, whose debug info may or may not be empty.

But there are cases debug info is required by the IR verifier: when both
the caller and the callee functions have DISubprograms, meaning we
already have declarations to `malloc` or `free` with a DISubprogram
attached, newly created calls to `malloc` and `free` should have
non-empty debug info. This patch creates a non-empty dummy debug info in
this case to those calls to make the IR verifier pass.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45461.

Reviewers: dschuff

Subscribers: aprantl, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77784
2020-04-09 18:44:50 -07:00
Stefan Pintilie 5b18b6e9a8 [PowerPC][Future] Fix for 6c4b40def7
This is a fix for the previous patch 6c4b40def7.
In some cases it may be possible to have the compiler produce st_other=1 without
the compiler using mcpu=future which should not be the case. This patch adds a
guard to make sure that if we are using st_other=1 then we are also compiling
for future CPU.
2020-04-10 01:12:11 +00:00
Craig Topper 5625e6ab37 [X86] Improve min/max reduction costs.
This is similar to what I recently did for getArithmeticReductionCost.

I'm trying to account for the narrowing from 512->256->128 as we go.

I've also added a new helper method getMinMaxCost that tries to
handle the cases where we have native min/max instructions and
fall back to cmp+select when we don't.

Differential Revision: https://reviews.llvm.org/D76634
2020-04-09 17:28:50 -07:00
Nemanja Ivanovic 5fe2809447 [PowerPC] Don't assert on SELECT_CC with i1 type
When we try to select a SELECT_CC on Power9, we check if it can be matched to a
SETB instruction. In that function, we assert that the output type is i32/i64.
This is unnecessary as it is perfectly reasonable to have an i1 SELECT_CC.
Change that from an assert to an early exit condition.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45448
2020-04-09 19:27:32 -05:00
Amara Emerson e99169f1c2 [AArch64][GlobalISel] CallLowering: Don't generate new copies each time we need
to store to a stack location for outgoing args.

During call arg lowering we shouldn't be modifying SP so cache the SP copy
vreg for subsequent uses.

Gives a 0.2% geomean code size improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D77838
2020-04-09 17:08:56 -07:00
Christopher Tetreault 9f87d951fc Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: mcrosier, efriedma, sdesmalen

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77269
2020-04-09 16:43:29 -07:00
James Y Knight 5e7b98fe75 Fix an unused-variable warning in Release mode. 2020-04-09 16:34:55 -04:00
Christopher Tetreault e634f482ea Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: arsenm, efriedma, sdesmalen

Reviewed By: arsenm

Subscribers: wdng, arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77268
2020-04-09 13:11:37 -07:00
Christopher Tetreault e1e131ea5e Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: grosbach, efriedma, sdesmalen

Reviewed By: efriedma

Subscribers: hiraditya, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77271
2020-04-09 12:52:44 -07:00
Stefan Pintilie 64868cbfcf [PowerPC][Future] Fix for 75828ef615
Used unsigned long where uint64_t should have been used by mistake.
Fixed in this patch.
2020-04-09 19:33:12 +00:00
Simon Pilgrim 12c629ec6c [CostModel][X86] Add shuffle costs for some common sub-128bit vectors
v2i8/v4i8/v8i8 + v2i16/v4i16 all show up in vectorizer code and by just using the legalized types (v16i8/v8i16) we're highly exaggerating the actual cost of the shuffle.
2020-04-09 19:57:06 +01:00
Paolo Savini fae40bd5a1 [RISCV] Add MC layer support for proposed Bit Manipulation extension (version 0.92)
This adds the instruction encoding and mnenomics for the proposed
RISC-V Bit Manipulation extension (version 0.92). It is implemented with
each category of instruction as its own target feature, with the 'b'
extension feature enabling all options. Since this extension is not yet
ratified, all target features are prefixed with 'experimental-' to note
their status.

Differential Revision: https://reviews.llvm.org/D65649
2020-04-09 18:04:22 +01:00
jasonliu 085689d44c [PPC][AIX] Implement variadic function handling in LowerFormalArguments_AIX
Summary:
This patch adds support for handling of variadic functions for AIX.
This includes ensuring that use and consume correct type of
va_list (char *va_list) for AIX.

Authored by: ZarkoCA

Reviewers: cebowleratibm, sfertile, jasonliu

Reviewed by: jasonliu

Differential Revision: https://reviews.llvm.org/D76130
2020-04-09 16:49:44 +00:00
Stefan Pintilie 75828ef615 [PowerPC][Future] Initial support for PCRel addressing for constant pool loads
Add initial support for PC Relative addressing for constant pool loads.
This includes adding a new relocation for @pcrel and adding a new PowerPC flag
to identify PC relative addressing.

Differential Revision: https://reviews.llvm.org/D74486
2020-04-09 11:17:23 -05:00
Kazushi (Jam) Marukawa 015dee1ac8 [VE] Support (m)0 and (m)1 operands
Summary:
VE has special operands to represent 0b000...000111...111 (`(m)0`) and
0b111...111000...000 (`(m)1`) bit sequences.  This patch supports those
operands not only in machine instructions but also in DAG lowering.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D77769
2020-04-09 18:09:00 +02:00
Craig Topper 5a55363dc4 [X86] Remove redundant VMOVDDUPZ128rmk/VMOVDDUPZ128rmkz isel patterns.
These patterns are identical to the pattern for the instruction.
2020-04-09 09:06:58 -07:00
Simon Cook 2df6a02fd7 [RISCV] Implement evaluateBranch
This implements the instruction analysis required to print branch
targets as part of llvm-objdump's disassembly.

Note, this only handles those branches which can be analyzed in a single
instruction, a future patch will handle multiple-instruction patterns,
such as AUIPC/LUI+JALR instruction pairs.

Differential Revision: https://reviews.llvm.org/D77567
2020-04-09 15:11:55 +01:00
Shengchen Kan 2477cec2ac [NFC][X86] Refine code in X86AsmBackend
Summary: Move code to a better place, rename function, etc

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77778
2020-04-09 21:31:52 +08:00
Jay Foad 9c7bd94ce8 Fix typo in comment 2020-04-09 10:36:00 +01:00
Jay Foad 4970a1deca [AMDGPU] Remove outdated comment 2020-04-09 10:36:00 +01:00
Jay Foad c63aed890e [KnownBits] Move AND, OR and XOR logic into KnownBits
Summary:
There are at least three clients for KnownBits calculations:
ValueTracking, SelectionDAG and GlobalISel. To reduce duplication the
common logic should be moved out of these clients and into KnownBits
itself.

This patch does this for AND, OR and XOR calculations by implementing
and using appropriate operator overloads KnownBits::operator& etc.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74060
2020-04-09 10:10:37 +01:00
WangTianQing a3dc949000 [X86] Add TSXLDTRK instructions.
Summary: For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference

Reviewers: craig.topper, RKSimon, LuoYuanke

Reviewed By: craig.topper

Subscribers: mgorny, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77205
2020-04-09 13:17:29 +08:00
Matt Arsenault 0aa0d70067 MIR: Use Register 2020-04-08 22:07:26 -04:00
Sam Clegg 7baad0c53c [WebAssembly][MC] Use StringRef over std::string pointer
This is followup based on feedback on 5be42f36f5.
See: https://reviews.llvm.org/D77627.

Differential Revision: https://reviews.llvm.org/D77674
2020-04-08 18:28:08 -07:00
Christopher Tetreault 49fd24fe9e Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: hfinkel, efriedma, sdesmalen

Reviewed By: efriedma

Subscribers: wuzish, nemanjai, hiraditya, kbarton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77266
2020-04-08 16:10:55 -07:00
Artem Belevich a9627b7ea7 [CUDA] Add partial support for recent CUDA versions.
Generate PTX using newer versions of PTX and allow using sm_80 with CUDA-11.
None of the new features of CUDA-10.2+ have been implemented yet, so using these
versions will still produce a warning.

Differential Revision: https://reviews.llvm.org/D77670
2020-04-08 11:19:44 -07:00
Sean Fertile d0b57b41f4 [PowerPC][AIX][NFC] Replace deprecated getByValAlign call.
Replace call to deprecated 'getByValAlign()' with
'getNonZeroByValAlign()'.
2020-04-08 13:27:39 -04:00
Matt Arsenault dcce3ef1d2 FastISel: Partially use Register
Doesn't try to convert the cases that depend on generated code.
2020-04-08 12:10:58 -04:00
Matt Arsenault ca0ace7298 CodeGen: Use Register in MachineBasicBlock 2020-04-08 12:10:58 -04:00
Matt Arsenault 84aa58cbe2 CodeGen: Use Register in TargetLowering 2020-04-08 12:10:58 -04:00
Sean Fertile 8abfd2c3bb [PowerPC][AIX] Enable passing byval formal arguments in multiple registers.
Any or all the argument registers can be used to pass a byval formal
argument, with the limitation that the argument must fit in the
available registers (ie: is not split between registers and stack).

Differential Revision: https://reviews.llvm.org/D76902
2020-04-08 11:16:33 -04:00
Stefan Pintilie 6c4b40def7 [PowerPC][Future] Add Support For Functions That Do Not Use A TOC.
On PowerPC most functions require a valid TOC pointer.

This is the case because either the function itself needs to use this
pointer to access the TOC or because other functions that are called
from that function expect a valid TOC pointer in the register R2.
The main exception to this is leaf functions that do not access the TOC
since they are guaranteed not to need a valid TOC pointer.

This patch introduces a feature that will allow more functions to not
require a valid TOC pointer in R2.

Differential Revision: https://reviews.llvm.org/D73664
2020-04-08 08:07:35 -05:00
Simon Pilgrim 66c18c729d [X86][SSE] Combine PTEST(AND(X,Y),AND(X,Y)) -> PTEST(X,Y) and ANDN equivalents
Tests derived from PR42035 examples
2020-04-08 12:42:22 +01:00
Shengchen Kan 916044d819 [X86][MC] Support enhanced relaxation for branch align
Summary:
Since D75300 has been landed, I want to support enhanced relaxation when we need to align branches and allow prefix padding. "Enhanced Relaxtion" means we allow an instruction that could not be traditionally relaxed to be emitted into RelaxableFragment so that we increase its length by adding prefixes for optimization.

The motivation is straightforward, RelaxFragment is mostly for relative jumps and we can not increase the length of jumps when we need to align them, so if we need to achieve D75300's purpose (reducing the bytes of nops) when need to align jumps, we have to make more instructions "relaxable".

Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: reames

Subscribers: hiraditya, llvm-commits, annita.zhang

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76286
2020-04-08 19:08:19 +08:00
Anna Welker 89e1248d7b [ARM][MVE] Optimise offset addresses of gathers/scatters
This patch adds an analysis of the offset addresses used by gathers
and scatters to the MVEGatherScatterLowering pass to find
multiplications and additions that are loop invariant and thus can
be moved into the loop preheader, avoiding to execute them each time.

Differential Revision: https://reviews.llvm.org/D76681
2020-04-08 11:46:57 +01:00
Kazushi (Jam) Marukawa aa034867f1 [VE] Simplify definitions of uimm6 and simm7
Summary: To prepare continuous changes, simplify uimm6 and simm7 operands.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D77700
2020-04-08 09:53:42 +02:00
Stanislav Mekhanoshin f96810ff34 [AMDGPU] Expand vector trunc stores from i16 to i8
Differential Revision: https://reviews.llvm.org/D77693
2020-04-07 21:47:45 -07:00
Eli Friedman 565b56a72c [NFC] Clean up uses of LoadInst constructor. 2020-04-07 16:28:53 -07:00
Fangrui Song 624654fd64 [VE] Migrate to the getMachineMemOperand overload using llvm::Align
Just delete the deprecated overload because nothing uses it.
2020-04-07 16:04:54 -07:00
Matt Arsenault 6011627f51 CodeGen: More conversions to use Register 2020-04-07 18:54:36 -04:00
Fangrui Song 2f8fb4d1cd [VE] Adapt aa26dd9858 and 2481f26ac3 2020-04-07 15:45:19 -07:00
Stanislav Mekhanoshin 96e51ed005 [AMDGPU] Implement copyPhysReg for 16 bit subregs
Differential Revision: https://reviews.llvm.org/D74937
2020-04-07 14:22:46 -07:00
Matt Arsenault 2481f26ac3 CodeGen: Use Register in TargetFrameLowering 2020-04-07 17:07:44 -04:00
Matt Arsenault aa26dd9858 CodeGen: Use Register in more places 2020-04-07 15:59:40 -04:00
Nemanja Ivanovic ecd8435483 [NFC][PowerPC] Fix register class for patterns using XXPERMDIs
There are a few patterns where we use a superclass for inputs to this
instruction rather than the correct class. This can sometimes lead to
unncessary copies.
2020-04-07 14:06:08 -05:00
Graham Sellers a19a56f6a1 [AMDGPU] Extend constant folding for logical operations
This patch extends existing constant folding in logical operations to
handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and
V_AND_OR_B32. Also added a couple of tests for existing folds.
2020-04-07 14:37:16 -04:00
Craig Topper c41685b16f [SelectionDAG] Make getZeroExtendInReg take a vector VT if the operand VT is a vector.
This removes a call to getScalarType from a bunch of call sites.
It also makes the behavior consistent with SIGN_EXTEND_INREG.

Differential Revision: https://reviews.llvm.org/D77631
2020-04-07 11:34:08 -07:00
Eli Friedman e9ac757f79 [AArch64] Don't expand memcmp in strict align mode.
7aecf232 fixed the bug where we would miscompile, but we still generate
a crazy amount of code. Turn off the expansion until someone implements
an appropriate heuristic.

Differential Revision: https://reviews.llvm.org/D77599
2020-04-07 10:53:36 -07:00
Matt Arsenault f596ab4066 AMDGPU: Use early return 2020-04-07 13:48:00 -04:00
Sam Clegg 5be42f36f5 [WebAssembly][MC] Fix leak of std::string members in MCSymbolWasm
Summary: Fixes: https://bugs.llvm.org/show_bug.cgi?id=45452

Subscribers: dschuff, jgravelle-google, hiraditya, aheejin, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77627
2020-04-07 10:38:43 -07:00
Stanislav Mekhanoshin 12a324393d [AMDGPU] Limit endcf-collapase to simple if
We can only collapse adjacent SI_END_CF if outer statement
belongs to a simple SI_IF, otherwise correct mask is not in the
register we expect, but is an argument of an S_XOR instruction.

Even if SI_IF is simple it might be lowered using S_XOR because
lowering is dependent on a basic block layout. It is not
considered simple if instruction consuming its output is
not an SI_END_CF. Since that SI_END_CF might have already been
lowered to an S_OR isSimpleIf() check may return false.

This situation is an opportunity for a further optimization
of SI_IF lowering, but that is a separate optimization. In the
meanwhile move SI_END_CF post the lowering when we already know
how the rest of the CFG was lowered since a non-simple SI_IF
case still needs to be handled.

Differential Revision: https://reviews.llvm.org/D77610
2020-04-07 10:27:23 -07:00
David Tenty b9245f14b7 [NFC][PowerPC] Cleanup 64-bit and Darwin CalleeSavedRegs
Summary:
- Remove the no longer used Darwin CalleeSavedRegs
- Combine the SVR464 callee saved regs and AIX64 since the two are (and should be) identical into PPC64
- Update tests for 64-bit CSR change

Reviewers: sfertile, ZarkoCA, cebowleratibm, jasonliu, #powerpc

Reviewed By: sfertile

Subscribers: wuzish, nemanjai, hiraditya, kbarton, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77235
2020-04-07 11:49:10 -04:00
Simon Pilgrim e3b6059776 [X86][SSE] combineX86ShufflesConstants - early out for zeroable vectors (PR45443)
Shuffle combining can insert zero byte sized elements into the shuffle mask, which combineX86ShufflesConstants will attempt to fold without taking into account whether the byte-sized type is legal (e.g. AVX512F only targets).

If we have a full-zeroable vector then we should just return a zero version of the root type, otherwise if the type isn't valid we should bail.

Fixes PR45443
2020-04-07 14:45:29 +01:00
Keith Walker 01dc10774e [ARM] unwinding .pad instructions missing in execute-only prologue
If the stack pointer is altered for local variables and we are generating
Thumb2 execute-only code the .pad directive is missing.

Usually the size of the adjustment is stored in a PC-relative location
and loaded into a register which is then added to the stack pointer.
However when we are generating execute-only code code the size of the
adjustment is instead generated using the MOVW/MOVT instruction pair.

As a by product of handling the execute-only case this also fixes an
existing issue that in the none execute-only case the .pad directive was
generated against the load of the constant to a register instruction,
instead of the instruction which adds the register to the stack pointer.

Differential Revision: https://reviews.llvm.org/D76849
2020-04-07 11:51:59 +01:00
Peter Smith 14c1e98754 [ARM] Remove condition that could never be true
From Arm v8 Architecture Reference Manual F5.1.84 LDREXD
The ldrexd instruction in Arm state has the following conditions:

t = UInt(Rt); t2 = t + 1; n = UInt(Rn);
if Rt<0> == '1' || t2 == 15 || n == 15 then UNPREDICTABLE;

In when Rt is odd or if Rt is 14 (making t2 15).

In the implementation when the pair is the UNPREDICTABLE R14_R15 we
would ideally return SOFT_FAIL. We can't because there is no R14_R15
value for us to return so we fail early returning FAIL.

The early return for registers outside the bounds of the table means
the check for Rt == 14 (0xE) redundant which causes a static analyzer
to flag the condition as never being true.

To fix the warning I've removed the check and replaced with a comment
explaining the difference with the specification.

Fixes pr41660

Differential Revision: https://reviews.llvm.org/D77463
2020-04-07 09:50:56 +01:00
Sam Clegg f0bbf3d086 [WebAssembly] EmscriptenEHSjLj: Mark more functions as imported
These should have been part of https://reviews.llvm.org/D77192

Differential Revision: https://reviews.llvm.org/D77358
2020-04-06 21:27:31 -07:00
Xiang1 Zhang 01a32f2bd3 Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow Enforcement Technology)
Do not commit the llvm/test/ExecutionEngine/MCJIT/cet-code-model-lager.ll because it will
cause build bot fail(not suitable for window 32 target).

Summary:
This patch comes from H.J.'s 2bd54ce7fa

**This patch fix the failed llvm unit tests which running on CET machine. **(e.g. ExecutionEngine/MCJIT/MCJITTests)

The reason we enable IBT at "JIT compiled with CET" is mainly that:  the JIT don't know the its caller program is CET enable or not.
If JIT's caller program is non-CET, it is no problem JIT generate CET code or not.
But if JIT's caller program is CET enabled,  JIT must generate CET code or it will cause Control protection exceptions.

I have test the patch at llvm-unit-test and llvm-test-suite at CET machine. It passed.
and H.J. also test it at building and running VNCserver(Virtual Network Console), it works too.
(if not apply this patch, VNCserver will crash at CET machine.)

Reviewers: hjl.tools, craig.topper, LuoYuanke, annita.zhang, pengfei

Reviewed By: LuoYuanke

Subscribers: tstellar, efriedma, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76900
2020-04-07 09:48:47 +08:00
Eli Friedman 68b03aee1a Remove SequentialType from the type heirarchy.
Now that we have scalable vectors, there's a distinction that isn't
getting captured in the original SequentialType: some vectors don't have
a known element count, so counting the number of elements doesn't make
sense.

In some cases, there's a better way to express the commonality using
other methods. If we're dealing with GEPs, there's GEP methods; if we're
dealing with a ConstantDataSequential, we can query its element type
directly.

In the relatively few remaining cases, I just decided to write out
the type checks. We're talking about relatively few places, and I think
the abstraction doesn't really carry its weight. (See thread "[RFC]
Refactor class hierarchy of VectorType in the IR" on llvmdev.)

Differential Revision: https://reviews.llvm.org/D75661
2020-04-06 17:03:49 -07:00
David Blaikie 5aead592f0 X86ISelLowering: Minor refactor to avoid redundant initialization while ensuring compiler warnings can hopefully still prove initialization
Based on post-commit review/discussion in fabe52a7412b
2020-04-06 14:25:52 -07:00
Konstantin Pyzhov 72e8754916 [AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard

Differential Revision: https://reviews.llvm.org/D77228
2020-04-06 09:05:58 -04:00
Matt Arsenault 869f05c834 AMDGPU: Remove dead paths for requiresUniformRegister
The extracts from control flow intrinsics are already properly handled
by divergence analysis. The inline asm case isn't dead, but has also
never really worked correctly so leave it as-is for now.
2020-04-06 16:15:10 -04:00
Konstantin Pyzhov 51dc028314 Revert e1730cfeb3 2020-04-06 05:56:11 -04:00
Konstantin Pyzhov e1730cfeb3 [AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard

Differential Revision: https://reviews.llvm.org/D77228
2020-04-06 05:10:37 -04:00
Fangrui Song a5d375e0cb [AArch64] Allow logical immediates to have all-1 in top bits
So that constant expressions like the following are permitted:

and w0, w0, #~(0xfe<<24)
and w1, w1, #~(0xff<<24)

The behavior matches GNU as (opcodes/aarch64-opc.c:aarch64_logical_immediate_p).

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D75885
2020-04-06 09:56:04 -07:00
Matt Arsenault 8a5f0dafd4 AMDGPU/GlobalISel: Select llvm.amdgcn.div.scale 2020-04-06 11:50:19 -04:00
Matt Arsenault e87ec66762 AMDGPU/GlobalISel: Fix llvm.amdgcn.div.fmas.ll 2020-04-06 11:50:16 -04:00
Jay Foad ddd2f4b96f [AMDGPU] Fix inaccurate comments 2020-04-06 16:44:08 +01:00
Chris Bowler d6ea82d11c [AIX][PPC] Implement by-val caller arguments in multiple registers
Differential Revision: https://reviews.llvm.org/D76380
2020-04-06 11:06:51 -04:00
Matt Arsenault cbf719b568 AMDGPU: Use DAG patterns for div_fmas 2020-04-06 09:28:30 -04:00
Matt Arsenault 79b29d6df7 AMDGPU: Remove DisableInst feature
I'm not sure why these were bothering to check the instruction
profile, since those profiles should only be used with these
instruction classes.
2020-04-06 09:27:44 -04:00
Hans Wennborg 64c2312750 Revert 43f031d312 "Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow Enforcement Technology)"
ExecutionEngine/MCJIT/cet-code-model-lager.ll is failing on 32-bit
windows, see llvm-commits thread for fef2dab.

This reverts commit 43f031d312
and the follow-ups fef2dab100 and
6a800f6f62.
2020-04-06 15:05:25 +02:00
Guillaume Chatelet ff858d7781 [Alignment][NFC] Add DebugStr and operator*
Summary:
This is a roll forward of D77394 minus AlignmentFromAssumptions (which needs to be addressed separately)
Differences from D77394:
 - DebugStr() now prints the alignment value or `None` and no more `Align(x)` or `MaybeAlign(x)`
   - This is to keep Warning message consistent (CodeGen/SystemZ/alloca-04.ll)
 - Removed a few unneeded headers from Alignment (since it's included everywhere it's better to keep the dependencies to a minimum)

Reviewers: courbet

Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77537
2020-04-06 12:09:45 +00:00
Simon Pilgrim 9bc5b1a489 [X86][SSE] combineVectorSignBitsTruncation - remove minimum vector length limitations
truncateVectorWithPACK has its own vector length controls, so we can rely on those directly. This helps some existing truncation to subvector tests, which were being combined later during shuffle lowering at which point the sign/zero bit detection had become obscured preventing lowerShuffleWithPACK working as well as it could.
2020-04-06 12:45:23 +01:00
Kazushi (Jam) Marukawa e981a46a77 [VE] Update lea/load/store instructions
Summary:
Modify lea/load/store instructions to accept `disp(index, base)`
style addressing mode (called ASX format).  Also, uniform the
number of DAG nodes to have 3 operands for this ASX format
instructions, and update selectADDR functions to lower
appropriate MI.

Reviewers: arsenm, simoll, k-ishizaka

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D76822
2020-04-06 11:49:46 +02:00
Oliver Stannard a294d9eb21 Revert "[IPRA][ARM] Spill extra registers at -Oz"
Reverting because this is causing failures on bots with expensive checks
enabled.

This reverts commit 73cea83a6f.
2020-04-06 10:34:59 +01:00
Kerry McLaughlin 944e322f88 [AArch64][SVE] Add SVE intrinsics for saturating add & subtract
Summary:
Adds the following intrinsics:
  - @llvm.aarch64.sve.[s|u]qadd.x
  - @llvm.aarch64.sve.[s|u]qsub.x

Reviewers: sdesmalen, c-rhodes, dancgr, efriedma, cameron.mcinally, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77054
2020-04-06 10:07:08 +01:00
Guillaume Chatelet 6000478f39 Revert "[Alignment][NFC] Add DebugStr and operator*"
This reverts commit 1e34ab98fc.
2020-04-06 07:55:25 +00:00
Guillaume Chatelet 1e34ab98fc [Alignment][NFC] Add DebugStr and operator*
Summary:
Also updates files to use them.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77394
2020-04-06 07:12:46 +00:00
Simon Pilgrim a43e233606 Remove unused function 'isInRange'. NFCI. 2020-04-05 23:11:24 +01:00
Simon Pilgrim 4431a29c60 [X86][SSE] Combine unary shuffle(HORIZOP,HORIZOP) -> HORIZOP
We had previously limited the shuffle(HORIZOP,HORIZOP) combine to binary shuffles, but we can often merge unary shuffles just as well, folding in UNDEF/ZERO values into the 64-bit half lanes.

For the (P)HADD/HSUB cases this is limited to fast-horizontal cases but PACKSS/PACKUS combines under all cases.
2020-04-05 22:49:46 +01:00
Apelete Seketeli 8aadb442d1 [scan-build] fix dead store warnings emitted on LLVM AMDGPU code base
This fixes dead store warnings of the type "dead assignment" reported
by Clang Static Analyzer.
2020-04-05 11:19:03 -04:00
Oliver Stannard cb6aeb2239 [ARM] Add data gathering hint instruction
Summary:
This patch upstreams support the optional ARMv8.0 Data Gathering Hint (DGH)
extension, which adds the Data Gathering Hint instruction to the hint
space.

See ARMv8.0-DGH in the Arm Architecture Reference Manual Armv8 for more
information.

Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, danielkiss, samparker

Reviewed By: SjoerdMeijer

Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77097
2020-04-05 15:21:00 +01:00
Oliver Stannard 6f60eb4a3c [ARM] Add enhanced counter virtualization system registers
Summary:
This patch upstreams support for the ARMv8.6A Enhanced Counter Virtualization
(ECV) extension, which adds 6 new system registers.

See ARMv8.6-ECV in the Arm Architecture Reference Manual Armv8 for more
information.

Reviewers: t.p.northover, rengolin, SjoerdMeijer, pcc, ab, chill

Reviewed By: SjoerdMeijer

Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77094
2020-04-05 15:18:35 +01:00
Oliver Stannard 9e1455dc23 [ARM] Add ARMv8.6 Fine Grain Traps system registers
Summary:
This patch upstreams support for the ARMv8.6A Fine Grain Traps (FGT)
extension, which adds 5 new system registers.

See ARMv8.6-FGT in the Arm Architecture Reference Manual Armv8 for more
information.

Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, momchil.velikov

Reviewed By: SjoerdMeijer

Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76991
2020-04-05 14:28:18 +01:00
Diogo Sampaio 59d10dc703 [ARM] add ARMv8.6-A Activity monitors virtualization extension
Summary:
This patch upstreams v8.6A activity monitors virtualization
assembler support, which consists of 32 new system
registers (two groups, each with 16 numbered registers).

See ARMv8.6-AMU in the Arm Architecture Reference Manual Armv8 for more
information.

Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, john.brawn, ostannard

Reviewed By: ostannard

Subscribers: LukeGeeson, dnsampaio, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76998
2020-04-05 13:31:06 +01:00
Benjamin Kramer ff889df356 [X86] Roll some loops. NFCI. 2020-04-05 13:59:50 +02:00
Simon Pilgrim 3079e51858 [X86][SSE] Generalize shuffle(HORIZOP,HORIZOP) -> HORIZOP combine
Our existing combine allows to merge the shuffle of 2 similar 64-bit wide 'horizontal ops' (HADD/PACK/etc.) if the shuffle was a UNPCK/MOVSD.

This patch generalizes this to decode any target shuffle mask that can be widened to a 128-bit repeating v2*64 mask, which helps us catch PBLENDW/PBLENDD cases.
2020-04-05 12:09:19 +01:00
Simon Pilgrim a17de6b91c [X86][SSE] truncateVectorWithPACK - upper undef for 128->64 packing
If we're packing from 128-bits to 64-bits then we don't need the RHS argument. This helps with register allocation, especially as we avoid repeating a use of the input value.
2020-04-05 11:47:36 +01:00
Matt Arsenault 6bfe28e92f AMDGPU: Fix annotate kernel features through casted calls
I thought I was testing this before, but the workitem id x
case isn't great since it's mandatory in the parent kernel.
2020-04-04 20:44:44 -04:00
Matt Arsenault 221890d709 AMDGPU: Add feature for fast f32 denormals 2020-04-04 20:01:24 -04:00
Simon Pilgrim e5e719d885 [X86][SSE] lowerV8I16Shuffle - lower compaction shuffles using PACKUSDW(PBLENDW,PBLENDW) on SSE41+
Similar to the lowerV16I8Shuffle implementation, for binary compaction v8i16 shuffles we can avoid the PUNPCKLDQ(PSHUFB,PSHUFB) pattern on SSE41+ targets by using PACKUSDW and PBLENDW. Before SSE41 we would need to use PACKSSDW but that requires sign extension that seems to destroy any gains, even on targets without PSHUFB.

This is a bigger gain on AMD than Intel targets but should never be a regression, and avoiding the shuffle mask load(s) is always useful.

Noticed in codegen while dealing with PR31443.
2020-04-04 13:08:25 +01:00
Craig Topper 1d42c0db9a Revert "[X86] Add a Pass that builds a Condensed CFG for Load Value Injection (LVI) Gadgets"
This reverts commit c74dd640fd.

Reverting to address coding standard issues raised in post-commit
review.
2020-04-03 16:56:08 -07:00
Craig Topper a505ad58cf Revert "[X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)"
This reverts commit 62c42e29ba

Reverting to address coding standard issues raised in post-commit
review.
2020-04-03 16:55:53 -07:00
Scott Constable 62c42e29ba [X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)
After finding all such gadgets in a given function, the pass minimally inserts
LFENCE instructions in such a manner that the following property is satisfied:
for all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at
least one LFENCE instruction. The algorithm that implements this minimal
insertion is influenced by an academic paper that minimally inserts memory
fences for high-performance concurrent programs:

http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf

The algorithm implemented in this pass is as follows:

1. Build a condensed CFG (i.e., a GadgetGraph) consisting only of the following components:
  -SOURCE instructions (also includes function arguments)
  -SINK instructions
  -Basic block entry points
  -Basic block terminators
  -LFENCE instructions
2. Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e., gadgets) are already mitigated by existing LFENCEs. If all gadgets have been mitigated, go to step 6.
3. Use a heuristic or plugin to approximate minimal LFENCE insertion.
4. Insert one LFENCE along each CFG edge that was cut in step 3.
5. Go to step 2.
6. If any LFENCEs were inserted, return true from runOnFunction() to tell LLVM that the function was modified.

By default, the heuristic used in Step 3 is a greedy heuristic that avoids
inserting LFENCEs into loops unless absolutely necessary. There is also a
CLI option to load a plugin that can provide even better optimization,
inserting fewer fences, while still mitigating all of the LVI gadgets.
The plugin can be found here: https://github.com/intel/lvi-llvm-optimization-plugin,
and a description of the pass's behavior with the plugin can be found here:
https://software.intel.com/security-software-guidance/insights/optimized-mitigation-approach-load-value-injection.

Differential Revision: https://reviews.llvm.org/D75937
2020-04-03 13:45:50 -07:00
Scott Constable c74dd640fd [X86] Add a Pass that builds a Condensed CFG for Load Value Injection (LVI) Gadgets
Adds a new data structure, ImmutableGraph, and uses RDF to find LVI gadgets and add them to a MachineGadgetGraph.

More specifically, a new X86 machine pass finds Load Value Injection (LVI) gadgets consisting of a load from memory (i.e., SOURCE), and any operation that may transmit the value loaded from memory over a covert channel, or use the value loaded from memory to determine a branch/call target (i.e., SINK).

Also adds a new target feature to X86: +lvi-load-hardening

The feature can be added via the clang CLI using -mlvi-hardening.

Differential Revision: https://reviews.llvm.org/D75936
2020-04-03 13:02:04 -07:00
Scott Constable f95a67d8b8 [X86] Add RET-hardening Support to mitigate Load Value Injection (LVI)
Adding a pass that replaces every ret instruction with the sequence:

pop <scratch-reg>
lfence
jmp *<scratch-reg>

where <scratch-reg> is some available scratch register, according to the
calling convention of the function being mitigated.

Differential Revision: https://reviews.llvm.org/D75935
2020-04-03 12:08:34 -07:00
Matt Arsenault 30ebafaa56 CodeGen: Convert some TII hooks to use Register 2020-04-03 14:52:54 -04:00
Matt Arsenault 178050c3ba AMDGPU: Use Register in more places 2020-04-03 14:52:54 -04:00
Matt Arsenault e8dcb6d05e AMDGPU: Remove redundant virtual 2020-04-03 14:52:53 -04:00
Christopher Tetreault b600809688 Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: kparzysz, sdesmalen, efriedma

Reviewed By: kparzysz

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77267
2020-04-03 11:26:51 -07:00
Stanislav Mekhanoshin 0462795095 [AMDGPU] Propagate AGPR RC from PHI to its PHI operands
We can fix register class of PHI based on its all AGPR uses.
That leaves behind all PHIs which were already processed
earlier. Propagate RC back to PHI operands of a PHI.

Differential Revision: https://reviews.llvm.org/D77344
2020-04-03 11:23:02 -07:00
Simon Pilgrim 34a497b765 [X86][SSE] lowerShuffleWithPACK - extend to use chained PACKs for larger truncations
Extend lowerShuffleWithPACK/matchShuffleWithPACK/createPackShuffleMask to handle compaction style shuffle masks that can be lowered to chains of PACKSS/PACKUS if their inputs are suitably sign/zero extended.

This helps avoid PSHUFB (and its mask load) for short shuffle chains, shuffle combining will still replace with a PSHUFB if we have enough shuffles as getFauxShuffleMask should recognise the PACKSS/PACKUS chains.
2020-04-03 18:26:10 +01:00
John Brawn 4ad9ca0f9e [ARM] Fix incorrect handling of big-endian vmov.i64
Currently when the target is big-endian vmov.i64 reverses the order of the two
words of the vector. This is correct only when the underlying element type is
32-bit, as actually what it should be doing is considering it a vector of the
underlying type and reversing the elements of that.

Differential Revision: https://reviews.llvm.org/D76515
2020-04-03 17:36:50 +01:00