Commit Graph

75411 Commits

Author SHA1 Message Date
Jay Foad 892ef2e3c0 [AMDGPU] More codegen patterns for v2i16/v2f16 build_vector
It's simpler to do this at codegen time than to do ad-hoc constant
folding of machine instructions in SIFoldOperands.

Differential Revision: https://reviews.llvm.org/D88028
2020-09-22 10:41:38 +01:00
Sam Parker b4fa884a73 [ARM] Improve VPT predicate tracking
The VPTBlock has been modified to track the 'global' state of the
VPR, as well as the state for each block. Each object now just holds
a list of instructions that makeup the block, while static structures
hold the predicate information. This enables global access for
querying how both a VPT block and individual instructions are
predicated. These changes now allow us, again, to handle more
complicated cases where multiple instructions build a predicate
and/or where the same predicate in used in multiple blocks.

It doesn't, however, get us back to before the tracking was 'fixed'
as some extra logic will be required to properly handle VPT
instructions. Currently a VPT could be effectively predicated because
of it's inputs, but the existing logic will not detect that and so
will refuse to perform the transformation. This can be seen in
remat-vctp.ll test where we still don't perform the transform.

Differential Revision: https://reviews.llvm.org/D87681
2020-09-22 10:40:27 +01:00
Muhammad Omair Javaid 73a6a164b8 Revert "Reapply Revert "RegAllocFast: Rewrite and improve""
This reverts commit 55f9f87da2.

Breaks following buildbots:
http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4306
http://lab.llvm.org:8011/builders/lldb-aarch64-ubuntu/builds/9154
2020-09-22 14:40:06 +05:00
Georgii Rymar 28b84dd138 [llvm-readobj/elf] - Stop reporting invalid extended indexes in warnings for unnamed section symbols.
We have an issue with `getFullSymbolName`: it assumes that the symbol passed is
always in the `.symtab`, what is wrong. We might calculate and report a wrong index currently.
I've added a test case revealing that.

This patch adds the "symbol index" argument to `getFullSymbolName` signature,
what fixes the issue.

Differential revision: https://reviews.llvm.org/D87899
2020-09-22 11:55:15 +03:00
Sam Parker a0c1dcc318 [ARM] Remove MVEDomain from VLDR/STR of P0
Remove the domain from the instructions and create a shouldInspect
helper for LowOverheadLoops which queries it or a vpr operand.

Differential Revision: https://reviews.llvm.org/D87900
2020-09-22 09:05:50 +01:00
Arthur Eubanks 3bf703fb6d [AlwaysInliner] Emit optimization remarks
To match the normal inliner in preparation for https://reviews.llvm.org/D86988.

Also change a FIXME to an assert.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88067
2020-09-21 22:09:28 -07:00
Dominic Chen 9c7b58080e
[WebAssembly][MC] Fix computation of relative symbol offset
For relative symbols, add its offset when computing relocation value.
Also, warn on unsupported absolute symbols.

Differential Revision: https://reviews.llvm.org/D87407
2020-09-22 00:53:23 -04:00
Serguei Katkov 5502cfa091 [LoopUnswitch] Trivial simplification: remove trivial dead condition after unswitch
Non trivial loop unswitch can keep the dead condition instruction.
CL adds trivial dead code elimination for unused condition.

Reviewers: asbirlea, aqjune, fhahn, DaniilSuchkov, reames
Reviewed By: asbirlea
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D88014
2020-09-22 09:04:59 +07:00
Arthur Eubanks 89df0fda17 [UnifyLoopExits] Pin tests with -unify-loop-exits to legacy PM
The pass is not used in tree, so no reason to port it.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D88058
2020-09-21 18:08:58 -07:00
Arthur Eubanks 9db0c572c1 [Delinearization][NewPM] Port delinearization to NPM
Also make tests in Analysis/Delinearization work under NPM.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87741
2020-09-21 17:59:08 -07:00
Arthur Eubanks 84a8ca1e6c [NewPM] Pin -lazy-branch-prob and -lazy-block-freq tests to legacy PM
NPM passes just use the normal versions of these analyses instead.
Also pin any tests with -analyze to legacy PM.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87857
2020-09-21 17:51:46 -07:00
Fangrui Song 8fdac7cb7a Revert D71539 "Recommit "[SCEV] Look through single value PHIs.""
This reverts commit 11dccf8d3a.

A bootstrapped clang crashes (due to ArrayRef::front called on an empty
ArrayRef) when compiling some files.  Very strangely, this only reproduces with
modules.

```
13 0x0000564d3349e968 llvm::ArrayRef<llvm::BasicBlock*>::front() const /proc/self/cwd/llvm/include/llvm/ADT/ArrayRef.h:160:7
14 0x0000564d3349e896 llvm::LoopBase<llvm::BasicBlock, llvm::Loop>::getHeader() const /proc/self/cwd/llvm/include/llvm/Analysis/LoopInfo.h:104:50
15 0x0000564d3349fd9d llvm::LoopBase<llvm::BasicBlock, llvm::Loop>::getLoopLatch() const /proc/self/cwd/llvm/include/llvm/Analysis/LoopInfoImpl.h:210:11
16 0x0000564d33593c8a llvm::ScalarEvolution::computeBackedgeTakenCount(llvm::Loop const*, bool) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:6933:15
17 0x0000564d33592ebc llvm::ScalarEvolution::getBackedgeTakenInfo(llvm::Loop const*) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:0:30
18 0x0000564d33593a54 llvm::ScalarEvolution::getBackedgeTakenCount(llvm::Loop const*, llvm::ScalarEvolution::ExitCountKind) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:6487:36
19 0x0000564d32be2402 llvm::ScalarEvolution::getConstantMaxBackedgeTakenCount(llvm::Loop const*) /proc/self/cwd/llvm/include/llvm/Analysis/ScalarEvolution.h:768:5
20 0x0000564d33590807 llvm::ScalarEvolution::getRangeRef(llvm::SCEV const*, llvm::ScalarEvolution::RangeSignHint) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:5495:19
21 0x0000564d320abab7 llvm::ScalarEvolution::getSignedRange(llvm::SCEV const*) /proc/self/cwd/llvm/include/llvm/Analysis/ScalarEvolution.h:840:12
22 0x0000564d335a03aa llvm::ScalarEvolution::isKnownPredicateViaConstantRanges(llvm::CmpInst::Predicate, llvm::SCEV const*, llvm::SCEV const*) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:9239:60
23 0x0000564d33586a80 llvm::ScalarEvolution::isKnownViaNonRecursiveReasoning(llvm::CmpInst::Predicate, llvm::SCEV const*, llvm::SCEV const*) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:10284:60
```
2020-09-21 17:21:43 -07:00
Krzysztof Parzyszek ae3f54c1e9 [EarlyCSE] Handle masked loads and stores
Extend the handling of memory intrinsics to also include non-
target-specific intrinsics, in particular masked loads and stores.

Invent "isHandledNonTargetIntrinsic" to distinguish between intrin-
sics that should be handled natively from intrinsics that can be
passed to TTI.

Add code that handles masked loads and stores and update the
testcase to reflect the results.

Differential Revision: https://reviews.llvm.org/D87340
2020-09-21 18:47:10 -05:00
Arthur Eubanks 44b1643d17 [NewPM] Support -disable-simplify-libcall/-disable-builtin in NPM opt
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87932
2020-09-21 16:38:37 -07:00
Arthur Eubanks 1747f77764 [SimplifyCFG] Override options in default constructor
SimplifyCFG's options should always be overridden by command line flags,
but they mistakenly weren't in the default constructor.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87718
2020-09-21 16:33:01 -07:00
Kazu Hirata ca8321574d Fix comment typos. NFC. 2020-09-21 16:12:56 -07:00
Amara Emerson e3f5046e44 [AArch64][GlobalISel] Merge selection of vector-vector G_ASHR/G_LSHR and support more cases.
The vector-immediate cases are handled elsewhere in an earlier commit.
2020-09-21 16:04:52 -07:00
Amara Emerson a513fdec90 [AArch64][GlobalISel] Add a post-legalize combine for lowering vector-immediate G_ASHR/G_LSHR.
In order to select the immediate forms using the imported patterns, we need to
lower them into new G_VASHR/G_VLSHR target generic ops. Add a combine to do this
matching build_vector of constant operands.

With this, we get selection for free.
2020-09-21 16:04:52 -07:00
Amara Emerson 825203daae [AArch64][GlobalISel] Make <4 x s16> G_ASHR and G_LSHR legal.
Selection support for these is coming up.
2020-09-21 15:32:48 -07:00
Martin Storsjö 36c64af9d7 [CodeGen] [WinException] Only produce handler data at the end of the function if needed
If we are going to write handler data (that is written as variable
length data following after the unwind info in .xdata), we need to
emit the handler data immediately, but for cases where no such
info is going to be written, skip emitting it right away. (Unwind
info for all remaining functions that hasn't gotten it emitted
directly is emitted at the end.)

This does slightly change the ordering of sections (triggering a
bunch of updates to DebugInfo/COFF tests), but the change should be
benign.

This also matches GCC's assembly output, which doesn't output
.seh_handlerdata unless it actually is needed.

For ARM64, the unwind info can be packed into the runtime function
entry itself (leaving no data in the .xdata section at all), but
that can only be done if there's no follow-on data in the .xdata
section. If emission of the unwind info is triggered via
EmitWinEHHandlerData (or the .seh_handlerdata directive), which
implicitly switches to the .xdata section, there's a chance of the
caller wanting to pass further data there, so the packed format
can't be used in that case.

Differential Revision: https://reviews.llvm.org/D87448
2020-09-21 23:42:59 +03:00
Matt Arsenault 55f9f87da2 Reapply Revert "RegAllocFast: Rewrite and improve"
This reverts commit dbd53a1f0c.

Needed lldb test updates
2020-09-21 15:45:27 -04:00
Arthur Eubanks f4f7df037e [DIE] Remove DeadInstEliminationPass
This pass is like DeadCodeEliminationPass, but only does one pass
through a function instead of iterating on users of eliminated
instructions.

DeadCodeEliminationPass should be used in all cases.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87933
2020-09-21 12:12:25 -07:00
Roman Lebedev 64e2cb7e96
[SCEV] Recognize @llvm.uadd.sat as `%y + umin(%x, (-1 - %y))`
----------------------------------------
define i32 @src(i32 %x, i32 %y) {
%0:
  %r = uadd_sat i32 %x, %y
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %t0 = sub nsw nuw i32 4294967295, %y
  %t1 = umin i32 %x, %t0
  %r = add nuw i32 %t1, %y
  ret i32 %r
}
Transformation seems to be correct!

The alternative, naive, lowering could be the following,
although i don't think it's better,
thought it will likely be needed for sadd/ssub/*shl:

----------------------------------------
define i32 @src(i32 %x, i32 %y) {
%0:
  %r = uadd_sat i32 %x, %y
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %t0 = zext i32 %x to i33
  %t1 = zext i32 %y to i33
  %t2 = add nuw i33 %t0, %t1
  %t3 = zext i32 4294967295 to i33
  %t4 = umin i33 %t2, %t3
  %r = trunc i33 %t4 to i32
  ret i32 %r
}
Transformation seems to be correct!
2020-09-21 20:25:54 +03:00
Roman Lebedev fedc9549d5
[SCEV] Recognize @llvm.usub.sat as `%x - (umin %x, %y)`
----------------------------------------
define i32 @src(i32 %x, i32 %y) {
%0:
  %r = usub_sat i32 %x, %y
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %t0 = umin i32 %x, %y
  %r = sub nuw i32 %x, %t0
  ret i32 %r
}
Transformation seems to be correct!
2020-09-21 20:25:54 +03:00
Roman Lebedev 0592de550f
[NFC][SCEV] Add tests for @llvm.*.sat intrinsics 2020-09-21 20:25:53 +03:00
Roman Lebedev 1bb7ab8c4a
[SCEV] Recognize @llvm.abs as smax(x, -x)
As per alive2 (ignoring undef):

----------------------------------------
define i32 @src(i32 %x, i1 %y) {
%0:
  %r = abs i32 %x, 0
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i1 %y) {
%0:
  %neg_x = mul i32 %x, 4294967295
  %r = smax i32 %x, %neg_x
  ret i32 %r
}
Transformation seems to be correct!

----------------------------------------
define i32 @src(i32 %x, i1 %y) {
%0:
  %r = abs i32 %x, 1
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i1 %y) {
%0:
  %neg_x = mul nsw i32 %x, 4294967295
  %r = smax i32 %x, %neg_x
  ret i32 %r
}
Transformation seems to be correct!
2020-09-21 20:25:53 +03:00
Roman Lebedev 83c2d10d3c
[NFC][SCEV] Add tests for @llvm.abs intrinsic 2020-09-21 20:25:53 +03:00
Florian Hahn 3cbdfe424f [SCEV] Add additional max BTC tests with loop guards. 2020-09-21 17:41:24 +01:00
Arthur Eubanks 024979b7b6 [ObjCARC][NewPM] Port objc-arc-contract to NPM
Similar to https://reviews.llvm.org/D86178.

This is a module pass instead of a function pass since
ARCRuntimeEntryPoints can lazily add function declarations.

Reviewed By: ahatanak

Differential Revision: https://reviews.llvm.org/D87806
2020-09-21 09:40:14 -07:00
Momchil Velikov 742250bf62 [ARM][CMSE] Issue an error if passing arguments through memory across
security boundary

It was never supported and that part was accidentally omitted when
upstreaming D76518.

Differential Revision: https://reviews.llvm.org/D86478

Change-Id: If6ba9506eb0431c87a1d42a38aa60e47ce263039
2020-09-21 17:26:10 +01:00
Baptiste Saleil 1372e23c7d [PowerPC] Add vector pair load/store instructions and vector pair register class
This patch adds support for the lxvp, lxvpx, plxvp, stxvp, stxvpx and pstxvp
instructions in the PowerPC backend. These instructions allow loading and
storing VSX register pairs. This patch also adds the VSRp register class
definition needed for these instructions.

Differential Revision: https://reviews.llvm.org/D84359
2020-09-21 10:27:47 -05:00
Arthur Eubanks 5249e6f248 [LoopSimplifyCFG][NewPM] Rename simplify-cfg -> loop-simplifycfg
This matches the legacy PM name and makes all tests in
Transforms/LoopSimplifyCFG pass under NPM.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87948
2020-09-21 08:27:19 -07:00
Simon Pilgrim 18a3ebcd30 [CostModel][X86] Add some select shuffle costs tests for D87884 2020-09-21 16:09:05 +01:00
Alexey Bataev 3ff07fcd54 [SLP] Allow reordering of vectorization trees with reused instructions.
If some leaves have the same instructions to be vectorized, we may
incorrectly evaluate the best order for the root node (it is built for the
vector of instructions without repeated instructions and, thus, has less
elements than the root node). In this case we just can not try to reorder
the tree + we may calculate the wrong number of nodes that requre the
same reordering.
For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves
are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first
leaf, it will be shrink to \<a, b\>. If instructions in this leaf should
be reordered, the best order will be \<1, 0\>. We need to extend this
order for the root node. For the root node this order should look like
\<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes
with the reused instructions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D45263
2020-09-21 10:51:03 -04:00
Denis Antrushin ee86688b81 [Statepoints][ISEL] gc.relocate uniquification should be based on SDValue, not IR Value.
When exporting statepoint results to virtual registers we try to avoid
generating exports for duplicated inputs. But we erroneously use
IR Value* to check if inputs are duplicated. Instead, we should use
SDValue, because even different IR values can get lowered to the same
SDValue.
I'm adding a (degenerate) test case which emphasizes importance of this
feature for invoke statepoints.
If we fail to export only unique values we will end up with something
like that:

  %0 = STATEPOINT
  %1 = COPY %0

landing_pad:
  <use of %1>

And when exceptional path is taken, %1 is left uninitialized (COPY is never
execute).

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D87695
2020-09-21 19:44:46 +07:00
Paul Walker f3fa954b5b [SVE] Change definition of reduction ISD nodes to have an SVE vector result type.
The current nodes, AArch64::SMAXV_PRED for example, are defined to
return a NEON vector result.  This is incorrect because they modify
the complete SVE register and are thus changed to represent such.

This patch also adds nodes for UADDV_PRED and SADDV_PRED, which
unifies the handling of all SVE reductions.

NOTE: Floating-point reductions are already implemented correctly,
so this patch is essentially making everything consistent with those.

Differential Revision: https://reviews.llvm.org/D87843
2020-09-21 13:16:28 +01:00
Paul Walker 6457455248 [SVE] Use NEON for extract_vector_elt when the index is in range.
Patch also adds missing patterns for unpacked vector types and
extracts of element zero.

Differential Revision: https://reviews.llvm.org/D87842
2020-09-21 13:12:28 +01:00
Florian Hahn 11dccf8d3a Recommit "[SCEV] Look through single value PHIs."
This commit was originally because it was suspected to cause a crash,
but a reproducer did not surface.

A crash that was exposed by this change was fixed in 1d8f2e5292.

This reverts the revert commit 0581c0b0ee.
2020-09-21 11:59:50 +01:00
David Green f4c5cadbcb [ARM] Select f32 constants with vmov.f16
This adds lowering for f32 values using the vmov.f16, which zeroes the
top bits whilst setting the lower bits to a pattern. This range of
values does not often come up, except where a f16 constant value has
been converted to a f32.

Differential Revision: https://reviews.llvm.org/D87790
2020-09-21 11:10:47 +01:00
Georgii Rymar 095f6fbbd7 [llvm-readelf/obj] - Stop printing invalid names for unnamed section symbols.
We have an issue with `ELFDumper<ELFT>::getSymbolSectionName`:
1) It is used deeply for both LLVM/GNU styles and might return LLVM-style only
   values to describe symbols: "Undefined", "Processor Specific", "Absolute", etc.

2) `getSymbolSectionName` is used by `getFullSymbolName` and these special values
   might appear instead of symbol names in many places.
   This occurs for unnamed section symbols currently.

This patch extracts the LLVM specific logic to `LLVMStyle<ELFT>::printSymbolSection`,
which seems to be the only place where we want to print the special values mentioned.
It also adds a meaningful new warning that is reported when we are unable to get
a section index for a section symbol.

Differential revision: https://reviews.llvm.org/D87764
2020-09-21 13:05:46 +03:00
Sam Parker 13c73632c7 [NFC][ARM] More tail predication tests.
Add mir tests for use/def of P0.
2020-09-21 10:58:05 +01:00
Max Kazantsev 98aed8aa00 [Test] Test auto-update 2020-09-21 16:06:18 +07:00
Lucas Prates 53d238a961 [CodeGen] Fixing inconsistent ABI mangling of vlaues in SelectionDAGBuilder
SelectionDAGBuilder was inconsistently mangling values based on ABI
Calling Conventions when getting them through copyFromRegs in
SelectionDAGBuilder, causing duplicate value type convertions for
function arguments. The checking for the mangling requirement was based
on the value's originating instruction and was performed outside of, and
inspite of, the regular Calling Convention Lowering.

The issue could be observed in a scenario such as:

```
%arg1 = load half, half* %const, align 2
%arg2 = call fastcc half @someFunc()
call fastcc void @otherFunc(half %arg1, half %arg2)
; Here, %arg2 was incorrectly mangled twice, as the CallConv data from
; the call to @someFunc() was taken into consideration for the check
; when getting the value for processing the call to @otherFunc(...),
; after the proper convertion had taken place when lowering the return
; value of the first call.
```

This patch fixes the issue by disregarding the Calling Convention
information for such copyFromRegs, making sure the ABI mangling is
properly contanined in the Calling Convention Lowering.

This fixes Bugzilla #47454.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87844
2020-09-21 10:05:34 +01:00
Florian Hahn 57ae9bb932 [LSR] Preserve MSSA when using SplitCriticalEdge.
LSR claims to MemorySSA, but we also have to make sure it is preserved
when splitting critical edges. This can be done by passing MSSAU to
SplitCriticalEdge.

Fixes PR47557.
2020-09-21 09:51:26 +01:00
Qiu Chaofan 1d782c2987 [PowerPC] Pass nofpexcept flag to custom lowered constrained ops
This is a follow-up of D86605. For strict DAG FP node, if its FP
exception behavior metadata is ignore, it should have nofpexcept flag.
But during custom lowering, this flag isn't passed down.

This is also seen on X86 target.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87390
2020-09-21 10:44:25 +08:00
Fangrui Song d06485685d [XRay] Change mips to use version 2 sled (PC-relative address)
Follow-up to D78590. All targets use PC-relative addresses now.

Reviewed By: atanasyan, dberris

Differential Revision: https://reviews.llvm.org/D87977
2020-09-20 17:59:57 -07:00
wlei a8b8a9374a [llvm-profdata]Fix llvm-profdata crash on compact binary profile
llvm-profdata `show` and `overlap` will crash in `getFuncName` on compact binary profile. This change fixed this by switching to use `getName`.

 `getFuncName` is misused in llvm-profdata. As showed below, `GUIDToFuncNameMap` is only supported in compilation mode, there is no initialization in llvm-profdata. Compact profile whose MD5 is true would try to query `GUIDToFuncNameMap` then caused the crash. So fix this by switching to `getName`

Reviewed By: MaskRay, wmi, wenlei, weihe, hoy

Differential Revision: https://reviews.llvm.org/D87740
2020-09-20 16:58:34 -07:00
Craig Topper a74b1faba2 [X86] Make reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore work for avx512 after type legalization.
The scalar elements of the vXi1 build_vector will have been type legalized to i8 by padding with 0s. So we can't check for all ones. Instead we should just look at bit 0 of the constant.

Differential Revision: https://reviews.llvm.org/D87863
2020-09-20 13:54:20 -07:00
Craig Topper c89b3af0e3 [X86] Pre-commit test cases for D87863. NFC 2020-09-20 13:53:05 -07:00
Craig Topper 4e8c028158 [X86] Stop reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore from creating scalar i64 load/stores in 32-bit mode
If we emit a scalar i64 load/store it will get type legalized to two i32 load/stores.

Differential Revision: https://reviews.llvm.org/D87862
2020-09-20 13:46:59 -07:00
Craig Topper 9b1c98c0fb [X86] Add 32-bit command lines to masked_store.ll and masked_load.ll 2020-09-20 13:46:59 -07:00
David Green 29bd8ea110 [ARM] Constant fold VMOVrh
This adds simple constant folding for VMOVrh, to constant fold fp16
constants to integer values. It can help especially with soft calling
conventions, but some of the results are not optimal as we end up
loading using a vldr. This will be improved in a follow up patch.

Differential Revision: https://reviews.llvm.org/D87789
2020-09-20 21:32:51 +01:00
Nikita Popov 1a27238098 [CVP] Additional tests for comparison with offset (NFC)
Both icmps have an additional offset here. We would fold this if
the second one didn't.
2020-09-20 22:10:34 +02:00
Nikita Popov 445db89b53 [LVI] Get value range from mask comparison
InstCombine likes to canonicalize comparisons of the form
X == C || X == C+1 into (X & -2) == C'. Make sure LVI can still
recover the value range from this. Can of course also be useful
for proper mask comparisons.

For the sake of clarity, the implementation goes through KnownBits
to compute the range.
2020-09-20 21:13:57 +02:00
Nikita Popov 91af6a78d0 [CVP] Add tests for mask comparisons (NFC) 2020-09-20 21:13:57 +02:00
Simon Pilgrim 0bfeede669 [X86][SSE] Fold EXTEND_VECTOR_INREG(EXTRACT_SUBVECTOR(EXTEND(X),0)) -> EXTEND_VECTOR_INREG(X) 2020-09-20 18:39:12 +01:00
Simon Pilgrim bb0078e591 [X86][SSE] Fold SIGN_EXTEND(SIGN_EXTEND_VECTOR_INREG(X)) -> SIGN_EXTEND_VECTOR_INREG(X)
It should be possible to make this generic, but we're not great at checking legality of *_EXTEND_VECTOR_INREG ops so I'm conservatively putting this inside X86ISelLowering.cpp
2020-09-20 18:39:12 +01:00
Sanjay Patel 7903ae4720 [InstCombine] factorize left shifts of add/sub
We do similar factorization folds in SimplifyUsingDistributiveLaws,
but that drops no-wrap properties. Propagating those optimally may
help solve:
https://llvm.org/PR47430

The propagation is all-or-nothing for these patterns: when all
3 incoming ops have nsw or nuw, the 2 new ops should have the
same no-wrap property:
https://alive2.llvm.org/ce/z/Dv8wsU

This also solves:
https://llvm.org/PR47584
2020-09-20 12:55:24 -04:00
Sanjay Patel cf75e83275 [InstCombine] replace zombie unreachable values with 'undef' before erasing
The test (currently crashing) is reduced from the example provided
in the post-commit discussion in D87149.

Differential Revision: https://reviews.llvm.org/D87965
2020-09-20 12:25:08 -04:00
Simon Pilgrim 15c8306056 [X86][SSE] Fold EXTEND_VECTOR_INREG(EXTEND_VECTOR_INREG(X)) -> EXTEND_VECTOR_INREG(X)
It should be possible to make this generic, but we're not great at checking legality of *_EXTEND_VECTOR_INREG ops so I'm conservatively putting this inside X86ISelLowering.cpp
2020-09-20 16:33:02 +01:00
Simon Pilgrim a0c8793ce6 [X86][SSE] Enable ZERO_EXTEND_VECTOR_INREG shuffle combining on SSE41 targets.
Allows ZERO_EXTEND_VECTOR_INREG to be shuffle combined on all targets where it is legal.
2020-09-20 16:05:10 +01:00
Dávid Bolvanský 2990518b03 [MemLoc] Support lllvm.memcpy.inline in MemoryLocation::getForArgument
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D87971
2020-09-20 14:01:48 +02:00
Nikita Popov a2f9098f7a [InstCombine] Regenerate test checks (NFC) 2020-09-19 21:07:54 +02:00
Roman Lebedev bb6f4d32aa
[NFC][PhaseOrdering] Add test showing SROA not being performed after loop unrolling 2020-09-19 21:18:35 +03:00
Dávid Bolvanský fa33235df5 [BasicAA] Regenerate test checks 2020-09-19 19:36:10 +02:00
Dávid Bolvanský d716f1608c [MemLoc] Support bcmp in MemoryLocation::getForArgument
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D87964
2020-09-19 17:12:43 +02:00
Sanjay Patel 534e9132af [InstCombine] auto-generate test checks; NFC 2020-09-19 11:06:47 -04:00
Sanjay Patel 2c3d199fbf [InstCombine] regenerate test checks; NFC 2020-09-19 10:43:18 -04:00
Sanjay Patel f74a334fe3 [ConstantFolding] add undef handling for fmin/fmax intrinsics
The output here may not be optimal (yet), but it should be
consistent for commuted operands (it was not before) and
correct. We can do better by checking FMF and NaN if needed.

Code in InstSimplify generally assumes that we have already
folded code like this, so it was not handling 2 constant
inputs by commuting consistently.
2020-09-19 10:31:01 -04:00
Amara Emerson 5a50f8b39f [AArch64][GlobalISel] Add legalization and selection support for <4 x s16> G_SHL. 2020-09-18 23:32:01 -07:00
Xun Li 11453740bc [ASAN] Properly deal with musttail calls in ASAN
When address sanitizing a function, stack unpinsoning code is inserted before each ret instruction. However if the ret instruciton is preceded by a musttail call, such transformation broke the musttail call contract and generates invalid IR.
This patch fixes the issue by moving the insertion point prior to the musttail call if there is one.

Differential Revision: https://reviews.llvm.org/D87777
2020-09-18 23:10:34 -07:00
Craig Topper 58ecbbcdcd [X86] Fix copy paste mistake in @ccnp flag.
We were treating @ccp and @ccnp the same.
2020-09-18 21:28:01 -07:00
Craig Topper 5e6baf78e5 [X86] Invert the compares in inline-asm-flag-output.ll so that the setcc instruction condition matches the test name. NFC
Also add nounwind to the tests to remove cfi directives.
2020-09-18 21:23:53 -07:00
David Blaikie ad68a8b952 DebugInfo: Cleanup RLE dumping, using a length-constrained DataExtractor rather than carrying the end offset separately 2020-09-18 19:32:38 -07:00
Alexander Shaposhnikov 5495b69164 [llvm-objcopy][MachO] Add llvm-bitcode-strip driver
This diff adds llvm-bitcode-strip driver to llvm-objcopy.
In the future this will enable us to build a replacement for the tool bitcode_strip.

Test plan: make check-all

Differential revision: https://reviews.llvm.org/D87212
2020-09-18 18:13:05 -07:00
Eric Christopher dbd53a1f0c Temporarily Revert "RegAllocFast: Rewrite and improve"
as it's breaking a few tests in the lldb test suite.

Bot: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4226/steps/test/logs/stdio

This reverts commit c8757ff3aa.
2020-09-18 18:11:21 -07:00
Amara Emerson cce24bb38d [AArch64][GlobalISel] Add tests for pre-existing selection support for <4 x s16> arithmetic/bitwise ops. 2020-09-18 17:13:55 -07:00
Amara Emerson 269bcc39ca [AArch64][GlobalISel] Legalize arithmetic ops for <4 x s16> 2020-09-18 17:13:55 -07:00
Amara Emerson 5d34d7f1a0 [GlobalISel] Add lowering support for G_ABS and use for AArch64.
Differential Revision: https://reviews.llvm.org/D87952
2020-09-18 16:17:18 -07:00
Amy Kwan 37e7673c21 [PowerPC] Implement Move to VSR Mask builtins in LLVM/Clang
This patch implements the vec_gen[b|h|w|d|q]m function prototypes in altivec.h
in order to utilize the move to VSR with mask instructions introduced in Power10.

Differential Revision: https://reviews.llvm.org/D82725
2020-09-18 18:16:14 -05:00
Philip Reames 06f136f61e [instcombine][x86] Converted pdep/pext with shifted mask to simple arithmetic
If the mask of a pdep or pext instruction is a shift masked (i.e. one contiguous block of ones) we need at most one and and one shift to represent the operation without the intrinsic. One all platforms I know of, this is faster than the pdep/pext.

The cost modelling for multiple contiguous blocks might be worth exploring in a follow up, but it's not relevant for my current use case. It would almost certainly be a win on AMDs where these are really really slow though.

Differential Revision: https://reviews.llvm.org/D87861
2020-09-18 14:54:24 -07:00
Arthur Eubanks 7c10129f5a [test][InstrProf] Fix always_inline.ll under NPM
NPM's inliner does not clean up dead functions.

Differential Revision: https://reviews.llvm.org/D87922
2020-09-18 14:50:47 -07:00
Reid Kleckner 9932561b48 [COFF] Move per-global .drective emission from AsmPrinter to TLOFCOFF
This changes the order of output sections and the output assembly, but
is otherwise NFC.

It simplifies the TLOF interface by removing two COFF-only methods.
2020-09-18 14:31:01 -07:00
Vedant Kumar 3c731ba5f1 [llvm-cov] Allow commas in filenames passed to `-object` flag
Currently, -object takes a comma separated list of objects as an
argument, which prevents it working with path names that contain a
comma. Drop comma-separated support, which requires to set pass the
-object flag multiple times to set multiple objects.

Patch by Andrew Gallagher!

Differential Revision: https://reviews.llvm.org/D87003
2020-09-18 13:46:29 -07:00
Sanjay Patel d3b0644e22 [InstSimplify] add tests for constant folding fmin/fmax with undef op; NFC 2020-09-18 16:09:44 -04:00
Eric Christopher ecfd8161bf Temporarily Revert "[SLP] Allow reordering of vectorization trees with reused instructions."
as it's infinite looping on occasion.

This reverts commit 455ca0ebb6.
2020-09-18 12:50:04 -07:00
Krzysztof Parzyszek ae0ecb3c50 Pre-commit test for CSEing masked loads/stores 2020-09-18 14:30:53 -05:00
Simon Pilgrim 4ebd30722a [X86][AVX] lowerBuildVectorAsBroadcast - improve BROADCASTM lowering on non-VLX targets
Broadcast to a ZMM type then extract the low subvector.
2020-09-18 19:52:02 +01:00
Arthur Eubanks 2b1cb6d54a [test][TSan] Fix tests under NPM
Under NPM, the TSan passes are split into a module and function pass. A
couple tests were testing for inserted module constructors, which is
only part of the module pass.
2020-09-18 11:37:55 -07:00
Huihui Zhang 9ad6049736 [InstCombine][SVE] Skip scalable type for InstCombiner::getFlippedStrictnessPredicateAndConstant.
We cannot iterate on scalable vector, the number of elements is unknown at compile-time.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87918
2020-09-18 11:26:36 -07:00
James Y Knight f7a53d82c0 PR47468: Fix findPHICopyInsertPoint, so that copies aren't incorrectly inserted after an INLINEASM_BR.
findPHICopyInsertPoint special cases placement in a block with a
callbr or invoke in it. In that case, we must ensure that the copy is
placed before the INLINEASM_BR or call instruction, if the register is
defined prior to that instruction, because it may jump out of the
block.

Previously, the code placed it immediately after the last def _or
use_. This is wrong, if the use is the instruction which may jump.  We
could correctly place it immediately after the last def (ignoring
uses), but that is non-optimal for register pressure.

Instead, place the copy after the last def, or before the
call/inlineasm_br, whichever is later.

Differential Revision: https://reviews.llvm.org/D87865
2020-09-18 14:14:04 -04:00
Simon Pilgrim ecba9d793e [X86][AVX] Add missing non AVX512VL broadcastm test coverage 2020-09-18 19:11:29 +01:00
Matt Arsenault c8757ff3aa RegAllocFast: Rewrite and improve
This rewrites big parts of the fast register allocator. The basic
strategy of doing block-local allocation hasn't changed but I tweaked
several details:

Track register state on register units instead of physical
registers. This simplifies and speeds up handling of register aliases.
Process basic blocks in reverse order: Definitions are known to end
register livetimes when walking backwards (contrary when walking
forward then uses may or may not be a kill so we need heuristics).

Check register mask operands (calls) instead of conservatively
assuming everything is clobbered.  Enhance heuristics to detect
killing uses: In case of a small number of defs/uses check if they are
all in the same basic block and if so the last one is a killing use.
Enhance heuristic for copy-coalescing through hinting: We check the
first k defs of a register for COPYs rather than relying on there just
being a single definition.  When testing this on the full llvm
test-suite including SPEC externals I measured:

average 5.1% reduction in code size for X86, 4.9% reduction in code on
aarch64. (ranging between 0% and 20% depending on the test) 0.5%
faster compiletime (some analysis suggests the pass is slightly slower
than before, but we more than make up for it because later passes are
faster with the reduced instruction count)

Also adds a few testcases that were broken without this patch, in
particular bug 47278.

Patch mostly by Matthias Braun
2020-09-18 14:05:18 -04:00
Matt Arsenault 870fd53e4f Reapply "RegAllocFast: Record internal state based on register units"
The regressions this caused should be fixed when
https://reviews.llvm.org/D52010 is applied.

This reverts commit a21387c654.
2020-09-18 14:05:18 -04:00
Zequan Wu 91aed9bf97 [CodeGen] emit CG profile for COFF object file
I forgot to add emission of CG profile for COFF object file, when adding the support (https://reviews.llvm.org/D81775)

Differential Revision: https://reviews.llvm.org/D87811
2020-09-18 10:57:54 -07:00
Arthur Eubanks d419e34c4d [test][HWAsan] Fix kernel-inline.ll under NPM 2020-09-18 10:56:08 -07:00
Arthur Eubanks 06fe76cc4f [ASan][NewPM] Fix byref-args.ll under NPM 2020-09-18 10:50:53 -07:00
Matt Arsenault 0576f436e5 AMDGPU: Don't sometimes allow instructions before lowered si_end_cf
Since 6524a7a2b9, this would sometimes
not emit the or to exec at the beginning of the block, where it really
has to be. If there is an instruction that defines one of the source
operands, split the block and turn the si_end_cf into a terminator.

This avoids regressions when regalloc fast is switched to inserting
reloads at the beginning of the block, instead of spills at the end of
the block.

In a future change, this should always split the block.
2020-09-18 13:43:01 -04:00
Amara Emerson 615695de27 [AArch64][GlobalISel] Make <8 x s8> of G_BUILD_VECTOR legal. 2020-09-18 10:32:33 -07:00
Simon Pilgrim ceadd98c2f [X86][AVX] lowerBuildVectorAsBroadcast - improve i64 BROADCASTM lowering on 32-bit targets
We already handle the the cases where we have a 'zero extended splat' build vector (a, 0, 0, 0, a, 0, 0, 0, ...) but were missing the case where the 'a' scalar was zero-extended as well - such as i64 -> vXi64 splat cases on 32-bit targets.
2020-09-18 16:59:57 +01:00