Commit Graph

58359 Commits

Author SHA1 Message Date
Craig Topper fad1589f39 Revert r350554 "[X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics."
The AutoUpgrade.cpp if/else cascade hit an MSVC limit again.

llvm-svn: 350562
2019-01-07 19:39:05 +00:00
Craig Topper 826f44b550 [TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes a User and OpIdx. Stop using it in AMDGPU target for simplifyI24.
As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions.

Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node.

This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input.

Differential Revision: https://reviews.llvm.org/D56087

llvm-svn: 350560
2019-01-07 19:30:43 +00:00
Craig Topper 9c4f7e9147 [X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics.
Differential Revision: https://reviews.llvm.org/D56377

llvm-svn: 350554
2019-01-07 19:10:12 +00:00
Diogo N. Sampaio f192cdb5c9 [ARM] ComputeKnownBits to handle extract vectors
This patch adds the sign/zero extension done by
vgetlane to ARM computeKnownBitsForTargetNode.

Differential revision: https://reviews.llvm.org/D56098

llvm-svn: 350553
2019-01-07 19:01:47 +00:00
Simon Pilgrim 32f77f2b52 [X86] Add OR(AND(X,C),AND(Y,~C)) bit select tests
Based off work for D55935

llvm-svn: 350548
2019-01-07 18:07:56 +00:00
Armando Montanez 488545ef15 [elfabi] Add option to manually specify file read format
Although llvm-elfabi will attempt to read input files without needing the format to be manually specified, doing so has the potential to introduce extraneous errors that can hinder debugging (since multiple readers may fail in attempts to read the file). This change allows the input file format to be manually specified to force elfabi to use a single reader. This makes it easier to test and debug errors specific to a given reader.

llvm-svn: 350545
2019-01-07 17:33:10 +00:00
Jordan Rupprecht 70038e01c8 [llvm-objcopy] Handle -O <format> flag.
Summary:
The -O flag is currently being mostly ignored; it's only checked whether or not the output format is "binary". This adds support for a few formats (e.g. elf64-x86-64), so that when specified, the output can change between 32/64 bit and sizes/alignments are updated accordingly.

This fixes PR39135

Reviewers: jakehehrlich, jhenderson, alexshap, espindola

Reviewed By: jhenderson

Subscribers: emaste, arichardson, llvm-commits

Differential Revision: https://reviews.llvm.org/D53667

llvm-svn: 350541
2019-01-07 16:59:12 +00:00
Sanjay Patel 47f92d3270 [x86] add more tests for LowerToHorizontalOp(); NFC
These tests show missed optimizations and a miscompile
similar to PR40243 - https://bugs.llvm.org/show_bug.cgi?id=40243

llvm-svn: 350533
2019-01-07 16:10:14 +00:00
Rhys Perry f77e2e8406 AMDGPU: test for uniformity of branch instruction, not its condition
Summary:
If a divergent branch instruction is marked as divergent by propagation
rule 2 in DivergencePropagator::exploreSyncDependency() and its condition
is uniform, that branch would incorrectly be assumed to be uniform.

Reviewers: arsenm, tstellar

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D56331

llvm-svn: 350532
2019-01-07 15:52:28 +00:00
James Henderson 9e014b6c3d [llvm-nm] Add --portability as alias for --format=posix
GNU nm supports this alias, so supporting it in llvm-nm makes it easier
to transition between the two.

Fixes https://bugs.llvm.org/show_bug.cgi?id=40002

Reviewed by: mstorsjo, rupprecht

Differential Revision: https://reviews.llvm.org/D56312

llvm-svn: 350522
2019-01-07 14:12:51 +00:00
Matt Arsenault 369acb8470 AMDGPU: Remove VS/SV mappings from select
These would violate the constant bus restriction

llvm-svn: 350517
2019-01-07 13:21:36 +00:00
Simon Pilgrim 6aac0ec21f Regenerate test.
Prep work towards enabling SimplifyDemandedBits vector support for TRUNCATE as discussed on D56118.

llvm-svn: 350514
2019-01-07 12:21:13 +00:00
Simon Pilgrim 09bf22862a Regenerate test.
Prep work towards enabling SimplifyDemandedBits vector support for TRUNCATE as discussed on D56118.

llvm-svn: 350513
2019-01-07 12:20:35 +00:00
Craig Topper 1ac0839098 [X86] Update VBMI2 vshld/vshrd tests to use an immediate that doesn't require a modulo.
Planning to replace these with funnel shift intrinsics which would mask out the extra bits. This will help minimize test diffs.

llvm-svn: 350504
2019-01-07 05:58:53 +00:00
Craig Topper 6ffeeb705f [X86] Add support for matching vector funnel shift to AVX512VBMI2 instructions.
Summary: AVX512VBMI2 supports a funnel shift by immediate and a funnel shift by a variable vector.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56361

llvm-svn: 350498
2019-01-06 18:10:18 +00:00
Craig Topper d0ba531a0c [X86] Use two pmovmskbs in combineBitcastvxi1 for (i64 (bitcast (v64i1 (truncate (v64i8)))) on KNL.
llvm-svn: 350481
2019-01-05 22:42:58 +00:00
Craig Topper 46f8b4a11e [X86] Allow combinevxi1Bitcast to use pmovmskb on avx512 targets if the input is a truncate from v16i8/v32i8.
This is especially helpful on targets without avx512bw since we don't have a good way to convert from v16i8/v32i8 to v16i1/v32i1 for the truncate anyway. If we're just going to convert it to a GPR we might as well use pmovmskb to accomplish both.

llvm-svn: 350480
2019-01-05 21:40:07 +00:00
Stanislav Mekhanoshin 35a3a3bd11 Added single use check to ShrinkDemandedConstant
Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink
source node to demanded bits only even if there are other uses.

Differential Revision: https://reviews.llvm.org/D56289

llvm-svn: 350475
2019-01-05 19:20:00 +00:00
Craig Topper 27406e1f9e [X86] Regenerate test to merge 32-bit and 64-bit check lines. NFC
llvm-svn: 350474
2019-01-05 19:19:37 +00:00
Craig Topper 3f48dbf72e [X86] Allow LowerTRUNCATE to use PACKUS/PACKSS for v16i16->v16i8 truncate when -mprefer-vector-width-256 is in effect and BWI is not available.
llvm-svn: 350473
2019-01-05 18:48:11 +00:00
Nikita Popov 25a02c12f1 [InstCombine] Improve cttz/ctlz + icmp tests; NFC
Change part of the tests to use vectors (I'm using scalar for ugt
and vector for ult), add multiuse variations, rename %lz to %tz
for the cttz tests.

llvm-svn: 350471
2019-01-05 17:36:05 +00:00
Nikita Popov b46680407d [InstCombine] Add cttz/ctlz + icmp ugt/ult tests; NFC
llvm-svn: 350468
2019-01-05 15:51:59 +00:00
Nikita Popov 65038515ee [InstCombine] Relax cttz/ctlz with select on zero
The cttz/ctlz intrinsics have a parameter specifying whether the
result is undefined for zero. cttz(x, false) can be relaxed to
cttz(x, true) if x is known non-zero, and in fact such an optimization
is already performed. However, this currently doesn't work if x is
non-zero as a result of a select rather than an explicit branch.
This patch adds handling for this case, thus allowing
x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y.

Differential Revision: https://reviews.llvm.org/D55786

llvm-svn: 350463
2019-01-05 09:48:16 +00:00
Nikita Popov 7bd4900ba0 [InstCombine] Add vector tests for select + ctlz/cttz; NFC
llvm-svn: 350462
2019-01-05 09:48:05 +00:00
Evgeniy Stepanov 0184c53cbd Revert "Revert "[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)""
This reapplies commit r348983.

llvm-svn: 350448
2019-01-05 00:44:58 +00:00
Vyacheslav Zakharin 0a6f86c54b Update the pr_datasz of .note.gnu.property section.
Patch by Xiang Zhang.

Differential Revision: https://reviews.llvm.org/D56080

llvm-svn: 350436
2019-01-04 21:25:01 +00:00
Nikita Popov 6658fce4fc [BDCE] Remove dead uses of arguments
In addition to finding dead uses of instructions, also find dead uses
of function arguments, and replace them with zero as well.

I'm changing the way the known bits are computed here to remove the
coupling between the transfer function and the algorithm. It previously
relied on the first op being visited first and computing known bits --
unless the first op is not an instruction, in which case they're computed
on the second op. I could have adjusted this to check for "instruction
or argument", but I think it's better to avoid the repeated calculation
with an explicit flag.

Differential Revision: https://reviews.llvm.org/D56247

llvm-svn: 350435
2019-01-04 21:21:43 +00:00
Craig Topper cfeb1cf9af [X86] Add INSERT_SUBVECTOR to ComputeNumSignBits
This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits.

My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common.

Differential Revision: https://reviews.llvm.org/D56283

llvm-svn: 350432
2019-01-04 20:50:59 +00:00
Sanjay Patel 6a5656703e [x86] add tests for potential horizontal vector ops; NFC
These are modified versions of the FP tests from rL349923.

llvm-svn: 350430
2019-01-04 20:14:53 +00:00
Peter Collingbourne 87f477b5e4 hwasan: Implement lazy thread initialization for the interceptor ABI.
The problem is similar to D55986 but for threads: a process with the
interceptor hwasan library loaded might have some threads started by
instrumented libraries and some by uninstrumented libraries, and we
need to be able to run instrumented code on the latter.

The solution is to perform per-thread initialization lazily. If a
function needs to access shadow memory or add itself to the per-thread
ring buffer its prologue checks to see whether the value in the
sanitizer TLS slot is null, and if so it calls __hwasan_thread_enter
and reloads from the TLS slot. The runtime does the same thing if it
needs to access this data structure.

This change means that the code generator needs to know whether we
are targeting the interceptor runtime, since we don't want to pay
the cost of lazy initialization when targeting a platform with native
hwasan support. A flag -fsanitize-hwaddress-abi={interceptor,platform}
has been introduced for selecting the runtime ABI to target. The
default ABI is set to interceptor since it's assumed that it will
be more common that users will be compiling application code than
platform code.

Because we can no longer assume that the TLS slot is initialized,
the pthread_create interceptor is no longer necessary, so it has
been removed.

Ideally, lazy initialization should only cost one instruction in the
hot path, but at present the call may cause us to spill arguments
to the stack, which means more instructions in the hot path (or
theoretically in the cold path if the spills are moved with shrink
wrapping). With an appropriately chosen calling convention for
the per-thread initialization function (TODO) the hot path should
always need just one instruction and the cold path should need two
instructions with no spilling required.

Differential Revision: https://reviews.llvm.org/D56038

llvm-svn: 350429
2019-01-04 19:27:04 +00:00
Teresa Johnson 853b962416 [ThinLTO] Handle chains of aliases
At -O0, globalopt is not run during the compile step, and we can have a
chain of an alias having an immediate aliasee of another alias. The
summaries are constructed assuming aliases in a canonical form
(flattened chains), and as a result only the base object but no
intermediate aliases were preserved.

Fix by adding a pass that canonicalize aliases, which ensures each
alias is a direct alias of the base object.

Reviewers: pcc, davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54507

llvm-svn: 350423
2019-01-04 19:04:54 +00:00
Sanjay Patel 6153565511 [x86] lower extracted fadd/fsub to horizontal vector math; 2nd try
The 1st try for this was at rL350369, but it caused IR-level diffs because
our cost models differentiate custom vs. legal/promote lowering. So that was
reverted at rL350373. The cost models were fixed independently at rL350403,
so this is effectively the same patch as last time.

Original commit message:
This would show up if we fix horizontal reductions to narrow as they go along,
but it's an improvement for size and/or Jaguar (fast-hops) independent of that.

We need to do this late to not interfere with other pattern matching of larger
horizontal sequences.

We can extend this to integer ops in a follow-up patch.

Differential Revision: https://reviews.llvm.org/D56011

llvm-svn: 350421
2019-01-04 17:48:13 +00:00
Vedant Kumar a1778df474 [CodeExtractor] Do not extract unsafe lifetime markers
Lifetime markers which reference inputs to the extraction region are not
safe to extract. Example ('rhs' will be extracted):

```
               entry:
              +------------+
              | x = alloca |
              | y = alloca |
              +------------+
             /              \
   lhs:                      rhs:
  +-------------------+     +-------------------+
  | lifetime_start(x) |     | lifetime_start(x) |
  | use(x)            |     | lifetime_start(y) |
  | lifetime_end(x)   |     | use(x, y)         |
  | lifetime_start(y) |     | lifetime_end(y)   |
  | use(y)            |     | lifetime_end(x)   |
  | lifetime_end(y)   |     +-------------------+
  +-------------------+
```

Prior to extraction, the stack coloring pass sees that the slots for 'x'
and 'y' are in-use at the same time. After extraction, the coloring pass
infers that 'x' and 'y' are *not* in-use concurrently, because markers
from 'rhs' are no longer available to help decide otherwise.

This leads to a miscompile, because the stack slots actually are in-use
concurrently in the extracted function.

Fix this by moving lifetime start/end markers for memory regions defined
in the calling function around the call to the extracted function.

Fixes llvm.org/PR39671 (rdar://45939472).

Differential Revision: https://reviews.llvm.org/D55967

llvm-svn: 350420
2019-01-04 17:43:22 +00:00
Sanjay Patel 722466e1f1 [InstCombine] reduce raw IR narrowing rotate patterns to funnel shift
Similar to rL350199 - there are no known analysis/codegen holes for
funnel shift intrinsics now, so we can canonicalize the 6+ regular
instructions to funnel shift to improve vectorization, inlining,
unrolling, etc.

llvm-svn: 350419
2019-01-04 17:38:12 +00:00
Nico Weber c9141fc99f [gn build] Commit change that should have been in r350410.
llvm-svn: 350416
2019-01-04 17:26:05 +00:00
John Brawn 39ac159c24 [LICM] Adjust how moving the re-hoist point works
In some cases the order that we hoist instructions in means that when rehoisting
(which uses the same order as hoisting) we can rehoist to a block A, then a
block B, then block A again. This currently causes an assertion failure as it
expects that when changing the hoist point it only ever moves to a block that
dominates the hoist point being moved from.

Fix this by moving the re-hoist point when it doesn't dominate the dominator of
hoisted instruction, or in other words when it wouldn't dominate the uses of
the instruction being rehoisted.

Differential Revision: https://reviews.llvm.org/D55266

llvm-svn: 350408
2019-01-04 17:12:09 +00:00
Simon Pilgrim c2054144ee [CostModel][X86] Fix SSE1 FADD/FSUB costs
Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4

Add the other costs so that we're not relying on the default "is legal/custom" cost logic.

llvm-svn: 350403
2019-01-04 16:55:57 +00:00
Ranjeet Singh 107dd2565c Revert patches 348835 and 348571 because they're
causing code size performance regressions.

llvm-svn: 350402
2019-01-04 16:39:10 +00:00
Simon Pilgrim 71d61567c0 [CostModel][X86] Add SSE1 fp cost tests
llvm-svn: 350401
2019-01-04 16:37:01 +00:00
Simon Pilgrim 9f4dea8c06 [X86] Add VPSLLI/VPSRLI ((X >>u C1) << C2) SimplifyDemandedBits combine
Repeat of the generic SimplifyDemandedBits shift combine

llvm-svn: 350399
2019-01-04 15:43:43 +00:00
Simon Pilgrim 7ee2285625 [X86] Split immediate shifts tests. NFCI.
A future patch will combine logical shifts more aggressively.

llvm-svn: 350396
2019-01-04 14:56:10 +00:00
Florian Hahn 7902405c42 [ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffset
GetPointerBaseWithConstantOffset include this code, where ByteOffset
and GEPOffset are both of type llvm::APInt :

  ByteOffset += GEPOffset.getSExtValue();

The problem with this line is that getSExtValue() returns an int64_t, but
the += matches an overload for uint64_t. The problem is that the resulting
APInt is no longer considered to be signed. That in turn causes assertion
failures later on if the relevant pointer type is > 64 bits in width and
the GEPOffset was negative.

Changing it to

  ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth());

resolves the issue and explicitly performs the sign-extending
or truncation. Additionally, instead of asserting later if the result
is > 64 bits, it breaks out of the loop in that case.

See also
 https://reviews.llvm.org/D24729
 https://reviews.llvm.org/D24772

This commit must be merged after D38662 in order for the test to pass.

Patch by Michael Ferguson <mpfergu@gmail.com>.

Reviewers: reames, sanjoy, hfinkel

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D38501

llvm-svn: 350395
2019-01-04 14:53:22 +00:00
Craig Topper 6265a15f2e [X86] Add post-isel peephole to fold KAND+KORTEST into KTEST if only the zero flag is used.
Doing this late so we will prefer to fold the AND into a masked comparison first. That can be better for the live range of the mask register.

Differential Revision: https://reviews.llvm.org/D56246

llvm-svn: 350374
2019-01-04 00:10:58 +00:00
Sanjay Patel 26ce9c38a7 revert r350369: [x86] lower extracted fadd/fsub to horizontal vector math
There are non-codegen tests that need to be updated with this code change.

llvm-svn: 350373
2019-01-04 00:02:02 +00:00
Sanjay Patel ef4afca2ad [x86] lower extracted fadd/fsub to horizontal vector math
This would show up if we fix horizontal reductions to narrow as they go along, 
but it's an improvement for size and/or Jaguar (fast-hops) independent of that.

We need to do this late to not interfere with other pattern matching of larger 
horizontal sequences.

We can extend this to integer ops in a follow-up patch.

Differential Revision: https://reviews.llvm.org/D56011

llvm-svn: 350369
2019-01-03 23:16:19 +00:00
Heejin Ahn 777d01c756 [WebAssembly] Optimize Irreducible Control Flow
Summary:
Irreducible control flow is not that rare, e.g. it happens in malloc and
3 other places in the libc portions linked in to a hello world program.
This patch improves how we handle that code: it emits a br_table to
dispatch to only the minimal necessary number of blocks. This reduces
the size of malloc by 33%, and makes it comparable in size to asm2wasm's
malloc output.

Added some tests, and verified this passes the emscripten-wasm tests run
on the waterfall (binaryen2, wasmobj2, other).

Reviewers: aheejin, sunfish

Subscribers: mgrang, jgravelle-google, sbc100, dschuff, llvm-commits

Differential Revision: https://reviews.llvm.org/D55467

Patch by Alon Zakai (kripken)

llvm-svn: 350367
2019-01-03 23:10:11 +00:00
Wouter van Oortmerssen 820c6263d9 [WebAssembly] Fixed disassembler not knowing about new brlist operand
Summary:
The previously introduced new operand type for br_table didn't have
a disassembler implementation, causing an assert.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56227

llvm-svn: 350366
2019-01-03 23:01:30 +00:00
Wouter van Oortmerssen 9843295608 [WebAssembly] Made InstPrinter more robust
Summary:
Instead of asserting on certain kinds of malformed instructions, it
now still print, but instead adds an annotation indicating the
problem, and/or indicates invalid_type etc.

We're using the InstPrinter from many contexts that can't always
guarantee values are within range (e.g. the disassembler), where having
output is more valueable than asserting.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56223

llvm-svn: 350365
2019-01-03 22:59:59 +00:00
Sanjay Patel b8687c2168 [x86] add 512-bit vector tests for horizontal ops; NFC
llvm-svn: 350364
2019-01-03 22:55:18 +00:00
Sanjay Patel ac23c46883 [x86] add AVX512 runs for horizontal ops; NFC
llvm-svn: 350362
2019-01-03 22:42:32 +00:00