Commit Graph

58975 Commits

Author SHA1 Message Date
Sanjay Patel 58256802ce [x86] add partial undef 'and' test; NFC
llvm-svn: 351840
2019-01-22 17:01:06 +00:00
Simon Pilgrim 90fa50d928 [llvm-mca][X86] Add missing CLWB/CLZERO/FSGSBASE/LWP/MWAITX/RDPID/SHA tests
We're getting pretty close to matching/exceeding test coverage of the test\CodeGen\X86\*-schedule.ll files, which should allow us to get rid of -print-schedule and fix PR37160

llvm-svn: 351836
2019-01-22 16:39:28 +00:00
Simon Pilgrim fc4b1e841e [llvm-mca][X86] Add missing enter/leave, invlpg/invlpga, rdmsr/wrmsr, rdpmc and rdtsc/rdtscp tests
llvm-svn: 351835
2019-01-22 16:29:26 +00:00
Sanjay Patel 6019e6f866 [x86] add another partial undef vector binop test; NFC
The existing test unintentionally shows that we have prematurely
optimized the shuffle into a vector concat and lost the undef info, 
so it is not affected by a basic improvement to 
SimplifyDemandedVectorElts.

llvm-svn: 351834
2019-01-22 16:26:09 +00:00
Simon Pilgrim 4e03b2496d [llvm-mca][X86] Add missing mfence/pinsrw tests
llvm-svn: 351831
2019-01-22 16:01:08 +00:00
Simon Pilgrim 05198a9b8a [llvm-mca][X86] Add missing monitor/mwait tests
These technically should be under a MONITOR cpuid bit, but we tag them as SSE3 so I've done that here as well.

llvm-svn: 351829
2019-01-22 15:48:16 +00:00
Simon Pilgrim 9b3a2f96a1 [llvm-mca][X86] Add missing vperm2i128 tests
llvm-svn: 351828
2019-01-22 14:54:24 +00:00
Simon Pilgrim 1d8d6c3bfb [llvm-mca][X86] Add missing tzcntw tests
llvm-svn: 351827
2019-01-22 14:53:52 +00:00
Sanjay Patel effee52c59 [DAGCombiner] narrow vector binop with 2 insert subvector operands
vecbo (insertsubv undef, X, Z), (insertsubv undef, Y, Z) --> insertsubv VecC, (vecbo X, Y), Z

This is another step in generic vector narrowing. It's also a step towards more horizontal op 
formation specifically for x86 (although we still failed to match those in the affected tests).

The scalarization cases are also not optimal (we should be scalarizing those), but it's still 
an improvement to use a narrower vector op when we know part of the result must be constant 
because both inputs are undef in some vector lanes.

I think a similar match but checking for a constant operand might help some of the cases in 
D51553.

Differential Revision: https://reviews.llvm.org/D56875

llvm-svn: 351825
2019-01-22 14:24:13 +00:00
Andrea Di Biagio a4d1ffc269 [MCA] Add tests for int-to-fpu transfer delays. NFC
llvm-svn: 351822
2019-01-22 13:59:08 +00:00
Simon Pilgrim 933673d878 [X86][SSE] Canonicalize OR(AND(X,C),AND(Y,~C)) -> OR(AND(X,C),ANDNP(C,Y))
For constant bit select patterns, replace one AND with a ANDNP, allowing us to reuse the constant mask. Only do this if the mask has multiple uses (to avoid losing load folding) or if we have XOP as its VPCMOV can handle most folding commutations.

This also requires computeKnownBitsForTargetNode support for X86ISD::ANDNP and X86ISD::FOR to prevent regressions in fabs/fcopysign patterns.

Differential Revision: https://reviews.llvm.org/D55935

llvm-svn: 351819
2019-01-22 13:44:49 +00:00
Simon Pilgrim aa6a4339ac [X86][BtVer2] SSE2 vector shifts has local forwarding disabled
Similar to horizontal ops on D56777, the sse2 (but not mmx) bit shift ops has local forwarding disabled, adding +1cy to the use latency for the result.

Differential Revision: https://reviews.llvm.org/D57026

llvm-svn: 351817
2019-01-22 13:27:18 +00:00
Simon Pilgrim 2c69f90171 [X86][BtVer2] X86ISD::VPERMILPV has local forwarding disabled
Similar to horizontal ops on D56777, the vpermilpd/vpermilps variable mask ops has local forwarding disabled, adding +1cy to the use latency for the result.

Differential Revision: https://reviews.llvm.org/D57022

llvm-svn: 351815
2019-01-22 13:13:57 +00:00
Martin Storsjo f614ace3e4 Revert "[llvm-objcopy] [COFF] Implement --add-gnu-debuglink"
This reverts commit r351801, as it caused errors on (so far)
ppc64be and aarch64 buildbots - the reason is yet unknown.

llvm-svn: 351811
2019-01-22 12:35:34 +00:00
Simon Pilgrim ee900efb30 [CostModel][X86] Add ICMP Predicate specific costs
First step towards PR40376, this patch adds support for getCmpSelInstrCost to use the (optional) Instruction CmpInst predicate to indicate the type of integer comparison we're performing and alter the costs accordingly.

Differential Revision: https://reviews.llvm.org/D57013

llvm-svn: 351810
2019-01-22 12:29:38 +00:00
Simon Pilgrim 180fcff5a7 [X86][SSE] Add selective commutation support for insertps (PR40340)
When we are inserting 1 "inline" element, and zeroing 2 of the other elements then we can safely commute the insertps source inputs to improve memory folding.

Differential Revision: https://reviews.llvm.org/D56843

llvm-svn: 351807
2019-01-22 12:17:48 +00:00
Alex Bradbury cd26560e46 [RISCV] Quick fix for PR40333
Avoid the infinite loop caused by the target DAG combine converting ANYEXT to
SIGNEXT and the target-independent DAG combine logic converting back to
ANYEXT. Do this by not adding the new node to the worklist.

Committing directly as this definitely doesn't make the problem any worse, and
I intend to follow-up with a patch that avoids this custom combiner logic
altogether and just lowers the i32 operations to a target-specific
SelectionDAG node. This should be easier to reason about and improve codegen
quality in some cases (though may miss out on some later DAG combines).

llvm-svn: 351806
2019-01-22 12:11:53 +00:00
Max Kazantsev feb475f4cf [LoopPredication] Support guards expressed as branches by widenable condition
This patch adds support of guards expressed as branches by widenable
conditions in Loop Predication.

Differential Revision: https://reviews.llvm.org/D56081
Reviewed By: reames

llvm-svn: 351805
2019-01-22 11:49:06 +00:00
Simon Pilgrim 372afb7ec4 [X86] Add test for matchAddressRecursively's MUL handling
Noticed in code coverage tests that this isn't tested.

llvm-svn: 351804
2019-01-22 11:39:21 +00:00
Martin Storsjo 1bf1964a15 [llvm-objcopy] [COFF] Implement --add-gnu-debuglink
Differential Revision: https://reviews.llvm.org/D57007

llvm-svn: 351801
2019-01-22 10:58:18 +00:00
Martin Storsjo 9ec18a3718 [llvm-objcopy] [COFF] Update symbol indices in weak externals
Differential Revision: https://reviews.llvm.org/D57006

llvm-svn: 351800
2019-01-22 10:58:09 +00:00
Martin Storsjo 8010c6beaf [llvm-objcopy] Consistently use createStringError instead of make_error<StringError>
This was requested in the review of D57006.

Also add missing quotes around symbol names in error messages.

Differential Revision: https://reviews.llvm.org/D57014

llvm-svn: 351799
2019-01-22 10:57:59 +00:00
James Henderson 21abd5df51 [NFC][llvm-readobj]Normalise --/- inconsistency in test options
llvm-svn: 351798
2019-01-22 10:57:21 +00:00
Chandler Carruth 285fe716c5 Revert r351778: IR: Add fp operations to atomicrmw
This broke the RISCV build, and even with that fixed, one of the RISCV
tests behaves surprisingly differently with asserts than without,
leaving there no clear test pattern to use. Generally it seems bad for
hte IR to differ substantially due to asserts (as in, an alloca is used
with asserts that isn't needed without!) and nothing I did simply would
fix it so I'm reverting back to green.

This also required reverting the RISCV build fix in r351782.

llvm-svn: 351796
2019-01-22 10:29:58 +00:00
James Henderson 33c16a3f16 [llvm-symbolizer] Add support for --basenames/-s
This fixes https://bugs.llvm.org/show_bug.cgi?id=40068.

--basenames is a GNU addr2line switch which strips the directory names
from the file path in the output.

Reviewed by: ruiu

Differential Revision: https://reviews.llvm.org/D56919

llvm-svn: 351795
2019-01-22 10:24:32 +00:00
James Henderson 5fc812f176 [llvm-readelf]Revert --dyn-symbols behaviour to make it GNU compatible, and add new --hash-symbols switch for old behaviour
In r287786, the behaviour of --dyn-symbols in llvm-readelf (but not
llvm-readobj) was changed to print the dynamic symbols as derived from
the hash table, rather than to print the dynamic symbol table contents
directly. The original change was initially submitted without review,
and some comments were made on the commit mailing list implying that the
new behavious is GNU compatible. I argue that it is not:

  1) It does not include a null symbol.
  2) It prints the symbols based on an order derived from the hash
     table.
  3) It prints an extra column indicating which bucket it came from.
     This could break parsers that expect a fixed number of columns,
     with the first column being the symbol index.
  4) If the input happens to have both .hash and .gnu.hash section, it
     prints interpretations of them both, resulting in most symbols
     being printed twice.
  5) There is no way of just printing the raw dynamic symbol table,
     because --symbols also prints the static symbol table.

This patch reverts the --dyn-symbols behaviour back to its old behaviour
of just printing the contents of the dynamic symbol table, similar to
what is printed by --symbols. As the hashed interpretation is still
desirable to validate the hash table, it puts it under a new switch
"--hash-symbols". This is a no-op on all output forms except for GNU
output style for ELF. If there is no hash table, it does nothing,
unlike the previous behaviour which printed the raw dynamic symbol
table, since the raw dynsym is available under --dyn-symbols.

The yaml input for the test is based on that in
test/tools/llvm-readobj/demangle.test, but stripped down to the bare
minimum to provide a valid dynamic symbol.

Note: some LLD tests needed updating. I will commit a separate patch for
those.

Reviewed by: grimar, rupprecht

Differential Revision: https://reviews.llvm.org/D56910

llvm-svn: 351789
2019-01-22 09:35:35 +00:00
Matt Arsenault bfdba5e4fc IR: Add fp operations to atomicrmw
Add just fadd/fsub for now.

llvm-svn: 351778
2019-01-22 03:32:36 +00:00
Eli Friedman 1eaa04d682 [ARM] Combine ands+lsls to lsls+lsrs for Thumb1.
This patch may seem familiar... but my previous patch handled the
equivalent lsls+and, not this case.  Usually instcombine puts the
"and" after the shift, so this case doesn't come up. However, if the
shift comes out of a GEP, it won't get canonicalized by instcombine,
and DAGCombine doesn't have an equivalent transform.

This also modifies isDesirableToCommuteWithShift to suppress DAGCombine
transforms which would make the overall code worse.

I'm not really happy adding a bunch of code to handle this, but it would
probably be tricky to substantially improve the behavior of DAGCombine
here.

Differential Revision: https://reviews.llvm.org/D56032

llvm-svn: 351776
2019-01-22 01:51:37 +00:00
Philip Reames 390c0e2f72 [CVP] Use LVI to constant fold deopt operands
Deopt operands are generally intended to record information about a site in code with minimal perturbation of the surrounding code. Idiomatically, they also tend to appear down rare paths. Putting these together, we have an obvious case for extending CVP w/deopt operand constant folding. Arguably, we should be doing this for all operands on all instructions, but that's definitely a much larger and risky change.

Differential Revision: https://reviews.llvm.org/D55678

llvm-svn: 351774
2019-01-22 01:34:33 +00:00
Matt Arsenault c19e17dd90 GlobalISel: Fix out of bounds crashes in verifier
llvm-svn: 351769
2019-01-22 00:29:37 +00:00
Eli Friedman 23e60c7893 [AArch64] Add patterns for zext/sext of shift amount.
Not sure this is the best fix, but it saves an instruction for certain
constructs involving variable shifts.

Differential Revision: https://reviews.llvm.org/D55572

llvm-svn: 351768
2019-01-22 00:21:35 +00:00
Matt Arsenault fb67164ebc AMDGPU/GlobalISel: Legalize more fp<->int conversions
llvm-svn: 351767
2019-01-22 00:20:17 +00:00
Sanjay Patel 9884dd1034 [x86] add another test for xor with undefs; NFC
llvm-svn: 351764
2019-01-21 22:12:35 +00:00
Sanjay Patel 43345f29f4 [x86] add tests for vector ops with undef lanes; NFC
llvm-svn: 351763
2019-01-21 21:52:27 +00:00
Stanislav Mekhanoshin f92ed6966e [AMDGPU] Fixed hazard recognizer to walk predecessors
Fixes two problems with GCNHazardRecognizer:
1. It only scans up to 5 instructions emitted earlier.
2. It does not take control flow into account. An earlier instruction
from the previous basic block is not necessarily a predecessor.
At the same time a real predecessor block is not scanned.

The patch provides a way to distinguish between scheduler and
hazard recognizer mode. It is OK to work with emitted instructions
in the scheduler because we do not really know what will be emitted
later and its order. However, when pass works as a hazard recognizer
the schedule is already finalized, and we have full access to the
instructions for the whole function, so we can properly traverse
predecessors and their instructions.

Differential Revision: https://reviews.llvm.org/D56923

llvm-svn: 351759
2019-01-21 19:11:26 +00:00
Simon Pilgrim 9b73ae96c5 [X86][BtVer2] Update latency of mmx horizontal operations
D56777 added +1cy local forwarding penalty for horizontal operations, but this penalty only affects sse2/xmm variants, the mmx variants don't suffer the penalty.

Confirmed with @andreadb

llvm-svn: 351755
2019-01-21 18:04:25 +00:00
Sanjay Patel fe3a1b56eb [AArch64] add more tests for buildvec to shuffle transform; NFC
These are copied from the sibling x86 file. I'm not sure which
of the current outputs (if any) is considered optimal, but
someone more familiar with AArch may want to take a look.

llvm-svn: 351754
2019-01-21 17:46:35 +00:00
Sanjay Patel e713c47d49 [DAGCombiner] fix crash when converting build vector to shuffle
The regression test is reduced from the example shown in D56281.
This does raise a question as noted in the test file: do we want
to handle this pattern? I don't have a motivating example for
that on x86 yet, but it seems like we could have that pattern 
there too, so we could avoid the back-and-forth using a shuffle.

llvm-svn: 351753
2019-01-21 17:30:14 +00:00
Andrea Di Biagio b68dd05c14 [X86][BtVer2] Update the WriteLoad latency.
r327630 introduced new write definitions for float/vector loads.
Before that revision, WriteLoad was used by both integer/float (scalar/vector)
load. So, WriteLoad had to conservatively declare a latency to 5cy. That is
because the load-to-use latency for float/vector load is 5cy.

Now that we have dedicated writes for float/vector loads, there is no reason why
we should keep the latency of WriteLoad to 5cy. At the moment, WriteLoad is only
used by scalar integer loads only; we can assume an optimstic 3cy latency for
them.
This patch changes that latency from 5cy to 3cy, and regenerates the affected
scheduling/mca tests.

Differential Revision: https://reviews.llvm.org/D56922

llvm-svn: 351742
2019-01-21 12:04:10 +00:00
Simon Pilgrim 44feb4a87b [CostModel][X86] Add XOP icmp cost tests (PR40376)
llvm-svn: 351741
2019-01-21 11:33:52 +00:00
Dmitry Venikov 119cf66fa5 [llvm-symbolizer] Add -no-demangle as alias for -demangle=false
Summary: Provides -no-demangle as alias for -demangle=false. Motivation: https://bugs.llvm.org/show_bug.cgi?id=40075

Reviewers: jhenderson, ruiu

Reviewed By: jhenderson

Subscribers: erik.pilkington, rupprecht, llvm-commits

Differential Revision: https://reviews.llvm.org/D56773

llvm-svn: 351735
2019-01-21 10:00:57 +00:00
Craig Topper f608dc1f57 [X86] Remove and autoupgrade vpmovqd/vpmovwb intrinsics using trunc+select.
llvm-svn: 351729
2019-01-21 08:16:59 +00:00
Kito Cheng 5e8798f987 [RISCV] Add R_RISCV_RELAX relocation to all possible relax candidates.
Summary:
Add R_RISCV_RELAX relocation to all possible relax candidates and
update corresponding testcase.

Reviewers: asb, apazos

Differential Revision: https://reviews.llvm.org/D46677

llvm-svn: 351723
2019-01-21 05:27:09 +00:00
Dylan McKay 5c23410fdf [AVR] Insert unconditional branch when inserting MBBs between blocks with fallthrough
This updates the AVR Select8/Select16 expansion code so that, when
inserting the two basic blocks for true and false conditions, any
existing fallthrough on the previous block is preserved.

Prior to this patch, if the block before the Select pseudo fell through
to the subsequent block, two new basic blocks would be inserted at the
prior fallthrough point, changing the fallthrough destination.

The predecessor or successor lists were not updated, causing the
BranchFolding pass at -O1 and above the rearrange basic blocks, causing
an infinite loop. Not to mention the unconditional fallthrough to the
true block is incorrect in of itself.

This patch modifies the Select8/16 expansion so that, if inserting true
and false basic blocks at a fallthrough point, the implicit branch is
preserved by means of an explicit, unconditional branch to the previous
fallthrough destination.

Thanks to Carl Peto for reporting this bug.

This fixes avr-rust bug https://github.com/avr-rust/rust/issues/123.

llvm-svn: 351721
2019-01-21 04:32:02 +00:00
Dylan McKay ce0ab06353 Revert "[AVR] Insert unconditional branch when inserting MBBs between blocks with fallthrough"
This reverts commit r351718.

Carl pointed out that the unit test could be improved.

This patch will be recommitted once the test is made more resilient.

llvm-svn: 351719
2019-01-21 02:46:13 +00:00
Dylan McKay 33acba43f0 [AVR] Insert unconditional branch when inserting MBBs between blocks with fallthrough
This updates the AVR Select8/Select16 expansion code so that, when
inserting the two basic blocks for true and false conditions, any
existing fallthrough on the previous block is preserved.

Prior to this patch, if the block before the Select pseudo fell through
to the subsequent block, two new basic blocks would be inserted at the
prior fallthrough point, changing the fallthrough destination.

The predecessor or successor lists were not updated, causing the
BranchFolding pass at -O1 and above the rearrange basic blocks, causing
an infinite loop. Not to mention the unconditional fallthrough to the
true block is incorrect in of itself.

This patch modifies the Select8/16 expansion so that, if inserting true
and false basic blocks at a fallthrough point, the implicit branch is
preserved by means of an explicit, unconditional branch to the previous
fallthrough destination.

Thanks to Carl Peto for reporting this bug.

This fixes avr-rust bug https://github.com/avr-rust/rust/issues/123.

llvm-svn: 351718
2019-01-21 02:44:09 +00:00
Matt Arsenault 7ac79ed8f0 AMDGPU: Legalize more bitcasts
llvm-svn: 351700
2019-01-20 19:45:18 +00:00
Matt Arsenault 46ffe68d77 AMDGPU/GlobalISel: Really legalize exts from i1
There is a combine that was hiding these tests
not actually testing what they should be, although
they were producing the expected end result.

llvm-svn: 351698
2019-01-20 19:28:20 +00:00
Simon Pilgrim e1143c1322 [X86] Auto upgrade VPCOM/VPCOMU intrinsics to generic integer comparisons
This causes a couple of changes in the upgrade tests as signed/unsigned eq/ne are equivalent and we constant fold true/false codes, these changes are the same as what we already do for avx512 cmp/ucmp.

Noticed while cleaning up vector integer comparison costs for PR40376.

llvm-svn: 351697
2019-01-20 19:27:40 +00:00
Matt Arsenault 745fd9f547 GlobalISel: Implement widenScalar for basic FP ops
llvm-svn: 351696
2019-01-20 19:10:31 +00:00
Matt Arsenault cfd9e7f594 AMDGPU/GlobalISel: Legalize f32->f16 fptrunc
llvm-svn: 351695
2019-01-20 19:10:26 +00:00
Matt Arsenault ff6a9a275b AMDGPU/GlobalISel: Fix some crashs in g_unmerge_values/g_merge_values
This was crashing in the predicate function assuming the value
is a vector.

Copy more of what AArch64 uses. This probably needs more refinement
later, but I don't exactly understand what it means in some cases,
particularly since any legalization for these seems to be missing.

llvm-svn: 351693
2019-01-20 18:40:36 +00:00
Matt Arsenault 2a2086b830 AMDGPU/GlobalISel: Regbank select for fpext
llvm-svn: 351692
2019-01-20 18:35:41 +00:00
Matt Arsenault 24563ef628 AMDGPU/GlobalISel: Cleanup legality for extensions
llvm-svn: 351691
2019-01-20 18:34:24 +00:00
Simon Pilgrim b590e4f7e5 [X86] Auto upgrade old style VPCOM/VPCOMU intrinsics to generic integer comparisons
We were upgrading these to the new style VPCOM/VPCOMU intrinsics (which includes the condition code immediate), but we'll be getting rid of those shortly, so convert these to generics first.

This causes a couple of changes in the upgrade tests as signed/unsigned eq/ne are equivalent and we constant fold true/false codes, these changes are the same as what we already do for avx512 cmp/ucmp.

Noticed while cleaning up vector integer comparison costs for PR40376.

llvm-svn: 351690
2019-01-20 17:36:22 +00:00
Simon Pilgrim 4fd2459c4d [X86] Replace VPCOM/VPCOMU with generic integer comparisons (llvm)
These intrinsics can always be replaced with generic integer comparisons without any regression in codegen, even for -O0/-fast-isel cases.

Noticed while cleaning up vector integer comparison costs for PR40376.

A future commit will remove/autoupgrade the existing VPCOM/VPCOMU llvm intrinsics.

llvm-svn: 351688
2019-01-20 16:40:44 +00:00
Simon Pilgrim c934d3a01b [CostModel][X86] Add explicit vector select costs
Prior to SSE41 (and sometimes on AVX1), vector select has to be performed as a ((X & C)|(Y & ~C)) bit select.

Exposes a couple of issues with the min/max reduction costs (which only go down to SSE42 for some reason).

The increase pre-SSE41 selection costs also prevent a couple of tests from firing any longer, so I've either tweaked the target or added AVX tests as well to the existing SSE2 tests.

llvm-svn: 351685
2019-01-20 13:55:01 +00:00
Simon Pilgrim 1231904c48 [CostModel][X86] Add explicit fcmp costs for pre-SSE42 targets
Typical throughputs: cmpss/cmpps = 1cy and cmpsd/cmppd = 2cy before the Core2 era

llvm-svn: 351684
2019-01-20 13:21:43 +00:00
Simon Pilgrim 60e5a3accb [CostModel][X86] Split icmp/fcmp costs tests and test all comparison codes
llvm-svn: 351682
2019-01-20 12:10:42 +00:00
Simon Pilgrim 5d7182ecb6 [CostModel][X86] Add masked load/store/gather/scatter tests for SSE2/SSE42/AVX1 targets
llvm-svn: 351681
2019-01-20 11:23:01 +00:00
Simon Pilgrim a8b009fd14 [CostModel][X86] Add non-constant vselect cost tests
Also add AVX512 costs at the same time

llvm-svn: 351680
2019-01-20 11:19:35 +00:00
Dylan McKay a6241a5dc0 [AVR] Remove unneeded XFAILs from the Generic CodeGen tests
These have been in place for quite a while now.

Several bugs have since been fixed, and these tests now pass.

llvm-svn: 351679
2019-01-20 11:16:58 +00:00
Dylan McKay 6afef286d9 [AVR] Fix codegen bug in 16-bit loads
Prior to this patch, the AVR::LDWRdPtr instruction was always lowered to
instructions of this pattern:

    ld  $GPR8, [PTR:XYZ]+
    ld  $GPR8, [PTR]+1

This has a problem; the [PTR] is incremented in-place once, but never
decremented.

Future uses of the same pointer will use the now clobbered value,
leading to the pointer being incorrect by an offset of one.

This patch modifies the expansion code of the LDWRdPtr pseudo
instruction so that the pointer variable is not silently clobbered in
future uses in the same live range.

Bug first reported by Keshav Kini.

Patch by Kaushik Phatak.

llvm-svn: 351673
2019-01-20 03:41:08 +00:00
Dylan McKay 52846ab09a Revert "[AVR] Fix codegen bug in 16-bit loads"
This reverts commit r351544.

In that commit, I had mistakenly misattributed the issue submitter as
the patch author, Kaushik Phatak.

The patch will be recommitted immediately with the correct attribution.

llvm-svn: 351672
2019-01-20 03:41:00 +00:00
Martin Storsjo e8305175b0 [llvm-objcopy] [COFF] Implement --only-section
Differential Revision: https://reviews.llvm.org/D56873

llvm-svn: 351663
2019-01-19 19:42:54 +00:00
Martin Storsjo 1868d88b2e [llvm-objcopy] [COFF] Implement --only-keep-debug
Differential Revision: https://reviews.llvm.org/D56840

llvm-svn: 351662
2019-01-19 19:42:48 +00:00
Martin Storsjo 78a0b418b4 [llvm-objcopy] [COFF] Implement --strip-debug
Also remove sections similarly for --strip-all, --discard-all,
--strip-unneeded.

Differential Revision: https://reviews.llvm.org/D56839

llvm-svn: 351661
2019-01-19 19:42:41 +00:00
Martin Storsjo f9e1434ef4 [llvm-objcopy] [COFF] Add support for removing sections
Differential Revision: https://reviews.llvm.org/D56683

llvm-svn: 351660
2019-01-19 19:42:35 +00:00
Martin Storsjo e9f62f62ce [llvm-objcopy] [COFF] Add a testcase for patching the debug directory. NFC.
The debug directory contains the rwa file address of itself,
which is updated on write. Add a testcase for this existing
functionality.

Differential Revision: https://reviews.llvm.org/D56876

llvm-svn: 351659
2019-01-19 19:42:27 +00:00
Martin Storsjo f11509ab11 [llvm-objcopy] [COFF] Rename a test from .yaml to .test. NFC.
Tests named .yaml aren't executed by default in this directory
(while they are within e.g. LLD).

llvm-svn: 351657
2019-01-19 19:42:19 +00:00
Nikita Popov 6515db205a [InstCombine] Simplify cttz/ctlz + icmp ugt/ult
Followup to D55745, this time handling comparisons with ugt and ult
predicates (which are the canonical forms for non-equality predicates).

For ctlz we can convert into a simple icmp, for cttz we can convert
into a mask check.

Differential Revision: https://reviews.llvm.org/D56355

llvm-svn: 351645
2019-01-19 09:56:01 +00:00
Johannes Doerfert 36872b5db9 Enable IPConstantPropagation to work with abstract call sites
This modification of the currently unused inter-procedural constant
propagation pass (IPConstantPropagation) shows how abstract call sites
enable optimization of callback calls alongside direct and indirect
calls. Through minimal changes, mostly dealing with the partial mapping
of callbacks, inter-procedural constant propagation was enabled for
callbacks, e.g., OpenMP runtime calls or pthreads_create.

Differential Revision: https://reviews.llvm.org/D56447

llvm-svn: 351628
2019-01-19 05:19:12 +00:00
Johannes Doerfert 18251842c6 AbstractCallSite -- A unified interface for (in)direct and callback calls
An abstract call site is a wrapper that allows to treat direct,
  indirect, and callback calls the same. If an abstract call site
  represents a direct or indirect call site it behaves like a stripped
  down version of a normal call site object. The abstract call site can
  also represent a callback call, thus the fact that the initially
  called function (=broker) may invoke a third one (=callback callee).
  In this case, the abstract call side hides the middle man, hence the
  broker function. The result is a representation of the callback call,
  inside the broker, but in the context of the original instruction that
  invoked the broker.

  Again, there are up to three functions involved when we talk about
  callback call sites. The caller (1), which invokes the broker
  function. The broker function (2), that may or may not invoke the
  callback callee. And finally the callback callee (3), which is the
  target of the callback call.

  The abstract call site will handle the mapping from parameters to
  arguments depending on the semantic of the broker function. However,
  it is important to note that the mapping is often partial. Thus, some
  arguments of the call/invoke instruction are mapped to parameters of
  the callee while others are not. At the same time, arguments of the
  callback callee might be unknown, thus "null" if queried.

  This patch introduces also !callback metadata which describe how a
  callback broker maps from parameters to arguments. This metadata is
  directly created by clang for known broker functions, provided through
  source code attributes by the user, or later deduced by analyses.

For motivation and additional information please see the corresponding
talk (slides/video)
  https://llvm.org/devmtg/2018-10/talk-abstracts.html#talk20
as well as the LCPC paper
  http://compilers.cs.uni-saarland.de/people/doerfert/par_opt_lcpc18.pdf

Differential Revision: https://reviews.llvm.org/D54498

llvm-svn: 351627
2019-01-19 05:19:06 +00:00
Roman Tereshin a0383d6c1f Reapply "[CGP] Check for existing inttotpr before creating new one"
Original commit: r351582

llvm-svn: 351626
2019-01-19 03:37:25 +00:00
Vedant Kumar b537b946b8 [MergeFunc] Allow merging identical vararg functions using aliases
Thanks to Nikita Popov for pointing out this missed case.

This is a follow-up to r351411, which disabled function merging for
vararg functions outright due to a miscompile (see llvm.org/PR40345).

Differential Revision: https://reviews.llvm.org/D56865

llvm-svn: 351624
2019-01-19 02:46:22 +00:00
Vedant Kumar b755a2df51 [HotColdSplit] Mark inherently cold functions as such
If an inherently cold function is found, mark it as cold. For now this
means applying the `cold` and `minsize` attributes.

As a drive-by, revisit and clean up the criteria for considering a
function for splitting. Add tests.

llvm-svn: 351623
2019-01-19 02:38:47 +00:00
Vedant Kumar 17d9f14bff [CodeExtractor] Emit lifetime markers around reloads of outputs
CodeExtractor permits extracting a region of blocks from a function even
when values defined within the region are used outside of it.

This is typically done by creating an alloca in the original function
and reloading the alloca after a call to the extracted function.

Wrap the reload in lifetime start/end markers to promote stack coloring.

Suggested by Sergei Kachkov!

Differential Revision: https://reviews.llvm.org/D56045

llvm-svn: 351621
2019-01-19 02:37:59 +00:00
Roman Tereshin 022bf3e8e7 Revert "Reapply "[CGP] Check for existing inttotpr before creating new one""
This reverts commit r351618.

Compiler RT + ASAN tests are failing for PowerPC. Not sure
how would I reproduce these on macOS, so reverting (again)
until I do.

llvm-svn: 351619
2019-01-19 01:53:26 +00:00
Roman Tereshin dd6f9f68bb Reapply "[CGP] Check for existing inttotpr before creating new one"
Original commit: r351582

llvm-svn: 351618
2019-01-19 01:41:03 +00:00
Amara Emerson d5015edb37 Revert r351584: "GlobalISel: Verify g_zextload and g_sextload"
This new assertion triggered on the AArch64 GlobalISel bots. Reverting while it's being investigated.

llvm-svn: 351617
2019-01-19 00:36:11 +00:00
Nico Weber 63fd07ce07 Use llvm_canonicalize_cmake_booleans for LLVM_LIBXML2_ENABLED [llvm]
r291284 added a nice mechanism to consistently pass CMake on/off toggles to
lit. This change uses it for LLVM_LIBXML2_ENABLED too (which was added around
the same time and doesn't use the new system yet).

Also alphabetically sort the list passed to llvm_canonicalize_cmake_booleans()
in llvm/test/CMakeLists.txt.

No intended behavior change.

Differential Revision: https://reviews.llvm.org/D56912

llvm-svn: 351615
2019-01-19 00:10:54 +00:00
Matt Arsenault 96e4701401 AMDGPU/GlobalISel: Legalize more types for select
llvm-svn: 351599
2019-01-18 21:42:55 +00:00
Roman Tereshin 86ac532687 Revert "[CGP] Check for existing inttotpr before creating new one"
This reverts commit r351582.

Bots are failing. Reverting this to fix and re-commit later.

llvm-svn: 351598
2019-01-18 21:38:44 +00:00
Matt Arsenault 4599159ac3 AMDGPU/GlobalISel: Legalize illegal g_constant
llvm-svn: 351596
2019-01-18 21:33:50 +00:00
Matt Arsenault bd3a5b29cb GlobalISel: Verify G_BITCAST
llvm-svn: 351594
2019-01-18 21:04:59 +00:00
Armando Montanez 56d18121e2 [elfabi] Add support for reading DT_NEEDED from binaries
This patch gives elfabi the ability to read DT_NEEDED entries from ELF binaries
to populate NeededLibs in TextAPI's ELFStub.

Differential Revision: https://reviews.llvm.org/D55852

llvm-svn: 351592
2019-01-18 20:56:03 +00:00
Matt Arsenault 215c4f68f6 GlobalISel: Verify G_ICMP/G_FCMP vector types
llvm-svn: 351591
2019-01-18 20:49:17 +00:00
Sanjay Patel 4453e4292d [x86] add more movmsk tests; NFC
The existing tests already show a sub-optimal transform,
but this should make it clear that we can't just match
an 'and' op when creating movmsk instructions.

llvm-svn: 351590
2019-01-18 20:42:12 +00:00
Teresa Johnson 723636ee8c Make ThinLTO test run single threaded to try to avoid flakiness
To see if this helps flaky bot failures in PR40351.

llvm-svn: 351589
2019-01-18 20:41:49 +00:00
Matt Arsenault f67ae61131 GlobalISel: Verify g_zextload and g_sextload
llvm-svn: 351584
2019-01-18 20:17:37 +00:00
Roman Tereshin 85a0467a11 [CGP] Check for existing inttotpr before creating new one
Make sure CodeGenPrepare doesn't emit multiple inttoptr instructions of
the same integer value while sinking address computations, but rather
CSEs them on the fly: excessive inttoptr's confuse SCEV into thinking
that related pointers have nothing to do with each other.

This problem blocks LoadStoreVectorizer from vectorizing some of the
loads / stores in a downstream target.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D56838

llvm-svn: 351582
2019-01-18 20:13:42 +00:00
Craig Topper b9d4461f9f [X86] Lower avx2/avx512f gather intrinsics to X86MaskedGatherSDNode instead of going directly to MachineSDNode.:
This sends these intrinsics through isel in a much more normal way. This should allow addressing mode matching in isel to make better use of the displacement field.

Differential Revision: https://reviews.llvm.org/D56827

llvm-svn: 351570
2019-01-18 18:22:26 +00:00
Neil Henning 3ed09f8e0c [AMDGPU] Add some missing always-uniform values.
This commit adds some missing intrinsics into the isAlwaysUniform list
for the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D56845

llvm-svn: 351562
2019-01-18 16:39:27 +00:00
Simon Pilgrim fbca7094c9 [LTO] Change test/tools/lto/no-bitcode.s requirement from arm to aarch64
Set the test to properly require aarch64 instead of arm. Otherwise, this test fails with LLVM_TARGETS_TO_BUILD='ARM;X86'

bin/llvm-mc: : error: unable to get target for 'arm64-apple-ios7.0.0'

Committed on behalf of @easyaspi314 (Devin)

Differential Revision: https://reviews.llvm.org/D56472

llvm-svn: 351560
2019-01-18 15:57:59 +00:00
Dmitry Preobrazhensky 6bc26aaada [AMDGPU][MC][GFX8+][DISASSEMBLER] Corrected 1/2pi value for 64-bit operands
See bug 39332: https://bugs.llvm.org/show_bug.cgi?id=39332

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D56794

llvm-svn: 351555
2019-01-18 15:17:17 +00:00
Dmitry Preobrazhensky 61105bab29 [AMDGPU][MC] Disabled use of 2 different literals with SOP2/SOPC instructions
See bug 39319: https://bugs.llvm.org/show_bug.cgi?id=39319

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D56847

llvm-svn: 351549
2019-01-18 13:57:43 +00:00
George Rimar 5e36433189 [llvm-objdump] - Dump the archive headers when -all-headers is specified.
When -all-headers is given it is supposed to dump all headers,
but now it skips the archive headers for no reason.

The patch fixes that.

Differential revision: https://reviews.llvm.org/D56780

llvm-svn: 351547
2019-01-18 12:01:59 +00:00
Dylan McKay 77364be497 [AVR] Fix codegen bug in 16-bit loads
Prior to this patch, the AVR::LDWRdPtr instruction was always lowered to
instructions of this pattern:

    ld  $GPR8, [PTR:XYZ]+
    ld  $GPR8, [PTR]+1

This has a problem; the [PTR] is incremented in-place once, but never
decremented.

Future uses of the same pointer will use the now clobbered value,
leading to the pointer being incorrect by an offset of one.

This patch modifies the expansion code of the LDWRdPtr pseudo
instruction so that the pointer variable is not silently clobbered in
future uses in the same live range.

Patch by Keshav Kini.

llvm-svn: 351544
2019-01-18 11:27:38 +00:00
Dylan McKay 0154977e97 [AVR] Fix the inst-cbr test
Now that the CBR alias has lower priority than ANDI, the assembly
printer uses ANDI instead.

Original broken in r351526.

llvm-svn: 351539
2019-01-18 10:11:33 +00:00
Shiva Chen e84c729aca [ScheduleDAGRRList] Do not preschedule the node has ADJCALLSTACKDOWN parent
We should not pre-scheduled the node has ADJCALLSTACKDOWN parent,
or else, when bottom-up scheduling, ADJCALLSTACKDOWN and
ADJCALLSTACKUP may hold CallResource too long and make other
calls can't be scheduled. If there's no other available node
to schedule, the scheduler will try to rename the register by
creating copy to avoid the conflict which will fail because
CallResource is not a real physical register.

llvm-svn: 351527
2019-01-18 08:36:06 +00:00
Hsiangkai Wang 66609a8255 [CodeGen] Fix bugs in LiveDebugVariables when debug labels are generated.
Remove DBG_LABELs in LiveDebugVariables and generate them in
VirtRegRewriter.

This bug is reported in
https://bugs.chromium.org/p/chromium/issues/detail?id=898152.

Differential Revision: https://reviews.llvm.org/D54465

llvm-svn: 351525
2019-01-18 07:17:09 +00:00
Dylan McKay 7203e00b5e [AVR] Expand 8/16-bit multiplication to libcalls on MCUs that don't have hardware MUL
This change modifies the LLVM ISel lowering settings so that
8-bit/16-bit multiplication is expanded to calls into the compiler
runtime library if the MCU being targeted does not support
multiplication in hardware.

Before this, MUL instructions would be generated on CPUs like the
ATtiny85, triggering a CPU reset due to an illegal instruction at
runtime.

First raised in https://github.com/avr-rust/rust/issues/124.

llvm-svn: 351523
2019-01-18 06:10:41 +00:00
Craig Topper f47bef661e [X86] Add test cases showing failure to fold a global variable address into the gather addressing mode when using the target specific intrinsics. NFC
llvm-svn: 351522
2019-01-18 06:06:03 +00:00
Craig Topper 13c7a9f81f [X86] Change avx512-gather-scatter-intrin.ll to use x86_64-unknown-unknown instead of x86_64-apple-darwin. NFC
Will help with an upcoming patch.

llvm-svn: 351521
2019-01-18 06:06:01 +00:00
Nico Weber 71ac58e311 mac: Correctly disable tools/lto tests when building with LLVM_ENABLE_PIC=OFF
llvm/tools sets LLVM_TOOL_LTO_BUILD to Off if LLVM_ENABLE_PIC=OFF, but that's
not visible in llvm/test.

r289662 added the llvm_tool_lto_build lit parameter, there the intent was to
use it with an explicit -DLLVM_TOOL_LTO_BUILD=OFF, which is visible globally.
On the review for that (D27739), a mild preference was expressed for using a
lit parameter over checking the existence of libLTO.dylib. Since that works
with the LLVM_ENABLE_PIC=OFF case too and since it matches what we do for the
gold plugin, switch to that approach.

Differential Revision: https://reviews.llvm.org/D56805

llvm-svn: 351515
2019-01-18 03:36:04 +00:00
Thomas Lively c6795e07f0 [WebAssembly] Add languages from debug info to producers section
Reviewers: aheejin, dschuff, sbc100

Subscribers: aprantl, jgravelle-google, hiraditya, sunfish

Differential Revision: https://reviews.llvm.org/D56889

llvm-svn: 351507
2019-01-18 02:47:48 +00:00
Matt Arsenault 456b93b4c2 AMDGPU: Convert tests away from llvm.SI.load.const
llvm-svn: 351494
2019-01-17 22:47:26 +00:00
Vedant Kumar f529b50702 [HotColdSplit] Allow outlining with live outputs
Prior to r348205, extracting code regions with live output values was
disabled because of a miscompilation (PR39433). Lift the restriction as
PR39433 has been addressed.

Tested on LNT+externals, on a run of check-llvm in a stage2 build, and
with a full build of iOS (with hot/cold splitting enabled).

As a drive-by, remove an errant TODO.

llvm-svn: 351492
2019-01-17 22:36:05 +00:00
Vedant Kumar b70e20db62 [HotColdSplit] Consider resume instructions to be cold
Resuming exception unwinding is roughly as unlikely as throwing an
exception.

Tested on LNT+externals (in particular, the C++ EH regression tests
provide end-to-end test coverage), as well as with a full build of iOS.

llvm-svn: 351491
2019-01-17 22:35:47 +00:00
Vladimir Stefanovic 3daf8bc96a [mips] Emit .reloc R_{MICRO}MIPS_JALR along with j(al)r(c) $25
The callee address is added as an optional operand (MCSymbol) in
AdjustInstrPostInstrSelection() and then used by asm printer to insert:
'.reloc tmplabel, R_MIPS_JALR, symbol
tmplabel:'.
Controlled with '-mips-jalr-reloc', default is true.

Differential revision: https://reviews.llvm.org/D56694

llvm-svn: 351485
2019-01-17 21:50:37 +00:00
Vedant Kumar 4541be0686 [HotColdSplit] Relax requirement that the cold sink block be extractable
Relaxing this requirement creates opportunities to split code dominated
by an EH pad.

Tested on LNT+externals.

llvm-svn: 351483
2019-01-17 21:42:36 +00:00
Vedant Kumar 32a014d048 [HotColdSplit] Simplify tests by lowering their splitting thresholds
This gets rid of the brittle/mysterious calls to @sink()/@sideeffect()
peppered throughout the test cases. They are no longer needed to force
splitting to occur.

llvm-svn: 351480
2019-01-17 21:29:34 +00:00
Wei Mi 3bcccdfe38 [SampleFDO] Skip profile reading when flattened profile used in ThinLTO postlink
If the sample profile has no inlining hierachy information included, we call
the sample profile is flattened. For flattened profile, in ThinLTO postlink
phase, SampleProfileLoader's hot function inlining and profile annotation will
do nothing, so it is better to save the effort to read in the profile and run
the sample profile loader pass. It is helpful for reducing compile time when
the flattened profile is huge.

Differential Revision: https://reviews.llvm.org/D54819

llvm-svn: 351476
2019-01-17 20:48:34 +00:00
Reid Kleckner edd653bc07 [InstCombine] Don't sink dynamic allocas
Summary:
InstCombine's sinking algorithm only thinks about memory. It doesn't
think about non-memory constraints like stack object lifetime. It can
sink dynamic allocas across a stacksave call, which may be used with
stackrestore, which can incorrectly reduce the lifetime of the dynamic
alloca.

Fixes PR40365

Reviewers: hfinkel, efriedma

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56872

llvm-svn: 351475
2019-01-17 20:46:53 +00:00
Sanjin Sijaric 4d1450298c Fix the buildbot failure introduced by r351404
EXPENSIVE_CHECKS buildbots are failing due to r351404.

Add x1 as live in to the funclet basic block for SEH funclets, as well as
-verify-machineinstrs to the test case that triggered the failure.

llvm-svn: 351472
2019-01-17 20:24:14 +00:00
Wouter van Oortmerssen 73a0c240cf [WebAssembly] Fixing 2 more symbol offset related tests.
llvm-svn: 351465
2019-01-17 19:18:05 +00:00
Jonas Devlieghere 032ef2dc00 [Test] Fix debug-loc-0.mir with EXPENSIVE_CHECKS
The `llc` invocation was missing `-start-before=machine-cp`.

llvm-svn: 351464
2019-01-17 18:35:14 +00:00
Wouter van Oortmerssen f3b762a0b6 [WebAssembly] Fixed objdump not parsing function headers.
Summary:
objdump was interpreting the function header containing the locals
declaration as instructions. To parse these without injecting target
specific code in objdump, MCDisassembler::onSymbolStart was added to
be implemented by the WebAssembly implemention.

WasmObjectFile now returns a code offset for the "address" of a symbol,
rather than the index. This is also more in-line with what other
targets do.

Also ensured that the AsmParser correctly puts each function
in its own segment to enable this test case.

Reviewers: sbc100, dschuff

Subscribers: jgravelle-google, aheejin, sunfish, rupprecht, llvm-commits

Differential Revision: https://reviews.llvm.org/D56684

llvm-svn: 351460
2019-01-17 18:14:09 +00:00
Teresa Johnson 8d86f1ba47 Revert "[ThinLTO] Add summary entries for index-based WPD"
Mistaken commit of something still under review!

This reverts commit r351453.

llvm-svn: 351455
2019-01-17 16:05:04 +00:00
Teresa Johnson 0be9960f28 Add -dump-input=always to cfi-devirt test to debug flake
To help diagnose flaky bot failures in PR40351.

llvm-svn: 351454
2019-01-17 15:49:04 +00:00
Teresa Johnson 4fcf3b1621 [ThinLTO] Add summary entries for index-based WPD
Summary:
If LTOUnit splitting is disabled, the module summary analysis computes
the summary information necessary to perform single implementation
devirtualization during the thin link with the index and no IR. The
information collected from the regular LTO IR in the current hybrid WPD
algorithm is summarized, including:
1) For vtable definitions, record the function pointers and their offset
within the vtable initializer (subsumes the information collected from
IR by tryFindVirtualCallTargets).
2) A record for each type metadata summarizing the vtable definitions
decorated with that metadata (subsumes the TypeIdentiferMap collected
from IR).

Also added are the necessary bitcode records, and the corresponding
assembly support.

The index-based WPD will be sent as a follow-on.

Depends on D53890.

Reviewers: pcc

Subscribers: mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54815

llvm-svn: 351453
2019-01-17 15:49:03 +00:00
James Henderson e50d9cb364 [llvm-readobj][ELF]Add demangling support
This change adds demangling support to the ELF side of llvm-readobj,
under the switch --demangle/-C.

The following places are demangled: symbol table dumps (static and
dynamic), relocation dumps (static and dynamic), addrsig dumps, call
graph profile dumps, and group section signature symbols.

Although GNU readelf doesn't support demangling, it is still a useful
feature to have, and brings it on a par with llvm-objdump's
capabilities.

This fixes https://bugs.llvm.org/show_bug.cgi?id=40054.

Reviewed by: grimar, rupprecht

Differential Revision: https://reviews.llvm.org/D56791

llvm-svn: 351450
2019-01-17 15:34:12 +00:00
Max Kazantsev 61a8d3fb33 [LoopSimplifyCFG] Form LCSSA when a parent loop becomes a sibling
During the transforms in LoopSimplifyCFG, when we remove a dead exiting edge, the
parent loop may stop being reachable from the child loop, and therefore they become
siblings. If the former child loop had uses of some values from its former parent loop,
now such uses will require LCSSA Phis, even if they weren't needed before. So we must
form LCSSA for all loops that stopped being ancestors of the current loop in this case.

Differential Revision: https://reviews.llvm.org/D56144
Reviewed By: fedor.sergeev

llvm-svn: 351434
2019-01-17 12:51:10 +00:00
Max Kazantsev 8b134169f5 [LoopSimplifyCFG] Fix order of deletion of complex dead subloops
Function `DeleteDeadBlock` requires that all predecessors of a block
being deleted have already been deleted, with the exception of a
single-block loop. When we use it for removal of dead subloops that
contain more than one block, we may not fulfull this requirement and
fail an assertion.

This patch replaces invocation of `DeleteDeadBlock` with a generalized
version `DeleteDeadBlocks` that is able to deal with multiple dead blocks,
even if they contain some cycles.

Differential Revision: https://reviews.llvm.org/D56121
Reviewed By: fedor.sergeev

llvm-svn: 351433
2019-01-17 12:25:40 +00:00
Simon Pilgrim 6cc9c3cd75 [X86][SSE] Add PR40340 test case
llvm-svn: 351430
2019-01-17 11:20:23 +00:00
Simon Pilgrim 8260bf9db2 [X86] Add AVX512 test to insertps
Pre-commit for PR40340

llvm-svn: 351429
2019-01-17 11:11:15 +00:00
Matt Arsenault 0cb08e448a Allow FP types for atomicrmw xchg
llvm-svn: 351427
2019-01-17 10:49:01 +00:00
Diana Picus d5c2499aec [ARM GlobalISel] Allow calls to varargs functions
Allow varargs functions to be called, both in arm and thumb mode. This
boils down to choosing the correct calling convention, which we can
easily test by making sure arm_aapcscc is used instead of
arm_aapcs_vfpcc when the callee is variadic.

llvm-svn: 351424
2019-01-17 10:11:55 +00:00
Alex Bradbury 07f1c62371 [RISCV] Add codegen support for RV64A
In order to support codegen RV64A, this patch:
* Introduces masked atomics intrinsics for atomicrmw operations and cmpxchg
  that use the i64 type. These are ultimately lowered to masked operations
  using lr.w/sc.w, but we need to use these alternate intrinsics for RV64
  because i32 is not legal
* Modifies RISCVExpandPseudoInsts.cpp to handle PseudoAtomicLoadNand64 and
  PseudoCmpXchg64
* Modifies the AtomicExpandPass hooks in RISCVTargetLowering to sext/trunc as
  needed for RV64 and to select the i64 intrinsic IDs when necessary
* Adds appropriate patterns to RISCVInstrInfoA.td
* Updates test/CodeGen/RISCV/atomic-*.ll to show RV64A support

This ends up being a fairly mechanical change, as the logic for RV32A is
effectively reused.

Differential Revision: https://reviews.llvm.org/D53233

llvm-svn: 351422
2019-01-17 10:04:39 +00:00
Sanjin Sijaric b694030647 [ARM64][Windows] Share unwind codes between epilogues
There are cases where we have multiple epilogues that have the exact same unwind
code sequence.  In that case, the epilogues can share the same unwind codes in
the .xdata section.  This should get us past the assert "SEH unwind data
splitting not yet implemented" in many cases.

We still need to add support for generating multiple .pdata/.xdata sections for
those functions that need to be split into fragments.

Differential Revision: https://reviews.llvm.org/D56813

llvm-svn: 351421
2019-01-17 09:45:17 +00:00
Thomas Lively cbda16eb8e [WebAssembly] Parse llvm.ident into producers section
llvm-svn: 351413
2019-01-17 02:29:55 +00:00
Vedant Kumar a9906c1e5e [MergeFunc] Prevent silent miscompile of vararg functions
The function merging pass miscompiles identical vararg functions. The
forwarding thunk it emits doesn't forward the full variable-length list
of arguments. Disable merging for vararg functions for now.

I've filed llvm.org/PR40345 to track the issue.

rdar://47326238

llvm-svn: 351411
2019-01-17 02:15:05 +00:00
Thomas Lively 3cfcc94c09 Revert "[WebAssembly] Parse llvm.ident into producers section"
This reverts commit eccdbba3a02a33e13b5262e92200a33e2ead873d.

llvm-svn: 351410
2019-01-17 00:39:49 +00:00
Vedant Kumar e21ab22115 [FunctionComparator] Consider tail call kinds
Essentially, do not treat `call` and `musttail call` as the same thing.

As a drive-by, fold CallInst and InvokeInst handling together using the
CallSite helper.

Differential Revision: https://reviews.llvm.org/D56815

llvm-svn: 351405
2019-01-17 00:29:14 +00:00
Sanjin Sijaric 685565ae9a [SEH] [ARM64] Retrieve the frame pointer from SEH funclets
The Windows ARM64 runtime passes the establisher frame to funclets as the first
argument.

llvm-svn: 351404
2019-01-17 00:24:38 +00:00
Thomas Lively a56c23c5ba [WebAssembly] Parse llvm.ident into producers section
Summary:
Everything before the word "version" is the tool, and everything after
the word "version" is the version.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56742

llvm-svn: 351399
2019-01-16 23:46:14 +00:00
Jonas Devlieghere 669edb5ce5 [AsmPrinter] Collapse .loc 0 0 directives
Currently we do not always collapse subsequent .loc 0 0 directives. The
reason is that we were checking for a PrevInstLoc which is not set when
we emit a line-0 record. We should only check the LastAsmLine, which
seems to be created exactly for this purpose.

  // When we emit a line-0 record, we don't update PrevInstLoc; so look at
  // the last line number actually emitted, to see if it was line 0.
  unsigned LastAsmLine =
    Asm->OutStreamer->getContext().getCurrentDwarfLoc().getLine();

Differential revision: https://reviews.llvm.org/D56767

llvm-svn: 351395
2019-01-16 23:26:29 +00:00
Wei Mi c876e3d42b [PGO] Make pgo related options in opt more consistent.
Currently we have pgo options defined in PassManagerBuilder.cpp only for
instrument pgo, but not for sample pgo. We also have pgo options defined
in NewPMDriver.cpp in opt only for new pass manager and for all kinds of
pgo. They have some inconsistency.

To make the options more consistent and make tests writing easier, the
patch let old pass manager to share the same pgo options with new pass
manager in opt, and removes the options in PassManagerBuilder.cpp.

Differential Revision: https://reviews.llvm.org/D56749

llvm-svn: 351392
2019-01-16 23:19:02 +00:00
Craig Topper 59abdf5f3f [X86] Add X86ISD::VSHLV and X86ISD::VSRLV nodes for psllv and psrlv
Previously we used ISD::SHL and ISD::SRL to represent these in SelectionDAG. ISD::SHL/SRL interpret an out of range shift amount as undefined behavior and will constant fold to undef. While the intrinsics are defined to return 0 for out of range shift amounts. A previous patch added a special node for VPSRAV to produce all sign bits.

This was previously believed safe because undefs frequently get turned into 0 either from the constant pool or a desire to not have a false register dependency. But undef is treated specially in some optimizations. For example, its ignored in detection of vector splats. So if the ISD::SHL/SRL can be constant folded and all of the elements with in bounds shift amounts are the same, we might fold it to single element broadcast from the constant pool. This would not put 0s in the elements with out of bounds shift amounts.

We do have an existing InstCombine optimization to use shl/lshr when the shift amounts are all constant and in bounds. That should prevent some loss of constant folding from this change.

Patch by zhutianyang and Craig Topper

Differential Revision: https://reviews.llvm.org/D56695

llvm-svn: 351381
2019-01-16 21:46:32 +00:00
Changpeng Fang fe9269f804 AMDGPU: Adjust the chain for loads writing to the HI part of a register.
Summary:
  For these loads that write to the HI part of a register, we should chain them to the op that writes to the LO part
of the register to maintain the appropriate order.

Reviewers:
  rampitec, arsenm

Differential Revision:
  https://reviews.llvm.org/D56454

llvm-svn: 351379
2019-01-16 21:32:53 +00:00
Craig Topper e5b7cc8aa0 [X86] Add a one use check to the setcc inversion code in combineVSelectWithAllOnesOrZeros
If we're going to generate a new inverted setcc, we should make sure we will be able to remove the old setcc.

Differential Revision: https://reviews.llvm.org/D56765

llvm-svn: 351378
2019-01-16 21:29:29 +00:00
Craig Topper 238ad13b3a [X86] Add test case for D56765. NFC
llvm-svn: 351377
2019-01-16 21:29:26 +00:00
Nikita Popov 5fa75a0488 [X86] Add additional saturating add/sub vector tests; NFC
Additional tests for vNi32 and vNi64. I've added these for
usub.sat before, this covers uadd.sat, ssub.sat and sadd.sat.

llvm-svn: 351375
2019-01-16 20:53:23 +00:00
Mandeep Singh Grang 33c49c0c82 [COFF, ARM64] Implement support for SEH extensions __try/__except/__finally
Summary:
This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler.

We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape.

Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric

Reviewed By: rnk, efriedma

Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53540

llvm-svn: 351370
2019-01-16 19:52:59 +00:00
Andrea Di Biagio c5f0f5309e [X86][BtVer2] Update latency of horizontal operations.
On Jaguar, horizontal adds/subs have local forwarding disable.
That means, we pay a compulsory extra cycle of write-back stage, and the value
is not available until the end of that stage.

This patch changes the latency of horizontal operations by adding an extra
cycle. With this patch, latency numbers now match what is reported by perf.

I plan to send another patch to also 'fix' the latency of shuffle operations (on
Jaguar, local forwarding is disabled for vector shuffles too).

Differential Revision: https://reviews.llvm.org/D56777

llvm-svn: 351366
2019-01-16 18:18:01 +00:00
Simon Pilgrim d40ab8db30 [X86] Regenerate test
Split check-prefixes to support a future commit

llvm-svn: 351362
2019-01-16 18:00:09 +00:00
Armando Montanez fe7ab3c22e [elfabi] Add support for reading DT_SONAME from binaries
This change gives the llvm-elfabi tool the ability to read DT_SONAME from a binary ELF file into an ELFStub.

Added:

 - DynamicEntries struct for storing dynamic entries that are relevant to elfabi.
 - terminatedSubstr() retrieves a null-terminated substring from a StringRef.
 - appendToError() appends a string to an error, allowing more specific error messages.

Differential Revision: https://reviews.llvm.org/D55629

llvm-svn: 351361
2019-01-16 17:47:16 +00:00
Jeremy Morse 7dcea5ae3b [DebugInfo] Allow creation of DBG_VALUEs in blocks where the operand is not used
dbg.value intrinsics can appear in blocks where their operand is not used,
meaning the operand never receives an SDNode, and thus no DBG_VALUE will
be created. Get around this by looking to see whether the operand has already
been allocated a virtual register. This allows dbg.values of Phi node and
Values that are used across basic blocks to successfully be translated into
DBG_VALUEs.

Differential Revision: https://reviews.llvm.org/D56678

llvm-svn: 351358
2019-01-16 17:25:27 +00:00
Sid Manning 2623870441 [llvm-readobj] Set correct offset when dumping hex section output.
Differential Revision: https://reviews.llvm.org/D56369

llvm-svn: 351356
2019-01-16 16:17:19 +00:00
Sanjay Patel 546c9f6d8f [x86] add tests for extracted scalar casts (PR39974); NFC
https://bugs.llvm.org/show_bug.cgi?id=39974

llvm-svn: 351354
2019-01-16 16:11:30 +00:00
Marek Olsak c5cec5e1fa AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52944

llvm-svn: 351351
2019-01-16 15:43:53 +00:00
Alexey Bataev 18809a6bbb [SLP] Fix PR40310: The reduction nodes should stay scalar.
Summary:
Sometimes the SLP vectorizer tries to vectorize the horizontal reduction
nodes during regular vectorization. This may happen inside of the loops,
when there are some vectorizable PHIs. Patch fixes this by checking if
the node is the reduction node and thus it must not be vectorized, it must
be gathered.

Reviewers: RKSimon, spatel, hfinkel, fedor.sergeev

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56783

llvm-svn: 351349
2019-01-16 15:39:52 +00:00
Saurabh Badhwar 148569f7a6 [llvm-nm] Allow --size-sort to print symbols with only Symbol size
Summary:
When llvm-nm is passed only the --size-sort option for an object file, there is no output generated.
The commit modifies the behavior to print the symbols sorted and their size which is also inline with
the output of the GNU nm tool.

Signed-off-by: Saurabh Badhwar <sbsaurabhbadhwar9@gmail.com>

Reviewers: enderby, rupprecht

Reviewed By: rupprecht

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56063

llvm-svn: 351347
2019-01-16 14:38:22 +00:00
Sanjay Patel 0dbecd05ed [x86] lower shuffle of extracts to AVX2 vperm instructions
I was trying to prevent shuffle regressions while matching more horizontal ops 
and ended up here:
  shuf (extract X, 0), (extract X, 4), Mask --> extract (shuf X, undef, Mask'), 0

The affected tests were added for:
https://bugs.llvm.org/show_bug.cgi?id=34380

This patch won't change the examples in the bug report itself, but we should be 
able to extend this to catch more types.

Differential Revision: https://reviews.llvm.org/D56756

llvm-svn: 351346
2019-01-16 14:15:18 +00:00
Anton Korobeynikov cbdb4effae [MSP430] Emit a separate section for every interrupt vector
This is LLVM part of D56663

Linker scripts shipped by TI require to have every
interrupt vector in a separate section with a specific name:

 SECTIONS
 {
   __interrupt_vector_XX   : { KEEP (*(__interrupt_vector_XX )) } > VECTXX
   ...
 }

Follow the requirement emit the section for every vector
which contain address of interrupt handler:

  .section  __interrupt_vector_XX,"ax",@progbits
  .word %isr%

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56664

llvm-svn: 351345
2019-01-16 14:03:41 +00:00
Simon Pilgrim 1bdc049767 [X86][SSE] Add additional PR40318 shuffle test cases
llvm-svn: 351333
2019-01-16 13:15:59 +00:00
Gabor Buella 3ec170c85a Assertion in isAllocaPromotable due to extra bitcast goes into lifetime marker
For the given test SROA detects possible replacement and creates a correct alloca. After that SROA is adding lifetime markers for this new alloca. The function getNewAllocaSlicePtr is trying to deduce the pointer type based on the original alloca, which is split, to use it later in lifetime intrinsic.

For the test we ended up with such code (rA is initial alloca [10 x float], which is split, and rA.sroa.0.0 is a new split allocation)

```
%rA.sroa.0.0.rA.sroa_cast = bitcast i32* %rA.sroa.0 to [10 x float]*    <----- this one causing the assertion and is an extra bitcast
%5 = bitcast [10 x float]* %rA.sroa.0.0.rA.sroa_cast to i8*
call void @llvm.lifetime.start.p0i8(i64 4, i8* %5)
```

isAllocaPromotable code assumes that a user of alloca may go into lifetime marker through bitcast but it must be the only one bitcast to i8* type. In the test it's not a i8* type, return false and throw the assertion.

As we are creating a pointer, which will be used in lifetime markers only, the proposed fix is to create a bitcast to i8* immediately to avoid extra bitcast creation.

The test is a greatly simplified to just reproduce the assertion.

Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com>

Reviewers: chandlerc, craig.topper

Reviewed By: chandlerc

Differential Revision: https://reviews.llvm.org/D55934

llvm-svn: 351325
2019-01-16 12:06:17 +00:00
Philip Pfaffe 81101de585 [MSan] Apply the ctor creation scheme of TSan
Summary: To avoid adding an extern function to the global ctors list, apply the changes of D56538 also to MSan.

Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan

Subscribers: hiraditya, bollu, llvm-commits

Differential Revision: https://reviews.llvm.org/D56734

llvm-svn: 351322
2019-01-16 11:14:07 +00:00
Philip Pfaffe 685c76d7a3 [NewPM][TSan] Reiterate the TSan port
Summary:
Second iteration of D56433 which got reverted in rL350719. The problem
in the previous version was that we dropped the thunk calling the tsan init
function. The new version keeps the thunk which should appease dyld, but is not
actually OK wrt. the current semantics of function passes. Hence, add a
helper to insert the functions only on the first time. The helper
allows hooking into the insertion to be able to append them to the
global ctors list.

Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan

Subscribers: hiraditya, bollu, llvm-commits

Differential Revision: https://reviews.llvm.org/D56538

llvm-svn: 351314
2019-01-16 09:28:01 +00:00
Sam Parker dd8cd6d26b [DAGCombine] Fix ReduceLoadWidth for shifted offsets
ReduceLoadWidth can trigger using a shifted mask is used and this
requires that the function return a shl node to correct for the
offset. However, the way that this was implemented meant that the
returned result could be an existing node, which would be incorrect.
This fixes the method of inserting the new node and replacing uses.

Differential Revision: https://reviews.llvm.org/D50432

llvm-svn: 351310
2019-01-16 08:40:12 +00:00
Martin Storsjo 58bb0e47dc [llvm-rc] Support '--' for delimiting options from input paths
This allows avoiding conflicts between paths that begin with the same
chars as some llvm-rc options (which can be used with either slashes
or dashes).

Differential Revision: https://reviews.llvm.org/D56743

llvm-svn: 351305
2019-01-16 08:09:22 +00:00
Dmitry Venikov d3f21d3a08 [llvm-symbolizer] Add -C as a short alias to -demangle
Summary: Provides -C as alias to -demangle. Motivation: https://bugs.llvm.org/show_bug.cgi?id=40069.

Reviewers: jhenderson, ruiu, rnk, fjricci

Reviewed By: jhenderson, ruiu

Subscribers: rupprecht, erik.pilkington, llvm-commits

Differential Revision: https://reviews.llvm.org/D56591

llvm-svn: 351300
2019-01-16 07:05:58 +00:00
Tom Stellard 3d36e5c3e6 Only promote args when function attributes are compatible
Summary:
Check to make sure that the caller and the callee have compatible
function arguments before promoting arguments.  This uses the same
TargetTransformInfo queries that are used to determine if attributes
are compatible for inlining.

The goal here is to avoid breaking ABI when a called function's ABI
depends on a target feature that is not enabled in the caller.

This is a very conservative fix for PR37358.  Ideally we would have a more
sophisticated check for ABI compatiblity rather than checking if the
attributes are compatible for inlining.

Reviewers: echristo, chandlerc, eli.friedman, craig.topper

Reviewed By: echristo, chandlerc

Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits

Differential Revision: https://reviews.llvm.org/D53554

llvm-svn: 351296
2019-01-16 05:15:31 +00:00
Serguei Katkov a5b0e5585b [InstCombine]Avoid introduction of unaligned mem access
InstCombine is able to transform mem transfer instrinsic to alone store or store/load pair.
It might result in generation of unaligned atomic load/store which later in backend
will be transformed to libcall. It is not an evident gain and it is better to keep intrinsic as is
and handle it at backend.

Reviewers: reames, anna, apilipenko, mkazantsev
Reviewed By: reames
Subscribers: t.p.northover, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D56582

llvm-svn: 351295
2019-01-16 04:36:26 +00:00
Sam Clegg 56c587adfd [WebAssembly] Store section alignment as a power of 2
This change bumps for version number of the wasm object file
metadata.

See https://github.com/WebAssembly/tool-conventions/pull/92

Differential Revision: https://reviews.llvm.org/D56758

llvm-svn: 351285
2019-01-16 01:34:48 +00:00
Aditya Nandakumar 500e3ead9f [GISel]: Add support for CSEing continuously during GISel passes.
https://reviews.llvm.org/D52803

This patch adds support to continuously CSE instructions during
each of the GISel passes. It consists of a GISelCSEInfo analysis pass
that can be used by the CSEMIRBuilder.

llvm-svn: 351283
2019-01-16 00:40:37 +00:00
Mandeep Singh Grang 436735c3fe [EH] Rename llvm.x86.seh.recoverfp intrinsic to llvm.eh.recoverfp
Summary:
Make recoverfp intrinsic target-independent so that it can be implemented for AArch64, etc.
Refer D53541 for the context. Clang counterpart D56748.

Reviewers: rnk, efriedma

Reviewed By: rnk, efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56747

llvm-svn: 351281
2019-01-16 00:37:13 +00:00
Craig Topper 34ac509ac8 [X86] Add avx512 scatter intrinsics that use a vXi1 mask instead of a scalar integer.
We're trying to have the vXi1 types in IR as much as possible. This prevents the need for bitcasts when the producer of the mask was already a vXi1 value like an icmp. The bitcasts can be subject to code motion and interfere with basic block at a time isel in bad ways.

llvm-svn: 351275
2019-01-15 23:36:25 +00:00
Changpeng Fang 20fe3d2f35 AMDGPU: Raise the priority of MAD24 in instruction selection.
Summary:
  We have seen performance regression when v_add3 is generated. The major reason is that the v_mad pattern
is broken when v_add3 is generated. We also see the register pressure increased. While we could not properly
estimate register pressure during instruction selection, we can give mad a higher priority.

In this work, we raise the priority for mad24 in selection and resolve the performance regression.

Reviewers:
  rampitec

Differential Revision:
  https://reviews.llvm.org/D56745

llvm-svn: 351273
2019-01-15 23:12:36 +00:00
Jordan Rupprecht 20a817ea2a [libObject] Tweak expected error output from llvm-ar
llvm-svn: 351259
2019-01-15 22:03:08 +00:00
Jordan Rupprecht 904ce9847d [llvm-ar] Resubmit recursive thin archive test with fix for full path names and better error messages
llvm-svn: 351256
2019-01-15 21:52:31 +00:00
Roman Lebedev fb4eed381d X86DAGToDAGISel::matchBitExtract() with truncation (PR36419)
Summary:
Previously in D54095 i have added support for extraction of `lshr` from `X` if we are to produce `BEXTR`.
That was good, but the fix was partial, there was still [[ https://bugs.llvm.org/show_bug.cgi?id=36419 | PR36419 ]].

That pattern can also appear, roughly, when you have a large (64-bit) storage, and the consume bits from it.
It will not be unexpected if you will be doing further computations in 32-bit width.
And then the current code breaks, as the tests show.

The basic idea/pattern here is following:
1. We have `i64` input
2. We perform `i64` right-shift on it.
3. We `trunc`ate that shifted value
4. We do all further work (masking) in `i32`

Since we see `trunc`ation and not `lshr`, we give up, and stop trying to extract that right-shift.
BUT. The mask is `i32`, therefore we can extend both of the operands of the masking (`and`) to `i64`
and truncate the result after masking: https://rise4fun.com/Alive/K4B
```
Name: @bextr64_32_b1 -> @bextr64_32_b0
  %shiftedval = lshr i64 %val, %numskipbits
  %truncshiftedval = trunc i64 %shiftedval to i32
  %widenumlowbits1 = zext i8 %numlowbits to i32
  %notmask1 = shl nsw i32 -1, %widenumlowbits1
  %mask1 = xor i32 %notmask1, -1
  %res = and i32 %truncshiftedval, %mask1
=>
  %shiftedval = lshr i64 %val, %numskipbits
  %widenumlowbits = zext i8 %numlowbits to i64
  %notmask = shl nsw i64 -1, %widenumlowbits
  %mask = xor i64 %notmask, -1
  %wideres = and i64 %shiftedval, %mask
  %res = trunc i64 %wideres to i32
```

Thus, we are again able to extract that `lshr` into `BEXTR`'s control.

Now, the perf (via `llvm-exegesis`) of the snippet suggests that it is not a good idea:
```
$ cat /tmp/old.s
# bextr64_32_b1
# LLVM-EXEGESIS-LIVEIN RSI
# LLVM-EXEGESIS-LIVEIN EDX
# LLVM-EXEGESIS-LIVEIN RDI
movq %rsi, %rcx
shrq %cl, %rdi
shll $8, %edx
bextrl %edx, %edi, %eax
$ cat /tmp/old.s | ./bin/llvm-exegesis -mode=latency -snippets-file=-
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-1e0082.o
---
mode:            latency
key:
  instructions:
    - 'MOV64rr RCX RSI'
    - 'SHR64rCL RDI RDI'
    - 'SHL32ri EDX EDX i_0x8'
    - 'BEXTR32rr EAX EDI EDX'
  config:          ''
  register_initial_values: []
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: latency, value: 0.6638, per_snippet_value: 2.6552 }
error:           ''
info:            ''
assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3
...
$ cat /tmp/old.s | ./bin/llvm-exegesis -mode=uops -snippets-file=-
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-43e346.o
---
mode:            uops
key:
  instructions:
    - 'MOV64rr RCX RSI'
    - 'SHR64rCL RDI RDI'
    - 'SHL32ri EDX EDX i_0x8'
    - 'BEXTR32rr EAX EDI EDX'
  config:          ''
  register_initial_values: []
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: PdFPU0, value: 0, per_snippet_value: 0 }
  - { key: PdFPU1, value: 0, per_snippet_value: 0 }
  - { key: PdFPU2, value: 0, per_snippet_value: 0 }
  - { key: PdFPU3, value: 0, per_snippet_value: 0 }
  - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 }
error:           ''
info:            ''
assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3
...
```
vs
```
$ cat /tmp/new.s
# bextr64_32_b1
# LLVM-EXEGESIS-LIVEIN RDX
# LLVM-EXEGESIS-LIVEIN SIL
# LLVM-EXEGESIS-LIVEIN RDI
shlq $8, %rdx
movzbl %sil, %eax
orq %rdx, %rax
bextrq %rax, %rdi, %rax
$ cat /tmp/new.s | ./bin/llvm-exegesis -mode=latency -snippets-file=-
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-8944f1.o
---
mode:            latency
key:
  instructions:
    - 'SHL64ri RDX RDX i_0x8'
    - 'MOVZX32rr8 EAX SIL'
    - 'OR64rr RAX RAX RDX'
    - 'BEXTR64rr RAX RDI RAX'
  config:          ''
  register_initial_values: []
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: latency, value: 0.7454, per_snippet_value: 2.9816 }
error:           ''
info:            ''
assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3
...
$ cat /tmp/new.s | ./bin/llvm-exegesis -mode=uops -snippets-file=-
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-da403c.o
---
mode:            uops
key:
  instructions:
    - 'SHL64ri RDX RDX i_0x8'
    - 'MOVZX32rr8 EAX SIL'
    - 'OR64rr RAX RAX RDX'
    - 'BEXTR64rr RAX RDI RAX'
  config:          ''
  register_initial_values: []
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: PdFPU0, value: 0, per_snippet_value: 0 }
  - { key: PdFPU1, value: 0, per_snippet_value: 0 }
  - { key: PdFPU2, value: 0, per_snippet_value: 0 }
  - { key: PdFPU3, value: 0, per_snippet_value: 0 }
  - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 }
error:           ''
info:            ''
assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3
...
```
^ latency increased (worse).

Except //maybe// not really.
Like with all synthetic benchmarks, they //may// be misleading.

Let's take a look on some actual real-world hotpath.
In this case it's 'my' [[ https://github.com/darktable-org/rawspeed | RawSpeed ]]'s `BitStream<>::peekBitsNoFill()`, in [[ e3316dc851/src/librawspeed/decompressors/VC5Decompressor.cpp (L814) | GoPro VC5 decompressor ]]:
```
raw.pixls.us-unique/GoPro/HERO6 Black$ /usr/src/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-clangs1-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR
RUNNING: /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmplwbKEM
2018-12-22 21:23:03
Running /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench
Run on (8 X 4012.81 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 3.41, 2.41, 2.03
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                        Time           CPU Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GOPR9172.GPR/threads:8/real_time_mean           40 ms         40 ms        128   0.322244          7.96974        12M       37.4457M        298.534M      3.12047       24.8778   0.040465
GOPR9172.GPR/threads:8/real_time_median         39 ms         39 ms        128   0.312606          7.99155        12M        38.387M        306.788M      3.19891       25.5656   0.039115
GOPR9172.GPR/threads:8/real_time_stddev          4 ms          3 ms        128  0.0271557         0.130575          0        2.4941M        21.3909M     0.207842       1.78257   3.81081m
RUNNING: /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpWAkan9
2018-12-22 21:23:08
Running /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench
Run on (8 X 4013.1 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 3.78, 2.50, 2.06
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                        Time           CPU Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GOPR9172.GPR/threads:8/real_time_mean           39 ms         39 ms        128   0.311533          7.97323        12M       38.6828M        308.471M      3.22356        25.706  0.0390928
GOPR9172.GPR/threads:8/real_time_median         38 ms         38 ms        128   0.304231          7.99005        12M       39.4437M        315.527M      3.28698        26.294  0.0380316
GOPR9172.GPR/threads:8/real_time_stddev          3 ms          3 ms        128  0.0229149         0.133814          0       2.26225M        19.1421M     0.188521       1.59517   3.13671m
Comparing /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench
Benchmark                                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------------
GOPR9172.GPR/threads:8/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 128 vs 128
GOPR9172.GPR/threads:8/real_time_mean                  -0.0339         -0.0316            40            39            40            39
GOPR9172.GPR/threads:8/real_time_median                -0.0277         -0.0274            39            38            39            38
GOPR9172.GPR/threads:8/real_time_stddev                -0.1769         -0.1267             4             3             3             3
```
I.e. this results in //roughly// -3% improvements in perf.

While this will help [[ https://bugs.llvm.org/show_bug.cgi?id=36419 | PR36419 ]], it won't address it fully.

Reviewers: RKSimon, craig.topper, andreadb, spatel

Reviewed By: craig.topper

Subscribers: courbet, llvm-commits

Differential Revision: https://reviews.llvm.org/D56052

llvm-svn: 351253
2019-01-15 21:31:18 +00:00
David Callahan d129d3e93f treat invoke like call
Summary:
InvokeInst should be treated like CallInst and
assigned a separate discriminator. This is particularly
import when an Invoke is converted to a Call
during compilation and so can invalidate sample profile
data collected wtih different link time optimizations

Reviewers: twoh, Kader, danielcdh, wmi

Reviewed By: wmi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56491

llvm-svn: 351251
2019-01-15 21:26:51 +00:00
Matt Morehouse 19ff35c481 [SanitizerCoverage] Don't create comdat for interposable functions.
Summary:
Comdat groups override weak symbol behavior, allowing the linker to keep
the comdats for weak symbols in favor of comdats for strong symbols.

Fixes the issue described in:
https://bugs.chromium.org/p/chromium/issues/detail?id=918662

Reviewers: eugenis, pcc, rnk

Reviewed By: pcc, rnk

Subscribers: smeenai, rnk, bd1976llvm, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56516

llvm-svn: 351247
2019-01-15 21:21:01 +00:00
Alexey Bataev 9514b1c6b4 [SLP] Added test for PR40310, NFC.
llvm-svn: 351240
2019-01-15 20:54:44 +00:00
Michael Trent 7e6602110b llvm-objdump -m -D should disassemble all text segments
Summary:
When running llvm-objdump with the -macho option objdump will by default
disassemble only the __TEXT,__text section (or __TEXT_EXEC,__text when
disassembling MH_KEXT_BUNDLE files). The -disassemble-all option is
treated no diferently than -disassemble.

This change upates llvm-objdump's MachO parsing code to disassemble all
__text sections found in a file when -disassemble-all is specified. This
is useful for disassembling files with more than one __text section, or
when disassembling files whose __text section is not present in __TEXT.

I added a lit test case that verifies "llvm-objdump -m -d" and 
"llvm-objdump -m -D" produce the expected results on a reference binary. 
I also updated the CommandGuide documentation for llvm-objdump.rst and
verified it renders correctly as man and html.

rdar://42899338

Reviewers: ab, pete, lhames

Reviewed By: lhames

Subscribers: rupprecht, llvm-commits

Differential Revision: https://reviews.llvm.org/D56649

llvm-svn: 351238
2019-01-15 20:41:30 +00:00
Craig Topper 82015b633b [X86] Add versions of the avx512 gather intrinsics that take the mask as a vXi1 vector instead of a scalar
In keeping with our general direction of having the vXi1 type present in IR, this patch converts the mask argument for avx512 gather to vXi1. This can avoid k-register to GPR to k-register transitions late in codegen.

I left the existing intrinsics behind because they have many out of tree users such as ISPC. They generate their own code and don't go through the autoupgrade path which only works for bitcode and ll parsing. Ideally we will get them to migrate to target independent intrinsics, but it might be easier for them to migrate to these new intrinsics.

I'll work on scatter and gatherpf/scatterpf next.

Differential Revision: https://reviews.llvm.org/D56527

llvm-svn: 351234
2019-01-15 20:12:33 +00:00
Anton Korobeynikov c9e9e28487 [MSP430] Recognize '{' as a line separator
msp430-as supports multiple assembly statements on the same line
separated by a '{' character.

llvm-svn: 351233
2019-01-15 20:10:46 +00:00
Craig Topper 99fcbf67d0 [Nios2] Remove Nios2 backend
As mentioned here http://lists.llvm.org/pipermail/llvm-dev/2019-January/129121.html This backend is incomplete and has not been maintained in several months.

Differential Revision: https://reviews.llvm.org/D56691

llvm-svn: 351231
2019-01-15 19:59:19 +00:00
Nikita Popov d3b86b79fa Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Reapplying with updated SLPVectorizer tests.

Differential Revision: https://reviews.llvm.org/D56636

llvm-svn: 351219
2019-01-15 18:43:41 +00:00
Yury Delendik be24c02003 [WebAssembly] Fix updating/moving DBG_VALUEs in RegStackify
Summary:
As described in PR40209, there can be issues in DBG_VALUEs handling when multiple defs present in a BB. This patch
adds logic for detection of related to def DBG_VALUEs and localizes register update and movement to found DBG_VALUEs.

Reviewers: aheejin

Subscribers: mgorny, dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56401

llvm-svn: 351216
2019-01-15 18:14:12 +00:00
Jordan Rupprecht 58aac95081 [llvm-readelf] Allow single-letter flags to be merged.
Summary:
This patch adds support for merged arguments (e.g. -SW == -S -W) for llvm-readelf.

No changes are intended for llvm-readobj. There are a few short flags (-sd, -sr, -st, -dt) that would conflict with grouped single letter flags, and having only some grouped flags might be confusing. So, allow merged flags for readelf compatibility, but force separate args for llvm-readobj. From what I can tell, these two-letter flags are only used with llvm-readobj, not llvm-readelf.

This fixes PR40064.

Reviewers: jhenderson, kristina, echristo, phosek

Reviewed By: jhenderson

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56629

llvm-svn: 351205
2019-01-15 17:04:40 +00:00
Jordan Rupprecht 17dd4a2c5e [llvm-objcopy] Use SHT_NOTE for added note sections.
Summary:
Fix llvm-objcopy to add .note sections as SHT_NOTEs. GNU objcopy overrides section flags for special sections. For `.note` sections (with the exception of `.note.GNU-stack`), SHT_NOTE is used.

Many other sections are special cased by libbfd, but `.note` is the only special section I can seem to find being used with objcopy --add-section.

See `.note` in context of the full list of special sections here: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/elf.c;h=eb3e1828e9c651678b95a1dcbc3b124783d1d2be;hb=HEAD#l2675

Reviewers: jhenderson, alexshap, jakehehrlich, espindola

Reviewed By: jhenderson

Subscribers: emaste, arichardson, llvm-commits

Differential Revision: https://reviews.llvm.org/D56570

llvm-svn: 351204
2019-01-15 16:57:23 +00:00
Simon Pilgrim b8f08c8d7b [X86] Bailout of lowerVectorShuffleAsPermuteAndUnpack for shuffle-with-zero (PR40306)
If we're shuffling with a zero vector, then we are better off not doing VECTOR_SHUFFLE(UNPCK()) as we lose track of those zero elements.

We were already doing this for SSSE3 targets as we have PSHUFB, but its worth doing for all targets.

llvm-svn: 351203
2019-01-15 16:56:55 +00:00
Simon Pilgrim fa7a8c7ddc [X86] Add PR40318 shuffle test case
The other test case is already covered by the PR40306 test case, which was mainly concerned with SSSE3 codegen.

llvm-svn: 351201
2019-01-15 16:31:10 +00:00
James Y Knight 693d39dd12 Remove irrelevant references to legacy git repositories from
compiler identification lines in test-cases.

(Doing so only because it's then easier to search for references which
are actually important and need fixing.)

llvm-svn: 351200
2019-01-15 16:18:52 +00:00
Simon Pilgrim 4e38b8f8bd [SLP][X86] Split prefer-256-bit 'AVX256BW' tests from AVX2 checks
Fixes SLP test issue with D56636

llvm-svn: 351199
2019-01-15 16:13:37 +00:00
Sanjay Patel fad5bdaf95 [DAGCombiner] reduce buildvec of zexted extracted element to shuffle
The motivating case for this is shown in the first regression test. We are 
transferring to scalar and back rather than just zero-extending with 'vpmovzxdq'.

That's a special-case for a more general pattern as shown here. In all tests, 
we're avoiding the vector-scalar-vector moves in favor of vector ops.

We aren't producing optimal shuffle code in some cases though, so the patch is
limited to reduce regressions.

Differential Revision: https://reviews.llvm.org/D56281

llvm-svn: 351198
2019-01-15 16:11:05 +00:00
Florian Hahn 4094f34f78 [InstCombine] Don't undo 0 - (X * Y) canonicalization when combining subs.
Otherwise instcombine gets stuck in a cycle. The canonicalization was
added in D55961.

This patch fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=12400

llvm-svn: 351187
2019-01-15 11:18:21 +00:00
Max Kazantsev 80242ee87e [NFC] Remove obsolete enum RangeCheckKind
llvm-svn: 351183
2019-01-15 10:48:45 +00:00
Roman Lebedev 58000f804a [NFC][X86] extract-bits.ll: add test with truncation with extra-use.
That extra-use *should* prevent D56052 from looking past the trunc.

llvm-svn: 351182
2019-01-15 10:36:20 +00:00
Martin Storsjo f51f5ea6d5 [llvm-objcopy] [COFF] Implement --strip-all[-gnu] for symbols
Differential Revision: https://reviews.llvm.org/D56481

llvm-svn: 351174
2019-01-15 09:34:55 +00:00
Martin Storsjo e30487ca26 [llvm-objcopy] [COFF] Remove pointless comment chars from .test files. NFC.
llvm-svn: 351173
2019-01-15 09:34:45 +00:00
Craig Topper 2581249f05 [X86] Upgrade some avx512bw shift intrinsics that were removed a while ago. NFC
Masking was removed from these intrinsics and I guess we didn't update the tests then.

llvm-svn: 351165
2019-01-15 07:15:20 +00:00
Craig Topper 3c5423b26e [X86] Add test cases for D56695. NFC
llvm-svn: 351162
2019-01-15 06:39:51 +00:00
Craig Topper a3cfdcc21f [X86] Switch the triple on avx2-intrinsics-x86.ll to be -unknown-unknown instead of darwin so the constant pool entries will be filtered better by the script.
Darwin uses LCPI instead of .LCPI so the filter doesn't work.

This is silly, but it will help reduce some future some test diffs.

llvm-svn: 351161
2019-01-15 06:39:49 +00:00
Thomas Lively 6bf2b40051 [WebAssembly] Expand SIMD shifts while V8's implementation disagrees
Summary:
V8 currently implements SIMD shifts as taking an immediate operation,
which disagrees with the spec proposal and the toolchain
implementation. As a stopgap measure to get things working, unroll all
vector shifts. Since this is a temporary measure, there are no tests.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D56520

llvm-svn: 351151
2019-01-15 02:16:03 +00:00
Marek Olsak 33eb4d947d AMDGPU: Add a fast path for icmp.i1(src, false, NE)
Summary:
This allows moving the condition from the intrinsic to the standard ICmp
opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic
is an identity for retrieving the SGPR mask.

And we can also get the mask from and i1, or i1, xor i1.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52060

llvm-svn: 351150
2019-01-15 02:13:18 +00:00
Evandro Menezes f793fe1402 [AArch64] Adjust the feature set for Exynos
Enable the fusion of arithmetic and logic instructions for Exynos M4.

llvm-svn: 351149
2019-01-15 01:53:49 +00:00
Reid Kleckner fe5e5dcab0 [X86] Avoid clobbering ESP/RSP in the epilogue.
Summary:
In r345197 ESP and RSP were added to GR32_TC/GR64_TC, allowing them to
be used for tail calls, but this also caused `findDeadCallerSavedReg` to
think they were acceptable targets for clobbering. Filter them out.

Fixes PR40289.

Patch by Geoffry Song!

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D56617

llvm-svn: 351146
2019-01-15 01:24:18 +00:00
Evandro Menezes 111033d917 [AArch64] Fix typo (NFC)
Fix another typo, this time in the `RUN` line, which used a syntax not
universally supported, in test case added by D56572.

llvm-svn: 351144
2019-01-15 00:58:59 +00:00
Evandro Menezes 4b6c290fbd [AArch64] Fix typo (NFC)
Fix typo in test case added by D56572 (rL351139).

llvm-svn: 351143
2019-01-15 00:20:57 +00:00
Eli Friedman 295e346da4 [EarlyIfConversion] Don't if-convert unconditional branches.
A block ending in an unconditional branch can have two successors if one
is a landing pad.  In practice, I think this only has an effect on
Windows because landing pads are never empty for Itanium unwinding.

(Alternatively, I could add a check to
AArch64InstrInfo::canInsertSelect, but this seems more obvious.)

Differential Revision: https://reviews.llvm.org/D56468

llvm-svn: 351142
2019-01-15 00:19:46 +00:00
Eli Friedman 33aecc8182 [AArch64] Explicitly use v1i64 type for llvm.aarch64.neon.abs.i64 .
Otherwise, with D56544, the intrinsic will be expanded to an integer
csel, which is probably not what the user expected.  This matches the
general convention of using "v1" types to represent scalar integer
operations in vector registers.

While I'm here, also add some error checking so we don't generate
illegal ABS nodes.

Differential Revision: https://reviews.llvm.org/D56616

llvm-svn: 351141
2019-01-15 00:15:24 +00:00
Evandro Menezes bf59cb02c3 [AArch64] Add new target feature to fuse arithmetic and logic operations
This feature enables the fusion of some arithmetic and logic instructions
together.

Differential revision: https://reviews.llvm.org/D56572

llvm-svn: 351139
2019-01-14 23:54:36 +00:00
James Y Knight 786558882c Update GettingStarted guide to recommend that people use the new
official Git repository.

Remove the directions for using git-svn, and demote the prominence of
the svn instructions.

Also, fix a few other issues while I'm in there:

* Mention LLVM_ENABLE_PROJECTS more.
* Getting started doesn't need to mention test-suite, but should
  mention clang and the other projects.
* Remove mentions of "configure", since that's long gone.

I've also adjusted a few other mentions of svn to point to github, but
have not done so comprehensively.

Differential Revision: https://reviews.llvm.org/D56654

llvm-svn: 351130
2019-01-14 22:27:32 +00:00
Nikita Popov 5885eec35a Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
This reverts commit r351125.

I missed test changes in an SLPVectorizer test, due to the cost model
changes. Reverting for now.

llvm-svn: 351129
2019-01-14 22:18:39 +00:00
Thomas Lively 2a47e03ee4 [WebAssembly][FastISel] Do not assume naive CmpInst lowering
Summary:
Fixes https://bugs.llvm.org/show_bug.cgi?id=40172. See
test/CodeGen/WebAssembly/PR40172.ll for an explanation.

Reviewers: dschuff, aheejin

Subscribers: nikic, llvm-commits, sunfish, jgravelle-google, sbc100

Differential Revision: https://reviews.llvm.org/D56457

llvm-svn: 351127
2019-01-14 22:03:43 +00:00
Jordan Rupprecht f3857f33c3 [llvm-ar] Temporarily remove failing test which is breaking buildbots
llvm-svn: 351126
2019-01-14 21:58:15 +00:00
Nikita Popov 8e9a8432a8 [CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Differential Revision: https://reviews.llvm.org/D56636

llvm-svn: 351125
2019-01-14 21:43:30 +00:00
Jordan Rupprecht 93bfb99ffc [llvm-ar] Flatten thin archives.
Summary:
Normal behavior for GNU ar is to flatten thin archives when adding them to another thin archive, i.e. add the members directly instead of nesting the archive.

Some refactoring done as part of this patch to ease things:
 - Consolidate `addMember`/`addLibMember` methods
 - Rename `addMember` to `addChildMember` to make it more visibly different at the call site that an archive child is passed instead of a regular member
 - Pass in a separate vector and splice it back into position instead of passing a vector + optional Pos (which makes expanding libs tricky)

This fixes PR37530 as raised by https://github.com/ClangBuiltLinux/linux/issues/279.

Reviewers: mstorsjo, pcc, ruiu

Reviewed By: mstorsjo

Subscribers: llvm-commits, tpimh, nickdesaulniers

Differential Revision: https://reviews.llvm.org/D56508

llvm-svn: 351120
2019-01-14 21:11:46 +00:00
Jonathan Metzman e159a0dd1a [SanitizerCoverage][NFC] Use appendToUsed instead of include
Summary:
Use appendToUsed instead of include to ensure that
SanitizerCoverage's constructors are not stripped.

Also, use isOSBinFormatCOFF() to determine if target
binary format is COFF.

Reviewers: pcc

Reviewed By: pcc

Subscribers: hiraditya

Differential Revision: https://reviews.llvm.org/D56369

llvm-svn: 351118
2019-01-14 21:02:02 +00:00
Simon Pilgrim bfe2ee453a [X86][SSSE3] Bailout of lowerVectorShuffleAsPermuteAndUnpack for shuffle-with-zero (PR40306)
If we have PSHUFB and we're shuffling with a zero vector, then we are better off not doing VECTOR_SHUFFLE(UNPCK()) as we lose track of those zero elements.

llvm-svn: 351103
2019-01-14 19:07:26 +00:00
Martin Storsjo 4b0694b712 [llvm-objcopy] [COFF] Remove unreferenced undefined externals with --strip-unneeded.
Differential Revision: https://reviews.llvm.org/D56660

llvm-svn: 351099
2019-01-14 18:56:47 +00:00
Martin Storsjo 4707fb65d4 [llvm-objcopy] [COFF] Test absolute symbols wrt --strip-unneeded and --discard-all. NFC.
Differential Revision: https://reviews.llvm.org/D56659

llvm-svn: 351098
2019-01-14 18:56:27 +00:00
Nirav Dave a0946dc740 [MC][X86] Add test case for invalid use of "(%dx)" operand.
llvm-svn: 351094
2019-01-14 18:44:32 +00:00
Sanjay Patel b23ff7a0e2 [x86] lower extracted add/sub to horizontal vector math
add (extractelt (X, 0), extractelt (X, 1)) --> extractelt (hadd X, X), 0

This is the integer sibling to D56011.

There's an additional restriction to only to do this transform in the
case where we don't have extra extracts from the source vector. Without
that, we can fail to match larger horizontal patterns that are more 
beneficial than this minimal case. An improvement to the more general 
h-op lowering may allow us to remove the restriction here in a follow-up.

llvm-svn: 351093
2019-01-14 18:44:02 +00:00
Dan Gohman 8da641455c [WebAssembly] Remove tests for old intrinsics.
This is a followup to r351084 which removes the tests for the old
intrinsic names.

llvm-svn: 351086
2019-01-14 18:25:29 +00:00
Simon Pilgrim 6c35c8e2e8 [X86] Add PR40306 shuffle test case
llvm-svn: 351078
2019-01-14 17:49:11 +00:00
Simon Pilgrim a1bd4a6ba4 [DAGCombiner] Add (sub_sat x, x) -> 0 combine
llvm-svn: 351073
2019-01-14 15:43:34 +00:00
Simon Pilgrim fa1f518748 [DAGCombiner] Enable sub saturation constant folding
llvm-svn: 351072
2019-01-14 15:28:53 +00:00
Simon Pilgrim 8c2e9e1fef [X86] Add sub saturation constant folding and self tests.
llvm-svn: 351071
2019-01-14 15:08:51 +00:00
Simon Pilgrim 7fc6882374 [DAGCombiner] Add add/sub saturation undef handling
Match ConstantFolding.cpp:
(add_sat x, undef) -> -1
(sub_sat x, undef) -> 0

llvm-svn: 351070
2019-01-14 14:16:24 +00:00
Petar Avramovic b469877749 [MIPS GlobalISel] Fix release build make-check after r351046
Add 'REQUIRES: asserts' to test that uses debug output in
order to fix r351046 for buildbots that use release build.

llvm-svn: 351068
2019-01-14 14:12:43 +00:00
Simon Pilgrim a367285f64 [X86] Add add/sub saturation undef tests.
llvm-svn: 351066
2019-01-14 13:47:07 +00:00
Simon Pilgrim cfa5f06dde [DAGCombiner] Enable add saturation constant folding
llvm-svn: 351060
2019-01-14 12:34:31 +00:00
Aleksandar Beserminji 4c4c0377ca [mips] Optimize shifts for types larger than GPR size (mips2/mips3)
With this patch, shifts are lowered to optimal number of instructions
necessary to shift types larger than the general purpose register size.

This resolves PR/32293.

Thanks to Kyle Butt for reporting the issue!

Differential Revision: https://reviews.llvm.org/D56320

llvm-svn: 351059
2019-01-14 12:28:51 +00:00
Jeremy Morse f216da7ee0 [DebugInfo] Remove un-necessary logic from HoistThenElseCodeToIf
Following PR39807, the way in which SimplifyCFG hoists common code on
branch paths was fixed in r347782. However this left extra code hanging
around HoistThenElseCodeToIf that wasn't necessary and needlessly
complicated matters -- we no longer need to look up through the 'if'
basic block to find a location for hoisted 'select' insts, we can instead
use the location chosen by applyMergedLocation.

This patch deletes that extra logic, and updates a regression test to
reflect the new logic (selects get the merged location, not a previous
insts location).

Differential Revision: https://reviews.llvm.org/D55272

llvm-svn: 351058
2019-01-14 12:13:12 +00:00
Simon Pilgrim 67610926fc [DAGCombiner] Add add saturation constant folding tests.
Exposes an issue with sadd_sat for computeOverflowKind, so I've disabled it for now.

llvm-svn: 351057
2019-01-14 12:12:42 +00:00
Diana Picus 8987d00653 [ARM GlobalISel] Import MOVi32imm into GlobalISel
Make it possible for TableGen to produce code for selecting MOVi32imm.
This allows reasonably recent ARM targets to select a lot more constants
than before.

We achieve this by adding GISelPredicateCode to arm_i32imm. It's
impossible to use the exact same code for both DAGISel and GlobalISel,
since one uses "Subtarget->" and the other "STI." to refer to the
subtarget. Moreover, in GlobalISel we don't have ready access to the
MachineFunction, so we need to add a bit of code for obtaining it from
the instruction that we're selecting. This is also the reason why it
needs to remain a PatLeaf instead of the more specific IntImmLeaf.

llvm-svn: 351056
2019-01-14 12:04:08 +00:00
David Stuttard f77079f892 [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

This re-submit of the change also includes a slight modification in
SIISelLowering.cpp to work-around a compiler bug for the powerpc_le
platform that caused a buildbot failure on a previous submission.

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda


Work around for ppcle compiler bug

Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b
llvm-svn: 351054
2019-01-14 11:55:24 +00:00
Francis Visoiu Mistrih b7cef81fd3 Replace "no-frame-pointer-*" function attributes with "frame-pointer"
Part of the effort to refactoring frame pointer code generation. We used
to use two function attributes "no-frame-pointer-elim" and
"no-frame-pointer-elim-non-leaf" to represent three kinds of frame
pointer usage: (all) frames use frame pointer, (non-leaf) frames use
frame pointer, (none) frame use frame pointer. This CL makes the idea
explicit by using only one enum function attribute "frame-pointer"

Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as
llc.

"no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still
supported for easy migration to "frame-pointer".

tests are mostly updated with

// replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’
grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g"

// replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’
grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g"

Patch by Yuanfang Chen (tabloid.adroit)!

Differential Revision: https://reviews.llvm.org/D56351

llvm-svn: 351049
2019-01-14 10:55:55 +00:00
Petar Avramovic 7d370a36bb [MIPS GlobalISel] Add pre legalizer combiner pass
Introduce GlobalISel pre legalizer pass for MIPS.
It will be used to cope with instructions that require
combining before legalization.

Differential Revision: https://reviews.llvm.org/D56269

llvm-svn: 351046
2019-01-14 10:27:05 +00:00
Dmitry Venikov 5c1768fc57 [llvm-symbolizer] Add -addresses, -a as aliases for -print-address
Summary: Provides -addresses, -a as aliases for -print-address. Motivation: https://bugs.llvm.org/show_bug.cgi?id=40067.

Reviewers: jhenderson, ruiu, rnk, fjricci

Reviewed By: jhenderson

Subscribers: rupprecht, llvm-commits

Differential Revision: https://reviews.llvm.org/D56635

llvm-svn: 351043
2019-01-14 10:10:51 +00:00
Thomas Preud'homme ca28a4651e Fix defines.txt
Support arbitrary suffix when matching FileCheck executable name in
defines.txt to successfully match FileCheck.EXE on Microsoft Windows.

llvm-svn: 351042
2019-01-14 10:10:48 +00:00
Thomas Preud'homme 84f4ff5119 Detect incorrect FileCheck variable CLI definition
Summary:
While the backend code of FileCheck relies on definition of variable
from the command-line to have an equal sign '=' and a variable name
before that, the frontend does not actually enforce it. This leads to
FileCheck crashing when invoked with invalid syntax for the -D option.

This patch adds the missing validation in the frontend. It also makes
the -D option an AlwaysPrefix option to be able to detect -D=FOO as
being a define without variable and -D as missing its value.

Copyright:
- Linaro (changes in version 2 of revision D55940)
- GraphCore (changes in later versions)

Reviewers: jdenny

Subscribers: JonChesterfield, hiraditya, kristina, probinson,
llvm-commits

Differential Revision: https://reviews.llvm.org/D55940

llvm-svn: 351039
2019-01-14 09:29:10 +00:00
Craig Topper e7b4ea4726 [X86] Remove mask parameter from avx512 pmultishiftqb intrinsics. Use select in IR instead.
Fixes PR40259

llvm-svn: 351035
2019-01-14 08:46:45 +00:00
Craig Topper 6149363515 [X86] Add new test file that was supposed to go with r351028.
llvm-svn: 351034
2019-01-14 08:46:42 +00:00
Craig Topper 3f3b8ef442 [X86] Remove mask parameter from vpshufbitqmb intrinsics. Change result to a vXi1 vector.
The input mask can be represented with an AND in IR.

Fixes PR40258

llvm-svn: 351028
2019-01-14 00:03:50 +00:00
Simon Pilgrim 56ba1db933 [DAGCombiner] If add_sat(x,y) can't overflow -> add(x,y)
NOTE: We need more powerful signed overflow detection in computeOverflowKind
llvm-svn: 351026
2019-01-13 22:08:26 +00:00
Simon Pilgrim 897d4c6fe9 [DAGCombiner] Some very basic add/sub saturation combines.
Handle combines with zero and constant canonicalization for adds.

llvm-svn: 351024
2019-01-13 21:50:24 +00:00
Simon Pilgrim 9961c55e28 [X86] Add some basic add/sub saturation combine tests.
The actual combines will be added in a future commit.

llvm-svn: 351023
2019-01-13 21:21:46 +00:00
Simon Pilgrim a0069ba0db [X86] More aggressive shuffle mask widening in combineExtractWithShuffle
Use demanded extract index to set most of the shuffle mask to undef, making it easier to widen and peek through.

llvm-svn: 351013
2019-01-12 16:38:56 +00:00
Sanjay Patel 7d65fe5cd5 [LoopVectorizer] give more advice in remark about failure to vectorize call
Something like this is requested by:
https://bugs.llvm.org/show_bug.cgi?id=40265
...and it seems like a common enough case that we should acknowledge it.

Differential Revision: https://reviews.llvm.org/D56551

llvm-svn: 351010
2019-01-12 15:27:15 +00:00
Sanjay Patel 625d5aef62 [DAGCombiner] fold insert_subvector of insert_subvector
This pattern:

    t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0>
  t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0>

...shows up in PR33758:
https://bugs.llvm.org/show_bug.cgi?id=33758
...although this patch doesn't make any difference to the final result on that yet.

In the affected tests here, it looks like it just makes RA wiggle. But we might 
as well squash this to prevent it interfering with other pattern-matching.

Differential Revision:
https://reviews.llvm.org/D56604

llvm-svn: 351008
2019-01-12 15:12:28 +00:00
George Rimar 9b6fe7e3a2 [llvm-objdump] - Change the output for --all-headers.
This is for https://bugs.llvm.org/show_bug.cgi?id=40008,

it starts printing the file headers when --all-headers is given and
do a minor cosmetic change.

Differential revision: https://reviews.llvm.org/D56588

llvm-svn: 351006
2019-01-12 12:17:24 +00:00
Nikita Popov 537b319860 [X86] Add more usub.sat vector tests; NFC
Add additional vXi32 and vXi64 tests.

llvm-svn: 351003
2019-01-12 11:43:04 +00:00
Simon Pilgrim a21e2bd682 [X86] Improve vXi64 ISD::ABS codegen with SSE41+
Make use of vblendvpd to select on the signbit

Differential Revision: https://reviews.llvm.org/D56544

llvm-svn: 350999
2019-01-12 10:28:12 +00:00
Simon Pilgrim ca0de0363b [X86][AARCH64] Improve ISD::ABS support
This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types.

Differential Revision: https://reviews.llvm.org/D56544

llvm-svn: 350998
2019-01-12 09:59:32 +00:00
Craig Topper 33b2cf50e3 [X86] Add ISD node for masked version of CVTPS2PH.
The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do.

This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there.

Fixes another instruction for PR34877.

llvm-svn: 350994
2019-01-12 08:05:12 +00:00
Alex Bradbury 61aa940074 [RISCV] Introduce codegen patterns for RV64M-only instructions
As discussed on llvm-dev
<http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>, we have
to be careful when trying to select the *w RV64M instructions. i32 is not a
legal type for RV64 in the RISC-V backend, so operations have been promoted by
the time they reach instruction selection. Information about whether the
operation was originally a 32-bit operations has been lost, and it's easy to
write incorrect patterns.

Similarly to the variable 32-bit shifts, a DAG combine on ANY_EXTEND will
produce a SIGN_EXTEND if this is likely to result in sdiv/udiv/urem being
selected (and so save instructions to sext/zext the input operands).

Differential Revision: https://reviews.llvm.org/D53230

llvm-svn: 350993
2019-01-12 07:43:06 +00:00
Alex Bradbury d05eae7a7b [RISCV] Add patterns for RV64I SLLW/SRLW/SRAW instructions
This restores support for selecting the SLLW/SRLW/SRAW instructions, which was
removed in rL348067 as the previous patterns made some unsafe assumptions.
Also see the related llvm-dev discussion
<http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>

Ultimately I didn't introduce a custom SelectionDAG node, but instead added a
DAG combine that inserts an AssertZext i5 on the shift amount for an i32
variable-length shift and also added an ANY_EXTEND DAG-combine which will
instead produce a SIGN_EXTEND for an i32 variable-length shift, increasing the
opportunity to safely select SLLW/SRLW/SRAW.

There are obviously different ways of addressing this (a number discussed in
the llvm-dev thread), so I'd welcome further feedback and comments.

Note that there are now some cases in
test/CodeGen/RISCV/rv64i-exhaustive-w-insts.ll where sraw/srlw/sllw is
selected even though sra/srl/sll could be used without any extra instructions.
Given both are semantically equivalent, there doesn't seem a good reason to
prefer one vs the other. Given that would require more logic to still select
sra/srl/sll in those cases, I've left it preferring the *w variants.

Differential Revision: https://reviews.llvm.org/D56264

llvm-svn: 350992
2019-01-12 07:32:31 +00:00
Craig Topper bf61525e8c [X86] When lowering v1i1/v2i1/v4i1/v8i1 load/store with avx512f, but not avx512dq, use v16i1 as the intermediate mask type instead of v8i1.
We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type.

By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned.

llvm-svn: 350989
2019-01-12 02:22:10 +00:00
Craig Topper abe6ef8d09 [X86] Add ISD nodes for masked truncate so we can properly represent when the output has more elements than the input due to needing to be 128 bits.
We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask.

This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that.

This fixes some of PR34877

llvm-svn: 350985
2019-01-12 00:55:27 +00:00
Nikita Popov 9f6e9cf71b [ConstantFolding] Fold undef for integer intrinsics
This fixes https://bugs.llvm.org/show_bug.cgi?id=40110.

This implements handling of undef operands for integer intrinsics in
ConstantFolding, in particular for the bitcounting intrinsics (ctpop,
cttz, ctlz), the with.overflow intrinsics, the saturating math
intrinsics and the funnel shift intrinsics.

The undef behavior follows what InstSimplify does for the general cas
e of non-constant operands. For the bitcount intrinsics (where
InstSimplify doesn't do undef handling -- there cannot be a combination
of an undef + non-constant operand) I'm using a 0 result if the intrinsic
is defined for zero and undef otherwise.

Differential Revision: https://reviews.llvm.org/D55950

llvm-svn: 350971
2019-01-11 21:18:00 +00:00
Alexey Bataev 8da9a7538e [SLP]Moved NVPTX test under NVPTX directory, NFC.
llvm-svn: 350969
2019-01-11 20:42:48 +00:00
Alexey Bataev ce2c8b3360 [SLP]Update test checks for the SPL vectorizer, NFC.
llvm-svn: 350967
2019-01-11 20:21:14 +00:00
Nirav Dave 6b7f5aac72 [X86] Fix incomplete handling of register-assigned variables in parsing.
Teach x86 assembly operand parsing to distinguish between assembler
variable assigned to named registers and those assigned to immediate
values.

Reviewers: rnk, nickdesaulniers, void

Subscribers: hiraditya, jyknight, llvm-commits

Differential Revision: https://reviews.llvm.org/D56287

llvm-svn: 350966
2019-01-11 20:17:36 +00:00
Alex Bradbury eea0b07028 [RISCV][NFC] Add CHECK lines for atomic operations on RV64I
As or RV32I, we include these for completeness. Committing now to make it
easier to review the RV64A patch.

llvm-svn: 350962
2019-01-11 19:46:48 +00:00
Evandro Menezes 946fe976fd [llvm-mca] Update tests for Exynos (NFC)
Update test cases for Exynos M4.

llvm-svn: 350961
2019-01-11 19:36:27 +00:00
Evandro Menezes 0674762112 [AArch64] Create feature set for Exynos M4
Complete the feature set for Exynos M4 and update test cases.

llvm-svn: 350953
2019-01-11 18:54:25 +00:00
Pirama Arumuga Nainar cc07dabdaa [Legalizer] Use correct ValueType of SELECT_CC node during Float promotion
Summary:
When legalizing the result of a SELECT_CC node by promoting the
floating-point type, use the promoted-to type rather than the original
type.

Fix PR40273.

Reviewers: efriedma, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56566

llvm-svn: 350951
2019-01-11 18:46:02 +00:00
Teresa Johnson 290a839891 [LTO] Record whether LTOUnit splitting is enabled in index
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).

The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.

This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.

Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.

Reviewers: pcc

Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D53890

llvm-svn: 350948
2019-01-11 18:31:57 +00:00
Jordan Rupprecht 298ea3f577 [llvm-objcopy][NFC] Consistenly use two dashes for flags in tests.
Summary:
As pointed out in D53667, our use of hyphens in flags can be inconsistent, mixing `-` with `--`. This change makes all long style flags use `--`.

Automatically changed via:

```
find test/tools/llvm-objcopy/ELF -type f | xargs sed -i 's/ -\([a-zA-Z]\{3\}\)/ --\1/g'
```

Two false positives were manually fixed/reverted.

Reviewers: jhenderson, espindola, alexshap

Reviewed By: jhenderson

Subscribers: emaste, javed.absar, arichardson, fedor.sergeev, jakehehrlich, llvm-commits

Differential Revision: https://reviews.llvm.org/D56513

llvm-svn: 350944
2019-01-11 18:06:31 +00:00
Vedant Kumar ee10ef737e [MergeFunc] Erase unused duplicate functions if they are discardable
MergeFunc only deletes unused duplicate functions if they have local
linkage, but it should be safe to relax this to any "discardable if
unused" linkage type.

Differential Revision: https://reviews.llvm.org/D56574

llvm-svn: 350939
2019-01-11 17:56:35 +00:00
Ehsan Amiri f452f116d2 [Jump Threading] Unfold a select insn that feeds a switch via a phi node
Currently when a select has a constant value in one branch and the select feeds
a conditional branch (via a compare/ phi and compare) we unfold the select 
statement. This results in threading the conditional branch later on. Similar
opportunity exists when a select (with a constant in one branch) feeds a 
switch (via a phi node). The patch unfolds select under this condition. 
A testcase is provided.

llvm-svn: 350931
2019-01-11 15:52:57 +00:00
Sanjay Patel 40cd4b77e9 [x86] allow insert/extract when matching horizontal ops
Previously, we limited this transform to cases where the
extraction into the build vector happens from vectors of
the same type as the build vector, but that's not required.

There's a slight potential regression seen in the AVX512
result for phadd -- we're using the 256-bit flavor of the
instruction now even though the 128-bit subset is sufficient.
The same problem could already be seen in the AVX2 result.
Follow-up patches will attempt to narrow that back down.

llvm-svn: 350928
2019-01-11 14:27:59 +00:00
Martin Storsjo fb909207c6 [llvm-objcopy] [COFF] Implmement --strip-unneeded and -x/--discard-all for symbols
Differential Revision: https://reviews.llvm.org/D56480

llvm-svn: 350927
2019-01-11 14:13:04 +00:00
Martin Storsjo d1cc64fe12 [llvm-objcopy] [COFF] Fix writing object files without symbols/string table
Previously, this was broken - by setting PointerToSymbolTable to zero
but still actually writing the string table length, the object file
header was corrupted.

Differential Revision: https://reviews.llvm.org/D56584

llvm-svn: 350926
2019-01-11 13:47:37 +00:00
Dmitry Venikov 37c1e2e7a9 [llvm-symbolizer] Add -exe, -e as aliases to -obj
Summary: Provides -exe, -e as aliases to -obj. Motivation: https://bugs.llvm.org/show_bug.cgi?id=40071

Reviewers: ruiu, rnk, fjricci, jhenderson

Reviewed By: jhenderson

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56580

llvm-svn: 350925
2019-01-11 11:51:52 +00:00
Craig Topper b97885cc2e [X86] Change vXi1 extract_vector_elt lowering to be legal if the index is 0. Add DAG combine to turn scalar_to_vector+extract_vector_elt into extract_subvector.
We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions.

Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector.

This fixes some of the regressions from D350800.

llvm-svn: 350918
2019-01-11 05:44:56 +00:00
Francis Visoiu Mistrih f57a247df7 [llvm-objdump][MachO] Disable some invalid input tests
It causes some (but not all) bots to fail. I'll look into it tomorrow
morning. Remove the tests for now to make the bots green.

llvm-svn: 350908
2019-01-10 23:46:31 +00:00
Heejin Ahn e73c7a1ab2 [WebAssembly] Fix stack pointer store check in RegStackify
Summary:
We now use __stack_pointer global and global.get/global.set instruction.
This fixes the checking routine for stack_pointer writes accordingly.

This also fixes the existing __stack_pointer test in reg-stackify.ll:
That test used to pass not because of __stack_pointer clashes but
because the function `stackpointer_callee` was not marked as `readnone`,
so it was assumed to possibly write to memory arbitraily, and
`global.set` instruction was marked as `mayStore` in the .td definition,
so they were identified as intervening writes. After we added `readnone`
to its attribute, this test fails without this patch.

Reviewers: dschuff, sunfish

Subscribers: jgravelle-google, sbc100, llvm-commits

Differential Revision: https://reviews.llvm.org/D56094

llvm-svn: 350906
2019-01-10 23:12:07 +00:00
Anton Korobeynikov 0681d6bc90 [MSP430] Minor fixes/improvements for assembler/disassembler
* Teach AsmParser to recognize @rn in distination operand as 0(rn).
* Do not allow Disassembler decoding instructions that have size more
  than a number of input bytes.
* Fix UB in MSP430MCCodeEmitter.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56547

llvm-svn: 350903
2019-01-10 22:59:50 +00:00
Anton Korobeynikov 29ffb6d558 [MSP430] Add missing instruction forms
* Add missing mm, [r|m]n, [r|m]p instruction forms.
* Fix bit16mc instruction.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56546

llvm-svn: 350902
2019-01-10 22:54:53 +00:00
Thomas Lively 64a39a1c4e [WebAssembly] Add unimplemented-simd128 subtarget feature
Summary:
This is a third attempt, but this time we have vetted it on Windows
first. The previous errors were due to an uninitialized class member.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D56560

llvm-svn: 350901
2019-01-10 22:32:11 +00:00
Martin Storsjo 44aefe0bfb [llvm-objcopy] [COFF] Fix a test matching pathnames for Windows. NFC.
llvm-svn: 350899
2019-01-10 22:05:21 +00:00
Martin Storsjo 10b7296484 [llvm-objcopy] [COFF] Add support for removing symbols
Differential Revision: https://reviews.llvm.org/D55881

llvm-svn: 350893
2019-01-10 21:28:24 +00:00
Alina Sbirlea cae12edaaa Use MemorySSA in LICM to do sinking and hoisting.
Summary:
Step 2 in using MemorySSA in LICM:
Use MemorySSA in LICM to do sinking and hoisting, all under "EnableMSSALoopDependency" flag.
Promotion is disabled.

Enable flag in LICM sink/hoist tests to test correctness of this change. Moved one test which
relied on promotion, in order to test all sinking tests.

Reviewers: sanjoy, davide, gberry, george.burgess.iv

Subscribers: llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D40375

llvm-svn: 350879
2019-01-10 19:29:04 +00:00
Craig Topper 844f989608 [X86] Call SimplifyDemandedBits on conditions of X86ISD::SHRUNKBLEND
This extends to combineVSelectToShrunkBlend to be able to resimplify SHRUNKBLENDS that have already been created.

This should help some of the regressions from D56387

Differential Revision: https://reviews.llvm.org/D56421

llvm-svn: 350875
2019-01-10 19:05:34 +00:00
Francis Visoiu Mistrih 1bd054ead2 [llvm-objdump][MachO] Fix test to work on Windows
This fails in http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/3208/steps/stage%201%20check/logs/stdio.

llvm-svn: 350871
2019-01-10 18:32:30 +00:00
Francis Visoiu Mistrih b8819dc1e3 [llvm-objdump][MachO] Fix error reporting after r350848 and r350849
llvm-svn: 350851
2019-01-10 17:36:54 +00:00
Dan Liew 4d8c8fe62e [FileCheck] Don't propagate `FILECHECK_DUMP_INPUT_ON_FAILURE` and
`FILECHECK_OPTS` into environment for FileCheck tests.

Summary:

This fixes the following FileCheck tests:

* FileCheck/dump-input-enable.txt
* FileCheck/match-full-lines.txt

when `FILECHECK_DUMP_INPUT_ON_FAILURE` is set in the environment.

By default llvm-lit propagates `FILECHECK_DUMP_INPUT_ON_FAILURE` and
`FILECHECK_OPTS` from llvm-lit's environment into the test environment.
Unfortunately this can break FileCheck's tests because they expect that
these environment variables not to be set.

rdar://problem/47176262

Reviewers: jdenny, probinson, george.karpenkov

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56541

llvm-svn: 350850
2019-01-10 17:24:06 +00:00
Francis Visoiu Mistrih 1b427d4dba [llvm-objdump][MachO] Use the -dsym file name when reporting errors
Instead of using the binary filename.

llvm-svn: 350849
2019-01-10 17:16:42 +00:00
Francis Visoiu Mistrih 9f4f01182e [llvm-objdump][MachO] Correctly handle the llvm::Error when -dsym has errors
In an assert build, the Error gets destroyed and we get "Program aborted
due to an unhandled Error:".

In release, we get an empty message.

llvm-svn: 350848
2019-01-10 17:16:37 +00:00
George Rimar 8e0a70be24 [llvm-objdump] - Do not include reserved undefined symbol in -t output.
This is https://bugs.llvm.org/show_bug.cgi?id=26892,

GNU objdump hides the special symbol entry:

SYMBOL TABLE:
000000000000a7e0 l     F .text	00000000000003f9 bi_copymodules
while llvm-objdump does not:

SYMBOL TABLE:
0000000000000000         *UND*		 00000000 
000000000000a7e0 l     F .text		 000003f9 bi_copymodules

Patch makes the behavior of the llvm-objdump to be consistent with the GNU objdump.

Differential revision: https://reviews.llvm.org/D56076

llvm-svn: 350840
2019-01-10 16:24:10 +00:00
Neil Henning e85d45a699 [AMDGPU] Fix dwordx3/southern-islands failures.
This commit fixes the dwordx3/southern-islands failures that were found
in bugzilla https://bugs.llvm.org/show_bug.cgi?id=40129, by not
generating the dwordx3 variants of load/store instructions that were
added to the ISA after southern islands.

Differential Revision: https://reviews.llvm.org/D56434

llvm-svn: 350838
2019-01-10 16:21:08 +00:00
Dmitry Venikov 60d71e4684 [llvm-symbolizer] Add -p as alias to -pretty-print
Summary: Provides -p as a short alias for -pretty-print. Motivation: https://bugs.llvm.org/show_bug.cgi?id=40076

Reviewers: samsonov, khemant, ruiu, rnk, fjricci, jhenderson

Reviewed By: jhenderson

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56542

llvm-svn: 350832
2019-01-10 15:33:35 +00:00
Alex Bradbury 6f302b8a69 [RISCV][MC] Add support for evaluating constant symbols as immediates
This further improves compatibility with GNU as, allowing input such as the
following to be assembled:

.equ CONST, 0x123456
li a0, CONST
addi a0, a0, %lo(CONST)

.equ CONST, 1
slli a0, a0, CONST

Note that we don't have perfect compatibility with gas, as it will avoid
emitting a relocation in this case:

addi a0, a0, %lo(CONST2)
.equ CONST2, 0x123456

Thanks to Shiva Chen for suggesting a better way to approach this during review.

Differential Revision: https://reviews.llvm.org/D52298

llvm-svn: 350831
2019-01-10 15:33:17 +00:00
Sanjay Patel 87ae1460f7 [x86] fix remaining miscompile bug in horizontal binop matching (PR40243)
When we use the partial-matching function on a 128-bit chunk, we must 
account for the possibility that we've matched undef halves of the
original source vectors, so the outputs may need to be reset.

This should allow closing PR40243:
https://bugs.llvm.org/show_bug.cgi?id=40243

llvm-svn: 350830
2019-01-10 15:27:23 +00:00
Sanjay Patel ed5cfc6792 [x86] fix horizontal binop matching for 256-bit vectors (PR40243)
This is a partial fix for:
https://bugs.llvm.org/show_bug.cgi?id=40243
...as seen in the integer test, we still need to correct the result when using the 
existing (old) horizontal op matching function because it does not model the way 
x86 256-bit horizontal ops return results (each 128-bit half is its own horizontal-op). 
A potential follow-up change for that is discussed in the bug report - see also D56490.

This generally duplicates a lot of the existing matching code, but we can't just remove 
that without introducing regressions, so the existing code is renamed and used less often. 
Follow-ups may try to reduce that overlap.

Differential Revision: https://reviews.llvm.org/D56450

llvm-svn: 350826
2019-01-10 15:04:52 +00:00
Bryan Chan 7ce5775e62 [AArch64] Fix operation actions for FP16 vector intrinsics
Summary:
This patch changes the legalization action for some half-precision floating-
point vector intrinsics (FSIN, FLOG, etc.) from Promote to Expand. These ops
are not supported in hardware for half-precision vectors, but promotion is
not always possible (for v8f16 operands). Changing the action to Expand fixes
an assertion failure in the legalizer when the frontend produces such ops.
In addition, a quick microbenchmark shows that, in the v4f16 case,
expanding introduces fewer spills and is therefore slightly faster than
promoting.

Reviewers: t.p.northover, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56296

llvm-svn: 350825
2019-01-10 15:02:37 +00:00
George Rimar 70d197d466 [llvm-objdump] - Implement -z/--disassemble-zeroes.
This is https://bugs.llvm.org/show_bug.cgi?id=37151,

GNU objdump spec says that "Normally the disassembly output will skip blocks of zeroes.",
but currently, llvm-objdump prints them.

The patch implements the -z/--disassemble-zeroes option and switches the default to always
skip blocks of zeroes.

Differential revision: https://reviews.llvm.org/D56083

llvm-svn: 350823
2019-01-10 14:55:26 +00:00
Simon Pilgrim 8c221b3f76 [X86] Add SSE41 vector abs tests
llvm-svn: 350822
2019-01-10 14:26:15 +00:00
James Henderson 3a6a5a3370 [llvm-symbolizer] Add support for specifying addresses on command-line
See https://bugs.llvm.org/show_bug.cgi?id=40070.

GNU addr2line accepts input addresses both on the command-line and via
stdin. llvm-symbolizer previously only supported the latter. This
change adds support for the former. As with addr2line, the new
behaviour is to only look for addresses on stdin if no positional
arguments were provided to llvm-symbolizer.

Reviewed by: ruiu

Differential Revision: https://reviews.llvm.org/D56272

llvm-svn: 350821
2019-01-10 14:10:02 +00:00
Bjorn Pettersson 15313a3705 Fix RUN line in test/Transforms/LoopDeletion/crashbc.ll
llvm-svn: 350807
2019-01-10 09:58:23 +00:00
Sam Parker 7208221452 [ARM] Size reduce teq to eors
Add t2TEQrr to the map of instructions with can be reduced down into
a T1 instruction. This is a special case because TEQ just sets the
CPSR and doesn't write to a GPR, which is not the case for EOR. So,
we need to ensure that the EOR can write to the first operand.

Differential Revision: https://reviews.llvm.org/D56255

llvm-svn: 350801
2019-01-10 08:36:33 +00:00
Craig Topper 5d20eb240f [X86] Disable DomainReassignment pass when AVX512BW is disabled to avoid injecting VK32/VK64 references into the MachineIR
Summary:
This pass replaces GR8/GR16/GR32/GR64 with their equivalent sized mask register classes. But VK32/VK64 aren't legal without AVX512BW. Apparently this mostly appears to work if the register coalescer is able to remove the VK32/VK64 register class reference. Or if we don't ever spill it. But there's no guarantee of that.

Another Intel employee managed to trigger a crash due to this with ISPC. Unfortunately, I've lost the test case he sent me at the time. I'm trying to get him to reproduce it for me. I'd like to get this in before 8.0 branches since its a little scary.

The regressions here are unfortunate, but I think we can make some improvements to DAG combine, load folding, etc. to fix them. Just not sure if we can get that done for 8.0.

Fixes PR39741

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56460

llvm-svn: 350800
2019-01-10 07:43:54 +00:00
Zi Xuan Wu 64c956eea8 Recommit "[PowerPC] Fix assert from machine verify pass that unmatched register class about fcmp selection in fast-isel"
This re-commit r350685.

Differential Revision: https://reviews.llvm.org/D55686

llvm-svn: 350799
2019-01-10 06:20:14 +00:00
Mandeep Singh Grang 859cb2e35d [AArch64] Emit the correct MCExpr relocations specifiers like VK_ABS_G0, etc
Summary:
D55896 and D56029 add support to emit fixups for :abs_g0: , :abs_g1_s: , etc.
This patch adds the necessary enums and MCExpr needed for lowering these.

Reviewers: rnk, mstorsjo, efriedma

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56037

llvm-svn: 350798
2019-01-10 04:59:44 +00:00