Commit Graph

27435 Commits

Author SHA1 Message Date
Sanjay Patel 252660c1ff [x86] move misplaced tests; NFC
Mixed up integer and FP in rL349923.

llvm-svn: 349928
2018-12-21 17:06:43 +00:00
Jessica Paquette 453ab1db5b [GlobalISel][AArch64] Add support for widening G_FCEIL
This adds support for widening G_FCEIL in LegalizerHelper and
AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to
widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't
support full FP 16.

This also updates AArch64/f16-instructions.ll to show that we perform the
correct transformation.

llvm-svn: 349927
2018-12-21 17:05:26 +00:00
Sanjay Patel 41eebdefa7 [x86] add tests for possible horizontal op transform; NFC
llvm-svn: 349923
2018-12-21 16:49:41 +00:00
Sanjay Patel fef39ecd31 [x86] move test for movddup; NFC
This adds an AVX512 run as suggested in D55936.
The test didn't really belong with other build vector tests
because that's not the pattern here. I don't see much value 
in adding 64-bit RUNs because they wouldn't exercise the 
isel patterns that we're aiming to expose.

llvm-svn: 349920
2018-12-21 16:08:27 +00:00
Luke Cheeseman 41a9e53500 [Dwarf/AArch64] Return address signing B key dwarf support
- When signing return addresses with -msign-return-address=<scope>{+<key>},
  either the A key instructions or the B key instructions can be used. To
  correctly authenticate the return address, the unwinder/debugger must know
  which key was used to sign the return address.
- When and exception is thrown or a break point reached, it may be necessary to
  unwind the stack. To accomplish this, the unwinder/debugger must be able to
  first authenticate an the return address if it has been signed.
- To enable this, the augmentation string of CIEs has been extended to allow
  inclusion of a 'B' character. Functions that are signed using the B key
  variant of the instructions should have and FDE whose associated CIE has a 'B'
  in the augmentation string.
- One must also be able to preserve these semantics when first stepping from a
  high level language into assembly and then, as a second step, into an object
  file. To achieve this, I have introduced a new assembly directive
  '.cfi_b_key_frame ', that tells the assembler the current frame uses return
  address signing with the B key.
- This ensures that the FDE is associated with a CIE that has 'B' in the
  augmentation string.

Differential Revision: https://reviews.llvm.org/D51798

llvm-svn: 349895
2018-12-21 10:45:08 +00:00
Simon Pilgrim 5d403f6bf8 [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm)
This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics.

Clang counterpart: https://reviews.llvm.org/D55890

Differential Revision: https://reviews.llvm.org/D55894

llvm-svn: 349892
2018-12-21 09:04:14 +00:00
Thomas Lively b6dac89c87 [WebAssembly] Fix invalid machine instrs in -O0, verify in tests
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55956

llvm-svn: 349889
2018-12-21 06:58:15 +00:00
Matt Arsenault 3eae3c4590 AMDGPU/GlobalISel: RegBankSelect for amdgcn.wqm.vote
llvm-svn: 349882
2018-12-21 03:20:54 +00:00
Matt Arsenault f4c21c575a AMDGPU/GlobalISel: RegBankSelect for some fp ops
llvm-svn: 349880
2018-12-21 03:14:45 +00:00
Matt Arsenault bee2ad7185 AMDGPU/GlobalISel: Redo legality for build_vector
It seems better to avoid using the callback if possible since
there are coverage assertions which are disabled if this is used.

Also fix missing tests. Only test the legal cases since it seems
legalization for build_vector is quite lacking.

llvm-svn: 349878
2018-12-21 03:03:11 +00:00
Craig Topper 7b78137403 [X86] Autogenerate complete checks. NFC
llvm-svn: 349870
2018-12-21 01:27:33 +00:00
Eli Friedman b1bbd5dca3 [ARM] Complete the Thumb1 shift+and->shift+shift transforms.
This saves materializing the immediate.  The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.

I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.

Differential Revision: https://reviews.llvm.org/D55630

llvm-svn: 349857
2018-12-20 23:39:54 +00:00
Jessica Paquette a6b9c68a85 [GlobalISel][AArch64] Add G_FCEIL to isPreISelGenericFloatingPointOpcode
If you don't do this, then if you hit a G_LOAD in getInstrMapping, you'll end
up with GPRs on the G_FCEIL instead of FPRs. This causes a fallback.

Add it to the switch, and add a test verifying that this happens.

llvm-svn: 349822
2018-12-20 21:14:15 +00:00
Simon Pilgrim 2a25360ae3 [X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift intrinsics (llvm)
This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics.

Clang counterpart: https://reviews.llvm.org/D55937

Differential Revision: https://reviews.llvm.org/D55938

llvm-svn: 349795
2018-12-20 19:01:07 +00:00
Sanjay Patel 18b008b577 [x86] add test to show missed movddup load fold; NFC
llvm-svn: 349773
2018-12-20 17:05:57 +00:00
Amilendra Kodithuwakku 388bb86d7b Test commit
Fix a simple typo.

llvm-svn: 349771
2018-12-20 16:44:26 +00:00
Krzysztof Parzyszek 30c42e2ab6 [Hexagon] Add patterns for funnel shifts
llvm-svn: 349770
2018-12-20 16:39:20 +00:00
Simon Pilgrim b208255fe0 [SelectionDAGBuilder] Enable funnel shift building to custom rotates
This patch enables funnel shift -> rotate building for all ROTL/ROTR custom/legal operations.

AFAICT X86 was the last target that was missing modulo support (PR38243), but I've tried to CC stakeholders for every target that has ROTL/ROTR custom handling for their final OK.

Differential Revision: https://reviews.llvm.org/D55747

llvm-svn: 349765
2018-12-20 14:56:44 +00:00
Simon Pilgrim 09c081176a [X86][AVX512] Don't custom lower v16i8 rotations.
As discussed on D55747, the expansion to (wider) shifts is better on all AVX512 cases, not just BWI.

llvm-svn: 349763
2018-12-20 14:38:35 +00:00
Ulrich Weigand 44d37ae38c [SystemZ] Make better use of VLLEZ
This patch fixes two deficiencies in current code that recognizes
the VLLEZ idiom:

- For the floating-point versions, we have ISel patterns that match
  on a bitconvert as the top node.  In more complex cases, that
  bitconvert may already have been merged into something else.
  Fix the patterns to match the inner nodes instead.

- For the 64-bit integer versions, depending on the surrounding code,
  we may get either a DAG tree based on JOIN_DWORDS or one based on
  INSERT_VECTOR_ELT.  Use a PatFrags to simply match both variants.

llvm-svn: 349749
2018-12-20 13:05:03 +00:00
Ulrich Weigand 8bb46b0f01 [SystemZ] Make better use of VGEF/VGEG
Current code in SystemZDAGToDAGISel::tryGather refuses to perform
any transformation if the Load SDNode has more than one use.  This
(erronously) counts uses of the chain result, which prevents the
optimization in many cases unnecessarily.  Fixed by this patch.

llvm-svn: 349748
2018-12-20 13:01:20 +00:00
Clement Courbet 36a3480385 Re-land r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Update PPC ir following GEP->bitcat to bitcat->GEP->bitcat change.

llvm-svn: 349747
2018-12-20 13:01:04 +00:00
Ulrich Weigand f43b510015 [SystemZ] Make better use of VLDEB
We already have special code (DAG combine support for FP_ROUND)
to recognize cases where we an use a vector version of VLEDB to
perform two floating-point truncates in parallel, but equivalent
support for VLEDB (vector floating-point extends) has been
missing so far.  This patch adds corresponding DAG combine
support for FP_EXTEND.

llvm-svn: 349746
2018-12-20 12:59:05 +00:00
Simon Pilgrim 6bbf39b48c [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm)
Pulled out of D55894 to match the clang changes in D55890.

Differential Revision: https://reviews.llvm.org/D55890

llvm-svn: 349744
2018-12-20 11:53:54 +00:00
Simon Pilgrim e85ad60ee0 [X86] Update PADDSW/PSUBSW intrinsic usage with generic saturated intrinsics.
As discussed on D55894, this makes no difference to the actual test.

llvm-svn: 349742
2018-12-20 11:14:56 +00:00
Clement Courbet e22cf4d7cb Revert r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads."
Forgot to update PowerPC tests for the GEP->bitcast change.

llvm-svn: 349733
2018-12-20 09:58:33 +00:00
Clement Courbet 1bb6e1b0f2 [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Summary:
This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp
in just two loads on X86. These were previously calling memcmp.

Reviewers: spatel, gchatelet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55263

llvm-svn: 349731
2018-12-20 09:13:47 +00:00
Kang Zhang ca8db48974 [PowerPC] Implement the isSelectSupported() target hook
Summary:
PowerPC has scalar selects (isel) and vector mask selects (xxsel). But PowerPC
does not have vector CR selects, PowerPC does not support scalar condition 
selects on vectors.
In addition to implementing this hook, isSelectSupported() should return false
when the SelectSupportKind is ScalarCondVectorVal, so that predictable selects
are converted into branch sequences.

Reviewed By: steven.zhang,  hfinkel

Differential Revision: https://reviews.llvm.org/D55754

llvm-svn: 349727
2018-12-20 06:19:59 +00:00
Thomas Lively feb18fe927 [WebAssembly] Emit a splat for v128 IMPLICIT_DEF
Summary:
This is a code size savings and is also important to get runnable code
while engines do not support v128.const.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55910

llvm-svn: 349724
2018-12-20 04:20:32 +00:00
Amara Emerson 321bfb210a Fix build errors introduced by r349712 on aarch64 bots.
llvm-svn: 349723
2018-12-20 03:27:42 +00:00
Thomas Lively 8dbf29af95 [WebAssembly] Gate unimplemented SIMD ops on flag
Summary:
Gates v128.const, f32x4.sqrt, f32x4.div, i8x16.extract_lane_u, and
i16x8.extract_lane_u on the --wasm-enable-unimplemented-simd flag,
since these ops are not implemented yet in V8.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55904

llvm-svn: 349720
2018-12-20 02:10:22 +00:00
Matt Arsenault 4339883710 AMDGPU: Make i1/i64/v2i32 and/or/xor legal
The 64-bit types do depend on the register bank,
but that's another issue to deal with later.

llvm-svn: 349716
2018-12-20 01:35:49 +00:00
Matt Arsenault 8cc98bee8a AMDGPU/GlobalISel: Fix ValueMapping tables for i1
This was incorrectly selecting SGPR for any i1 values,
e.g. G_TRUNC to i1 from a VGPR was still an SGPR.

llvm-svn: 349715
2018-12-20 01:33:43 +00:00
Amara Emerson 8cb186ce17 [AArch64][GlobalISel] Implement selection og G_MERGE of two s32s into s64.
This code pattern is an unfortunate side effect of the way some types get split
at call lowering. Ideally we'd either not generate it at all or combine it away
in the legalizer artifact combiner.

Until then, add selection support anyway which is a significant proportion of
our current fallbacks on CTMark.

rdar://46491420

llvm-svn: 349712
2018-12-20 01:11:04 +00:00
Matt Arsenault dff33c38e1 AMDGPU/GlobalISel: RegBankSelect for fp conversions
llvm-svn: 349709
2018-12-20 00:37:02 +00:00
Matt Arsenault 36d4092173 AMDGPU/GlobalISel: Legality/regbankselect for atomicrmw/atomic_cmpxchg
llvm-svn: 349708
2018-12-20 00:33:49 +00:00
Rhys Perry 3931ad38b9 AMDGPU: Add patterns for v4i16/v4f16 -> v4i16/v4f16 bitcasts
Reviewers: arsenm, tstellar

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55058

llvm-svn: 349694
2018-12-19 22:53:33 +00:00
Sanjay Patel ca6434de37 [x86] add test to show ddup hole; NFC (PR37502)
llvm-svn: 349680
2018-12-19 20:35:28 +00:00
Jessica Paquette 3560e93dc1 [GlobalISel][AArch64] Add support for @llvm.ceil
This adds a G_FCEIL generic instruction and uses it in AArch64. This adds
selection for floating point ceil where it has a supported, dedicated
instruction. Other cases aren't handled here.

It updates the relevant gisel tests and adds a select-ceil test. It also adds a
check to arm64-vcvt.ll which ensures that we don't fall back when we run into
one of the relevant cases.

llvm-svn: 349664
2018-12-19 19:01:36 +00:00
Craig Topper 84a00bd98a [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing
The (cmp (and X, Y) 0) pattern is greedy and ends up forming a TESTrr and consuming the and when it might be better to use one of the BMI/TBM like BLSR or BLSI.

This patch moves removes the pattern from isel and adds a post processing check to combine TESTrr+ANDrr into just a TESTrr. With this patch we are able to select the BMI/TBM instructions, but we'll also emit a TESTrr when the result is compared to 0. In many cases the peephole pass will be able to use optimizeCompareInstr to remove the TEST, but its probably not perfect.

Differential Revision: https://reviews.llvm.org/D55870

llvm-svn: 349661
2018-12-19 18:49:13 +00:00
Craig Topper 291470347a [X86] Fix assert fails in pass X86AvoidSFBPass
Fixes https://bugs.llvm.org/show_bug.cgi?id=38743

The function removeRedundantBlockingStores is supposed to remove any blocking stores contained in each other in lockingStoresDispSizeMap.
But it currently looks only at the previous one, which will miss some cases that result in assert.

This patch refine the function to check all previous layouts until find the uncontained one. So all redundant stores will be removed.

Patch by Pengfei Wang

Differential Revision: https://reviews.llvm.org/D55642

llvm-svn: 349660
2018-12-19 18:45:57 +00:00
Simon Pilgrim 171f3aa012 [X86] Remove already upgraded llvm.x86.avx512.mask.padds/psubs tests
Duplicate tests have already been moved to avx512bw-intrinsics-upgrade.ll

llvm-svn: 349643
2018-12-19 17:18:27 +00:00
Yonghong Song 7b410ac352 [BPF] Generate BTF DebugInfo under BPF target
This patch implements BTF (BPF Type Format).
The BTF is the debug info format for BPF, introduced
in the below linux patch:
  69b693f0ae (diff-06fb1c8825f653d7e539058b72c83332)
and further extended several times, e.g.,
  https://www.spinics.net/lists/netdev/msg534640.html
  https://www.spinics.net/lists/netdev/msg538464.html
  https://www.spinics.net/lists/netdev/msg540246.html

The main advantage of implementing in LLVM is:
   . better integration/deployment as no extra tools are needed.
   . bpf JIT based compilation (like bcc, bpftrace, etc.) can get
     BTF without much extra effort.
   . BTF line_info needs selective source codes, which can be
     easily retrieved when inside the compiler.

This patch implemented BTF generation by registering a BPF
specific DebugHandler in BPFAsmPrinter.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D55752

llvm-svn: 349640
2018-12-19 16:40:25 +00:00
Amy Kwan 22cd453ba7 Test commit
llvm-svn: 349633
2018-12-19 15:21:07 +00:00
Simon Pilgrim 7bfbf3caa4 [X86][SSE] Auto upgrade PADDUS/PSUBUS intrinsics to UADD_SAT/USUB_SAT generic intrinsics (llvm)
Now that we use the generic ISD opcodes, we can use the generic intrinsics directly as well. This fixes the poor fast-isel codegen by not expanding to an easily broken IR code sequence.

I'm intending to deal with the signed saturation equivalents as well.

Clang counterpart: https://reviews.llvm.org/D55879

Differential Revision: https://reviews.llvm.org/D55855

llvm-svn: 349630
2018-12-19 14:43:36 +00:00
Simon Pilgrim 2ae3a91656 [SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate (part 2 of 2)
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.

This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.

I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1|c2) fold to demonstrate its use, which I believe is safe for undef cases.

Differential Revision: https://reviews.llvm.org/D55822

llvm-svn: 349629
2018-12-19 14:09:38 +00:00
Simon Pilgrim 6c95bea072 [TargetLowering] Fix propagation of undefs in zero extension ops (PR40091)
As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero.

SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue.

Thanks to @dmgreen for catching this.

Differential Revision: https://reviews.llvm.org/D55883

llvm-svn: 349625
2018-12-19 13:37:59 +00:00
Simon Pilgrim ac62c8a3aa [X86][SSE] Remove use of SSE ADDS/SUBS saturation intrinsics from schedule/stack tests
These are due to be upgraded soon, but good to replace them with generic llvm sadd_sat/ssub_sat intrinsics now.

The avx512 masked cases need doing as well but require a bit of tidyup first.

llvm-svn: 349621
2018-12-19 12:00:25 +00:00
Nicolai Haehnle 8d5e974076 AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1
Summary:
Using HI here makes no logical sense, since the dword is only
32 bits to begin with.

Current Mesa master does not look at the relocation type at all,
so this change is fine. Future Mesa will rely on this, however.

Change-Id: I91085707834c4ac0370926602b93c94b90e44cb1

Reviewers: arsenm, rampitec, mareko

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55369

llvm-svn: 349620
2018-12-19 11:55:03 +00:00
Simon Pilgrim 2072b5afbe [SelectionDAG] Optional handling of UNDEF elements in matchUnaryPredicate
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs.

This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument.

I've updated SelectionDAG::simplifyShift to demonstrate its use.

Differential Revision: https://reviews.llvm.org/D55819

llvm-svn: 349616
2018-12-19 10:41:06 +00:00
Simon Pilgrim d4b077698a [X86][SSE] Remove SSE ADDUS/SUBUS saturation intrinsics from schedule/stack tests
These are already being autoupgraded, currently to an IR sequence, but best to replace them with generic llvm uadd_sat/usub_sat intrinsics (which D55855 will be doing shortly anyhow).

The avx512 masked cases need doing as well but require a bit of tidyup first.

llvm-svn: 349615
2018-12-19 10:39:14 +00:00
Carl Ritson c521ac3a44 AMDGPU/InsertWaitcnts: Update VGPR/SGPR bounds when brackets are merged
Summary:
Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged.
This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds.

Irreducible loop test has been extended based on a CTS failure detected for GFX9.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D55602

llvm-svn: 349611
2018-12-19 10:17:49 +00:00
Diana Picus 6c35a1e5af [ARM GlobalISel] Support G_CONSTANT for Thumb2
All we have to do is mark it as legal.

This allows us to select a lot of new patterns handled by TableGen. This
patch adds tests for them and splits up the existing test file for
binary operators into 2 files, one for arithmetic ops and one for
logical ones.

llvm-svn: 349610
2018-12-19 09:55:10 +00:00
Matt Arsenault b110e2277c AMDGPU/GlobalISel: Regbankselect for fsub
llvm-svn: 349608
2018-12-19 09:07:58 +00:00
Kewen Lin a6247e7cf4 [PowerPC]Exploit P9 vabsdu for unsigned vselect patterns
For type v4i32/v8ii16/v16i8, do following transforms:
  (vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b)
  (vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b)
  (vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b)
  (vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b)

Differential Revision: https://reviews.llvm.org/D55812

llvm-svn: 349599
2018-12-19 03:04:07 +00:00
Pete Cooper f86db5ce9e Rewrite objc intrinsics to runtime methods in PreISelIntrinsicLowering instead of SDAG.
SelectionDAG currently changes these intrinsics to function calls, but that won't work
for other ISel's.  Also we want to eventually support nonlazybind and weak linkage coming
from the front-end which we can't do in SelectionDAG.

llvm-svn: 349552
2018-12-18 22:20:03 +00:00
Simon Pilgrim 73c8685295 [AARCH64] Added test case for PR40091
llvm-svn: 349543
2018-12-18 21:05:22 +00:00
Craig Topper 18a9d545e1 [X86] Add BSR to isUseDefConvertible.
We already had BSF here as part of __builtin_ffs improvements and I was just wondering yesterday whether we should have BSR there.

This addresses one issue from PR40090.

llvm-svn: 349531
2018-12-18 20:03:54 +00:00
Farhana Aleen 59ee2c5362 [AMDGPU] Removed the unnecessary operand size-check-assert from processBaseWithConstOffset().
Summary: 32bit operand sizes are guaranteed by the opcode check AMDGPU::V_ADD_I32_e64 and
         AMDGPU::V_ADDC_U32_e64. Therefore, we don't any additional operand size-check-assert.

Author: FarhanaAleen
llvm-svn: 349529
2018-12-18 19:58:39 +00:00
Nikita Popov f6058ff140 [X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBS
Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic
ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for
@llvm.sadd.sat() and @llvm.ssub.sat() intrinsics.

This is a followup to D55787 and part of PR40056.

Differential Revision: https://reviews.llvm.org/D55833

llvm-svn: 349520
2018-12-18 18:28:22 +00:00
Craig Topper 20a6db5a84 [X86] Create PSUBUS from (add (umax X, C), -C)
InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant.

This patch recognizes this pattern and turns it into PSUBUS. Future work could improve undef element handling.

Fixes some of PR40053

Differential Revision: https://reviews.llvm.org/D55780

llvm-svn: 349519
2018-12-18 18:26:25 +00:00
Michael Berg c6a5245cf7 Add FMF management to common fp intrinsics in GlobalIsel
Summary: This the initial code change to facilitate managing FMF flags from Instructions to MI wrt Intrinsics in Global Isel.  Eventually the GlobalObserver interface will be added as well, where FMF additions can be tracked for the builder and CSE.

Reviewers: aditya_nandakumar, bogner

Reviewed By: bogner

Subscribers: rovka, kristof.beyls, javed.absar

Differential Revision: https://reviews.llvm.org/D55668

llvm-svn: 349514
2018-12-18 17:54:52 +00:00
Simon Pilgrim 1411917431 [X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for constant rotation amounts
Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion.

llvm-svn: 349510
2018-12-18 17:31:11 +00:00
Simon Pilgrim e9effe9744 [X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for splat rotation amounts
Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion.

llvm-svn: 349500
2018-12-18 16:02:23 +00:00
Petar Avramovic 0a5e4eb776 [MIPS GlobalISel] Select G_SDIV, G_UDIV, G_SREM and G_UREM
Add support for s64 libcalls for G_SDIV, G_UDIV, G_SREM and G_UREM
and use integer type of correct size when creating arguments for
CLI.lowerCall.
Select G_SDIV, G_UDIV, G_SREM and G_UREM for types s8, s16, s32 and s64
on MIPS32.

Differential Revision: https://reviews.llvm.org/D55651

llvm-svn: 349499
2018-12-18 15:59:51 +00:00
Simon Pilgrim be0fbe673e [X86][SSE] Add shift combine 'out of range' tests with UNDEFs
Shows failure to simplify out of range shift amounts to UNDEF if any element is UNDEF.

llvm-svn: 349483
2018-12-18 13:37:04 +00:00
Nikita Popov 665ab08178 [X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUS
Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes
UADDSAT and USUBSAT. As a side-effect, this also makes codegen for
the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable.

This only replaces use in the X86 backend, and does not move any of
the ADDUS/SUBUS X86 specific combines into generic codegen.

Differential Revision: https://reviews.llvm.org/D55787

llvm-svn: 349481
2018-12-18 13:23:03 +00:00
Nikita Popov a7d2a235bb [SelectionDAG][X86] Fix [US](ADD|SUB)SAT vector legalization, add tests
Integer result promotion needs to use the scalar size, and we need
support for result widening.

This is in preparation for D55787.

llvm-svn: 349480
2018-12-18 13:22:53 +00:00
Petar Avramovic 150fd430f6 [MIPS GlobalISel] ClampScalar G_AND G_OR and G_XOR
Add narrowScalar for G_AND and G_XOR.
Legalize G_AND G_OR and G_XOR for types other then s32 
with clampScalar on MIPS32.

Differential Revision: https://reviews.llvm.org/D55362

llvm-svn: 349475
2018-12-18 11:36:14 +00:00
Luke Cheeseman f57d7d8237 [AArch64] - Return address signing dwarf support
- Reapply changes intially introduced in r343089
- The archtecture info is no longer loaded whenever a DWARFContext is created
- The runtimes libraries (santiziers) make use of the dwarf context classes but
  do not intialise the target info
- The architecture of the object can be obtained without loading the target info
- Adding a method to the dwarf context to get this information and multiplex the
  string printing later on

Differential Revision: https://reviews.llvm.org/D55774

llvm-svn: 349472
2018-12-18 10:37:42 +00:00
Simon Pilgrim ba8e84b31c [X86][AVX] Add 256/512-bit vector funnel shift tests
Extra coverage for D55747

llvm-svn: 349471
2018-12-18 10:32:54 +00:00
Simon Pilgrim 46b90e851b [X86][SSE] Add 128-bit vector funnel shift tests
Extra coverage for D55747

llvm-svn: 349470
2018-12-18 10:08:23 +00:00
Matt Arsenault c94e26c71d AMDGPU: Legalize/regbankselect frame_index
llvm-svn: 349468
2018-12-18 09:46:13 +00:00
Matt Arsenault c0ea221068 AMDGPU: Legalize/regbankselect fma
llvm-svn: 349467
2018-12-18 09:39:56 +00:00
Simon Pilgrim af6fbbf18b [TargetLowering] Fallback from SimplifyDemandedVectorElts to SimplifyDemandedBits
For opcodes not covered by SimplifyDemandedVectorElts, SimplifyDemandedBits might be able to help now that it supports demanded elts as well.

llvm-svn: 349466
2018-12-18 09:33:25 +00:00
Matt Arsenault e01e7c81f2 AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsub
llvm-svn: 349463
2018-12-18 09:19:03 +00:00
Simon Pilgrim 26c630f416 [X86][SSE] Replace (VSRLI (VSRAI X, Y), 31) -> (VSRLI X, 31) fold.
This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same).

Test change is merely a rescheduling.

llvm-svn: 349459
2018-12-18 08:55:47 +00:00
Kristof Beyls e66bc1f756 Introduce control flow speculation tracking pass for AArch64
The pass implements tracking of control flow miss-speculation into a "taint"
register. That taint register can then be used to mask off registers with
sensitive data when executing under miss-speculation, a.k.a. "transient
execution".
This pass is aimed at mitigating against SpectreV1-style vulnarabilities.

At the moment, it implements the tracking of miss-speculation of control
flow into a taint register, but doesn't implement a mechanism yet to then
use that taint register to mask off vulnerable data in registers (something
for a follow-on improvement). Possible strategies to mask out vulnerable
data that can be implemented on top of this are:
- speculative load hardening to automatically mask of data loaded
  in registers.
- using intrinsics to mask of data in registers as indicated by the
  programmer (see https://lwn.net/Articles/759423/).

For AArch64, the following implementation choices are made.
Some of these are different than the implementation choices made in
the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
the instruction set characteristics result in different trade-offs.
- The speculation hardening is done after register allocation. With a
  relative abundance of registers, one register is reserved (X16) to be
  the taint register. X16 is expected to not clash with other register
  reservation mechanisms with very high probability because:
  . The AArch64 ABI doesn't guarantee X16 to be retained across any call.
  . The only way to request X16 to be used as a programmer is through
    inline assembly. In the rare case a function explicitly demands to
    use X16/W16, this pass falls back to hardening against speculation
    by inserting a DSB SYS/ISB barrier pair which will prevent control
    flow speculation.
- It is easy to insert mask operations at this late stage as we have
  mask operations available that don't set flags.
- The taint variable contains all-ones when no miss-speculation is detected,
  and contains all-zeros when miss-speculation is detected. Therefore, when
  masking, an AND instruction (which only changes the register to be masked,
  no other side effects) can easily be inserted anywhere that's needed.
- The tracking of miss-speculation is done by using a data-flow conditional
  select instruction (CSEL) to evaluate the flags that were also used to
  make conditional branch direction decisions. Speculation of the CSEL
  instruction can be limited with a CSDB instruction - so the combination of
  CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL
  aren't speculated. When conditional branch direction gets miss-speculated,
  the semantics of the inserted CSEL instruction is such that the taint
  register will contain all zero bits.
  One key requirement for this to work is that the conditional branch is
  followed by an execution of the CSEL instruction, where the CSEL
  instruction needs to use the same flags status as the conditional branch.
  This means that the conditional branches must not be implemented as one
  of the AArch64 conditional branches that do not use the flags as input
  (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction
  selectors to not produce these instructions when speculation hardening
  is enabled. This pass will assert if it does encounter such an instruction.
- On function call boundaries, the miss-speculation state is transferred from
  the taint register X16 to be encoded in the SP register as value 0.

Future extensions/improvements could be:
- Implement this functionality using full speculation barriers, akin to the
  x86-slh-lfence option. This may be more useful for the intrinsics-based
  approach than for the SLH approach to masking.
  Note that this pass already inserts the full speculation barriers if the
  function for some niche reason makes use of X16/W16.
- no indirect branch misprediction gets protected/instrumented; but this
  could be done for some indirect branches, such as switch jump tables.

Differential Revision: https://reviews.llvm.org/D54896

llvm-svn: 349456
2018-12-18 08:50:02 +00:00
Martin Storsjo 8f0cb9c3a8 [AArch64] [MinGW] Allow enabling SEH exceptions
The default still is dwarf, but SEH exceptions can now be enabled
optionally for the MinGW target.

Differential Revision: https://reviews.llvm.org/D55748

llvm-svn: 349451
2018-12-18 08:32:37 +00:00
Craig Topper 284d426f6d [X86] Add test cases to show isel failing to match BMI blsmsk/blsi/blsr when the flag result is used.
A similar things happen to TBM instructions which we already have tests for.

llvm-svn: 349450
2018-12-18 08:26:01 +00:00
Kewen Lin bbb461f758 [PowerPC][NFC]Update vabsd cases with vselect test cases
Power9 VABSDU* instructions can be exploited for some special vselect sequences.
Check in the orignal test case here, later the exploitation patch will update this 
and reviewers can check the differences easily.

llvm-svn: 349446
2018-12-18 08:11:32 +00:00
Kewen Lin 44ace92596 [PowerPC] Exploit power9 new instruction setb
Check the expected pattens feeding to SELECT_CC like:
   (select_cc lhs, rhs,  1, (sext (setcc [lr]hs, [lr]hs, cc2)), cc1)
   (select_cc lhs, rhs, -1, (zext (setcc [lr]hs, [lr]hs, cc2)), cc1)
   (select_cc lhs, rhs,  0, (select_cc [lr]hs, [lr]hs,  1, -1, cc2), seteq)
   (select_cc lhs, rhs,  0, (select_cc [lr]hs, [lr]hs, -1,  1, cc2), seteq)
Further transform the sequence to comparison + setb if hits.

Differential Revision: https://reviews.llvm.org/D53275

llvm-svn: 349445
2018-12-18 07:53:26 +00:00
QingShan Zhang ecdab5bdd8 [NFC] Add new test to cover the lhs scheduling issue for P9.
llvm-svn: 349443
2018-12-18 06:32:42 +00:00
Craig Topper 4adf9ca738 [X86] Add test case for PR40060. NFC
llvm-svn: 349441
2018-12-18 04:58:07 +00:00
QingShan Zhang f549812599 [NFC] fix test case issue that with wrong label check.
llvm-svn: 349439
2018-12-18 04:25:41 +00:00
Kewen Lin 3dac1252da [PowerPC] Improve vec_abs on P9
Improve the current vec_abs support on P9, generate ISD::ABS node for vector types,
combine ABS node to VABSD node for some special cases to make use of P9 VABSD* insns,
do custom lowering to vsub(vneg later)+vmax if it has no combination opportunity.

Differential Revision: https://reviews.llvm.org/D54783

llvm-svn: 349437
2018-12-18 03:16:43 +00:00
Craig Topper 6a6f6109c4 [X86] Add baseline tests for D55780
This adds tests for (add (umax X, C), -C) as part of fixing PR40053

llvm-svn: 349416
2018-12-17 23:20:14 +00:00
Simon Pilgrim 7e2975a44c [X86][SSE] Improve immediate vector shift known bits handling.
Convert VSRAI to VSRLI is the sign bit is known zero and improve KnownBits output for all shift instruction.

Fixes the poor codegen comments in D55768.

llvm-svn: 349407
2018-12-17 22:09:47 +00:00
Wouter van Oortmerssen d3c544aa6e [WebAssembly] Fix assembler parsing of br_table.
Summary:
We use `variable_ops` in the tablegen defs to denote the list of
branch targets in `br_table`, but unlike other uses of `variable_ops`
(e.g. call) the these branch targets need to actually be encoded in the
instruction. The existing tables for `variable_ops` cause not operands
to be accepted by the assembly matcher.

Following the example of ARM:
2cc0a7da87/lib/Target/ARM/ARMInstrInfo.td (L550-L555)
we introduce a new operand type to capture this list, and we use the
same {} syntax as ARM as well to differentiate them from regular
integer operands.

Also removed definition and use of TSFlags in tablegen defs, since
`br_table` now has a non-variable_ops immediate operand, so the
previous logic of only the variable_ops arguments being labels didn't
make sense anymore.

Reviewers: dschuff, aheejin, sunfish

Subscribers: javed.absar, sbc100, jgravelle-google, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D55401

llvm-svn: 349405
2018-12-17 22:04:44 +00:00
Craig Topper 8c9d772991 [X86] Add T1MSKC and TZMSK to isDefConvertible used by optimizeCompareInstr.
These seem to have been missed when the other TBM instructions were added.

llvm-svn: 349404
2018-12-17 21:50:06 +00:00
Craig Topper 728cbc0378 Convert (CMP (srl/shl X, C), 0) to (CMP (and X, C'), 0) when only the zero flag is used.
This allows a TEST to be used and can be combined with any AND that may already exist as an input to the shift.

This was already done in EmitTest, but was easily tricked by multiple uses because the setcc might be used by multiple instructions. Once the SETCC and users are legalized then we can look for the shift to be used by a single CMP, but the CMP itself can have multiple users.

This appears to fix the case in PR39968.

llvm-svn: 349385
2018-12-17 20:02:16 +00:00
Simon Pilgrim 9274f17a5e [TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000)
This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen.

I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well.

Differential Revision: https://reviews.llvm.org/D55768

llvm-svn: 349374
2018-12-17 18:43:43 +00:00
Tim Northover 256a16d031 FastIsel: take care to update iterators when removing instructions.
We keep a few iterators into the basic block we're selecting while
performing FastISel. Usually this is fine, but occasionally code wants
to remove already-emitted instructions. When this happens we have to be
careful to update those iterators so they're not pointint at dangling
memory.

llvm-svn: 349365
2018-12-17 17:25:53 +00:00
Tim Northover ae3b66b7b0 ARM: use acquire/release instruction variants when available.
These features (fairly) recently got split out into their own feature, so we
should make CodeGen use them when available. The main change here is that the
check used to be based on the triple, but now it's based on CPU features.

llvm-svn: 349355
2018-12-17 15:05:32 +00:00
Simon Pilgrim 193429ea15 Regenerate test in prep for SimplifyDemandedBits improvements.
llvm-svn: 349350
2018-12-17 12:48:34 +00:00
Petar Avramovic b8276f2280 [MIPS GlobalISel] Lower G_UADDE and narrowScalar G_ADD
Lower G_UADDE and legalize G_ADD using narrowScalar on MIPS32.

Differential Revision: https://reviews.llvm.org/D54580

llvm-svn: 349346
2018-12-17 12:31:07 +00:00
Alexandros Lamprineas 490ae11717 [AArch64] Re-run load/store optimizer after aggressive tail duplication
The Load/Store Optimizer runs before Machine Block Placement. At O3 the
Tail Duplication Threshold is set to 4 instructions and this can create
new opportunities for the Load/Store Optimizer. It seems worthwhile to
run it once again.

llvm-svn: 349338
2018-12-17 10:45:43 +00:00
Craig Topper 792d4f130d [X86] Add test case for PR39968. NFC
llvm-svn: 349331
2018-12-17 07:51:17 +00:00
Kewen Lin c68ce89ae1 [Power9][NFC]update vabsd case for better dumping
Appended options -ppc-vsr-nums-as-vr and -ppc-asm-full-reg-names to get the 
more descriptive output. Also removed useless function attributes.

llvm-svn: 349329
2018-12-17 06:32:02 +00:00
Kewen Lin 3ee103085e [Power9][NFC]Make pre-inc-disable case more robust
With some patch adopted for Power9 vabsd* insns, some CHECKs can't get the expected results.
But it's false alarm, we should update the case more robust.

llvm-svn: 349325
2018-12-17 03:16:12 +00:00
Simon Pilgrim 0dea14f2f2 Regenerate test (merges X86+X64 cases). NFCI.
llvm-svn: 349317
2018-12-16 19:07:57 +00:00
Craig Topper 10f8892837 [X86] Remove truncation handling from EmitTest. Replace it with a DAG combine.
I'd like to try to move a lot of the flag matching out of EmitTest and push it to isel or isel preprocessing. This is a step towards that.

The test-shrink-bug.ll changie is an improvement because we are no longer interfering with test shrink handling in isel.

The pr34137.ll change is a regression, but the IR came from -O0 and was not reduced by InstCombine. So it contains a lot of redundancies like duplicate loads that made it combine poorly.

llvm-svn: 349315
2018-12-16 18:35:55 +00:00
Craig Topper b0b9c54578 [X86] Autogenerate complete checks. NFC
llvm-svn: 349314
2018-12-16 18:35:54 +00:00
Sanjay Patel 13ac2f15b0 [x86] increment/decrement constant vector with min/max in vsetcc lowering (PR39859)
This is part of fixing PR39859:
https://bugs.llvm.org/show_bug.cgi?id=39859

We have a crippled vector ISA, so we have to invert a typical fold and create min/max here.

As discussed in the bug report, we can probably do better by using saturating subtract when 
it's available, but we should have this improvement for the min/max patterns regardless.

Alive proofs:
https://rise4fun.com/Alive/zsf
https://rise4fun.com/Alive/Qrl

Differential Revision: https://reviews.llvm.org/D55515

llvm-svn: 349304
2018-12-16 15:05:48 +00:00
Sanjay Patel f24900b934 [DAGCombiner] allow hoisting vector bitwise logic ahead of truncates
The transform performs a bitwise logic op in a wider type followed by
truncate when both inputs are truncated from the same source type:
logic_op (truncate x), (truncate y) --> truncate (logic_op x, y)

There are a bunch of other checks that should prevent doing this when 
it might be harmful.

We already do this transform for scalars in this spot. The vector 
limitation was shared with a check for the case when the operands are 
extended. I'm not sure if that limit is needed either, but that would 
be a separate patch.

Differential Revision: https://reviews.llvm.org/D55448

llvm-svn: 349303
2018-12-16 14:57:04 +00:00
Simon Pilgrim 0ef977b83d [SelectionDAG] Add FSHL/FSHR support to computeKnownBits
Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type).

llvm-svn: 349298
2018-12-16 13:33:37 +00:00
Simon Pilgrim 780b3ca775 [X86] Add computeKnownBits tests for funnel shift intrinsics
llvm-svn: 349297
2018-12-16 12:15:31 +00:00
Craig Topper 392edb6223 [X86] Autogenerate complete checks. NFC
llvm-svn: 349287
2018-12-15 22:52:57 +00:00
Simon Pilgrim ef7b5949e5 [X86] Lower to SHLD/SHRD on slow machines for optsize
Use consistent rules for when to lower to SHLD/SHRD for slow machines - fixes a weird issue where funnel shift gets expanded but then X86ISelLowering's combineOr sees the optsize and combines to SHLD/SHRD, but now with the modulo amount guard......

llvm-svn: 349285
2018-12-15 19:43:44 +00:00
Simon Pilgrim 53c8b1b6f7 [X86] Add optsize SHLD/SHRD tests
llvm-svn: 349284
2018-12-15 19:32:26 +00:00
Dinar Temirbulatov 8c8724dd0d [CodeGen] Enhance machine PHIs optimization
Summary:
Make machine PHIs optimization to work for single value register taken from
several different copies. This is the first step to fix PR38917. This change
allows to get rid of redundant PHIs (see opt_phis2.mir test) to make
the subsequent optimizations (like CSE) possible and simpler.

For instance, before this patch the code like this:

%b = COPY %z
...
%a = PHI %bb1, %a; %bb2, %b
could be optimized to:

%a = %b
but the code like this:

%c = COPY %z
...
%b = COPY %z
...
%a = PHI %bb1, %a; %bb2, %b; %bb3, %c
would remain unchanged.
With this patch the latter case will be optimized:

%a = %z```.

Committed on behalf of: Anton Afanasyev anton.a.afanasyev@gmail.com

Reviewers: RKSimon, MatzeB

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54839

llvm-svn: 349271
2018-12-15 14:37:01 +00:00
Simon Pilgrim bfbe510d4f Regenerate neon copy tests. NFCI.
llvm-svn: 349270
2018-12-15 14:23:18 +00:00
Simon Pilgrim 1e1fd9c761 [TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts
Differential Revision: https://reviews.llvm.org/D55600

llvm-svn: 349264
2018-12-15 11:36:36 +00:00
Kewen Lin 3ac031bb8f [Power9][NFC] add setb exploitation test case
Add an original test case for setb before the exploitation actually takes effect, later we can check the difference.

Differential Revision: https://reviews.llvm.org/D55696

llvm-svn: 349251
2018-12-15 04:39:37 +00:00
Artem Belevich 6d74bd638a [NVPTX] Lower instructions that expand into libcalls.
The change is an effort to split and refactor abandoned
D34708 into smaller parts.

Here the behaviour of unsupported instructions is changed
to match the behaviour of explicit intrinsics calls.
Currently LLVM crashes with:
> Assertion getInstruction() && "Not a call or invoke instruction!" failed.

With this patch LLVM produces a more sensible error message:
> Cannot select: ... i32 = ExternalSymbol'__foobar'

Author: Denys Zariaiev <denys.zariaiev@gmail.com>

Differential Revision: https://reviews.llvm.org/D55145

llvm-svn: 349213
2018-12-14 23:53:06 +00:00
Krzysztof Parzyszek 26d994f56e [Hexagon] Add patterns for shifts of v2i16
This fixes https://llvm.org/PR39983.

llvm-svn: 349202
2018-12-14 22:33:48 +00:00
Volkan Keles 574d737e06 [GlobalISel] LegalizerHelper: Implement fewerElementsVector for G_LOAD/G_STORE
Reviewers: aemerson, dsanders, bogner, paquette, aditya_nandakumar

Reviewed By: dsanders

Subscribers: rovka, kristof.beyls, javed.absar, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53728

llvm-svn: 349200
2018-12-14 22:11:20 +00:00
Farhana Aleen ce095c564a [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions.
Summary: Promote constant offset to immediate by recomputing the relative 13bit offset from nearby instructions.
 E.g.
  s_movk_i32 s0, 0x1800
  v_add_co_u32_e32 v0, vcc, s0, v2
  v_addc_co_u32_e32 v1, vcc, 0, v6, vcc

  s_movk_i32 s0, 0x1000
  v_add_co_u32_e32 v5, vcc, s0, v2
  v_addc_co_u32_e32 v6, vcc, 0, v6, vcc
  global_load_dwordx2 v[5:6], v[5:6], off
  global_load_dwordx2 v[0:1], v[0:1], off
  =>
  s_movk_i32 s0, 0x1000
  v_add_co_u32_e32 v5, vcc, s0, v2
  v_addc_co_u32_e32 v6, vcc, 0, v6, vcc
  global_load_dwordx2 v[5:6], v[5:6], off
  global_load_dwordx2 v[0:1], v[5:6], off offset:2048

Author: FarhanaAleen

Reviewed By: arsenm, rampitec

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D55539

llvm-svn: 349196
2018-12-14 21:13:14 +00:00
Sanjay Patel 7b776863ac [x86] add tests for extractelement of FP binops; NFC
llvm-svn: 349179
2018-12-14 19:15:54 +00:00
Sanjay Patel b7e2d6e493 [ARM] make test immune to scalarization improvements; NFC
llvm-svn: 349177
2018-12-14 18:47:04 +00:00
Sanjay Patel 4f4963b9cb [x86] make tests immune to scalarization improvements; NFC
llvm-svn: 349176
2018-12-14 18:44:16 +00:00
Daniel Sanders ec29eac5dd [globalisel][combiner] Fix r349167 for release mode bots
This test relies on -debug-only which is unavailable in non-asserts builds.

llvm-svn: 349174
2018-12-14 18:25:05 +00:00
Daniel Sanders 629db5d8e5 [globalisel][combiner] Make the CombinerChangeObserver a MachineFunction::Delegate
Summary:
This allows us to register it with the MachineFunction delegate and be
notified automatically about erasure and creation of instructions. However,
we still need explicit notification for modifications such as those caused
by setReg() or replaceRegWith().

There is a catch with this though. The notification for creation is
delivered before any operands can be added. While appropriate for
scheduling combiner work. This is unfortunate for debug output since an
opcode by itself doesn't provide sufficient information on what happened.
As a result, the work list remembers the instructions (when debug output is
requested) and emits a more complete dump later.

Another nit is that the MachineFunction::Delegate provides const pointers
which is inconvenient since we want to use it to schedule future
modification. To resolve this GISelWorkList now has an optional pointer to
the MachineFunction which describes the scope of the work it is permitted
to schedule. If a given MachineInstr* is in this function then it is
permitted to schedule work to be performed on the MachineInstr's. An
alternative to this would be to remove the const from the
MachineFunction::Delegate interface, however delegates are not permitted
to modify the MachineInstr's they receive.

In addition to this, the observer has three interface changes.
* erasedInstr() is now erasingInstr() to indicate it is about to be erased
  but still exists at the moment.
* changingInstr() and changedInstr() have been added to report changes
  before and after they are made. This allows us to trace the changes
  in the debug output.
* As a convenience changingAllUsesOfReg() and
  finishedChangingAllUsesOfReg() will report changingInstr() and
  changedInstr() for each use of a given register. This is primarily useful
  for changes caused by MachineRegisterInfo::replaceRegWith()

With this in place, both combine rules have been updated to report their
changes to the observer.

Finally, make some cosmetic changes to the debug output and make Combiner
and CombinerHelp

Reviewers: aditya_nandakumar, bogner, volkan, rtereshin, javed.absar

Reviewed By: aditya_nandakumar

Subscribers: mgorny, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D52947

llvm-svn: 349167
2018-12-14 17:50:14 +00:00
Sanjay Patel 95f90ef3b3 [AArch64] make test immune to scalarization improvements; NFC
This is explicitly implementing what the comment says rather
than relying on the implicit zext of a costant operand.

llvm-svn: 349166
2018-12-14 17:44:07 +00:00
Sanjay Patel a44dc32708 [SystemZ] make test immune to scalarization improvements; NFC
The undef operands mean this test is probably still too fragile
to accomplish what the comments suggest.

llvm-svn: 349164
2018-12-14 17:28:52 +00:00
Sanjay Patel 25fc03c5c0 [Hexagon] make test immune to scalarization improvements; NFC
llvm-svn: 349163
2018-12-14 17:23:01 +00:00
Sanjay Patel 5a97a105f8 [x86] auto-generate complete checks; NFC
llvm-svn: 349162
2018-12-14 16:49:57 +00:00
Sanjay Patel 41e8112ed6 [x86] regenerate test checks; NFC
llvm-svn: 349161
2018-12-14 16:46:21 +00:00
Sanjay Patel b7d9f9117e [x86] make tests immune to scalarization improvements; NFC
llvm-svn: 349160
2018-12-14 16:44:58 +00:00
Scott Linder de6beb02a5 Implement -frecord-command-line (-frecord-gcc-switches)
Implement options in clang to enable recording the driver command-line
in an ELF section.

Implement a new special named metadata, llvm.commandline, to support
frontends embedding their command-line options in IR/ASM/ELF.

This differs from the GCC implementation in some key ways:

* In GCC there is only one command-line possible per compilation-unit,
  in LLVM it mirrors llvm.ident and multiple are allowed.
* In GCC individual options are separated by NULL bytes, in LLVM entire
  command-lines are separated by NULL bytes. The advantage of the GCC
  approach is to clearly delineate options in the face of embedded
  spaces. The advantage of the LLVM approach is to support merging
  multiple command-lines unambiguously, while handling embedded spaces
  with escaping.

Differential Revision: https://reviews.llvm.org/D54487
Clang Differential Revision: https://reviews.llvm.org/D54489

llvm-svn: 349155
2018-12-14 15:38:15 +00:00
John Brawn 1d0d86ae40 [RegAllocGreedy] IMPLICIT_DEF values shouldn't prefer registers
It costs nothing to spill an IMPLICIT_DEF value (the only spill code that's
generated is a KILL of the value), so when creating split constraints if the
live-out value is IMPLICIT_DEF the exit constraint should be DontCare instead
of PrefReg.

Differential Revision: https://reviews.llvm.org/D55652

llvm-svn: 349151
2018-12-14 14:07:57 +00:00
Diana Picus 02c8343c75 [ARM GlobalISel] Thumb2: casts between int and ptr
Mark as legal and add tests. Nothing special to do.

llvm-svn: 349147
2018-12-14 13:45:38 +00:00
Diana Picus acca60b49e [ARM GlobalISel] Remove duplicate test. NFCI
Fixup for r349026. I forgot to delete these test functions from the
original file when I moved them to arm-legalize-exts.mir.

llvm-svn: 349146
2018-12-14 13:28:34 +00:00
Diana Picus 14dc3b2959 [ARM GlobalISel] Allow simple binary ops in Thumb2
Mark G_ADD, G_SUB, G_MUL, G_AND, G_OR and G_XOR as legal for both ARM
and Thumb2.

Extract the legalizer tests for these opcodes into another file.

Add tests for the instruction selector.

llvm-svn: 349142
2018-12-14 11:58:14 +00:00
Aakanksha Patil bc568766b2 Revert r348971: [AMDGPU] Support for "uniform-work-group-size" attribute
This patch breaks RADV (and probably RadeonSI as well)

llvm-svn: 349084
2018-12-13 21:23:12 +00:00
Matt Arsenault 934e534c47 AMDGPU/GlobalISel: Legalize/regbankselect block_addr
llvm-svn: 349081
2018-12-13 20:34:15 +00:00
Sanjay Patel 791ae69afe [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract; 2nd try
This is a retry of rL349051 (reverted at rL349056). I changed the check for dead-ness from
number of uses to an opcode test for DELETED_NODE based on existing similar code.

Differential Revision: https://reviews.llvm.org/D55655

llvm-svn: 349058
2018-12-13 17:05:01 +00:00
Simon Pilgrim b5aaa673c6 [X86][SSE] Add SSE vector imm/var shift support to SimplifyDemandedVectorEltsForTargetNode
llvm-svn: 349057
2018-12-13 16:39:29 +00:00
Sanjay Patel c56f5728ee revert rL349051: [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract
This causes an address sanitizer bot failure:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/27187/steps/check-llvm%20asan/logs/stdio

llvm-svn: 349056
2018-12-13 16:32:44 +00:00
Simon Pilgrim b0b2f1503a [X86][SSE] Fix all remaining modulo vector rotation amounts (PR38243)
There's still a couple of minor SimplifyDemandedElts regressions in some of the shift amount splats that will be fixed in future patches.

llvm-svn: 349052
2018-12-13 15:50:31 +00:00
Sanjay Patel a7b115b392 [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract
Differential Revision: https://reviews.llvm.org/D55655

llvm-svn: 349051
2018-12-13 15:44:26 +00:00
Simon Pilgrim ba91ff4a86 [X86][SSE] Fix modulo rotation amounts for v8i16/v16i16/v4i32 (PR38243)
llvm-svn: 349047
2018-12-13 15:23:09 +00:00
Daniel Cederman b5d284408e [Sparc] Use float register for integer constrained with "f" in inline asm
Summary:
Constraining an integer value to a floating point register using "f"
causes an llvm_unreachable to trigger. This patch allows i32 integers
to be placed in a single precision float register and i64 integers to
be placed in a double precision float register. This matches the behavior
of GCC.

For other types the llvm_unreachable is removed to instead trigger an
error message that points out the offending line.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D51614

llvm-svn: 349045
2018-12-13 15:13:29 +00:00
Jonas Paulsson e79b1b986d [SystemZ] Pass copy-hinted regs first from getRegAllocationHints().
When computing register allocation hints for a GRX32Bit register, make sure
that any of the hinted registers that are also copy hints are returned first
in the list.

Review: Ulrich Weigand.
llvm-svn: 349037
2018-12-13 14:37:05 +00:00
Daniel Sanders 9f3cf55e63 [mir] Serialize DILocation inline when not possible to use a metadata reference
Summary:
Sometimes MIR-level passes create DILocations that were not present in the
LLVM-IR. For example, it may merge two DILocations together to produce a
DILocation that points to line 0.

Previously, the address of these DILocations were printed which prevented the
MIR from being read back into LLVM. With this patch, DILocations will use
metadata references where possible and fall back on serializing them inline like so:
    MOV32mr %stack.0.x.addr, 1, _, 0, _, %0, debug-location !DILocation(line: 1, scope: !15)

Reviewers: aprantl, vsk, arphaman

Reviewed By: aprantl

Subscribers: probinson, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D55243

llvm-svn: 349035
2018-12-13 14:25:27 +00:00
Simon Pilgrim 320fd7383f [X86][BWI] Don't custom lower vXi8 rotations.
We always expand to shifts anyhow - test changes are just different scheduling only.

llvm-svn: 349034
2018-12-13 13:44:33 +00:00
Chen Zheng cdbd5bef6d [NFC][PowerPC] add verify-machineinstrs check
After rL349029 and rL348566, sj-ctr-loop.ll is ok for verify-machineinstrs check.

llvm-svn: 349030
2018-12-13 12:55:42 +00:00
Chen Zheng 9c6fa536e0 [PowerPC] intrinsic llvm.eh.sjlj.setjmp should not have flag isBarrier.
Differential Revision: https://reviews.llvm.org/D55499

llvm-svn: 349029
2018-12-13 12:25:20 +00:00
Diana Picus 99cd644b6c [ARM GlobalISel] Support exts and truncs for Thumb2
Mark G_SEXT, G_ZEXT and G_ANYEXT to 32 bits as legal and add support for
them in the instruction selector. This uses handwritten code again
because the patterns that are generated with TableGen are tuned for what
the DAG combiner would produce and not for simple sext/zext nodes.
Luckily, we only need to update the opcodes to use the Thumb2 variants,
everything else can be reused from ARM.

llvm-svn: 349026
2018-12-13 12:06:54 +00:00
Alex Bradbury 919f5fb8ca [RISCV] Add support for the various RISC-V FMA instruction variants
Adds support for the various RISC-V FMA instructions (fmadd, fmsub, fnmsub, fnmadd).

The criteria for choosing whether a fused add or subtract is used, as well as
whether the product is negated or not, is whether some of the arguments to the
llvm.fma.* intrinsic are negated or not. In the tests, extraneous fadd
instructions were added to avoid the negation being performed using a xor
trick, which prevented the proper FMA forms from being selected and thus
tested.

The FMA instruction patterns might seem incorrect (e.g., fnmadd: -rs1 * rs2 -
rs3), but they should be correct. The misleading names were inherited from
MIPS, where the negation happens after computing the sum.

The llvm.fmuladd.* intrinsics still do not generate RISC-V FMA instructions,
as that depends on TargetLowering::isFMAFasterthanFMulAndFAdd.

Some comments in the test files about what type of instructions are there
tested were updated, to better reflect the current content of those test
files.

Differential Revision: https://reviews.llvm.org/D54205
Patch by Luís Marques.

llvm-svn: 349023
2018-12-13 10:49:05 +00:00
Arnaud A. de Grandmaison dfe861087d [AArch64] Catch some more CMN opportunities.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33486

llvm-svn: 349022
2018-12-13 10:31:32 +00:00
Clement Courbet 76f4ae1092 [CodeGen] Allow mempcy/memset to generate small overlapping stores.
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.

Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D55365

llvm-svn: 349016
2018-12-13 09:56:19 +00:00
Matt Arsenault 577b9fc543 AMDGPU/GlobalISel: Legalize f64 fadd/fmul
llvm-svn: 349014
2018-12-13 08:27:48 +00:00
Matt Arsenault f38f483bef AMDGPU/GlobalISel: RegBankSelect some simple operations
llvm-svn: 349012
2018-12-13 08:23:51 +00:00
Matt Arsenault 7acf89a21a AMDGPU/GlobalISel: Test cleanups
Remove IR and registers sections

llvm-svn: 349011
2018-12-13 08:11:45 +00:00
Stanislav Mekhanoshin 6071e1aa58 [AMDGPU] Simplify negated condition
Optimize sequence:

  %sel = V_CNDMASK_B32_e64 0, 1, %cc
  %cmp = V_CMP_NE_U32 1, %1
  $vcc = S_AND_B64 $exec, %cmp
  S_CBRANCH_VCC[N]Z
=>
  $vcc = S_ANDN2_B64 $exec, %cc
  S_CBRANCH_VCC[N]Z

It is the negation pattern inserted by DAGCombiner::visitBRCOND() in the
rebuildSetCC().

Differential Revision: https://reviews.llvm.org/D55402

llvm-svn: 349003
2018-12-13 03:17:40 +00:00
Craig Topper d1c61861dd [X86] Don't emit MULX by default with BMI2
MULX has somewhat improved register allocation constraints compared to the legacy MUL instruction. Both output registers are encoded instead of fixed to EAX/EDX, but EDX is used as input. It also doesn't touch flags. Unfortunately, the encoding is longer.

Prefering it whenever BMI2 is enabled is probably not optimal. Choosing it should somehow be a function of register allocation constraints like converting adds to three address. gcc and icc definitely don't pick MULX by default. Not sure what if any rules they have for using it.

Differential Revision: https://reviews.llvm.org/D55565

llvm-svn: 348975
2018-12-12 21:21:31 +00:00
Craig Topper cd7d7ac0fd [X86] Move stack folding test for MULX to a MIR test. Add a MULX32 case as well
A future patch may stop using MULX by default so use MIR to ensure we're always testing MULX.

Add the 32-bit case that we couldn't do in the 64-bit mode IR test due to it being promoted to a 64-bit mul.

llvm-svn: 348972
2018-12-12 20:50:24 +00:00
Aakanksha Patil 729309cc89 [AMDGPU] Support for "uniform-work-group-size" attribute
Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute from the kernel to the called functions. Once this pass is run, all kernels, even the ones which initially did not have the attribute, will be able to indicate weather or not they have uniform work group size depending on the value of the attribute. 

Differential Revision: https://reviews.llvm.org/D50200

llvm-svn: 348971
2018-12-12 20:49:17 +00:00
Simon Pilgrim 4a641efdc1 [X86] Added missing constant pool checks. NFCI.
So the extra checks in D55600 don't look like a regression.

llvm-svn: 348966
2018-12-12 19:56:38 +00:00
Scott Linder f5b36e56fb [AMDGPU] Emit MessagePack HSA Metadata for v3 code object
Continue to present HSA metadata as YAML in ASM and when output by tools
(e.g. llvm-readobj), but encode it in Messagepack in the code object.

Differential Revision: https://reviews.llvm.org/D48179

llvm-svn: 348963
2018-12-12 19:39:27 +00:00
Craig Topper 4937adf75f [X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input.
I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that.

I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful.

Differential Revision: https://reviews.llvm.org/D55414

llvm-svn: 348959
2018-12-12 19:20:21 +00:00
Simon Pilgrim 5864ab2dc0 [X86] Added missing constant pool checks. NFCI.
So the extra checks in D55600 don't look like a regression.

llvm-svn: 348956
2018-12-12 18:53:12 +00:00
Artem Belevich f802b9324a [NVPTX] do not rely on cached subtarget info.
If a module has function references, but no functions
themselves, we may end up never calling runOnMachineFunction
and therefore would never initialize nvptxSubtarget field
which would eventually cause a crash.

Instead of relying on nvptxSubtarget being initialized by
one of the methods, retrieve subtarget info directly.

Differential Revision: https://reviews.llvm.org/D55580

llvm-svn: 348952
2018-12-12 18:31:04 +00:00
Sanjay Patel 44eaa492b8 [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA
This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds. 
That allows us to combine add ops with register moves and save some instructions. This is 
another step towards allowing add truncation in generic DAGCombiner (see D54640).

Differential Revision: https://reviews.llvm.org/D55494

llvm-svn: 348946
2018-12-12 17:58:27 +00:00
Neil Henning 76504a4c5e [AMDGPU] Extend the SI Load/Store optimizer to combine more things.
I've extended the load/store optimizer to be able to produce dwordx3
loads and stores, This change allows many more load/stores to be combined,
and results in much more optimal code for our hardware.

Differential Revision: https://reviews.llvm.org/D54042

llvm-svn: 348937
2018-12-12 16:15:21 +00:00
Simon Pilgrim f6c898e12f [TargetLowering] Add ISD::AND handling to SimplifyDemandedVectorElts
If either of the operand elements are zero then we know the result element is going to be zero (even if the other element is undef).

Differential Revision: https://reviews.llvm.org/D55558

llvm-svn: 348926
2018-12-12 13:43:07 +00:00
Simon Pilgrim 125d9b0907 Regenerate knownbits test. NFCI.
A future SimplifyDemandedBits patch will affect this code and I want to ensure the codegen diff is obvious.

llvm-svn: 348925
2018-12-12 13:21:03 +00:00
Piotr Sobczak 3732b4ce25 [AMDGPU] Set metadata access for explicit section
Summary:
This patch provides a means to set Metadata section kind
for a global variable, if its explicit section name is
prefixed with ".AMDGPU.metadata."
This could be useful to make the global variable go to
an ELF section without any section flags set.

Reviewers: dstuttard, tpr, kzhuravl, nhaehnle, t-tye

Reviewed By: dstuttard, kzhuravl

Subscribers: llvm-commits, arsenm, jvesely, wdng, yaxunl, t-tye

Differential Revision: https://reviews.llvm.org/D55267

llvm-svn: 348922
2018-12-12 11:20:04 +00:00
Diana Picus 59720b422a [ARM GlobalISel] Select load/store for Thumb2
Unfortunately we can't use TableGen for this because it doesn't yet
support predicates on the source pattern root. Therefore, add a bit of
handwritten code to the instruction selector to handle the most basic
cases.

Also mark them as legal and extract their legalizer test cases to a new
test file.

llvm-svn: 348920
2018-12-12 10:32:15 +00:00
Leonard Chan 118e53fd63 [Intrinsic] Signed Fixed Point Multiplication Intrinsic
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D54719

llvm-svn: 348912
2018-12-12 06:29:14 +00:00
Craig Topper 1fe466689b [X86] Combine vpmovdw+vpacksswb into vpmovdb.
This is similar to the combine we already have for vpmovdw+vpackuswb.

llvm-svn: 348910
2018-12-12 05:56:01 +00:00
Craig Topper 5b69b5e20a [X86] Add a few more fptosi test cases to demonstrate -x86-experimental-vector-widening legalization not combining vpacksswb+vpmovdw.
We are able to combine vpackuswb+vpmovdw, but we didn't have packsswb+vpmovdw at the time that combine was added.

llvm-svn: 348909
2018-12-12 05:55:59 +00:00
Craig Topper b51283bfd7 Fix not correct imm operand assertion for SUB32ri in X86CondBrFolding::analyzeCompare
Summary:
When doing X86CondBrFolding::analyzeCompare, it will meet the SUB32ri instruction as below to use the global address for its operand,
  %733:gr32 = SUB32ri %62:gr32(tied-def 0), @img2buf_normal, implicit-def $eflags
  JNE_1 %bb.41, implicit $eflags

so the assertion "assert(MI.getOperand(ValueIndex).isImm() && "Expecting Imm operand")" is not correct and change the assert to if make X86CondBrFolding::analyzeCompare return false as not finding the compare for this

Patch by Jianping Chen

Reviewers: smaslov, LuoYuanke, liutianle, Jianping

Reviewed By: Jianping

Subscribers: lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D54250

llvm-svn: 348853
2018-12-11 15:32:14 +00:00
Clement Courbet 8b6434bbb9 Revert r348843 "[CodeGen] Allow mempcy/memset to generate small overlapping stores."
Breaks ARM/memcpy-inline.ll

llvm-svn: 348844
2018-12-11 13:38:43 +00:00
Clement Courbet 93b3445770 [CodeGen] Allow mempcy/memset to generate small overlapping stores.
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.

Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D55365

llvm-svn: 348843
2018-12-11 13:15:56 +00:00
Simon Pilgrim f6371f5f23 [TargetLowering] Add ISD::EXTRACT_VECTOR_ELT support to SimplifyDemandedBits
Let SimplifyDemandedBits attempt to simplify all elements of a vector extraction.

Part of PR39689.

llvm-svn: 348839
2018-12-11 11:08:40 +00:00
Craig Topper 4bd93fa5bb [X86] Switch the 64-bit mulx schedule test to use inline assembly.
I'm not sure we should always prefer MULX over MUL. So making the MULX guaranteed with inline assembly.

llvm-svn: 348833
2018-12-11 07:41:06 +00:00
Heejin Ahn be5e5874f6 [WebAssembly] Add '.eventtype' directive support
Summary:
This patch supports `.eventtype` directive printing and parsing in the
same syntax with `.functype`.

Reviewers: aardappel, sbc100

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55353

llvm-svn: 348818
2018-12-11 01:11:04 +00:00
Krzysztof Parzyszek 9f003f9262 [Hexagon] Couple of fixes in optimize addressing mode
- Check if an operand is an immediate before calling getImm. Some operands
  that take constant values can actually have global symbols or other
  constant expressions.
- When a load-constant instruction can be folded into users, make sure to
  only delete it when all users have been successfully converted.

llvm-svn: 348802
2018-12-10 21:56:04 +00:00
David Green bd72be0b44 [Targets] Fixup incorrect targets in codemodel tests
llvm-svn: 348796
2018-12-10 20:55:34 +00:00
Krzysztof Parzyszek c1b2d5905a Revert "[Hexagon] Check if operand is an immediate before getImm"
This reverts r348787. The patch wasn't quite correct.

llvm-svn: 348792
2018-12-10 19:30:08 +00:00
Amara Emerson 5ec146046c [GlobalISel] Restrict G_MERGE_VALUES capability and replace with new opcodes.
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.

This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.

Differential Revisions: https://reviews.llvm.org/D53629

llvm-svn: 348788
2018-12-10 18:44:58 +00:00
Krzysztof Parzyszek c6e9380a56 [Hexagon] Check if operand is an immediate before getImm
llvm-svn: 348787
2018-12-10 18:39:47 +00:00
Simon Pilgrim fc2c9af99c [TargetLowering] Add UNDEF folding to SimplifyDemandedVectorElts
If all the demanded elements of the SimplifyDemandedVectorElts are known to be UNDEF, we can simplify to an ISD::UNDEF node.

Zero constant folding will be handled in a future patch - its a little trickier as we often have bitcasted zero values.

Differential Revision: https://reviews.llvm.org/D55511

llvm-svn: 348784
2018-12-10 18:29:46 +00:00
Neil Henning e448351b77 [AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D.
This commit changes which l1 flush instruction is used for AMDPAL and
MESA3d workloads to flush the entire l1 cache instead of just the
volatile lines.

Differential Revision: https://reviews.llvm.org/D55367

llvm-svn: 348771
2018-12-10 16:35:53 +00:00
Sanjay Patel 45ae6b50d8 [x86] add tests for LowerVSETCC with min/max; NFC
llvm-svn: 348769
2018-12-10 16:28:30 +00:00
Francis Visoiu Mistrih 0ad1af72cd [DAGCombiner] Simplify test case from r348759
Thanks Simon for pointing that out.

llvm-svn: 348765
2018-12-10 16:04:56 +00:00
Petr Pavlu 84e89ff06f [GlobalISel] Set stack protector index when translating Intrinsic::stackprotector
Record the stack protector index in MachineFrameInfo when translating
Intrinsic::stackprotector similarly as is done by SelectionDAG when
processing the same intrinsic.

Setting this index allows the Prologue/Epilogue Insertion to recognize
that the stack protection is enabled. The pass can then make sure that
the stack protector comes before local variables on the stack and
assigns potentially vulnerable objects first so they are close to the
stack protector slot.

Differential Revision: https://reviews.llvm.org/D55418

llvm-svn: 348761
2018-12-10 15:15:05 +00:00
Francis Visoiu Mistrih 753efe3584 [DAGCombiner] Use the result value type in visitCONCAT_VECTORS
This triggers an assert when combining concat_vectors of a bitcast of
merge_values.

With asserts disabled, it fails to select:
fatal error: error in backend: Cannot select: 0x7ff19d000e90: i32 = any_extend 0x7ff19d000ae8
  0x7ff19d000ae8: f64,ch = CopyFromReg 0x7ff19d000c20:1, Register:f64 %1
    0x7ff19d000b50: f64 = Register %1
In function: d

Differential Revision: https://reviews.llvm.org/D55507

llvm-svn: 348759
2018-12-10 14:31:34 +00:00
Tim Corringham 4c4d2fe280 [AMDGPU] Add new Mode Register pass
A new pass to manage the Mode register.

Currently this just manages the floating point double precision
rounding requirements, but is intended to be easily extended to
encompass all Mode register settings.

The immediate motivation comes from the requirement to use the
round-to-zero rounding mode for the 16 bit interpolation
instructions, where the rounding mode setting is shared between
16 and 64 bit operations.

llvm-svn: 348754
2018-12-10 12:06:10 +00:00
Jeremy Morse 045c67769d [DebugInfo] Emit undef DBG_VALUEs when SDNodes are optimised out
This is a fix for PR39896, where dbg.value's of SDNodes that have been
optimised out do not lead to "DBG_VALUE undef" instructions being created.
Such undef instructions are necessary to terminate earlier variable
ranges, otherwise variable values leak past the point where they're valid.

The "invalidated" flag of SDDbgValue is currently being abused to mean two
things:
 * The corresponding SDNode is now invalid
 * This SDDbgValue should not be emitted
Of which there are several legitimate combinations of meaning:
 * The SDNode has been invalidated and we should emit "DBG_VALUE undef"
 * The SDNode has been invalidated but the debug data was salvaged, don't
   emit anything for this SDDbgValue
 * This SDDbgValue has been emitted

This patch introduces distinct "Emitted" and "Invalidated" fields to the
SDDbgValue class, updates users accordingly, and generates "undef"
DBG_VALUEs for invalidated records. Awkwardly, there are circumstances
where we emit SDDbgValue's twice, specifically DebugInfo/X86/dbg-addr-dse.ll
which I've preserved.

Differential Revision: https://reviews.llvm.org/D55372

llvm-svn: 348751
2018-12-10 11:20:47 +00:00
Nikita Popov e79477895e [X86] Fix AvoidStoreForwardingBlocks pass for negative displacements
Fixes https://bugs.llvm.org/show_bug.cgi?id=39926.

The size of the first copy was computed as
std::abs(std::abs(LdDisp2) - std::abs(LdDisp1)), which results in
skipped bytes if the signs of LdDisp2 and LdDisp1 differ. As far as
I can see, this should just be LdDisp2 - LdDisp1. The case where
LdDisp1 > LdDisp2 is already handled in the code above, in which case
LdDisp2 is set to LdDisp1 and this subtraction will evaluate to
Size1 = 0, which is the correct value to skip an overlapping copy.

Differential Revision: https://reviews.llvm.org/D55485

llvm-svn: 348750
2018-12-10 10:16:50 +00:00
Craig Topper 02b614abc8 [X86] Merge addcarryx/addcarry intrinsic into a single addcarry intrinsic.
Both intrinsics do the exact same thing so we really only need one.

Earlier in the 8.0 cycle we changed the signature of this intrinsic without renaming it. But it looks difficult to get the autoupgrade code to allow me to merge the intrinsics and change the signature at the same time. So I've renamed the intrinsic slightly for the new merged intrinsic. I'm skipping autoupgrading from the previous new to 8.0 signature. I've also renamed the subborrow for consistency.

llvm-svn: 348737
2018-12-10 06:07:50 +00:00
Brian Gesiak b963c5150d [AMDGPU] Fix discarded result of addAttribute
Summary:
`llvm::AttributeList` and `llvm::AttributeSet` are immutable, and so methods
defined on these classes, such as `addAttribute`, return a new immutable
object with the attribute added. In https://reviews.llvm.org/D55217 I attempted
to annotate methods such as `addAttribute` with `LLVM_NODISCARD`, since
calling these methods has no side-effects, and so ignoring the result
that is returned is almost certainly a programmer error.

However, committing the change resulted in new warnings in the AMDGPU target.
The AMDGPU simplify libcalls pass added in https://reviews.llvm.org/D36436
attempts to add the readonly and nounwind attributes to simplified
library functions, but instead calls the `addAttribute` methods and
ignores the result.

Modify the simplify libcalls pass to actually add the nounwind and
readonly attributes. Also update the simplify libcalls test to assert
that these attributes are actually being set.

Reviewers: rampitec, vpykhtin, rnk

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55435

llvm-svn: 348732
2018-12-09 21:56:50 +00:00
Craig Topper 2b09d17d93 [X86] If the carry input to an addcarry/subborrow intrinsic is known to be 0, emit a flag setting ADD/SUB instead of ADC/SBB.
Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother.

This should go a long way towards fixing PR24545.

llvm-svn: 348727
2018-12-09 18:02:37 +00:00
Sanjay Patel 099beb25e4 [x86] regenerate test checks; NFC
llvm-svn: 348723
2018-12-09 14:47:53 +00:00
Sanjay Patel 19bc850220 [x86] don't try to convert add with undef operands to LEA
The existing code tries to handle an undef operand while transforming an add to an LEA, 
but it's incomplete because we will crash on the i16 test with the debug output shown below. 
It's better to just give up instead. Really, GlobalIsel should have folded these before we 
could get into trouble.

# Machine code for function add_undef_i16: NoPHIs, TracksLiveness, Legalized, RegBankSelected, Selected

bb.0 (%ir-block.0):
  liveins: $edi
  %1:gr32 = COPY killed $edi
  %0:gr16 = COPY %1.sub_16bit:gr32
  %5:gr64_nosp = IMPLICIT_DEF
  %5.sub_16bit:gr64_nosp = COPY %0:gr16
  %6:gr64_nosp = IMPLICIT_DEF
  %6.sub_16bit:gr64_nosp = COPY %2:gr16
  %4:gr32 = LEA64_32r killed %5:gr64_nosp, 1, killed %6:gr64_nosp, 0, $noreg
  %3:gr16 = COPY killed %4.sub_16bit:gr32
  $ax = COPY killed %3:gr16
  RET 0, implicit killed $ax

# End machine code for function add_undef_i16.

*** Bad machine code: Reading virtual register without a def ***
- function:    add_undef_i16
- basic block: %bb.0  (0x7fe6cd83d940)
- instruction: %6.sub_16bit:gr64_nosp = COPY %2:gr16
- operand 1:   %2:gr16
LLVM ERROR: Found 1 machine code errors.

Differential Revision: https://reviews.llvm.org/D54710

llvm-svn: 348722
2018-12-09 14:40:37 +00:00
Nikita Popov 3192449412 [X86] Add test for PR39926; NFC
The test file shows a case where the avoid store forwarding block
pass misses to copy a range (-1..1) when the load displacement
changes sign.

Baseline test for D55485.

llvm-svn: 348712
2018-12-09 12:02:56 +00:00
Sanjay Patel e767bf4468 [DAGCombiner] re-enable truncation of binops
This is effectively re-committing the changes from:
rL347917 (D54640)
rL348195 (D55126)
...which were effectively reverted here:
rL348604
...because the code had a bug that could induce infinite looping
or eventual out-of-memory compilation.

The bug was that this code did not guard against transforming
opaque constants. More details are in the post-commit mailing
list thread for r347917. A reduced test for that is included
in the x86 bool-math.ll file. (I wasn't able to reduce a PPC
backend test for this, but it was almost the same pattern.)

Original commit message for r347917:

The motivating case for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=32023
and the corresponding rot16.ll regression tests.

Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc
sequences that don't get folded in IR.

As the TODO comments suggest, there will be regressions if we extend this (for x86,
we mostly seem to be missing LEA opportunities, but there are likely vector folds
missing too). I think those should be considered existing bugs because this is the
same transform that we do as an IR canonicalization in instcombine. We just need
more tests to make those visible independent of this patch.

llvm-svn: 348706
2018-12-08 16:07:38 +00:00
Sanjay Patel 04461ee821 [x86] add 32-bit RUN for tests and test with opaque constants; NFC
The opaque constant test is reduced from a Chrome file that
infinite-looped with rL347917.

llvm-svn: 348705
2018-12-08 15:34:09 +00:00
Craig Topper 531103f622 [X86] Remove the XFAILed test added in r348620
It seems to be unexpectedly passing on some bots probably because it requires asserts to fail, but doesn't say that. But we already have a patch in review to make it not xfail so I'd rather just focus on getting it passing rather than trying to figure out an unexpected pass.

llvm-svn: 348661
2018-12-07 22:16:40 +00:00
Matt Arsenault b5613ecf17 AMDGPU: Fix offsets for < 4-byte aggregate kernel arguments
We were still using the rounded down offset and alignment even though
they aren't handled because you can't trivially bitcast the loaded
value.

llvm-svn: 348658
2018-12-07 22:12:17 +00:00
Jessica Paquette cc4b6920b3 [GlobalISel] Add IR translation support for the @llvm.log10 intrinsic
This adds IR translation support for @llvm.log10 and updates relevant tests.

https://reviews.llvm.org/D55392

llvm-svn: 348657
2018-12-07 22:08:02 +00:00
Krzysztof Parzyszek b754f7a2e0 [Hexagon] Fix post-ra expansion of PS_wselect
llvm-svn: 348655
2018-12-07 22:00:53 +00:00
Pete Cooper 782a490dfb Follow-up from r348441 to add the rest of the objc ARC intrinsics.
This adds the other intrinsics used by ARC and codegen's them to their respective runtime methods.

llvm-svn: 348646
2018-12-07 21:28:47 +00:00
Matt Arsenault fab7d27f0e AMDGPU: Use gfx9 instead of gfx8 in a test
They are the same for the purposes of the tests,
but it's much easier to write check lines for
the memory instructions with offsets.

llvm-svn: 348643
2018-12-07 20:57:43 +00:00
Matt Arsenault ce2e053134 AMDGPU: Allow f32 types for llvm.amdgcn.s.buffer.load
llvm-svn: 348625
2018-12-07 18:41:39 +00:00
Craig Topper ba3ab78291 [X86] Initialize and Register X86CondBrFoldingPass
To make X86CondBrFoldingPass can be run with --run-pass option, this can test one wrong assertion on analyzeCompare function for SUB32ri when its operand is not imm

Patch by Jianping Chen

Differential Revision: https://reviews.llvm.org/D55412

llvm-svn: 348620
2018-12-07 18:10:34 +00:00
Matt Arsenault ca8eb0b672 AMDGPU: Remove llvm.SI.tbuffer.store
llvm-svn: 348619
2018-12-07 18:03:47 +00:00
Matt Arsenault 3ff764a944 AMDGPU: Remove llvm.SI.buffer.load.dword
llvm-svn: 348616
2018-12-07 17:46:20 +00:00
Matt Arsenault aa9bcd56b1 AMDGPU: Remove llvm.AMDGPU.kill
This is the last of the old AMDGPU intrinsics.

llvm-svn: 348615
2018-12-07 17:46:16 +00:00
Sanjay Patel 3af4ae9735 [DAGCombiner] disable truncation of binops by default
As discussed in the post-commit thread of r347917, this
transform is fighting with an existing transform causing
an infinite loop or out-of-memory, so this is effectively 
reverting r347917 and its follow-up r348195 while we
investigate the bug.

llvm-svn: 348604
2018-12-07 15:47:52 +00:00
Graham Sellers b297379ef0 [AMDGPU] Shrink scalar AND, OR, XOR instructions
This change attempts to shrink scalar AND, OR and XOR instructions which take an immediate that isn't inlineable.

It performs:
AND s0, s0, ~(1 << n) -> BITSET0 s0, n
OR s0, s0, (1 << n) -> BITSET1 s0, n
AND s0, s1, x -> ANDN2 s0, s1, ~x
OR s0, s1, x -> ORN2 s0, s1, ~x
XOR s0, s1, x -> XNOR s0, s1, ~x

In particular, this catches setting and clearing the sign bit for fabs (and x, 0x7ffffffff -> bitset0 x, 31 and or x, 0x80000000 -> bitset1 x, 31).

llvm-svn: 348601
2018-12-07 15:33:21 +00:00
Tim Northover 4bf394be3a ARM: use correct offset from base pointer (r6) in call frame regions.
When we had dynamic call frames (i.e. sp adjustment around each call) we
were including that adjustment into offsets calculated based on r6, even
though it's only sp that changes. This led to incorrect stack slot
accesses.

llvm-svn: 348591
2018-12-07 13:43:55 +00:00
David Green ca29c271d2 [Targets] Add errors for tiny and kernel codemodel on targets that don't support them
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.

Differential Revision: https://reviews.llvm.org/D50141

llvm-svn: 348585
2018-12-07 12:10:23 +00:00
Simon Pilgrim d498dee7a2 [SelectionDAG] Don't pass on DemandedElts when handling SCALAR_TO_VECTOR
Fixes an assertion:

llc: lib/CodeGen/SelectionDAG/SelectionDAG.cpp:2200: llvm::KnownBits llvm::SelectionDAG::computeKnownBits(llvm::SDValue, const llvm::APInt&, unsigned int) const: Assertion `(!Op.getValueType().isVector() || NumElts == Op.getValueType().getVectorNumElements()) && "Unexpected vector size"' failed.

Committed on behalf of: @pendingchaos (Rhys Perry)

Differential Revision: https://reviews.llvm.org/D55223

llvm-svn: 348574
2018-12-07 09:18:44 +00:00
Zi Xuan Wu cf4d477b0b [PowerPC] Fix assert from machine verify pass that missing undef register flag
Fix assert about using an undefined physical register in machine instruction verify pass. 
The reason is that register flag undef is missing when doing transformation from If Conversion Pass.

```
Bad machine code: Using an undefined physical register 
- function:    func_65
- basic block: %bb.0 entry (0x10024740738)
- instruction: BCLR killed $cr5lt, implicit $lr8, implicit $rm, implicit undef $x3
- operand 0:   killed $cr5lt
LLVM ERROR: Found 1 machine code errors.
```

There are also other existing testcases with same issue. So I add -verify-machineinstrs option to open verifying.

Differential Revision: https://reviews.llvm.org/D55408

llvm-svn: 348566
2018-12-07 05:25:16 +00:00
Sanjay Patel c6441c8547 [DAGCombiner] use root SDLoc for all nodes created by logic fold
If this is not a valid way to assign an SDLoc, then we get this
wrong all over SDAG.

I don't know enough about the SDAG to explain this. IIUC, theoretically,
debug info is not supposed to affect codegen. But here it has clearly
affected 3 different targets, and the x86 change is an actual improvement.

llvm-svn: 348552
2018-12-07 00:01:57 +00:00
Sanjay Patel 70af85b0ac [DAGCombiner] don't group bswap with casts in logic hoisting fold
This was probably organized as it was because bswap is a unary op.
But that's where the similarity to the other opcodes ends. We should
not limit this transform to scalars, and we should not try it if
either input has other uses. This is another step towards trying to
clean this whole function up to prevent it from causing infinite loops
and memory explosions. 

Earlier commits in this series:
rL348501
rL348508
rL348518

llvm-svn: 348534
2018-12-06 22:10:44 +00:00
Sanjay Patel b7156fb504 [x86] add test for vector bitwise-logic-of-bswaps; NFC
llvm-svn: 348530
2018-12-06 21:56:30 +00:00
Andrea Di Biagio 52a2bac583 [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef.
This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes:

concat_vectors( bitcast (scalar_to_vector %A), UNDEF)
    --> bitcast (scalar_to_vector %A)

This patch only partially addresses PR39257. In particular, it is enough to fix
one of the two problematic cases mentioned in PR39257. However, it is not enough
to fix the original test case posted by Craig; that particular case would
probably require a more complicated approach (and knowledge about used bits).

Before this patch, we used to generate the following code for function PR39257
(-mtriple=x86_64 , -mattr=+avx):

vmovsd  (%rdi), %xmm0           # xmm0 = mem[0],zero
vxorps  %xmm1, %xmm1, %xmm1
vblendps        $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3]
vmovaps %ymm0, (%rsi)
vzeroupper
retq

Now we generate this:

vmovsd  (%rdi), %xmm0           # xmm0 = mem[0],zero
vmovaps %ymm0, (%rsi)
vzeroupper
retq

As a side note: that VZEROUPPER is completely redundant...

I guess the vzeroupper insertion pass doesn't realize that the definition of
%xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on
%-mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion
%pass is disabled.

Differential Revision: https://reviews.llvm.org/D55274

llvm-svn: 348522
2018-12-06 19:55:38 +00:00
Sanjay Patel bfc7ffa40f [DAGCombiner] don't hoist logic op if operands have other uses, part 2
The PPC test with 2 extra uses seems clearly better by avoiding this transform. 
With 1 extra use, we also prevent an extra register move (although that might
be an RA problem). The general rule should be to only make a change here if
it is always profitable. The x86 diffs are all neutral.

llvm-svn: 348518
2018-12-06 19:18:56 +00:00
Sanjay Patel 273b778997 [PowerPC] add tests for hoisting bitwise logic; NFC
llvm-svn: 348516
2018-12-06 19:05:19 +00:00
Adrian Prantl fbeeac0e1e Reapply "Adapt gcov to changes in CFE."
This reverts commit r348203 and reapplies D55085 with an additional
GCOV bugfix to make the change NFC for relative file paths in .gcno files.

Thanks to Ilya Biryukov for additional testing!

Original commit message:

    Update Diagnostic handling for changes in CFE.

    The clang frontend no longer emits the current working directory for
    DIFiles containing an absolute path in the filename: and will move the
    common prefix between current working directory and the file into the
    directory: component.

    https://reviews.llvm.org/D55085

llvm-svn: 348512
2018-12-06 18:44:48 +00:00
Sanjay Patel c3717cd0d5 [DAGCombiner] don't hoist logic op if operands have other uses
The AVX512 diffs are neutral, but the bswap test shows a clear overreach in 
hoistLogicOpWithSameOpcodeHands(). If we don't check for other uses, we can 
increase the instruction count.

This could also fight with transforms trying to go in the opposite direction 
and possibly blow up/infinite loop. This might be enough to solve the bug 
noted here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html

I did not add the hasOneUse() checks to all opcodes because I see a perf 
regression for at least one opcode. We may decide that's irrelevant in the
face of potential compiler crashing, but I'll see if I can salvage that first.

llvm-svn: 348508
2018-12-06 18:16:32 +00:00
Sanjay Patel db6396b892 [x86] add test for hoistLogicOpWithSameOpcodeHands with extra uses; NFC
llvm-svn: 348506
2018-12-06 18:06:10 +00:00
Sam Parker 993326da19 [ARM][NFC] Adding another test for armcgp
llvm-svn: 348489
2018-12-06 15:13:44 +00:00
Nicolai Haehnle ca4a32945f AMDGPU: Generate VALU ThreeOp Integer instructions
Summary:
Original patch by: Fabian Wahlster <razor@singul4rity.com>

Change-Id: I148f692a88432541fad468963f58da9ddf79fac5

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, b-sumner, llvm-commits

Differential Revision: https://reviews.llvm.org/D51995

llvm-svn: 348488
2018-12-06 14:33:40 +00:00
Sam Parker 9fa793dbe4 [ARM][NFC] Added extra arm-cgp test
llvm-svn: 348482
2018-12-06 12:58:58 +00:00
Clement Courbet fee1040f04 [X86][NFC] Convert memcpy/memset tests to update_llc_test_checks.
llvm-svn: 348477
2018-12-06 10:07:12 +00:00
Clement Courbet 52d382488f [X86][NFC] Add more tests for memset.
llvm-svn: 348465
2018-12-06 08:48:06 +00:00
Matthias Braun d041212c07 AArch64: Fix invalid CCMP emission
The code emitting AND-subtrees used to check whether any of the operands
was an OR in order to figure out if the result needs to be negated.
However the OR could be hidden in further subtrees and not immediately
visible.

Change the code so that canEmitConjunction() determines whether the
result of the generated subtree needs to be negated. Cleanup emission
logic to use this. I also changed the code a bit to make all negation
decisions early before we actually emit the subtrees.

This fixes http://llvm.org/PR39550

Differential Revision: https://reviews.llvm.org/D54137

llvm-svn: 348444
2018-12-06 01:40:23 +00:00
Pete Cooper e13d0992dc Add objc.* ARC intrinsics and codegen them to their runtime methods.
Reviewers: erik.pilkington, ahatanak

Differential Revision: https://reviews.llvm.org/D55233

llvm-svn: 348441
2018-12-06 00:52:54 +00:00
Amara Emerson a0b15d8f3e [GlobalISel] Introduce G_BUILD_VECTOR, G_BUILD_VECTOR_TRUNC and G_CONCAT_VECTOR opcodes.
These opcodes are intended to subsume some of the capability of G_MERGE_VALUES,
as it was too powerful and thus complex to add deal with throughout the GISel
pipeline.

G_BUILD_VECTOR creates a vector value from a sequence of uniformly typed
scalar values. G_BUILD_VECTOR_TRUNC is a special opcode for handling scalar
operands which are larger than the destination vector element type, and
therefore does an implicit truncate.

G_CONCAT_VECTOR creates a vector by concatenating smaller, uniformly typed,
vectors together.

These will be used in a subsequent commit. This commit just adds the initial
infrastructure.

Differential Revision: https://reviews.llvm.org/D53594

llvm-svn: 348430
2018-12-05 23:53:30 +00:00
Jessica Paquette 962b3ae659 [MachineOutliner] Outline functions by order of benefit
Mostly NFC, only change is the order of outlined function names.

Loop over the outlined functions instead of walking the candidate list.

This is a bit easier to understand. It's far more natural to create a function,
then replace all of its occurrences with calls than the other way around.

The functions outlined after this do not change, but their names will be
decided by their benefit. E.g, OUTLINED_FUNCTION_0 will now always be the
most beneficial function, rather than the first one seen.

This makes it easier to enforce an ordering on the outlined functions. So,
this also adds a test to make sure that the ordering works as expected.

llvm-svn: 348414
2018-12-05 21:36:04 +00:00
Krzysztof Parzyszek 8eb394d764 [Hexagon] Add intrinsics for Hexagon V66
llvm-svn: 348413
2018-12-05 21:14:51 +00:00
Krzysztof Parzyszek 545a68ca4b [Hexagon] Add instruction definitions for Hexagon V66
llvm-svn: 348411
2018-12-05 21:01:07 +00:00
Simon Pilgrim c10590f6f9 [X86][SSE] Fix a copy+paste typo that was folding the sext/zext of partial vectors
llvm-svn: 348403
2018-12-05 19:32:19 +00:00
Matt Arsenault b3e14de487 AMDGPU: Fix using old address spaces in some tests
llvm-svn: 348385
2018-12-05 17:34:59 +00:00
Sanjay Patel 33a448f935 [DAGCombiner] don't try to extract a fraction of a vector binop and crash (PR39893)
Because we're potentially peeking through a bitcast in this transform,
we need to use overall bitwidths rather than number of elements to
determine when it's safe to proceed.

Should fix:
https://bugs.llvm.org/show_bug.cgi?id=39893

llvm-svn: 348383
2018-12-05 17:10:30 +00:00
Andrea Di Biagio 3998bebc12 [X86] Add test case to show missed opportunity to combine a concat_vector into a scalar_to_vector. NFC
This is a test for D55274.

llvm-svn: 348380
2018-12-05 16:23:27 +00:00
Chandler Carruth 71c14a36a2 [SLH] Fix a nasty bug in SLH.
Whenever we effectively take the address of a basic block we need to
manually update that basic block to reflect that fact or later passes
such as tail duplication and tail merging can break the invariants of
the code. =/ Sadly, there doesn't appear to be any good way of
automating this or even writing a reasonable assert to catch it early.

The change seems trivially and obviously correct, but sadly the only
really good test case I have is 1000s of basic blocks. I've tried
directly writing a test case that happens to make tail duplication do
something that crashes later on, but this appears to require an
*amazingly* complex set of conditions that I've not yet reproduced.

The change is technically covered by the tests because we mark the
blocks as having their address taken, but that doesn't really count as
properly testing the functionality.

llvm-svn: 348374
2018-12-05 15:42:11 +00:00
Chandler Carruth e3ea164659 [SLH] Regenerate tests with --no_x86_scrub_rip to restore the higher
fidelity checking of RIP-based references to basic blocks and other
labels.

These labels are super important for SLH tests so we should keep them
readable in the test cases.

llvm-svn: 348373
2018-12-05 15:41:13 +00:00
Valery Pykhtin 5b4db77b13 [AMDGPU]: Turn on the DPP combiner by default
Differential revision: https://reviews.llvm.org/D55314

llvm-svn: 348371
2018-12-05 15:21:17 +00:00
Simon Pilgrim 180639afe5 [SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467)
This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions.

Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code.

Differential Revision: https://reviews.llvm.org/D54698

llvm-svn: 348353
2018-12-05 11:12:12 +00:00
Diana Picus 8a1b4f57c9 [ARM GlobalISel] Implement call lowering for Thumb2
The only things that are different from arm are:
* different opcodes for calls and returns
* Thumb calls take predicate operands

llvm-svn: 348347
2018-12-05 10:35:28 +00:00
Saleem Abdulrasool efd2cb8a0d AArch64: support funclets in fastcall and swift_call
Functions annotated with `__fastcall` or `__attribute__((__fastcall__))`
or `__attribute__((__swiftcall__))` may contain SEH handlers even on
Win64.  This matches the behaviour of cl which allows for
`__try`/`__except` inside a `__fastcall` function.  This was detected
while trying to self-host clang on Windows ARM64.

llvm-svn: 348337
2018-12-05 07:09:20 +00:00
Craig Topper 3991089816 [X86] Add narrow vector test cases to vector-reduce* tests. Add copies of the tests with -x86-experimental-vector-widening-legalization
llvm-svn: 348334
2018-12-05 06:29:44 +00:00
Craig Topper 6934202dc0 [MachineLICM][X86][AMDGPU] Fix subtle bug in the updating of PhysRegClobbers in post-RA LICM
It looks like MCRegAliasIterator can visit the same physical register twice. When this happens in this code in LICM we end up setting the PhysRegDef and then later in the same loop visit the register again. Now we see that PhysRegDef is set from the earlier iteration so now set PhysRegClobber.

This patch splits the loop so we have one that uses the previous value of PhysRegDef to update PhysRegClobber and second loop that updates PhysRegDef.

The X86 atomic test is an improvement. I had to add sideeffect to the two shrink wrapping tests to prevent hoisting from occurring. I'm not sure about the AMDGPU tests. It looks like the branch instruction changed at end the of the loops. And in the branch-relaxation test I think there is now "and vcc, exec, -1" instruction that wasn't there before.

Differential Revision: https://reviews.llvm.org/D55102

llvm-svn: 348330
2018-12-05 03:41:26 +00:00
Amara Emerson 8547f4fb7f [AArch64][GlobalISel] Re-enable selection of volatile loads.
We previously disabled this in r323371 because of a bug where we selected an
extending load, but didn't delete the old G_LOAD, resulting in two loads being
generated for volatile loads.

Since we now have dedicated G_SEXTLOAD/G_ZEXTLOAD operations, and that the
tablegen patterns should no longer be able to select (ext(load x)) patterns, it
should be safe to re-enable it.

The old test case should still work as expected.

llvm-svn: 348320
2018-12-05 00:03:09 +00:00
Stefan Pintilie 46f840f286 [PowerPC] Make no-PIC default to match GCC - LLVM
Change the default for PowerPC LE to -fno-PIC.

Differential Revision: https://reviews.llvm.org/D53383

llvm-svn: 348298
2018-12-04 20:14:57 +00:00
Matt Arsenault b17241b12d Move llc-start-stop-instance to x86
Avoid bot failures where the host pass
setup might not have 2 dead-mi-elimination runs

llvm-svn: 348290
2018-12-04 18:19:08 +00:00
Matt Arsenault 43153024ab MIR: Add method to stop after specific runs of passes
Currently if you use -{start,stop}-{before,after}, it picks
the first instance with the matching pass name. If you run
the same pass multiple times, there's no way to distinguish them.

Allow specifying a run index wih ,N to specify which you mean.

llvm-svn: 348285
2018-12-04 17:45:12 +00:00
Simon Pilgrim 07843640d5 [X86][SSE] Add SimplifyDemandedBitsForTargetNode handling for MOVMSK
Moves existing SimplifyDemandedBits call out of combineMOVMSK and add SimplifyDemandedVectorElts call based on the sign bits we need.

llvm-svn: 348282
2018-12-04 16:52:32 +00:00
Ilya Biryukov 449a7f0dbb Revert "Adapt gcov to changes in CFE."
This reverts commit r348203.
Reason: this produces absolute paths in .gcno files, breaking us
internally as we rely on them being consistent with the filenames passed
in the command line.

Also reverts r348157 and r348155 to account for revert of r348154 in
clang repository.

llvm-svn: 348279
2018-12-04 16:30:31 +00:00
Simon Pilgrim e82c3dab12 [X86][SSE] Add MOVMSK demandedbits/elts tests
llvm-svn: 348277
2018-12-04 16:01:25 +00:00
Clement Courbet 7925d58eae [X86][NFC] Add more constant-size memcmp tests.
llvm-svn: 348257
2018-12-04 12:35:51 +00:00
Simon Pilgrim 0add090e24 [TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686)
PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction.

The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment.

The X87 cases don't need the strict flag but generates much nicer code with it....

Differential Revision: https://reviews.llvm.org/D53794

llvm-svn: 348251
2018-12-04 11:21:30 +00:00
Simon Pilgrim 666261cdc8 [TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodes
Add support for ISD::*_EXTEND and ISD::*_EXTEND_VECTOR_INREG opcodes.

The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch.

llvm-svn: 348246
2018-12-04 10:41:06 +00:00
Sanjin Sijaric dc6403d133 [ARM64][Windows] Fix local stack size for funclets
The comment was misplaced, and the code didn't do what the comment indicated,
namely ignoring the varargs portion when computing the local stack size of a
funclet in emitEpilogue.  This results in incorrect offset computations within
funclets that are contained in vararg functions.

Differential Revision: https://reviews.llvm.org/D55096

llvm-svn: 348222
2018-12-04 00:54:52 +00:00
Jessica Paquette bce2086ad1 [MachineOutliner] Move stack instr check logic to getOutliningCandidateInfo
This moves the stack check logic into a lambda within getOutliningCandidateInfo.

This allows us to be less conservative with stack checks. Whether or not a
stack instruction is safe to outline is dependent on the frame variant and call
variant of the outlined function; only in cases where we modify the stack can
these be unsafe.

So, if we move that logic later, when we're looking at an individual candidate,
we can make better decisions here.

This gives some code size savings as a result.

llvm-svn: 348220
2018-12-04 00:31:55 +00:00
Krzysztof Parzyszek 44c1f81b27 [Hexagon] Switch to auto-generated intrinsic definitions and patterns
llvm-svn: 348206
2018-12-03 22:40:36 +00:00
Sanjay Patel d24f63477d [DAGCombiner] narrow truncated vector binops when legal
This is the smallest vector enhancement I could find to D54640.
Here, we're allowing narrowing to only legal vector ops because we'll see
regressions without that. All of the test diffs are wins from what I can tell.
With AVX/AVX512, we can shrink ymm/zmm ops to xmm.

x86 vector multiplies are the problem case that we're avoiding due to the
patchwork ISA, and it's not clear to me if we can dance around those
regressions using TLI hooks or if we need preliminary patches to plug those
holes.

Differential Revision: https://reviews.llvm.org/D55126

llvm-svn: 348195
2018-12-03 21:57:35 +00:00
Jessica Paquette 2accb31690 [MachineOutliner] Drop candidates that require fixups if it's beneficial
If it's a bigger code size win to drop candidates that require stack fixups
than to demote every candidate to that variant, the outliner should do that.

This happens if the number of bytes taken by calls to functions that don't
require fixups, plus the number of bytes that'd be left is less than the
number of bytes that it'd take to emit a save + restore for all candidates.

Also add tests for each possible new behaviour.

- machine-outliner-compatible-candidates shows that when we have candidates
that don't use the stack, we can use the default call variant along with the
no save/regsave variant.

- machine-outliner-all-stack shows that when it's better to fix up the stack,
we still will demote all candidates to that case

- machine-outliner-drop-stack shows that we can discard candidates that
require stack fixups when it would be beneficial to do so.

llvm-svn: 348168
2018-12-03 19:11:27 +00:00
Craig Topper 5440b63fa8 [X86] Teach LowerMUL/LowerMULH for vXi8 to unpack constant RHS.
Summary:
We need to unpackl and unpackh the operands to use two vXi16 multiplies. Previously it looks like the low unpack would get constant folded at least in the 128-bit case after shuffle lowering turned the unpackl into ZERO_EXTEND_VECTOR_INREG and X86 custom DAG combined it. The same doesn't happen for the high half. So we'd load a constant and then shuffle it. But the low half would just be loaded and used by the multiply directly.

After this patch we now end up with a constant pool entry for the low and high unpacks separately with no shuffle operations.

This is a step towards removing custom constant folding for ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG in the X86 backend.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55165

llvm-svn: 348159
2018-12-03 18:26:27 +00:00
Craig Topper e35b01f8ea [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8.
Summary:
Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction.

The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54836

llvm-svn: 348158
2018-12-03 18:26:24 +00:00
Adrian Prantl 0f873eb80a Update Diagnostic handling for changes in CFE.
The clang frontend no longer emits the current working directory for
DIFiles containing an absolute path in the filename: and will move the
common prefix between current working directory and the file into the
directory: component.

https://reviews.llvm.org/D55085

llvm-svn: 348155
2018-12-03 17:55:29 +00:00
Simon Pilgrim fb39916048 Fix line endings. NFCI.
llvm-svn: 348146
2018-12-03 14:55:09 +00:00
Ron Lieberman 16de4fd2eb [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos
The introduction of S_{ADD|SUB}_U64_PSEUDO instructions which are decomposed
into VOP3 instruction pairs for S_ADD_U64_PSEUDO:
  V_ADD_I32_e64
  V_ADDC_U32_e64
and for S_SUB_U64_PSEUDO
  V_SUB_I32_e64
  V_SUBB_U32_e64
preclude the use of SDWA to encode a constant.
SDWA: Sub-Dword addressing is supported on VOP1 and VOP2 instructions,
but not on VOP3 instructions.

We desire to fold the bit-and operand into the instruction encoding
for the V_ADD_I32 instruction. This requires that we transform the
VOP3 into a VOP2 form of the instruction (_e32).
  %19:vgpr_32 = V_AND_B32_e32 255,
      killed %16:vgpr_32, implicit $exec
  %47:vgpr_32, %49:sreg_64_xexec = V_ADD_I32_e64
      %26.sub0:vreg_64, %19:vgpr_32, implicit $exec
 %48:vgpr_32, dead %50:sreg_64_xexec = V_ADDC_U32_e64
      %26.sub1:vreg_64, %54:vgpr_32, killed %49:sreg_64_xexec, implicit $exec

which then allows the SDWA encoding and becomes
  %47:vgpr_32 = V_ADD_I32_sdwa
      0, %26.sub0:vreg_64, 0, killed %16:vgpr_32, 0, 6, 0, 6, 0,
      implicit-def $vcc, implicit $exec
  %48:vgpr_32 = V_ADDC_U32_e32
      0, %26.sub1:vreg_64, implicit-def $vcc, implicit $vcc, implicit $exec


Differential Revision: https://reviews.llvm.org/D54882

llvm-svn: 348132
2018-12-03 13:04:54 +00:00
Tim Northover 5745b6ac3b ARM: use target-specific SUBS node when combining cmp with cmov.
This has two positive effects. First, using a custom node prevents
recombination leading to an infinite loop since the output DAG is notionally a
little more complex than the input one. Using a flag-setting instruction also
allows the subtraction to be folded with the related comparison more easily.

https://reviews.llvm.org/D53190

llvm-svn: 348122
2018-12-03 11:16:21 +00:00
Petr Pavlu d336c4eb61 [GlobalISel] Fix test irtranslator-stackprotect-check.ll
Fix for commit r347862. Use correct AArch64 triple in test
CodeGen/AArch64/GlobalISel/irtranslator-stackprotect-check.ll.

llvm-svn: 348111
2018-12-03 09:28:28 +00:00
Sjoerd Meijer 5afc957eba [ARM] FP16: support vld1.16 for vector loads with post-increment
Differential Revision: https://reviews.llvm.org/D55112

llvm-svn: 348110
2018-12-03 08:26:34 +00:00
Kang Zhang 51986417f9 [PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D54738

llvm-svn: 348109
2018-12-03 03:32:57 +00:00
Craig Topper 959b415e2f [X86] Add a DAG combine to turn stores of vXi1 on pre-avx512 targets into a bitcast and a store of a iX scalar.
llvm-svn: 348104
2018-12-02 19:47:14 +00:00
Sanjay Patel b205606d3e [SelectionDAG] fold constant with undef vector per element
This makes the SDAG behavior consistent with the way we do this in IR.
It's possible that we were getting the wrong answer before. For example,
'xor undef, undef --> 0' but 'xor undef, C' --> undef. 

But the most practical improvement is likely as shown in the tests here - 
for FP, we were overconstraining undef lanes to NaN, and that can prevent 
vector simplifications/narrowing (see D51553).

llvm-svn: 348090
2018-12-02 13:48:42 +00:00
Craig Topper 4bb077910a [X86] Add custom type legalization for v2i32/v4i16/v8i8->mmx bitcasts to avoid a store/load to/from the stack.
Widen the input to a 128 bit vector by padding with undef elements. Then use a movdq2q to convert from xmm register to mmx register.

llvm-svn: 348086
2018-12-02 05:46:50 +00:00
Craig Topper ec096a1dae [X86] Custom type legalize v2i32/v4i16/v8i8->i64 bitcasts in 64-bit mode similar to what's done when the destination is f64.
The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away.

By custom legalizing it we can avoid this churn and maybe produce better code.

llvm-svn: 348085
2018-12-02 05:46:48 +00:00
Craig Topper eff43f6ae3 [X86] Add vXi8 division/remainder by non-splat constant test cases to prepare for an upcoming patch.
llvm-svn: 348082
2018-12-01 21:53:08 +00:00
Jessica Paquette 9a7103b0f8 [MachineOutliner][AArch64] Improve checks for stack instructions
If we know that we'll definitely save LR to a register, there's no reason to
pre-check whether or not a stack instruction is unsafe to fix up.

This makes it so that we check for that condition before mapping instructions.

This allows us to outline more, since we don't pessimise as many instructions.

Also update some tests, since we outline more.

llvm-svn: 348081
2018-12-01 21:24:06 +00:00
Jessica Paquette adcc410f65 Replace w16/w17 in machine-outliner.mir with w11/w12
These registers should not be used here, since they are interprocedural
scratch registers in AArch64.

llvm-svn: 348080
2018-12-01 21:23:58 +00:00
Craig Topper f4b13927e7 [X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1
Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55138

llvm-svn: 348079
2018-12-01 19:26:31 +00:00
Graham Sellers ba559ac058 [AMDGPU] Split 64-Bit XNOR to 64-Bit NOT/XOR
The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit.

Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it.

Differential: https://reviews.llvm.org/D55071
llvm-svn: 348075
2018-12-01 12:27:53 +00:00
Simon Pilgrim e017ed3245 [SelectionDAG] Improve SimplifyDemandedBits to SimplifyDemandedVectorElts simplification
D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element.

This patch relaxes this to demanding an element if we need any bit from it.

Differential Revision: https://reviews.llvm.org/D54761

llvm-svn: 348073
2018-12-01 12:08:55 +00:00
Craig Topper 2d6324c3cb [X86] Remove stale FIXME from test case. NFC
This was fixed in r346581. I just forgot to remove it.

llvm-svn: 348069
2018-12-01 07:45:36 +00:00
Alex Bradbury 757d296222 [RISCV] Remove RV64I SLLW/SRLW/SRAW patterns and add new test cases
As noted by Eli Friedman <https://reviews.llvm.org/D52977?id=168629#1315291>, 
the RV64I shift patterns for SLLW/SRLW/SRAW make some incorrect assumptions. 
SRAW assumed that (sext_inreg foo, i32) could only be produced when 
sign-extended an i32. However, it can be produced by input such as:

define i64 @tricky_ashr(i64 %a, i64 %b) {
  %1 = shl i64 %a, 32
  %2 = ashr i64 %1, 32
  %3 = ashr i64 %2, %b
  ret i64 %3
}

It's important not to select sraw in the above case, because sraw only uses 
bits lower 5 bits from the shift, while a shift of 32-63 would be valid.

Similarly, the patterns for srlw assumed (and foo, 0xffffffff) would only be 
produced when zero-extending a value that was originally i32 in LLVM IR. This
is obviously incorrect.

This patch removes the SLLW/SRLW/SRAW shift patterns for the time being and 
adds test cases that would demonstrate a miscompile if the incorrect patterns 
were re-added.

llvm-svn: 348067
2018-12-01 05:00:00 +00:00
Artem Belevich e5664b1559 [NVPTX] Add lowering of i128 numbers as struct fields
Addition to D34555 - override VTs computation with ComputePTXValueVTs
for struct fields.

Author: Denys Zariaiev<denys.zariaiev@gmail.com>

Differential Revision: https://reviews.llvm.org/D55144

llvm-svn: 348057
2018-12-01 00:21:52 +00:00
Nicolai Haehnle a7b00058e0 AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

llvm-svn: 348050
2018-11-30 22:55:38 +00:00
Nicolai Haehnle a9cc92c247 AMDGPU: Fix various issues around the VirtReg2Value mapping
Summary:
The VirtReg2Value mapping is crucial for getting consistently
reliable divergence information into the SelectionDAG. This
patch fixes a bunch of issues that lead to incorrect divergence
info and introduces tight assertions to ensure we don't regress:

1. VirtReg2Value is generated lazily; there were some cases where
   a lookup was performed before all relevant virtual registers were
   created, leading to an out-of-sync mapping. Those cases were:

  - Complex code to lower formal arguments that generated CopyFromReg
    nodes from live-in registers (fixed by never querying the mapping
    for live-in registers).

  - Code that generates CopyToReg for formal arguments that are used
    outside the entry basic block (fixed by never querying the
    mapping for Register nodes, which don't need the divergence info
    anyway).

2. For complex values that are lowered to a sequence of registers,
   all registers must be reflected in the VirtReg2Value mapping.

I am not adding any new tests, since I'm not actually aware of any
bugs that these problems are causing with trunk as-is. However,
I recently added a test case (in r346423) which fails when D53283 is
applied without this change. Also, the new assertions should provide
most of the effective test coverage.

There is one test change in sdwa-peephole.ll. The underlying issue
is that since the divergence info is now correct, the DAGISel will
select V_OR_B32 directly instead of S_OR_B32. This leads to an extra
COPY which affects the behavior of MachineLICM in a way that ends up
with the S_MOV_B32 with the constant in a different basic block than
the V_OR_B32, which is presumably what defeats the peephole.

Reviewers: alex-t, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54340

llvm-svn: 348049
2018-11-30 22:55:29 +00:00
Sanjay Patel 39298cae9f [x86] add tests for undef + partial undef constant folding; NFC
Keep this file sync'd with the instsimplify version (rL348045).

llvm-svn: 348047
2018-11-30 22:54:33 +00:00
Jessica Paquette 1cb18ec4ec [MachineOutliner] Outline both register save calls + no LR save calls together
Instead of treating the outlined functions for these as distinct frames, they
should be combined into one case. Neither allows for stack fixups, and both
generate the same frame. Thus, they ought to be considered one case.

This makes the code far easier to understand, for one thing. It also offers
some small code size improvements. It's fairly rare to see a class of outlined
functions that doesn't fall entirely into one variant (on CTMark anyway). It
does happen from time to time though.

This mostly offers some serious simplification.

Also update the test to show the added functionality.

llvm-svn: 348036
2018-11-30 21:14:58 +00:00
Peter Collingbourne 35fcc294ab AArch64: Don't emit CFI for SCS register in nounwind functions.
All that you can legitimately do with the CFI for a nounwind function
is get a backtrace, and adjusting the SCS register is not (currently)
required for this purpose.

Differential Revision: https://reviews.llvm.org/D54988

llvm-svn: 348035
2018-11-30 21:04:25 +00:00
Craig Topper 4d80f199e8 [X86] Change vXi8 MULHU lowering to unpack high and low half of lanes instead of extracting and concating low and high half registers.
This reduces the number of shuffle operations that need to be done. The splitting strategy requires the shuffle unit for the extraction and the extension. With the unpack strategy the unpacks accomplish a splitting and extending in one operation.

llvm-svn: 348019
2018-11-30 18:43:18 +00:00
Craig Topper 8191307d09 [X86] Prefer lowerVectorShuffleAsBitMask over using a avx512 masked operation when avx512bw/avx512vl is enabled.
This does require a constant pool load instead of loading an immediate into a gpr, moving to a k register and masking. But its less instructions and more consistent with previous ISAs. It probably opens up more combine opportunities as one of the test cases demonstrates.

llvm-svn: 348018
2018-11-30 18:43:15 +00:00
Sanjay Patel 1901a12e76 [SelectionDAG] fold FP binops with 2 undef operands to undef
llvm-svn: 348016
2018-11-30 18:38:52 +00:00
Ron Lieberman f48e43bbf7 [AMDGPU] Disable SReg Global LD/ST, perf regression
Differential Revision: https://reviews.llvm.org/D55093

llvm-svn: 348014
2018-11-30 18:29:17 +00:00
Sanjay Patel 1cfb796b58 [x86] add tests for fake vector FP ops; NFC
llvm-svn: 348002
2018-11-30 16:50:08 +00:00
Than McIntosh 0e0a8a3fee [CodeGen] Prefer static frame index for STATEPOINT liveness args
Summary:
If a given liveness arg of STATEPOINT is at a fixed frame index
(e.g. a function argument passed on stack), prefer to use this
fixed location even the address is also in a register. If we use
the register it will generate a spill, which is not necessary
since the fixed frame index can be directly recorded in the stack
map.

Patch by Cherry Zhang <cherryyz@google.com>.

Reviewers: thanm, niravd, reames

Reviewed By: reames

Subscribers: cherryyz, reames, anna, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D53889

llvm-svn: 347998
2018-11-30 16:22:41 +00:00
Valery Pykhtin 3d9afa273f [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3)
Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses.

Differential revision: https://reviews.llvm.org/D53762

llvm-svn: 347993
2018-11-30 14:21:56 +00:00
Alex Bradbury fca95cfee9 [SelectionDAG] Support result type promotion for FLT_ROUNDS_
For targets where i32 is not a legal type (e.g. 64-bit RISC-V), 
LegalizeIntegerTypes must promote the result of ISD::FLT_ROUNDS_.

Differential Revision: https://reviews.llvm.org/D53820

llvm-svn: 347986
2018-11-30 13:18:33 +00:00
Alex Bradbury bd24c7b045 [SelectionDAG] Support promotion of PREFETCH operands
For targets where i32 is not a legal type (e.g. 64-bit RISC-V), 
LegalizeIntegerTypes must promote the operands of ISD::PREFETCH.

Differential Revision: https://reviews.llvm.org/D53281

llvm-svn: 347980
2018-11-30 10:06:31 +00:00
Alex Bradbury 36e0fd1d39 [SelectionDAG] Support promotion of FRAMEADDR/RETURNADDR operands
For targets where i32 is not a legal type (e.g. 64-bit RISC-V), 
LegalizeIntegerTypes must promote the operand.

Differential Revision: https://reviews.llvm.org/D53279

llvm-svn: 347978
2018-11-30 10:02:06 +00:00
Alex Bradbury e0e62e97df [TargetLowering][RISCV] Introduce isSExtCheaperThanZExt hook and implement for RISC-V
DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend 
operands when it is able to do so. For some targets this is more expensive 
than a sign-extension, which is also a valid choice. Introduce the 
isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger 
helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy == 
MVT::i64, as it can be performed using a single instruction.

Differential Revision: https://reviews.llvm.org/D52978

llvm-svn: 347977
2018-11-30 09:56:54 +00:00
Alex Bradbury bc96a98ed0 [RISCV] Introduce codegen patterns for instructions introduced in RV64I
As discussed in the RFC 
<http://lists.llvm.org/pipermail/llvm-dev/2018-October/126690.html>, 64-bit 
RISC-V has i64 as the only legal integer type.  This patch introduces patterns 
to support codegen of the new instructions 
introduced in RV64I: addiw, addiw, subw, sllw, slliw, srlw, srliw, sraw, 
sraiw, ld, sd.

Custom selection code is needed for srliw as SimplifyDemandedBits will remove 
lower bits from the mask, meaning the obvious pattern won't work:

def : Pat<(sext_inreg (srl (and GPR:$rs1, 0xffffffff), uimm5:$shamt), i32),
          (SRLIW GPR:$rs1, uimm5:$shamt)>;
This is sufficient to compile and execute all of the GCC torture suite for 
RV64I other than those files using frameaddr or returnaddr intrinsics 
(LegalizeDAG doesn't know how to promote the operands - a future patch 
addresses this).

When promoting i32 sltu/sltiu operands, it would be more efficient to use 
sign-extension rather than zero-extension for RV64. A future patch adds a hook 
to allow this.

Differential Revision: https://reviews.llvm.org/D52977

llvm-svn: 347973
2018-11-30 09:38:44 +00:00
Craig Topper a2133061c0 [X86] Emit PACKUS directly from the v16i8 LowerMULH code instead of using a shuffle.
llvm-svn: 347967
2018-11-30 08:32:05 +00:00
Sjoerd Meijer ecc7dcb879 [ARM] Don't expand sdiv when optimising for minsize
Don't expand SDIV with an immediate that is a power of 2 if we optimise for
minimum code size. For example:

sdiv %1, i32 4

gets expanded to a sequence of 3 instructions, but this is suboptimal for
minimum code size so instead we just generate a MOV and a SDIV if integer
division is supported.

Differential Revision: https://reviews.llvm.org/D54546

llvm-svn: 347965
2018-11-30 08:14:28 +00:00
Hsiangkai Wang 957578ddf7 [CodeGen] Fix bugs in BranchFolderPass when debug labels are generated.
Skip DBG_VALUE and DBG_LABEL in branch folding algorithms.

The bug is reported in
https://bugs.chromium.org/p/chromium/issues/detail?id=898160.

Differential Revision: https://reviews.llvm.org/D54199

llvm-svn: 347964
2018-11-30 08:07:29 +00:00
Mircea Trofin f1a49e8525 Revert "Revert r347596 "Support for inserting profile-directed cache prefetches""
Summary:
This reverts commit d8517b96dfbd42e6a8db33c50d1fa1e58e63fbb9.

Fix: correct  the use of DenseMap.

Reviewers: davidxl, hans, wmi

Reviewed By: wmi

Subscribers: mgorny, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D55088

llvm-svn: 347938
2018-11-30 01:01:52 +00:00
Sanjay Patel 8d27144251 [DAGCombiner] narrow truncated binops
The motivating case for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=32023
and the corresponding rot16.ll regression tests.

Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc 
sequences that don't get folded in IR.

As the TODO comments suggest, there will be regressions if we extend this (for x86, 
we mostly seem to be missing LEA opportunities, but there are likely vector folds 
missing too). I think those should be considered existing bugs because this is the 
same transform that we do as an IR canonicalization in instcombine. We just need 
more tests to make those visible independent of this patch.

Differential Revision: https://reviews.llvm.org/D54640

llvm-svn: 347917
2018-11-29 20:58:26 +00:00
Alex Bradbury 66d9a752b9 [RISCV] Implement codegen for cmpxchg on RV32IA
Utilise a similar ('late') lowering strategy to D47882. The changes to 
AtomicExpandPass allow this strategy to be utilised by other targets which 
implement shouldExpandAtomicCmpXchgInIR.

All cmpxchg are lowered as 'strong' currently and failure ordering is ignored. 
This is conservative but correct.

Differential Revision: https://reviews.llvm.org/D48131

llvm-svn: 347914
2018-11-29 20:43:42 +00:00
David Stuttard c6603861d8 Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"
Also revert fix r347876

One of the buildbots was reporting a failure in some relevant tests that I can't
repro or explain at present, so reverting until I can isolate.

llvm-svn: 347911
2018-11-29 20:14:17 +00:00
Francis Visoiu Mistrih 0b8dd4488e [MachineScheduler] Order FI-based memops based on stack direction
It makes more sense to order FI-based memops in descending order when
the stack goes down. This allows offsets to stay "consecutive" and allow
easier pattern matching.

llvm-svn: 347906
2018-11-29 20:03:19 +00:00
Craig Topper 129d529ab3 [SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps
I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG.

I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse.

Differential Revision: https://reviews.llvm.org/D54276

llvm-svn: 347902
2018-11-29 19:36:17 +00:00
Craig Topper 6cd0b17078 [X86] Add a DAG combine pre type legalization to widen division by constant splat on narrow vectors to avoid scalarization
This is another patch for -x86-experimental-vector-widening. This pre widens narrow division by constants so that we can get pass the legal type check in the generic DAG combiner. Otherwise we end up scalarizing.

I've restricted this to splats for now because it was easy to just call DAG.getConstant. Not sure what we should do for non-splat? Increase the element size?Widen the constant vector by padding with 1?

Differential Revision: https://reviews.llvm.org/D54919

llvm-svn: 347898
2018-11-29 19:13:38 +00:00
Volkan Keles 4fe0080984 [GlobalISel] LegalizationArtifactCombiner: Combine aext([asz]ext x) -> [asz]ext x
Summary:
Replace `aext([asz]ext x)` with `aext/sext/zext x` in order to
reduce the number of instructions generated to clean up some
legalization artifacts.

Reviewers: aditya_nandakumar, dsanders, aemerson, bogner

Reviewed By: aemerson

Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D54174

llvm-svn: 347893
2018-11-29 18:19:24 +00:00
Graham Sellers 04f7a4d2d2 [AMDGPU] Add and update scalar instructions
This patch adds support for S_ANDN2, S_ORN2 32-bit and 64-bit instructions and adds splits to move them to the vector unit (for which there is no equivalent instruction). It modifies the way that the more complex scalar instructions are lowered to vector instructions by first breaking them down to sequences of simpler scalar instructions which are then lowered through the existing code paths. The pattern for S_XNOR has also been updated to apply inversion to one input rather than the output of the XOR as the result is equivalent and may allow leaving the NOT instruction on the scalar unit.

A new tests for NAND, NOR, ANDN2 and ORN2 have been added, and existing tests now hit the new instructions (and have been modified accordingly).

Differential: https://reviews.llvm.org/D54714
llvm-svn: 347877
2018-11-29 16:05:38 +00:00
David Stuttard de02e4b1cc Add support for TFE/LWE in image intrinsics
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
llvm-svn: 347871
2018-11-29 15:21:13 +00:00
Hans Wennborg 6e3be9d12e Revert r347596 "Support for inserting profile-directed cache prefetches"
It causes asserts building BoringSSL. See https://crbug.com/91009#c3 for
repro.

This also reverts the follow-ups:
Revert r347724 "Do not insert prefetches with unsupported memory operands."
Revert r347606 "[X86] Add dependency from X86 to ProfileData after rL347596"
Revert r347607 "Add new passes to X86 pipeline tests"

llvm-svn: 347864
2018-11-29 13:58:02 +00:00
Petr Pavlu 6bb80512db [GlobalISel] Fix insertion of stack-protector epilogue
* Tell the StackProtector pass to generate the epilogue instrumentation
  when GlobalISel is enabled because GISel currently does not implement
  the same deferred epilogue insertion as SelectionDAG.
* Update StackProtector::InsertStackProtectors() to find a stack guard
  slot by searching for the llvm.stackprotector intrinsic when the
  prologue was not created by StackProtector itself but the pass still
  needs to generate the epilogue instrumentation. This fixes a problem
  when the pass would abort because the stack guard AllocInst pointer
  was null when generating the epilogue -- test
  CodeGen/AArch64/GlobalISel/arm64-irtranslator-stackprotect.ll.

Differential Revision: https://reviews.llvm.org/D54518

llvm-svn: 347862
2018-11-29 13:22:53 +00:00
Nicolai Haehnle 7bed696915 AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo
Summary:
MachineLoopInfo cannot be relied on for correctness, because it cannot
properly recognize loops in irreducible control flow which can be
introduced by late machine basic block optimization passes. See the new
test case for the reduced form of an example that occurred in practice.

Use a simple fixpoint iteration instead.

In order to facilitate this change, refactor WaitcntBrackets so that it
only tracks pending events and registers, rather than also maintaining
state that is relevant for the high-level algorithm. Various accessor
methods can be removed or made private as a consequence.

Affects (in radv):
- dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex}

Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU")

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54231

llvm-svn: 347853
2018-11-29 11:06:26 +00:00
Nicolai Haehnle 1a94cbb3f5 AMDGPU/InsertWaitcnts: Untangle some semi-global state
Summary:
Reduce the statefulness of the algorithm in two ways:

1. More clearly split generateWaitcntInstBefore into two phases: the
   first one which determines the required wait, if any, without changing
   the ScoreBrackets, and the second one which actually inserts the wait
   and updates the brackets.

2. Communicate pre-existing s_waitcnt instructions using an argument to
   generateWaitcntInstBefore instead of through the ScoreBrackets.

To simplify these changes, a Waitcnt structure is introduced which carries
the counts of an s_waitcnt instruction in decoded form.

There are some functional changes:

1. The FIXME for the VCCZ bug workaround was implemented: we only wait for
   SMEM instructions as required instead of waiting on all counters.

2. We now properly track pre-existing waitcnt's in all cases, which leads
   to less conservative waitcnts being emitted in some cases.

     s_load_dword ...
     s_waitcnt lgkmcnt(0)    <-- pre-existing wait count
     ds_read_b32 v0, ...
     ds_read_b32 v1, ...
     s_waitcnt lgkmcnt(0)    <-- this is too conservative
     use(v0)
     more code
     use(v1)

   This increases code size a bit, but the reduced latency should still be a
   win in basically all cases. The worst code size regressions in my shader-db
   are:

 WORST REGRESSIONS - Code Size
 Before After     Delta Percentage
   1724  1736        12    0.70 %   shaders/private/f1-2015/1334.shader_test [0]
   2276  2284         8    0.35 %   shaders/private/f1-2015/1306.shader_test [0]
   4632  4640         8    0.17 %   shaders/private/ue4_elemental/62.shader_test [0]
   2376  2384         8    0.34 %   shaders/private/f1-2015/1308.shader_test [0]
   3284  3292         8    0.24 %   shaders/private/talos_principle/1955.shader_test [0]

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54226

llvm-svn: 347848
2018-11-29 11:06:06 +00:00
Li Jia He bcae407a3c [PowerPC] Fix a conversion is not considered when the ISD::BR_CC node making the instruction selection
Summary:
 A signed comparison of i1 values produces the opposite result to an unsigned one if the condition code 
 includes less-than or greater-than. This is so because 1 is the most negative signed i1 number and the 
 most positive unsigned i1 number. The CR-logical operations used for such comparisons are non-commutative
 so for signed comparisons vs. unsigned ones, the input operands just need to be swapped.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D54825

llvm-svn: 347831
2018-11-29 03:04:39 +00:00
Li Jia He 339af52804 [PowerPC] [NFC] Add test cases to the ISD::BR_CC node in the instruction selection
Add the following test case for the ISD::BR_CC node in the instruction selection
define i64 @testi64slt(i64 %c1, i64 %c2, i64 %c3, i64 %c4, i64 %a1, i64 %a2) #0 {
entry:
  %cmp1 = icmp eq i64 %c3, %c4
  %cmp3tmp = icmp eq i64 %c1, %c2
  %cmp3 = icmp slt i1 %cmp3tmp, %cmp1
  br i1 %cmp3, label %iftrue, label %iffalse
iftrue:
  ret i64 %a1
iffalse:
  ret i64 %a2
}
The data type i64 can be replaced by i32, i64, float, double

And condition codes can be replaced by: SETEQ, SETEN, SELT, SETLE, SETGT, SETGE,SETULT, SETULE, SSETGT, and SETUGE

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D54824

llvm-svn: 347828
2018-11-29 02:51:03 +00:00
Sanjay Patel 2de209313e [x86] try select simplification for target-specific nodes
This failed to select (which might be a separate bug) in
X86ISelDAGToDAG because we try to create a select node
that can be simplified away after rL347227.

This change avoids the problem by simplifying the SHRUNKBLEND
node sooner. In the test case, we manage to realize that the
true/false values of the select (SHRUNKBLEND) are the same thing,
so it simplifies away completely.

llvm-svn: 347818
2018-11-28 22:51:04 +00:00
Craig Topper f3b6f583e2 [X86] Add a combine for back to back VSRAI instructions
Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI

Differential Revision: https://reviews.llvm.org/D54959

llvm-svn: 347784
2018-11-28 18:03:38 +00:00
Francis Visoiu Mistrih 879087ce5b [MachineScheduler] Add support for clustering mem ops with FI base operands
Before this patch, the following stores in `merge_fail` would fail to be
merged, while they would get merged in `merge_ok`:

```
void use(unsigned long long *);
void merge_fail(unsigned key, unsigned index)
{
  unsigned long long args[8];
  args[0] = key;
  args[1] = index;
  use(args);
}
void merge_ok(unsigned long long *dst, unsigned a, unsigned b)
{
  dst[0] = a;
  dst[1] = b;
}
```

The reason is that `getMemOpBaseImmOfs` would return false for FI base
operands.

This adds support for this.

Differential Revision: https://reviews.llvm.org/D54847

llvm-svn: 347747
2018-11-28 12:00:28 +00:00
Mircea Trofin 35f0e5cd2d Do not insert prefetches with unsupported memory operands.
Summary:
Ignore advices where the memory operand of the 'anchor' instruction
uses unsupported register types.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54983

llvm-svn: 347724
2018-11-28 01:08:45 +00:00
Craig Topper 5fb34b5498 [X86] Add cascade lake arch in X86 target.
This is skylake-avx512 with the addition of avx512vnni ISA.

Patch by Jianping Chen

Differential Revision: https://reviews.llvm.org/D54785

llvm-svn: 347681
2018-11-27 18:05:00 +00:00
Sanjay Patel 3827aabe75 [x86] regenerate checks; NFC
llvm-svn: 347661
2018-11-27 15:52:17 +00:00
Stanislav Mekhanoshin 443a7f9788 [AMDGPU] Disable DAG combine at -O0
Differential Revision: https://reviews.llvm.org/D54358

llvm-svn: 347659
2018-11-27 15:13:37 +00:00
Craig Topper 587b981fca [X86] Add test cases for vector shifts of v2i32/v2i16/v4i16/v2i8/v4i8/v8i8 with promotion legalization and widening legalization. NFC
llvm-svn: 347643
2018-11-27 07:20:19 +00:00
Craig Topper 4325505f05 [X86] Prevent DAG combine from folding a bitcast from vXi1 to iX with a store on pre-AVX512 targets.
If we fold the bitcast into the store we'll end up creating a truncating store to vXi1 that will get scalarized. Instead allow the bitcast to be turned into a movmsk.

We probably need to do something if the store itself is a vXi1 type, but I'll leave that til a testcase appears.

llvm-svn: 347632
2018-11-27 02:57:27 +00:00
Craig Topper fe3bbb251b [X86] Add a bunch of test cases for storing a scalar bitcasted from a vXi1 type.
Currently a store combine will absorb the bitcast before our combine that turns bitcasts into movmsk gets a chance to run. This results in a store being created with a vXi1 type. Type legalization then promotes the input type and makes this a truncating store. Then we badly scalarize this store.

Currently we avoid this on v8i1->i8 bitcasts due to an incompletely qualified(per the original intention) check in isLoadBitCastBeneficial. An easy fix is to disable this for all vXi1->iX bitcasts on pre-avx512 targets. We'll still generate terrible code if the IR explicitly contains a store of vXi1 without a bitcast. We could probably solve that by just turning all stores of vXi1 into (store (iX (bitcast))) as an early DAG combine.

llvm-svn: 347631
2018-11-27 02:57:23 +00:00
Sterling Augustine 9cc1ffadc5 Notify the linker when a TU compiled with split-stack has a function without a prologue.
More context here: https://go-review.googlesource.com/c/go/+/148819/

llvm-svn: 347614
2018-11-26 23:26:31 +00:00
Mircea Trofin 183df14520 Add new passes to X86 pipeline tests
Summary: Fixes test failures introduced by rL347596.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54916

llvm-svn: 347607
2018-11-26 22:49:17 +00:00
Mircea Trofin cfbc1788d6 Support for inserting profile-directed cache prefetches
Summary:
Support for profile-driven cache prefetching (X86)

This change is part of a larger system, consisting of a cache prefetches recommender, create_llvm_prof (https://github.com/google/autofdo), and LLVM.

A proof of concept recommender is DynamoRIO's cache miss analyzer. It processes memory access traces obtained from a running binary and identifies patterns in cache misses. Based on them, it produces a csv file with recommendations. The expectation is that, by leveraging such recommendations, we can reduce the amount of clock cycles spent waiting for data from memory. A microbenchmark based on the DynamoRIO analyzer is available as a proof of concept: https://goo.gl/6TM2Xp.

The recommender makes prefetch recommendations in terms of:

* the binary offset of an instruction with a memory operand;
* a delta;
* and a type (nta, t0, t1, t2)

meaning: a prefetch of that type should be inserted right before the instrution at that binary offset, and the prefetch should be for an address delta away from the memory address the instruction will access.

For example:

0x400ab2,64,nta

and assuming the instruction at 0x400ab2 is:

movzbl (%rbx,%rdx,1),%edx

means that the recommender determined it would be beneficial for a prefetchnta instruction to be inserted right before this instruction, as such:

prefetchnta 0x40(%rbx,%rdx,1)
movzbl (%rbx, %rdx, 1), %edx

The workflow for prefetch cache instrumentation is as follows (the proof of concept script details these steps as well):

1. build binary, making sure -gmlt -fdebug-info-for-profiling is passed. The latter option will enable the X86DiscriminateMemOps pass, which ensures instructions with memory operands are uniquely identifiable (this causes ~2% size increase in total binary size due to the additional debug information).

2. collect memory traces, run analysis to obtain recommendations (see above-referenced DynamoRIO demo as a proof of concept).

3. use create_llvm_prof to convert recommendations to reference insertion locations in terms of debug info locations.

4. rebuild binary, using the exact same set of arguments used initially, to which -mllvm -prefetch-hints-file=<file> needs to be added, using the afdo file obtained at step 3.

Note that if sample profiling feedback-driven optimization is also desired, that happens before step 1 above. In this case, the sample profile afdo file that was used to produce the binary at step 1 must also be included in step 4.

The data needed by the compiler in order to identify prefetch insertion points is very similar to what is needed for sample profiles. For this reason, and given that the overall approach (memory tracing-based cache recommendation mechanisms) is under active development, we use the afdo format as a syntax for capturing this information. We avoid confusing semantics with sample profile afdo data by feeding the two types of information to the compiler through separate files and compiler flags. Should the approach prove successful, we can investigate improvements to this encoding mechanism.

Reviewers: davidxl, wmi, craig.topper

Reviewed By: davidxl, wmi, craig.topper

Subscribers: davide, danielcdh, mgorny, aprantl, eraman, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D54052

llvm-svn: 347596
2018-11-26 21:36:18 +00:00
Craig Topper b955bf382c [LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use SplitVecOp_TruncateHelper for FP_TO_SINT/UINT.
SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512.

This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral.

Differential Revision: https://reviews.llvm.org/D54906

llvm-svn: 347593
2018-11-26 21:12:39 +00:00
Craig Topper 923f463ef2 [SelectionDAG] Teach BaseIndexOffset::match to unwrap the base after looking through an add/or
We might find a target specific node that needs to be unwrapped after we look through an add/or. Otherwise we get inconsistent results if one pointer is just X86WrapperRIP and the other is (add X86WrapperRIP, C)

Differential Revision: https://reviews.llvm.org/D54818

llvm-svn: 347591
2018-11-26 20:16:33 +00:00
Craig Topper 2754d1dca4 [X86] Add test case for D54818
llvm-svn: 347590
2018-11-26 20:16:31 +00:00
Matt Arsenault dcdf3ddff5 AMDGPU: Cleanup / relax tests for future changes
llvm-svn: 347576
2018-11-26 17:17:07 +00:00
Than McIntosh b9e4852c92 [CodeGen] Take SPAdj into account for STATEPOINT liveness args
Summary:
STATEPOINT records its args' locations on stack relative to SP.
If the SP is changed, take that into account.

This patch authored by Cherry Zhang <cherryyz@google.com>.

Reviewers: thanm, reames

Reviewed By: reames

Subscribers: reames, llvm-commits

Differential Revision: https://reviews.llvm.org/D53603

llvm-svn: 347569
2018-11-26 16:16:09 +00:00
Sanjay Patel d31220e0de [x86] promote all multiply i8 by constant to i32
We have these 2 "isDesirable" promotion hooks (I'm not sure why we need both of them, but that's 
independent of this patch), and we can adjust them to promote "mul i8 X, C" to i32. Then, all of 
our existing LEA and other multiply expansion magic happens as it would for i32 ops.

Some of the test diffs show that we could end up with an actual 32-bit mul instruction here 
because we choose not to expand to simpler ops. That instruction could be slower depending on the 
subtarget. On the plus side, this means we don't need a separate instruction to load the constant 
operand and possibly an extra instruction to move the result. If we need to tune mul i32 further, 
we could add a later transform that tries to shrink it back to i8 based on subtarget timing.

I did not bother to duplicate all of the 32-bit test file RUNs and target settings that exist to 
test whether LEA expansion is cheap or not. The diffs here assume a default target, so that means 
LEA is generally cheap.

Differential Revision: https://reviews.llvm.org/D54803

llvm-svn: 347557
2018-11-26 15:22:30 +00:00
Diana Picus 0528e2cfb3 [ARM GlobalISel] Support G_CTLZ and G_CTLZ_ZERO_UNDEF
We can now select CLZ via the TableGen'erated code, so support G_CTLZ
and G_CTLZ_ZERO_UNDEF throughout the pipeline for types <= s32.

Legalizer:
If the CLZ instruction is available, use it for both G_CTLZ and
G_CTLZ_ZERO_UNDEF. Otherwise, use a libcall for G_CTLZ_ZERO_UNDEF and
lower G_CTLZ in terms of it.

In order to achieve this we need to add support to the LegalizerHelper
for the legalization of G_CTLZ_ZERO_UNDEF for s32 as a libcall (__clzsi2).

We also need to allow lowering of G_CTLZ in terms of G_CTLZ_ZERO_UNDEF
if that is supported as a libcall, as opposed to just if it is Legal or
Custom. Due to a minor refactoring of the helper function in charge of
this, we will also allow the same behaviour for G_CTTZ and G_CTPOP.
This is not going to be a problem in practice since we don't yet have
support for treating G_CTTZ and G_CTPOP as libcalls (not even in
DAGISel).

Reg bank select:
Map G_CTLZ to GPR. G_CTLZ_ZERO_UNDEF should not make it to this point.

Instruction select:
Nothing to do.

llvm-svn: 347545
2018-11-26 11:07:02 +00:00
Sam Parker 5338f7aae4 [ARM] Prevent parallel macs for unsigned values
Both zext and sext are currently allowed during the search for narrow
sequences and sexts operands are later added to the mac candidates.
But operands of muls are also added, without checking whether they're
sext or zext, which means we can generate a signed smlad when we
shouldn't.

Differential Revision: https://reviews.llvm.org/D54790

llvm-svn: 347542
2018-11-26 10:22:55 +00:00
Kang Zhang 840e98f9f1 Revert "[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction"
This reverts commits r347532. Forget add the option 
-mtriple powerpc64-unknown-linux-gnu. So other platform is error except
for PowerPC.

llvm-svn: 347534
2018-11-26 07:15:31 +00:00
Craig Topper b7a50e5796 [X86] Add test cases to show bad type legalization of fptosi/fptosui v16f32->v16i8 and v8f64->v8i16 on pre-AVX512 targets.
When splitting the v16f32/v8f64 result type, type legalization will try to promote the integer result type before a concat and an explicit truncate. But for the fptoui test case this is particularly bad since fptoui isn't supported on X86 until AVX512. We could use an fptosi since the result range would fit in a signed 32-bit value, but the generic type legalization doesn't do that transformation when splitting. It does do this when promoting.

llvm-svn: 347533
2018-11-26 06:50:19 +00:00
Kang Zhang e98d4f511c [PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D54738

llvm-svn: 347532
2018-11-26 06:03:25 +00:00
Sanjay Patel 7336e7c67a [x86] limit transform for select-of-fp-constants
This should likely be adjusted to limit this transform
further, but these diffs should be clear wins.

If we have blendv/conditional move, then we should assume 
those are cheap ops. The loads become independent of the
compare, so those can be speculated before we need to use 
the values in the blend/mov.

llvm-svn: 347526
2018-11-25 17:27:02 +00:00
Sanjay Patel 2e5a25c170 [x86] add tests for select-of-fp-constants; NFC
There are many options here depending on subtarget,
but we are uniformly relying on a transform that was 
driven by performance for a 32-bit SSE2 target in 2009.

Note: The same motivation was apparently used to do this 
transform for *all* targets, so non-x86 may want to look
at this too.

llvm-svn: 347525
2018-11-25 16:54:43 +00:00
Sanjay Patel 7e119c0400 [DAG] consolidate shift simplifications
...and use them to avoid creating obviously undef values as
discussed in the post-commit thread for r347478.

The diffs in vector div/rem show that we were missing real
optimizations by creating bogus shift nodes.

llvm-svn: 347502
2018-11-23 20:05:12 +00:00
Sanjay Patel e0cc876363 [x86] make test immune to oversized shift simplification
I'm not sure if this actually preserves the original intent
of this test, but if we leave it as-is, the -1 (oversized)
shift should be folded to undef and allow deleting half
of the output.

llvm-svn: 347501
2018-11-23 19:45:29 +00:00
Luke Cheeseman 6db3a6a4a7 Revert r347490 as it breaks address sanitizer builds
llvm-svn: 347499
2018-11-23 17:13:06 +00:00
Luke Cheeseman d6dbd64104 Revert r343341
- Cannot reproduce the build failure locally and the build logs have
  been deleted.

llvm-svn: 347490
2018-11-23 11:01:47 +00:00
Sjoerd Meijer fc448cfd25 [ARM][NFC] codegen tests cleanup: remove dangling check prefixes
I am working on making FileCheck stricter (in D54769 and D53710) so that it
issues diagnostics when there's something wrong with tests.

This is a cleanup for dangling prefixes in the ARM codegen tests, e.g.:

--check-prefixes=A,B

where A occurs in the check file, but B doesn't. This can be innocent if A does
all the required checking, but can also be a bug in that test if it results in
the test actually not checking anything (if A for example only checks a common
label). Test CodeGen/ARM/smml.ll is such an example.

Differential Revision: https://reviews.llvm.org/D54842

llvm-svn: 347487
2018-11-23 10:08:39 +00:00
Craig Topper 0ec17884de [LegalizeVectorTypes] Don't use SplitVecOp_TruncateHelper if we're heading towards scalarizing the type.
This code takes a truncate, fp_to_int, or int_to_fp with a legal result type and an input type that needs to be split and enlarges the elements in the result type before doing the split. Then inserts a follow up truncate or fp_round after concatenating the two halves back together.

But if the input type of the original op is being split on its way to ultimately being scalarized we're just going to end up building a vector from scalars and then truncating or rounding it in the vector register. Seems kind of silly to enlarge the result element type of the operation only to end up with scalar code and then building a vector with large elements only to make the elements smaller again in the vector register. Seems better to just try to get away producing smaller result types in the scalarized code.

The X86 test case that changes is a pretty contrived test case that exists because of a bug we used to have in our AVG matching code. I think the code is better now, but its not realistic anyway.

llvm-svn: 347482
2018-11-23 02:32:13 +00:00
Craig Topper b239763384 [LegalizeVectorTypes] Have SplitVecOp_TruncateHelper fall back to SplitVecOp_UnaryOp if splitting the output type would be a legal type.
SplitVecOp_TruncateHelper tries to introduce a multilevel truncate to avoid scalarization. But if splitting the result type would still be a legal type we don't need to do that.

The comment block at the top of the function implied that this was already implemented. I looked back through the history and it doesn't look to have ever been checked.

llvm-svn: 347479
2018-11-22 22:56:52 +00:00
Sanjay Patel 3e80019275 [DAGCombiner] form 'not' ops ahead of shifts (PR39657)
We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'),
but that would not matter without this patch because DAGCombiner was 
reversing that transform. I think we need this transform in the backend 
regardless of what happens in IR to catch cases where the shift-xor 
is formed late from GEP or other ops.

https://rise4fun.com/Alive/NC1

  Name: shl
  Pre: (-1 << C2) == C1
  %shl = shl i8 %x, C2
  %r = xor i8 %shl, C1
  =>
  %not = xor i8 %x, -1
  %r = shl i8 %not, C2
  
  Name: shr
  Pre: (-1 u>> C2) == C1
  %sh = lshr i8 %x, C2
  %r = xor i8 %sh, C1
  =>
  %not = xor i8 %x, -1
  %r = lshr i8 %not, C2

https://bugs.llvm.org/show_bug.cgi?id=39657

llvm-svn: 347478
2018-11-22 19:24:10 +00:00
John Brawn d6e0ebea10 [AArch64] Fix SelectionDAG infinite loop for v1i64 SCALAR_TO_VECTOR
A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into
BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn
BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop.

Fix this by making LowerBUILD_VECTOR not do this transformation for those
vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element
constant vector. Doing that means we get a DUP, which we then need to recognise
in ISel as a copy.

llvm-svn: 347456
2018-11-22 11:45:23 +00:00
Diana Picus 6b37655740 [ARM GlobalISel] Add test for BFC. NFCI
r334871 has made it possible for TableGen'erated code to select BFC, but
it has not added a test for it on the ARM side. Add it now to make sure
we don't introduce regressions if we ever change anything about that
rule.

llvm-svn: 347447
2018-11-22 09:54:14 +00:00
Sanjay Patel 1afd38f008 [x86] use FileCheck to verify output; NFC
llvm-svn: 347438
2018-11-21 23:39:19 +00:00
Reid Kleckner 86ada54e4c [mingw] Use unmangled name after the $ in the section name
GCC does it this way, and we have to be consistent. This includes
stdcall and fastcall functions with suffixes. I confirmed that a
fastcall function named "foo" ends up in ".text$foo", not
".text$@foo@8".

Based on a patch by Andrew Yohn!

Fixes PR39218.

Differential Revision: https://reviews.llvm.org/D54762

llvm-svn: 347431
2018-11-21 22:01:10 +00:00
Sanjay Patel 78e2b901e5 [x86] add tests for select-of-FP-constants; NFC
llvm-svn: 347406
2018-11-21 19:14:38 +00:00
Sanjay Patel cadf62f360 [x86] fix predicate for avoiding vblendv
It only makes sense to produce the logic ops when 1 of the
constants is +0.0. Otherwise, go with vblendv to reduce code.

llvm-svn: 347403
2018-11-21 18:02:50 +00:00
Sanjay Patel 5ba384347c [x86] add test for FP select with constant; NFC
llvm-svn: 347401
2018-11-21 17:47:18 +00:00
Sanjay Patel 2c513f5b4b [x86] add checks for asm to test; NFC
llvm-svn: 347394
2018-11-21 15:26:35 +00:00
Simon Pilgrim 66bae9aee8 [X86][AVX] Remove BROADCAST if we only need the 0'th element
We don't catch this with target shuffle simplification if the src/dst types are different.

llvm-svn: 347386
2018-11-21 11:00:09 +00:00
Craig Topper e9b4001a82 [X86] In getScalarMaskingNode, replace scalar_to_vector with a bitcast to v8i1 and an extract_subvector to convert i8 to v1i1.
The bitcast can be nicely merged with any i8 loads that exist for argument passing in 32 mode for example.

llvm-svn: 347380
2018-11-21 07:01:22 +00:00
Nemanja Ivanovic 5cf902ccd4 [PowerPC] Do not use vectors to codegen bswap with Altivec turned off
We have efficient codegen on P9 for lowering bswap that involves moving
the value into a vector reg and moving it back. However, the check under
which we custom lowered it did not adequately reflect the actual requirements.
It required only that the subtarget be an implementation of ISA 3.0 since all
compliant implementations have to provide the vector instructions.
However, the kernel builds have a valid use case for -mno-altivec -mcpu=pwr9
(i.e. don't emit vector code, don't have to save vector regs for context
switch). So we should require the correct features for this lowering.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39334

llvm-svn: 347376
2018-11-21 02:53:50 +00:00
Craig Topper 27a5896fe8 [X86] Correct 256 vpmovzx/vpmovsx isel patterns to check HasAVX2 instead of HasAVX to prevent fast-isel from using them incorrectly.
These are AVX2 instructions, but have been incorrectly marked in tablegen for a while. This wasn't a problem until r346784 switched the patterns to use target independent ISD opcodes. This made the patterns visible to fast isel.

Fixes PR39733

llvm-svn: 347375
2018-11-21 01:39:38 +00:00
Craig Topper 8b48587f5b [X86] Add a copy of avx512-trunc.ll with -x86-experimental-vector-widening-legalization enabled.
llvm-svn: 347374
2018-11-21 01:39:35 +00:00
Craig Topper aa52ee2770 [X86] Emit a PACKUS instead of a VECTOR_SHUFFLE from LowerTRUNCATE for v16i16->v16i8.
We can't guarantee that demanded bits passing through the vector shuffle won't cause the AND in front of this to be removed. This would prevent the PACKUS from being matched during shuffle lowering.

Unfortunately, this adds a packuswb to one of the vector-reduce-mul.ll tests since we were removing the shuffle via SimplifyDemandedVectorElts. We appear to have similar issues with vpmovwb on the same test case on other targets.

llvm-svn: 347361
2018-11-20 22:57:48 +00:00
Sanjay Patel 357053f289 [DAGCombiner] look through bitcasts when trying to narrow vector binops
This is another step in vector narrowing - a follow-up to D53784
(and hoping to eventually squash potential regressions seen in
D51553).

The x86 test diffs are wins, but the AArch64 diff is probably not.
That problem already exists independent of this patch (see PR39722), but it
went unnoticed in the previous patch because there were no regression tests
that showed the possibility.

The x86 diff in i64-mem-copy.ll is close. Given the frequency throttling
concerns with using wider vector ops, an extra extract to reduce vector
width is the right trade-off at this level of codegen.

Differential Revision: https://reviews.llvm.org/D54392

llvm-svn: 347356
2018-11-20 22:26:35 +00:00
Craig Topper 24b346da42 [X86] Emit a single shuffle for the v16i8->v4i32 step of a SIGN_EXTEND_VECTOR_INREG lowering on pre-sse4.1 targets.
Previously we emitted to separate shuffles, one for unpcklbw and one for unpcklwd. Instead emit a single shuffle equivalent to both of the original shuffles. Shuffle lowering seems able to handle it. This avoids a bitcast between the two shuffles which seems helpful to DAG combine.

Remove the custom type legalization for v8i8->v8i32. I had put that in to avoid some almost duplicate punpcklbw instructions I was seeing, but this lowering change seems to fix that. It also fixes some duplicate shuffles seen in vector-sext.ll

llvm-svn: 347348
2018-11-20 21:21:52 +00:00
Sanjay Patel fa78c228a3 [x86] add tests for 8-bit multiply with constant; NFC
This is based on the existing file for 16-bit. We also already have 32-bit and 64-bit variants.

llvm-svn: 347341
2018-11-20 19:45:53 +00:00
Sam Clegg 4791a668f5 [WebAssembly] WebAssemblyLowerEmscriptenEHSjLj: use getter/setter for accessing tempRet0
Rather than assuming that `tempRet0` exists in linear memory only assume
the getter/setter functions exist.  This avoids conflicting with
binaryen which declares a wasm global for this purpose and defines it's
own getter and setter for that.

The other advantage of doing things this way is that it leaving
it up to the linker/finalizer to decide how to actually store this
temporary.  As it happens binaryen uses a wasm global which is more
appropriate since it is thread safe.

This also allows us to change the way this is stored in the future
(memory, TLS memory, wasm global) without modifying LLVM.

This is part of a 4 part change:
LLVM: https://reviews.llvm.org/D53240
fastcomp: https://github.com/kripken/emscripten-fastcomp/pull/237
emscripten: https://github.com/kripken/emscripten/pull/7358
binaryen: https://github.com/WebAssembly/binaryen/pull/1709

Differential Revision: https://reviews.llvm.org/D53240

llvm-svn: 347340
2018-11-20 19:25:07 +00:00
Simon Pilgrim 368a199236 [X86] Remove -verify-machineinstrs=0 now that PR38391 is fixed.
llvm-svn: 347335
2018-11-20 18:08:56 +00:00
Simon Pilgrim bac49ac455 [AMDGPU] Regenerate weird stores tests.
Makes an upcoming SimplifyDemandedBits optimization much easier to understand.

llvm-svn: 347326
2018-11-20 17:04:02 +00:00
Sanjay Patel 8aeffd8c57 [AArch64, x86] add tests for shift-not (PR39657); NFC
llvm-svn: 347316
2018-11-20 15:49:42 +00:00
Simon Pilgrim 3735105961 [DAGCombine] Add calls to SimplifyDemandedVectorElts from visitINSERT_SUBVECTOR (PR37989)
This uncovered an off-by-one typo in SimplifyDemandedVectorElts's INSERT_SUBVECTOR handling as its bounds check was bailing on safe indices.

llvm-svn: 347313
2018-11-20 15:23:50 +00:00
Jinsong Ji 9a0ed20072 [PowerPC] Add Itineraries for STWU/STWUX etc
When doing some instruction scheduling work, we noticed some missing itineraries.

Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, 
because we can still get same latency due to default values.

With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.

This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, 
then causing different scheduling or suboptimal scheduling further.

This patch is for STWU/STWUX (IIC_LdStStoreUpd ) for P8.

Since there are already multiple IIC for store update, this patch also merge
IIC_LdStSTDU/IIC_LdStStoreUpd to IIC_LdStSTU
IIC_LdStSTDUX to IIC_LdStSTUX

and we add a new testcase in https://reviews.llvm.org/D54699 to show the difference.

Differential Revision: https://reviews.llvm.org/D54700

llvm-svn: 347311
2018-11-20 15:11:42 +00:00
Jinsong Ji 42c13c22bc [PowerPC][NFC]Add testcase for STWU scheduling check
This patch add a STWU testcase for scheduling check.

Currently P7/P8 which use itineraries are missing IIC_LdStStoreUpd, 
We use CHECK-ITIN prefix to check P7/P8, then use default for P9 (and future).

We will fix the missing itineraries of IIC_LdStStoreUpd in following patch, 
and update this testcase to show the scheduling difference only there.

Differential Revision: https://reviews.llvm.org/D54699

llvm-svn: 347310
2018-11-20 14:55:43 +00:00
Simon Pilgrim ee8b96f253 [X86][SSE] Add computeKnownBits/ComputeNumSignBits support for PACKSS/PACKUS instructions.
Pull out getPackDemandedElts demanded elts remapping helper from computeKnownBitsForTargetNode and use in computeKnownBits/ComputeNumSignBits.

llvm-svn: 347303
2018-11-20 13:23:37 +00:00
Simon Pilgrim b356d0463e [TargetLowering] Improve SimplifyDemandedVectorElts/SimplifyDemandedBits support
For bitcast nodes from larger element types, add the ability for SimplifyDemandedVectorElts to call SimplifyDemandedBits by merging the elts mask to a bits mask.

I've raised https://bugs.llvm.org/show_bug.cgi?id=39689 to deal with the few places where SimplifyDemandedBits's lack of vector handling is a problem.

Differential Revision: https://reviews.llvm.org/D54679

llvm-svn: 347301
2018-11-20 12:02:16 +00:00
Simon Pilgrim a6fb85ffa7 [X86][SSE] Lower immediately to PACKUS instead of VECTOR_SHUFFLE.
As discussed on rL347240, this avoids some regressions on D54679 and also helps some combines to kick in a bit earlier.

llvm-svn: 347300
2018-11-20 11:46:37 +00:00
Simon Pilgrim 7198506ba8 [X86][SSE] Add SimplifyDemandedVectorElts support for PACKSS/PACKUS instructions.
As discussed on rL347240.

llvm-svn: 347299
2018-11-20 11:09:46 +00:00
Craig Topper 17fa42a69b [X86] Preserve undef information when creating a punpckl/hbw from a v16i8 where all the even or odd elements are undef.
Previously if V2 was unused we ended up using V1 for both inputs as part of the code that follows the new code. By using lowerVectorShuffleWithUNPCK we keep the undef nature of V2 in the output.

As near as I can tell this makes v16i8 behavior consistent with every other VT now.

This does mean that we give the register allocator freedom to fill in random registers now and create false dependencies. But like I said we're already doing that for other types.

llvm-svn: 347296
2018-11-20 09:04:01 +00:00
Craig Topper c733c7bf94 [X86] Replace more calls to getZeroVector with regular getConstant.
getZeroVector produces a specifically canonicalized zero vector, but we can just let DAG legalization take care of it.

The test changes are because MULH lowering happens later than it should and this change gave us the opportunity to constant fold away a multiply during a DAG combine before the build_vector got legalized with a bitcast.

llvm-svn: 347290
2018-11-20 06:54:01 +00:00
Nemanja Ivanovic 9b393909e2 [PowerPC] Don't combine to bswap store on 1-byte truncating store
Turns out that there was no check for a store that truncates down
to a single byte when combining a (store (bswap...)) into a byte-swapping
store. This patch just adds that check.

Fixes https://bugs.llvm.org/show_bug.cgi?id=39478.

llvm-svn: 347288
2018-11-20 04:42:31 +00:00
Craig Topper 4954c66430 [SelectionDAG] Compute known bits and num sign bits for live out vector registers. Use it to add AssertZExt/AssertSExt in the live in basic blocks
Summary:
We already support this for scalars, but it was explicitly disabled for vectors. In the updated test cases this allows us to see the upper bits are zero to use less multiply instructions to emulate a 64 bit multiply.

This should help with this ispc issue that a coworker pointed me to https://github.com/ispc/ispc/issues/1362

Reviewers: spatel, efriedma, RKSimon, arsenm

Reviewed By: spatel

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D54725

llvm-svn: 347287
2018-11-20 04:30:26 +00:00
Stanislav Mekhanoshin 54ebfe8aee Implement computeKnownBits for scalar_to_vector
Differential Revision: https://reviews.llvm.org/D54728

llvm-svn: 347274
2018-11-19 23:34:07 +00:00
Craig Topper dbe3473634 [X86] Add test case to show missed opportunity to use a single pmuludq to implement a multiply when a zext lives in another basic block.
This can occur when one of the inputs to the multiply is loop invariant. Though my test cases just use two basic blocks with an unconditional jump which we won't merge until after isel in the codegen pipeline.

For scalars, I believe SelectionDAGBuilder can add an AssertZExt to pass knowledge across basic blocks but its explicitly disabled for vectors.

llvm-svn: 347266
2018-11-19 22:04:12 +00:00
Konstantin Zhuravlyov 700b1ef54d AMDGPU: Fix V_FMA_F16 selection on GFX9
GFX9 should select opsel version.

Differential Revision: https://reviews.llvm.org/D54545

llvm-svn: 347265
2018-11-19 21:10:16 +00:00
Stanislav Mekhanoshin 8bafbae889 [AMDGPU] Restored selection of scalar_to_vector (v2x16)
This works if DAG combiner is enabled, but without combining
we cannot select scalar_to_vector of <2 x half> and <2 x i16>.

Differential Revision: https://reviews.llvm.org/D54718

llvm-svn: 347259
2018-11-19 19:58:13 +00:00
Simon Pilgrim de3605f56b [TargetLowering] expandFP_TO_UINT - improve fp16 support
As discussed on D53794, for float types with ranges smaller than the destination integer type, then we should be able to just use a regular FP_TO_SINT opcode.

I thought we'd need to provide MSA test cases for very small integer types as well (fp16 -> i8 etc.), but it turns out that promotion will kick in so they're unnecessary.

Differential Revision: https://reviews.llvm.org/D54703

llvm-svn: 347251
2018-11-19 19:16:13 +00:00
Simon Pilgrim c4861ab170 [X86][SSE] Remove unnecessary bit-and in pshufb vector ctlz (PR39703)
SSE PSHUFB vector ctlz lowering works at the i4 nibble level. As detailed in PR39703, we were masking the lower nibble off but we only actually use it in the case where the upper nibble is known to be zero, making it safe to remove the mask and save an instruction.

Differential Revision: https://reviews.llvm.org/D54707

llvm-svn: 347242
2018-11-19 18:40:59 +00:00
Craig Topper 311bbcd535 [X86] Attempt to improve v32i8/v64i8 multiply lowering by applying the v16i8 non-avx2 algorithm to each 128-bit lane.
Previously we split the vectors in half to allow the two halves to be any extended then concatenated the results back together.

This patch instead instead extends the v16i8 sse algorithm to extend half of each 128-bit lane using punpcklbw/punpckhbw. Multiplies all the low half lanes and high half lanes together in separate operations. Then merges the half lane results back together using packuswb.

Unfortunately, some of the cases in vector-reduce-mul.ll regress because we aren't narrowing the vector width of the multiplies as we reduce. The splitting was somewhat making up for that before by causing halves to be discarded after the split.

Differential Revision: https://reviews.llvm.org/D54668

llvm-svn: 347240
2018-11-19 18:32:53 +00:00
Sam Parker 1c803f5988 [ARM] Attempt to fix arm selfhost bots after rL347191
llvm-svn: 347238
2018-11-19 18:08:46 +00:00
Stanislav Mekhanoshin 054f8101f1 [AMDGPU] Convert insert_vector_elt into set of selects
This allows to avoid scratch use or indirect VGPR addressing for
small vectors.

Differential Revision: https://reviews.llvm.org/D54606

llvm-svn: 347231
2018-11-19 17:39:20 +00:00
Wouter van Oortmerssen 49482f824a [WebAssembly] replaced .param/.result by .functype
Summary:
This makes it easier/cleaner to generate a single signature from
this directive. Also:
- Adds the symbol name, such that we don't depend on the location
  of this directive anymore.
- Actually constructs the signature in the assembler, and make the
  assembler own it.
- Refactor the use of MVT vs ValType in the streamer and assembler
  to require less conversions overall.
- Changed 700 or so tests to use it.

Reviewers: sbc100, dschuff

Subscribers: jgravelle-google, eraman, aheejin, sunfish, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D54652

llvm-svn: 347228
2018-11-19 17:10:36 +00:00
Sanjay Patel b25adf5edb [SelectionDAG] simplify vector select with undef operand(s)
llvm-svn: 347227
2018-11-19 17:06:05 +00:00
Sanjay Patel 08c0a0ac58 [Hexagon] make test immune to improvements in undef simplification
llvm-svn: 347218
2018-11-19 15:34:09 +00:00
Sanjay Patel 60abc29b0a [x86] add/make tests immune to improvements in undef simplification
llvm-svn: 347217
2018-11-19 15:33:44 +00:00
Sanjay Patel a1dca3553e [SelectionDAG] simplify select FP with undef condition
llvm-svn: 347212
2018-11-19 14:42:28 +00:00
Sanjay Patel 7a51bdcf3b [x86] add test for select FP with undef condition; NFC
llvm-svn: 347211
2018-11-19 14:39:57 +00:00
Martin Elshuber fef3036d37 Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads.
This patch defines an interleaved-load-combine pass. The pass searches
for ShuffleVector instructions that represent interleaved loads. Matches are
converted such that they will be captured by the InterleavedAccessPass.

The pass extends LLVMs capabilities to use target specific instruction
selection of interleaved load patterns (e.g.: ld4 on Aarch64
architectures).

Differential Revision: https://reviews.llvm.org/D52653

llvm-svn: 347208
2018-11-19 14:26:10 +00:00
Simon Pilgrim f6c2fbdd1a [X86] Add codegen tests for slow-shld scalar funnel shifts
llvm-svn: 347195
2018-11-19 12:29:41 +00:00
Sam Parker e7c42dd7e2 [ARM] Remove trunc sinks in ARM CGP
Truncs are treated as sources if their produce a value of the same
type as the one we currently trying to promote. Truncs used to be
considered as a sink if their operand was the same value type.
    
We now allow smaller types in the search, so we should search through
truncs that produce a smaller value. These truncs can then be
converted to an AND mask.
    
This leaves sinks as being:
  - points where the value in the register is being observed, such as
    an icmp, switch or store.
  - points where value types have to match, such as calls and returns.
  - zext are included to ease the transformation and are generally
    removed later on.
    
During this change, it also became apart from truncating sinks was
broken: if a sink used a source, its type information had already
been lost by the time the truncation happens. So I've changed the
method of caching the type information.

Differential Revision: https://reviews.llvm.org/D54515

llvm-svn: 347191
2018-11-19 11:34:40 +00:00
Anton Korobeynikov 4df19b75c0 [MSP430] Optimize srl/sra in case of A >> (8 + N)
There is no variable-length shifts on MSP430. Therefore
"eat" 8 bits of shift via bswap & ext.

Path by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54623

llvm-svn: 347187
2018-11-19 10:43:02 +00:00
Craig Topper 8b22bcd39f [X86] Use a pcmpgt with 0 instead of psrad 31, to fill elements with the sign bit in v4i32 MULH lowering.
The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate.

llvm-svn: 347185
2018-11-19 07:22:26 +00:00
Craig Topper 3616891046 [X86] Use compare with 0 to fill an element with sign bits when sign extending to v2i64 pre-sse4.1
Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter.

llvm-svn: 347181
2018-11-19 04:33:20 +00:00
Craig Topper 053f1eea96 [X86] Remove most of the SEXTLOAD Custom setOperationAction calls under -x86-experimental-vector-widening-legalization.
Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts.

llvm-svn: 347180
2018-11-19 00:33:16 +00:00
Simon Pilgrim 7f92efa5a9 [X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp conversions.
llvm-svn: 347177
2018-11-18 22:13:31 +00:00
Craig Topper 0468c860b7 [X86] Add custom type legalization for extending v4i8/v4i16->v4i64.
Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result.

When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq.

llvm-svn: 347176
2018-11-18 21:28:50 +00:00
Craig Topper 950f3842cc [X86] Add a 32-bit command line with only sse2 to vector-sext.ll and vector-sext.ll to show some of the scalarized load sequences without 64-bit scalar support.
Some of these sequeces look pretty bad since we have to copy the sign bit from a 32 bit register to a 64 bit register to finish a sign extend.

llvm-svn: 347175
2018-11-18 21:28:47 +00:00
Simon Pilgrim b31bdbd2e9 [X86][SSE] Add SimplifyDemandedVectorElts support for SSE splat-vector-shifts.
SSE vector shifts only use the bottom 64-bits of the shift amount vector.

llvm-svn: 347173
2018-11-18 20:21:52 +00:00
Craig Topper 11d50948e2 [X86] Disable combineToExtendVectorInReg under -x86-experimental-vector-widening-legalization. Add custom type legalization for extends.
If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats.

This patch disables combineToExtendVectorInReg when we are using widening.

I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346.

I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend.

llvm-svn: 347172
2018-11-18 18:11:25 +00:00
Craig Topper bc8148f7b0 [X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an extract_subvector, and a packuswb instruction.
Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54671

llvm-svn: 347171
2018-11-18 17:59:28 +00:00
Sanjay Patel 8c0cd77bff [DAG] add undef simplifications for select nodes
Sadly, this duplicates (twice) the logic from InstSimplify. There
might be some way to at least share the DAG versions of the code, 
but copying the folds seems to be the standard method to ensure 
that we don't miss these folds. 

Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no 
way to ensure that we do these kinds of simplifications unless the 
code is repeated at node creation time and during combines.

There were other tests that would become worthless with this
improvement that I changed as pre-commits:
rL347161
rL347164
rL347165
rL347166
rL347167

I'm not sure how to salvage the remaining tests (diffs in this patch).
So the x86 tests verify that the new code is working as intended.
The AMDGPU test is actually similar to my motivating case: we have
some undef value that has survived to machine IR in an x86 test, and 
then it gets folded in some weird way, or we crash if we don't transfer
the undef flag. But we would have been better off never getting to that
point by doing these simplifications.

This will lead back to PR32023 someday...
https://bugs.llvm.org/show_bug.cgi?id=32023

llvm-svn: 347170
2018-11-18 17:36:23 +00:00
Sanjay Patel bc23408fe5 [x86] regenerate full checks; NFC
llvm-svn: 347167
2018-11-18 16:56:17 +00:00
Sanjay Patel 7e659ef4b1 [SystemZ] make test immune to improvements in undef simplification
llvm-svn: 347166
2018-11-18 16:50:44 +00:00
Sanjay Patel cb04e590d3 [Hexagon] make tests immune to improvements in undef simplification
llvm-svn: 347165
2018-11-18 16:50:16 +00:00
Sanjay Patel becf03efa1 [ARM] make test immune to improvements in undef simplification
llvm-svn: 347164
2018-11-18 16:49:42 +00:00
Simon Pilgrim fec9f8657b [X86][SSE] Relax IsSplatValue - remove the 'variable shift' limit on subtracts.
Means we don't use the per-lane-shifts as much when we can cheaply use the older splat-variable-shifts.

llvm-svn: 347162
2018-11-18 15:52:08 +00:00
Sanjay Patel 40509997eb [x86] make tests immune to improvements in undef handling
llvm-svn: 347161
2018-11-18 15:27:19 +00:00
Simon Pilgrim 7fdbae3224 [X86][SSE] Add some generic masked gather codegen tests
llvm-svn: 347159
2018-11-18 14:35:57 +00:00
Simon Pilgrim cc1f5d2407 [X86][SSE] Use raw shuffle mask decode in SimplifyDemandedVectorEltsForTargetNode (PR39549)
We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs.

llvm-svn: 347158
2018-11-18 13:34:53 +00:00
Heejin Ahn e0f8b9bfc6 [WebAssembly] Add null streamer support
Summary: Now `llc -filetype=null` works.

Reviewers: eush

Subscribers: dschuff, jgravelle-google, sbc100, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54660

llvm-svn: 347155
2018-11-18 11:58:47 +00:00
Craig Topper f56a57518d [X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends from i8 or smaller without SSE4.1. Prefer to shrink the mul instead.
The zero extend will require two stages of unpacks to implement. So its better to shrink the multiply using pmullw and then extend that result back to v4i32 using a single unpack.

llvm-svn: 347149
2018-11-18 05:53:21 +00:00
Craig Topper 0438d791fa [X86] Add support for matching PACKUSWB from a v64i8 shuffle.
llvm-svn: 347143
2018-11-17 18:54:43 +00:00
Craig Topper c6c760f07f [X86] Add test case to show missed opportunity to use PACKUSWB in v64i8 shuffle lowering.
llvm-svn: 347142
2018-11-17 18:54:41 +00:00
Simon Pilgrim 0e1a9d5ee6 [X86][SSE] Add shuffle demanded elts test case for PR39549
llvm-svn: 347139
2018-11-17 14:06:03 +00:00
Craig Topper dd61f11642 [X86] Don't extend v32i8 multiplies to v32i16 with avx512bw and prefer-vector-width=256.
llvm-svn: 347131
2018-11-17 02:36:07 +00:00
Craig Topper d8da95bbe3 [X86] Add test cases to show incorrect use of a 512 bit vector in v32i8 multiply lowering with prefer-vector-width=256.
On the min-legal-vector-width test this actually causes some of the v32i16 operations we emitted to be scalarized.

llvm-svn: 347130
2018-11-17 02:36:02 +00:00
Stanislav Mekhanoshin c12d64ab16 Moved dag-combine-select-undef.ll into amdgpu. NFC.
Tests really needs target arch to be specified.

llvm-svn: 347115
2018-11-17 00:17:15 +00:00
Stanislav Mekhanoshin c3214ad1dd Fixed test after r347110
Comments in llc outputs are printed differently on different
platforms, some with '#', some with '##'. Removed non-essential
part of the checks.

llvm-svn: 347112
2018-11-16 23:40:04 +00:00
Stanislav Mekhanoshin 0ff7c8309d DAG combiner: fold (select, C, X, undef) -> X
Differential Revision: https://reviews.llvm.org/D54646

llvm-svn: 347110
2018-11-16 23:13:38 +00:00
Craig Topper ee0333b4a9 [X86] Add custom promotion of narrow fp_to_uint/fp_to_sint operations under -x86-experimental-vector-widening-legalization.
This tries to force the result type to vXi32 followed by a truncate. This can help avoid scalarization that would otherwise occur.

There's some annoying examples of an avx512 truncate instruction followed by a packus where we should really be able to just use one truncate. But overall this is still a net improvement.

llvm-svn: 347105
2018-11-16 22:53:00 +00:00
Sam Clegg a2827edc2f [WebAssembly] Cleanup unused declares in test code. NFC.
In one case probably you have be using it, in the other it
looks like it was redundant.

Differential Revision: https://reviews.llvm.org/D54644

llvm-svn: 347098
2018-11-16 21:20:00 +00:00
Nemanja Ivanovic ed6159bb71 [PowerPC][NFC] Add tests for vector fp <-> int conversions
This NFC patch just adds test cases for conversions that currently
require scalarization of vectors. An updcoming patch will change
the legalization for these and it is more suitable on the review
to show the diferences in code gen rather than just the new code gen.

llvm-svn: 347090
2018-11-16 20:24:10 +00:00
Peter Collingbourne 527024469a AArch64: Emit a call frame instruction for the shadow call stack register.
When unwinding past a function that uses shadow call stack, we must
subtract 8 from the value of the x18 register. This patch causes us
to emit a call frame instruction that causes that to happen.

Differential Revision: https://reviews.llvm.org/D54609

llvm-svn: 347089
2018-11-16 20:08:54 +00:00
Anton Korobeynikov e5cb1c35b4 [MSP430] Add RTLIB::[SRL/SRA/SHL]_I32 lowering to EABI lib calls
Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54626

llvm-svn: 347080
2018-11-16 19:36:15 +00:00
Rong Xu 3a38175723 [X86] Disable Condbr_merge pass
Disable Condbr_merge pass for now due to PR39658.
Will reenable the pass once the bug is fixed.

llvm-svn: 347079
2018-11-16 19:35:00 +00:00
Stefan Pintilie 9004444d81 Revert "[PowerPC] Make no-PIC default to match GCC - LLVM"
This reverts commit r347069

llvm-svn: 347076
2018-11-16 19:24:23 +00:00
Sam Clegg 74f5fd4e32 [WebAssembly] Default to static reloc model
Differential Revision: https://reviews.llvm.org/D54637

llvm-svn: 347073
2018-11-16 18:59:51 +00:00
Stefan Pintilie 046eff502f [PowerPC] Make no-PIC default to match GCC - LLVM
Set -fno-PIC as the default option.

Differential Revision: https://reviews.llvm.org/D53383

llvm-svn: 347069
2018-11-16 18:36:21 +00:00
Simon Pilgrim 96f7924fe2 [X86] Add codegen tests for scalar funnel shifts
llvm-svn: 347066
2018-11-16 17:48:52 +00:00
Sanjay Patel 8da76a6581 [x86] regenerate complete checks for test; NFC
llvm-svn: 347051
2018-11-16 14:44:20 +00:00
Roman Lebedev 90c5b3f78e [X86] X86DAGToDAGISel::matchBitExtract(): extract 'lshr' from `X`
Summary:
As discussed in previous review, and noted in the FIXME, if `X` is actually an `lshr Y, Z` (logical!),
we can fold the `Z` into 'control`, and let the `BEXTR` do this too.
We could just insert those 8 bits of shift amount into control,
but it is better to instead zero-extend them, and 'or' them in place.

We can only do this for `lshr`, not `ashr`, because we do not know that the mask cover only the bits of `Y`,
and not any of the sign-extended bits.

The obvious question is, is this actually legal to do?
I believe it is. Relevant quotes, from `Intel® 64 and IA-32 Architectures Software Developer’s Manual`, `BEXTR — Bit Field Extract`:
* `Bit 7:0 of the second source operand specifies the starting bit position of bit extraction.`
* `A START value exceeding the operand size will not extract any bits from the second source operand.`
* `Only bit positions up to (OperandSize -1) of the first source operand are extracted.`
* `All higher order bits in the destination operand (starting at bit position LENGTH) are zeroed.`
* `The destination register is cleared if no bits are extracted.`

FIXME: if we can do this, i wonder if we should prefer `BEXTR` over `BZHI` in such cases.

Reviewers: RKSimon, craig.topper, spatel, andreadb

Reviewed By: RKSimon, craig.topper, andreadb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54095

llvm-svn: 347048
2018-11-16 13:04:54 +00:00
Alex Bradbury 2146e8fb1e [RISCV] Constant materialisation for RV64I
This commit introduces support for materialising 64-bit constants for RV64I,
making use of the RISCVMatInt::generateInstSeq helper in order to share logic
for immediate materialisation with the MC layer (where it's used for the li
pseudoinstruction).

test/CodeGen/RISCV/imm.ll is updated to test RV64, and gains new 64-bit
constant tests. It would be preferable if anyext constant returns were sign
rather than zero extended (see PR39092). This patch simply adds an explicit
signext to the returns in imm.ll.

Further optimisations for constant materialisation are possible, most notably
for mask-like values which can be generated my loading -1 and shifting right.
A future patch will standardise on the C++ codepath for immediate selection on
RV32 as well as RV64, and then add further such optimisations to
RISCVMatInt::generateInstSeq in order to benefit both RV32 and RV64 for
codegen and li expansion.

Differential Revision: https://reviews.llvm.org/D52962

llvm-svn: 347042
2018-11-16 10:14:16 +00:00
Anton Korobeynikov cad2b83182 [MSP430] Add more tests for ABI and calling convention
Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54582

llvm-svn: 347040
2018-11-16 09:47:58 +00:00
Craig Topper 079c37da58 [X86] Add custom type legalization for v2i8/v4i8/v8i8 mul under -x86-experimental-vector-widening.
By early promoting the multiply to use an i16 element type we can avoid op legalization emit a second multiply for the 8 upper elements of the v16i8 type we would otherwise get.

llvm-svn: 347032
2018-11-16 06:15:21 +00:00
Craig Topper dc957d49f9 [X86] Add some test cases for vector multiplies on vectors shorter than 128 bits with -x86-experimental-vector-widening-legalization.
llvm-svn: 347031
2018-11-16 06:15:20 +00:00
Matt Arsenault eabb8dd015 AMDGPU: Fix analyzeBranch failing with pseudoterminators
If a block had one of the _term instructions used for gluing
exec modifying instructions to the end of the block,
analyzeBranch would fail, preventing the verifier from catching
a broken successor list.

llvm-svn: 347027
2018-11-16 05:03:02 +00:00
Craig Topper c93ae2b0a2 Revert r347014 "[X86] Add some test cases for vector multiplies on vectors shorter than 128 bits with -x86-experimental-vector-widening-legalization."
Apparently I failed to update this after turnign sign extend to any extend.

llvm-svn: 347015
2018-11-16 01:57:55 +00:00
Craig Topper 36920b44f7 [X86] Add some test cases for vector multiplies on vectors shorter than 128 bits with -x86-experimental-vector-widening-legalization.
llvm-svn: 347014
2018-11-16 01:52:32 +00:00
Craig Topper 5802b82b40 [X86] Use ANY_EXTEND instead of SIGN_EXTEND in the AVX2 and later path for legalizing vXi8 multiply.
We aren't going to use the upper bits of the multiply result that the extend would effect. So we don't need a specific type of extend.

This makes some reduction test cases shorter because we were previously trying to sign_extend a truncate which we can't eliminate.

llvm-svn: 347011
2018-11-16 01:16:59 +00:00
Ron Lieberman cac749ac88 [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST
Add a pass to fixup various vector ISel issues.
Currently we handle converting GLOBAL_{LOAD|STORE}_*
and GLOBAL_Atomic_* instructions into their _SADDR variants.
This involves feeding the sreg into the saddr field of the new instruction.

llvm-svn: 347008
2018-11-16 01:13:34 +00:00
Heejin Ahn 095796a391 [WebAssembly] Split BBs after throw instructions
Summary:
`throw` instruction is a terminator in wasm, but BBs were not splitted
after `throw` instructions, causing machine instruction verifier to
fail.

This patch
- Splits BBs after `throw` instructions in WasmEHPrepare and adding an
  unreachable instruction after `throw`, which will be deleted in
  LateEHPrepare pass
- Refactors WasmEHPrepare into two member functions
- Changes the semantics of `eraseBBsAndChildren` in LateEHPrepare pass
  to match that of WasmEHPrepare pass, which is newly added. Now
  `eraseBBsAndChildren` does not delete BBs with remaining predecessors.
- Fixes style nits, making static function names conform to clang-tidy
- Re-enables the test temporarily disabled by rL346840 && rL346845

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54571

llvm-svn: 347003
2018-11-16 00:47:18 +00:00
Craig Topper 73bb04ab6f [X86] Add -x86-experimental-vector-widening support to reduceVMULWidth and combineMulToPMADDWD
In reduceVMULWidth, we no longer need to worry about extending the vector to 128 bits first. Regular widening of extends, muls and shuffles will take care of that for us.

In combineMulToPMADDWD, we can handle v2i32 multiplies and allow the VPMADDWD to be widened to v4i32 during type legalization by adding custom widening like we do have for AVG/ADDUS/SUBUS. I had to modify that code a little to allow different and output VTs.

Differential Revision: https://reviews.llvm.org/D54512

llvm-svn: 346980
2018-11-15 18:59:31 +00:00
Simon Pilgrim 0db8cb0147 [X86] Fix MCNullStreamer support for modules with a CodeView flag
This fixes -filetype=null support when compiling for a Win32 target and the module has a CodeView flag.

The only places changed are the uses of getTargetStreamer function - this patch guards both of them with null checks.

Committed on behalf of @eush (Eugene Sharygin)

Differential Revision: https://reviews.llvm.org/D54008

llvm-svn: 346962
2018-11-15 15:17:15 +00:00
Alex Bradbury 7727240438 [RISCV] Mark FREM as Expand
Mark the FREM SelectionDAG node as Expand, which is necessary in order to 
support the frem IR instruction on RISC-V. This is expanded into a library 
call. Adds the corresponding test. Previously, this would have triggered an 
assertion at instruction selection time.

Differential Revision: https://reviews.llvm.org/D54159
Patch by Luís Marques.

llvm-svn: 346958
2018-11-15 14:46:11 +00:00
Anton Korobeynikov 49045c6a0d [MSP430] Add MC layer
Reapply r346374 with the fixes for modules build.

Original summary:

This change implements assembler parser, code emitter, ELF object writer
and disassembler for the MSP430 ISA.  Also, more instruction forms are added
to the target description.

Patch by Michael Skvortsov!

llvm-svn: 346948
2018-11-15 12:29:43 +00:00
Craig Topper 553ac560aa [X86] Add some custom type legalization rules for truncate with -x86-experimental-vector-widening-legalization.
This avoids some nasty shuffles when we have avx512. It will also prevent using zmm truncate instructions when a ymm instruction that zeroes part of an xmm register will do. Also avoid using avx512 truncate instructions when the input is 128 bits or less. These instructions are 2 uops on skx so we can probably find a better single uop shuffle like pshufb.

llvm-svn: 346936
2018-11-15 08:23:40 +00:00
Craig Topper 926dbdd601 [X86] Add -x86-experimental-vector-widening-legalization versions of shuffle-vs-trunc tests.
llvm-svn: 346935
2018-11-15 08:23:37 +00:00
Konstantin Zhuravlyov 7d1532d333 AMDGPU: Fix check lines in fdot2 test:
GCN900 -> GFX900

llvm-svn: 346925
2018-11-15 02:42:04 +00:00
Konstantin Zhuravlyov a25e0524c0 AMDGPU: Enable code object v3 for AMDHSA only
Differential Revision: https://reviews.llvm.org/D54186

llvm-svn: 346923
2018-11-15 02:32:43 +00:00
Craig Topper ea6ced9d1a [X86] Don't mark SEXTLOADS with narrow types as Custom with -x86-experimental-vector-widening-legalization.
The narrow types end up requesting widening, but generic legalization will end up scalaring and using a build_vector to do the widening.

llvm-svn: 346916
2018-11-15 00:21:41 +00:00
Craig Topper 0b2089da4b [X86] Support v2i32/v4i16/v8i8 load/store using f64 on 32-bit targets under -x86-experimental-vector-widening-legalization.
On 64-bit targets the type legalizer will use i64 to legalize these. But when i64 isn't legal, the type legalizer won't try an FP type. So do it manually instead.

There are a few regressions in here due to some v2i32 operations like mul and div now being reassembled into a full vector just to store instead of storing the pieces. But this was already occuring in 64-bit mode so its not a new issue.

llvm-svn: 346908
2018-11-14 23:02:09 +00:00
Simon Pilgrim e8cc5e4e03 [X86] Update masked expandload/compressstore test names
llvm-svn: 346903
2018-11-14 22:44:08 +00:00
Simon Pilgrim 9d9353aef5 [X86][SSE] Add SSE2/SSE42 masked load/store tests
Now that the load/store tests are split the impact of running the tests on multiple (illegal) targets is a lot less impactful

llvm-svn: 346896
2018-11-14 21:31:50 +00:00
Nirav Dave 1241dcb3cf Bias physical register immediate assignments
The machine scheduler currently biases register copies to/from
physical registers to be closer to their point of use / def to
minimize their live ranges. This change extends this to also physical
register assignments from immediate values.

This causes a reduction in reduction in overall register pressure and
minor reduction in spills and indirectly fixes an out-of-registers
assertion (PR39391).

Most test changes are from minor instruction reorderings and register
name selection changes and direct consequences of that.

Reviewers: MatzeB, qcolombet, myatsina, pcc

Subscribers: nemanjai, jvesely, nhaehnle, eraman, hiraditya,
  javed.absar, arphaman, jfb, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D54218

llvm-svn: 346894
2018-11-14 21:11:53 +00:00
Simon Pilgrim be527b545f [X86] Split masked load/store test files
llvm-svn: 346889
2018-11-14 20:44:59 +00:00
Simon Pilgrim 7f15568c40 [X86] Update masked load/store test names
llvm-svn: 346887
2018-11-14 20:25:50 +00:00
Aakanksha Patil 1a60116b5c AMDGPU: Additional pattern for i16 median3 matching
min(max(a, b), max(min(a, b), c))

Differential Revision: https://reviews.llvm.org/D54494

llvm-svn: 346886
2018-11-14 20:10:41 +00:00
Craig Topper 6c94264b1f [X86] Allow pmulh to be formed from narrow vXi16 vectors under -x86-experimental-vector-widening-legalization
Narrower vectors will be widened to 128 bits without changing the element size. And generic type legalization can already handle widening mulhu/mulhs.

Differential Revision: https://reviews.llvm.org/D54513

llvm-svn: 346879
2018-11-14 18:16:21 +00:00
Simon Pilgrim 7501780ec6 [X86][AVX512] Remove constant pool shuffle decoding from SelectionDAG
This patch removes the last use of the constant pool shuffle decode helper and consistently uses the 'getTargetShuffleMaskIndices' versions instead. The constant pool versions are now purely used for assembly comments.

The avx512vbmi intrinsic upgrades had to be altered as they were being decoded as broadcasts, similar to what I fixed in rL346032. I don't think the change is critical - although its annoying that we lose the {k}{z} instruction test coverage as they are tricky to generate....

Differential Revision: https://reviews.llvm.org/D54083

llvm-svn: 346850
2018-11-14 11:26:35 +00:00
Craig Topper 789cc8170d [X86] Add -x86-experimental-vector-widening command lines to pmulh.ll
I've only added sse2 and sse4.1 variants as I'm only interested in the two v4i16 tests and I don't expect that to different with AVX other than a v prefix.

llvm-svn: 346834
2018-11-14 07:51:26 +00:00
Heejin Ahn da419bdb5e [WebAssembly] Add support for the event section
Summary:
This adds support for the 'event section' specified in the exception
handling proposal. (This was named 'exception section' first, but later
renamed to 'event section' to take possibilities of other kinds of
events into consideration. But currently we only store exception info in
this section.)

The event section is added between the global section and the export
section. This is for ease of validation per request of the V8 team.

This patch:
- Creates the event symbol type, which is a weak symbol
- Makes 'throw' instruction take the event symbol '__cpp_exception'
- Adds relocation support for events
- Adds WasmObjectWriter / WasmObjectFile (Reader) support
- Adds obj2yaml / yaml2obj support
- Adds '.eventtype' printing support

Reviewers: dschuff, sbc100, aardappel

Subscribers: jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54096

llvm-svn: 346825
2018-11-14 02:46:21 +00:00
Zi Xuan Wu 6a3c279d1c [PowerPC] Enhance the selection(ISD::VSELECT) of vector type
To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding,
which is legalized at type-legalization phase. Use xxsel to match vselect if vsx is open, or use vsel.


Differential Revision: https://reviews.llvm.org/D49531

llvm-svn: 346824
2018-11-14 02:34:45 +00:00
Eli Friedman 6bdabcf368 [CodeGen] Fix forward scan in MachineBasicBlock::computeRegisterLiveness.
The scan was incorrectly skipping the first instruction, so a register
could appear to be dead when it was actually live. This eventually leads
to a machine verifier failure and miscompile in arm-ldst-opt.

Differential Revision: https://reviews.llvm.org/D54491

llvm-svn: 346821
2018-11-14 00:39:29 +00:00
Stanislav Mekhanoshin bcb34ac2ea [AMDGPU] combine extractelement into several selects
An extractelement with non-constant index will be lowered either to
scratch or movrel loop in most cases. This patch converts such
instruction into a set of selects if vector size is not too big.

Differential Revision: https://reviews.llvm.org/D54351

llvm-svn: 346800
2018-11-13 21:18:21 +00:00
Stanislav Mekhanoshin 35de877e8c Fixed DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT i1 handling
Legalizer used to request an ext load from i8 to i1 when promoting
vector element type to i8. Fixed.

Differential Revision: https://reviews.llvm.org/D54440

llvm-svn: 346795
2018-11-13 20:26:27 +00:00
Sam Clegg f98ba05f3d [WebAssembly] Fix broken assumption that all bitcasts are to functions types
Specifically, we can bitcast to void.

Fixes PR39591

Differential Revision: https://reviews.llvm.org/D54447

llvm-svn: 346778
2018-11-13 19:14:02 +00:00
Cameron McInally cbde0d9c7b [IR] Add a dedicated FNeg IR Instruction
The IEEE-754 Standard makes it clear that fneg(x) and
fsub(-0.0, x) are two different operations. The former is a bitwise
operation, while the latter is an arithmetic operation. This patch
creates a dedicated FNeg IR Instruction to model that behavior.

Differential Revision: https://reviews.llvm.org/D53877

llvm-svn: 346774
2018-11-13 18:15:47 +00:00
Simon Atanasyan 9d87256d3d [WebAssembly] Mark immediates.ll as XFAILed on MIPS hosts
Usually MIPS hosts uses a legacy (non IEEE 754-2008) encoding for NaNs.
Tests like `nan_f32` failed in attempt to compare hard-coded IEEE
754-2008 NaN value and a legacy NaN value provided by a system.

llvm-svn: 346773
2018-11-13 18:14:29 +00:00
Jonas Paulsson f9b2b5e67e [SystemZ] Increase the number of VLREPs
If a loaded value is replicated it is best to combine these two operations
into a VLREP (load and replicate), but isel will not produce this if the load
has other users as well.

This patch handles this by putting the other users of the load to use the
REPLICATE 0-element instead of the load. This way the load has only the
REPLICATE node as user, and we get a VLREP.

Review: Ulrich Weigand
https://reviews.llvm.org/D54264

llvm-svn: 346746
2018-11-13 08:37:09 +00:00
Craig Topper 333ab7d08b [X86] Add more tests for -x86-experimental-vector-widening-legalization
I'm looking into whether we can make this the default legalization strategy. Adding these tests to help cover the changes that will be necessary.

This patch adds copies of some tests with the command line switch enabled. By making copies its easier to compare the two legalization strategies.

I've also removed RUN lines from some of these tests that already had -x86-experimental-vector-widening-legalization

llvm-svn: 346745
2018-11-13 07:47:52 +00:00
Simon Pilgrim e565e5a962 [X86][SSE] Add lowerVectorShuffleAsByteRotateAndPermute (PR39387)
This patch adds the ability to use a PALIGNR to rotate a pair of inputs to select a range containing all the referenced elements, followed by a single input permute to put them in the right location.

Differential Revision: https://reviews.llvm.org/D54267

llvm-svn: 346706
2018-11-12 21:12:38 +00:00
Aakanksha Patil a992c694c6 AMDGPU: Adding more median3 patterns
min(max(a, b), max(min(a, b), c)) -> med3 a, b, c

Differential Revision: https://reviews.llvm.org/D54331

llvm-svn: 346704
2018-11-12 21:04:06 +00:00
Wouter van Oortmerssen cc75e77df5 [WebAssembly] Added WasmAsmParser.
Summary:
This is to replace the ELFAsmParser that WebAssembly was using, which
so far was a stub that didn't do anything, and couldn't work correctly
with wasm.

This new class is there to implement generic directives related to
wasm as a binary format. Wasm target specific directives are still
parsed in WebAssemblyAsmParser as before. The two classes now
cooperate more correctly too.

Also implemented .result which was missing. Any unknown directives
will now result in errors.

Reviewers: dschuff, sbc100

Subscribers: mgorny, jgravelle-google, eraman, aheejin, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54360

llvm-svn: 346700
2018-11-12 20:15:01 +00:00
Craig Topper c48712b341 [X86] In LowerMULH, use generic truncate and vector shuffle nodes instead of directly emitting PACKUS.
Truncate and shuffle lowering are already capable of matching to PACKUS using known bits analysis.

This features one test change where we now prefer to extend v16i16->v16i32 then trunc v16i32->v16i8 over extract_subvector+packus when avx512f is available, but avx512bw is not.

llvm-svn: 346697
2018-11-12 19:37:29 +00:00
Stanislav Mekhanoshin e86c8d33b1 [AMDGPU] Optimize S_CBRANCH_VCC[N]Z -> S_CBRANCH_EXEC[N]Z
Sometimes after basic block placement we end up with a code like:

  sreg = s_mov_b64 -1
  vcc = s_and_b64 exec, sreg
  s_cbranch_vccz

This happens as a join of a block assigning -1 to a saved mask and
another block which consumes that saved mask with s_and_b64 and a
branch.

This is essentially a single s_cbranch_execz instruction when moved
into a single new basic block.

Differential Revision: https://reviews.llvm.org/D54164

llvm-svn: 346690
2018-11-12 18:48:17 +00:00
Stanislav Mekhanoshin 5f9513147a Fix MachineInstr::findRegisterUseOperandIdx subreg checks
The function only checks that instruction reads a super-register
containing requested physical register. In case if a sub-register
if being read that is also a use of a super-reg, so added the check.
In particular MI->readsRegister() is broken because of the missing
check. The resulting check is essentially regsOverlap().

Differential Revision: https://reviews.llvm.org/D54128

llvm-svn: 346686
2018-11-12 18:12:28 +00:00
Paul Robinson 5b302bfc8e [DWARFv5] Emit split type units in .debug_info.dwo.
Differential Revision: https://reviews.llvm.org/D54350

llvm-svn: 346674
2018-11-12 16:55:11 +00:00
Alex Bradbury 9c03e4cacd [RISCV] Support .option relax and .option norelax
This extends the .option support from D45864 to enable/disable the relax 
feature flag from D44886

During parsing of the relax/norelax directives, the RISCV::FeatureRelax 
feature bits of the SubtargetInfo stored in the AsmParser are updated 
appropriately to reflect whether relaxation is currently enabled in the 
parser. When an instruction is parsed, the parser checks if relaxation is 
currently enabled and if so, gets a handle to the AsmBackend and sets the 
ForceRelocs flag. The AsmBackend uses a combination of the original 
RISCV::FeatureRelax feature bits set by e.g -mattr=+/-relax and the 
ForceRelocs flag to determine whether to emit relocations for symbol and 
branch diffs. Diff relocations should therefore only not be emitted if the 
relax flag was not set on the command line and no instruction was ever parsed 
in a section with relaxation enabled to ensure correct diffs are emitted.

Differential Revision: https://reviews.llvm.org/D46423
Patch by Lewis Revill.

llvm-svn: 346655
2018-11-12 14:25:07 +00:00
Nirav Dave a395e2df56 [DAGCombiner] Fix load-store forwarding of indexed loads.
Summary:
Handle extra output from index loads in cases where we wish to
forward a load value directly from a preceeding store.

Fixes PR39571.

Reviewers: peter.smith, rengolin

Subscribers: javed.absar, hiraditya, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54265

llvm-svn: 346654
2018-11-12 14:05:40 +00:00
Jonas Paulsson c0ee028dc3 [SystemZ] Replicate the load with most uses in buildVector()
Iterate over all elements and count the number of uses among them for each
used load. Then make sure to REPLICATE the load which has the most uses in
order to minimize the number of needed element insertions.

Review: Ulrich Weigand
https://reviews.llvm.org/D54322

llvm-svn: 346637
2018-11-12 08:12:20 +00:00
Sanjay Patel 622b71d40a [x86] auto-generate complete checks; NFC
llvm-svn: 346609
2018-11-11 14:57:26 +00:00
Craig Topper 2eab39f77b [X86] Use DAG.getConstant instead of getZeroVector.
llvm-svn: 346605
2018-11-11 07:24:36 +00:00
Sanjay Patel 0a515595a7 [x86] allow vector load narrowing with multi-use values
This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more
opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs.
Apart from 2-3 strange cases, these are all wins.

I've structured this to be no-functional-change-intended for any target except for x86
because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those
targets have existing regression tests (4, 4, 10 files respectively) that would be
affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show
any regression test diffs. The trade-off is deciding if an extra vector load is better
than a single wide load + extract_subvector.

For x86, this is almost always better (on paper at least) because we often can fold
loads into subsequent ops and not increase the official instruction count. There's also
some unknown -- but potentially large -- benefit from using narrower vector ops if wide
ops are implemented with multiple uops and/or frequency throttling is avoided.

Differential Revision: https://reviews.llvm.org/D54073

llvm-svn: 346595
2018-11-10 20:05:31 +00:00