Commit Graph

9147 Commits

Author SHA1 Message Date
Nirav Dave f54c8370e5 [DAG] Convert extload check to equivalent type check. NFC.
Replace check with check that consuming store has the same type.

llvm-svn: 309708
2017-08-01 17:19:41 +00:00
Nirav Dave b5a0af6b6e [DAG] Move extload check in store merge. NFC.
Move candidate check from later check to initial candidate check.

llvm-svn: 309698
2017-08-01 16:00:47 +00:00
Nirav Dave b5cb48c6ae [DAG] Extend visitSCALAR_TO_VECTOR optimization to truncated vector.
Summary:
Allow SCALAR_TO_VECTOR of EXTRACT_VECTOR_ELT to reduce to
EXTRACT_SUBVECTOR of vector shuffle when output is smaller. Marginally
improves vector shuffle computations.

Reviewers: efriedma, RKSimon, spatel

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D35566

llvm-svn: 309680
2017-08-01 13:45:35 +00:00
Eli Friedman 37f41d1e7c [ScheduleDAG] Don't schedule node with physical register interference
https://reviews.llvm.org/D31536 didn't really solve the problem it was
trying to solve; it got rid of the assertion failure, but we were still
scheduling the DAG incorrectly (mixing together instructions from
different calls), leading to a MachineVerifier failure.

In order to schedule the DAG correctly, we have to make sure we don't
schedule a node which should be blocked by an interference. Fix
ScheduleDAGRRList::PickNodeToScheduleBottomUp so it doesn't pick a node
like that.

The added call to FindAvailableNode() is the key change here; this makes
sure we don't try to schedule a call while we're in the middle of
scheduling a different call. I'm not sure this is the right approach; in
particular, I'm not sure how to prove we don't end up with an infinite
loop of repeatedly backtracking.

This also reverts the code change from D31536. It doesn't do anything
useful: we should never schedule an ADJCALLSTACKDOWN unless we've
already scheduled the corresponding ADJCALLSTACKUP.

Differential Revision: https://reviews.llvm.org/D33818

llvm-svn: 309642
2017-08-01 00:28:40 +00:00
Simon Dardis 8930b4a049 [SelectionDAG][mips] Fix PR33883
PR33883 shows that calls to intrinsic functions should not have their vector
arguments or returns subject to ABI changes required by the target.

This resolves PR33883.

Thanks to Alex Crichton for reporting the issue!

Reviewers: zoran.jovanovic, atanasyan

Differential Revision: https://reviews.llvm.org/D35765

llvm-svn: 309561
2017-07-31 14:06:58 +00:00
Simon Pilgrim 718cb0ea62 [SelectionDAG][X86] CombineBT - more aggressively determine demanded bits
This patch is in 2 parts:

1 - replace combineBT's use of SimplifyDemandedBits (hasOneUse only) with SelectionDAG::GetDemandedBits to more aggressively determine the lower bits used by BT.

2 - update SelectionDAG::GetDemandedBits to support ANY_EXTEND - if the demanded bits are only in the non-extended portion, then peek through and demand from the source value and then ANY_EXTEND that if we found a match.

Differential Revision: https://reviews.llvm.org/D35896

llvm-svn: 309486
2017-07-29 14:50:25 +00:00
Adrian Prantl 8b9bb534a1 Remove the unused offset from DBG_VALUE (NFC)
Followup to r309426.
rdar://problem/33580047

llvm-svn: 309450
2017-07-28 23:00:45 +00:00
Adrian Prantl a617576bb1 Remove the unused dbg.value offset from SelectionDAG (NFC)
Followup to r309426.
rdar://problem/33580047

llvm-svn: 309436
2017-07-28 21:27:35 +00:00
Adrian Prantl abe04759a6 Remove the obsolete offset parameter from @llvm.dbg.value
There is no situation where this rarely-used argument cannot be
substituted with a DIExpression and removing it allows us to simplify
the DWARF backend. Note that this patch does not yet remove any of
the newly dead code.

rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D35951

llvm-svn: 309426
2017-07-28 20:21:02 +00:00
Simon Pilgrim ac84850ea6 [SelectionDAG] Improve DAGTypeLegalizer::convertMask assertion (PR33960)
Improve DAGTypeLegalizer::convertMask's isSETCCorConvertedSETCC assertion to properly check for any mixture of SETCC or BUILD_VECTOR of constants, or a logical mask op of them.

llvm-svn: 309302
2017-07-27 18:15:54 +00:00
Simon Pilgrim 64a795c5f5 [SelectionDAG] Avoid repeated calls to getNumOperands in for loop. NFCI.
llvm-svn: 309283
2017-07-27 15:42:21 +00:00
Simon Pilgrim 0d543b5921 [SelectionDAG] Tidyup mask creation. NFCI.
Assign all concat elements to UNDEF and then just replace the first element, instead of copying everything individually.

llvm-svn: 309277
2017-07-27 15:08:53 +00:00
Peter Collingbourne 081ffe2ff2 Change CallLoweringInfo::CS to be an ImmutableCallSite instead of a pointer. NFCI.
This was a use-after-free waiting to happen.

llvm-svn: 309159
2017-07-26 19:15:29 +00:00
Zvi Rackover 092f199188 DAGCombiner: Extend reduceBuildVecToTrunc to handle non-zero offset
Summary:
Adding support for combining power2-strided build_vector's where the
first build_vectori's operand is extracted from a non-zero index.

Example:

 v4i32 build_vector((extract_elt V, 1),
                    (extract_elt V, 3),
                    (extract_elt V, 5),
                    (extract_elt V, 7))
 -->
 v4i32 truncate (bitcast (shuffle<1,u,3,u,5,u,7,u> V, u) to v4i64)

Reviewers: delena, RKSimon, guyblank

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35700

llvm-svn: 309108
2017-07-26 12:57:03 +00:00
Simon Pilgrim 6d59933175 [DAG] Move DAGCombiner::GetDemandedBits to SelectionDAG
This patch moves the DAGCombiner::GetDemandedBits function to SelectionDAG::GetDemandedBits as a first step towards making it easier for targets to get to the source of any demanded bits without the limitations of SimplifyDemandedBits.

Differential Revision: https://reviews.llvm.org/D35841

llvm-svn: 308983
2017-07-25 16:36:44 +00:00
Francois Pichet 82bf3de606 Fix endianness bug in DAGCombiner::visitTRUNCATE and visitEXTRACT_VECTOR_ELT
Summary:
Do not assume little endian architecture in DAGCombiner::visitTRUNCATE and DAGCombiner::visitEXTRACT_VECTOR_ELT.
PR33682

Reviewers: hfinkel, sdardis, RKSimon

Reviewed By: sdardis, RKSimon

Subscribers: uabelho, RKSimon, sdardis, llvm-commits

Differential Revision: https://reviews.llvm.org/D34990

llvm-svn: 308960
2017-07-25 09:40:35 +00:00
Nirav Dave 4e6dcf73f9 [DAG] Fix typo preventing some stores merges to truncated stores.
Check the actual memory type stored and not the extended value size
when considering if truncated store merge is worthwhile.

Reviewers: efriedma, RKSimon, spatel, jyknight

Reviewed By: efriedma

Subscribers: llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D35623

llvm-svn: 308833
2017-07-23 02:06:28 +00:00
Xin Tong 495a3022da [DAGCombiner] Update comment. NFC
llvm-svn: 308772
2017-07-21 19:10:19 +00:00
Nirav Dave 4aa51c3af1 [DAG] Commit missed nit cleanup from r308617. NFC.
llvm-svn: 308645
2017-07-20 18:07:57 +00:00
Nirav Dave df86d2d008 [DAG] Handle missing transform in fold of value extension case.
Summary:
When pushing an extension of a constant bitwise operator on a load
into the load, change other uses of the load value if they exist to
prevent the old load from persisting.

Reviewers: spatel, RKSimon, efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35030

llvm-svn: 308618
2017-07-20 13:57:32 +00:00
Nirav Dave 77cc6f23b9 [DAG] Optimize away degenerate INSERT_VECTOR_ELT nodes.
Summary:
Add missing vector write of vector read reduction, i.e.:

(insert_vector_elt x (extract_vector_elt x idx) idx) to x

Reviewers: spatel, RKSimon, efriedma

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35563

llvm-svn: 308617
2017-07-20 13:48:17 +00:00
Simon Pilgrim 2911296f10 [DAGCombiner] Match ISD::SRL non-uniform constant vectors patterns using predicates.
Use predicate matchers introduced in D35492 to match more ISD::SRL constant folds

llvm-svn: 308602
2017-07-20 11:03:30 +00:00
Simon Pilgrim b9ff25df59 Remove trailing whitespace. NFCI.
llvm-svn: 308601
2017-07-20 10:43:52 +00:00
Simon Pilgrim 7ff0e49d8c [DAGCombiner] Match ISD::SRA non-uniform constant vectors patterns using predicates.
Use predicate matchers introduced in D35492 to match more ISD::SRA constant folds

llvm-svn: 308600
2017-07-20 10:43:05 +00:00
Simon Pilgrim 9d7863b935 [DAGCombiner] Match non-uniform constant vectors using predicates.
Most combines currently recognise scalar and splat-vector constants, but not non-uniform vector constants.

This patch introduces a matching mechanism that uses predicates to check against BUILD_VECTOR of ConstantSDNode, as well as scalar ConstantSDNode cases.

I've changed a couple of predicates to demonstrate - the combine-shl changes add currently unsupported cases, while the MatchRotate replaces an existing mechanism.

Differential Revision: https://reviews.llvm.org/D35492

llvm-svn: 308598
2017-07-20 10:13:40 +00:00
Simon Pilgrim c77e262260 {DAGCombine] Convert (Val & Mask) == Mask to Mask.isSubsetof(Val). NFCI.
llvm-svn: 308460
2017-07-19 13:39:58 +00:00
Nirav Dave d839749ae8 [DAG] Improve Aliasing of operations to static alloca
Re-recommiting after landing DAG extension-crash fix.

Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.

Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.

Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.

Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.

The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.

Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand

Reviewed By: rnk

Subscribers: sdardis, nemanjai, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33345

llvm-svn: 308350
2017-07-18 20:06:24 +00:00
Nirav Dave 041b87758a [DAG] Reverse node replacement in extension operation. NFCI.
Reorder replacements to be user first in preparation for multi-level
folding to premptively avoid inadvertantly deleting later nodes from
sharing found from replacement.

llvm-svn: 308348
2017-07-18 19:49:20 +00:00
Nirav Dave 07871007aa [DAG] Avoid deleting nodes before combining them.
When replacing a node and it's operand, replacing the operand node may
cause the deletion of the original node leading to an assertion
failure. Case around these replacements to avoid this without relying
on inspecting the DELETED_NODE opcode in various extend
dagcombiner cases.

Fixes PR32515.

Reviewers: dbabokin, RKSimon, davide, chandlerc

Subscribers: chandlerc, llvm-commits

Differential Revision: https://reviews.llvm.org/D34095

llvm-svn: 308330
2017-07-18 17:39:15 +00:00
Nirav Dave f87c8e82f6 [DAG] Allow base element type of store merge type to also be a vector.
Correctly calculate merged vector size if MemVT is already a vector.

llvm-svn: 308312
2017-07-18 14:39:09 +00:00
Simon Pilgrim 4793a11df9 [DAGCombine] Fix issue with out of bound constant rotation (PR33828)
Take the modulo of rotations by a constant greater than or equal to the bit-width

llvm-svn: 308302
2017-07-18 12:31:46 +00:00
Chandler Carruth a15e080b05 Revert r308025 due to uncovering a crash in SelectionDAG. This is filed
with a minimal test case in http://llvm.org/PR33833.

Original commit message:
  Improve Aliasing of operations to static alloca

llvm-svn: 308271
2017-07-18 07:53:47 +00:00
Andrew Zhogin 67a64041b9 [DAGCombiner] Recognise vector rotations with non-splat constants
Fixes PR33691.

Differential revision: https://reviews.llvm.org/D35381

llvm-svn: 308150
2017-07-16 23:11:45 +00:00
Simon Pilgrim e7a2e6bdf1 Strip trailing whitespace. NFCI
llvm-svn: 308108
2017-07-15 19:29:19 +00:00
Nirav Dave a8f63af9d1 Improve Aliasing of operations to static alloca
Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.

Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.

Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.

Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.

The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.

Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand

Reviewed By: rnk

Subscribers: sdardis, nemanjai, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33345

llvm-svn: 308025
2017-07-14 13:56:21 +00:00
Simon Dardis 250256f9c9 Reland "[mips] Fix multiprecision arithmetic."
For multiprecision arithmetic on MIPS, rather than using ISD::ADDE / ISD::ADDC,
get SelectionDAG to break down the operation into ISD::ADDs and ISD::SETCCs.

For MIPS, only the DSP ASE has a carry flag, so in the general case it is not
useful to directly support ISD::{ADDE, ADDC, SUBE, SUBC} nodes.

Also improve the generation code in such cases for targets with
TargetLoweringBase::ZeroOrOneBooleanContent by directly using the result of the
comparison node rather than using it in selects. Similarly for ISD::SUBE /
ISD::SUBC.

Address optimization breakage by moving the generation of MIPS specific integer
multiply-accumulate nodes to before legalization.

This revolves PR32713 and PR33424.

Thanks to Simonas Kazlauskas and Pirama Arumuga Nainar for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D33494

The previous version of this patch was too aggressive in producing fused
integer multiple-addition instructions.

llvm-svn: 307906
2017-07-13 11:28:05 +00:00
Simon Pilgrim bb85cb16e3 [DAGCombiner] Fix issue with rotate combines asserting if the constant value types differ from the result type.
llvm-svn: 307900
2017-07-13 10:41:49 +00:00
Simon Pilgrim 2dc42b7202 Use isNullConstantOrNullSplatConstant helper. NFCI.
llvm-svn: 307895
2017-07-13 09:39:00 +00:00
Geoff Berry bea2e188e9 [TargetLowering] Add hook for adding target MMO flags when doing ISel.
Summary: Add TargetLowering hook getMMOFlags() to add target specific
MMO flags to load/store instructions created by ISel.

Reviewers: bogner, hfinkel, qcolombet, MatzeB

Subscribers: mcrosier, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D34962

llvm-svn: 307879
2017-07-13 03:49:42 +00:00
Daniel Neilson 965613ef1b Add element atomic memset intrinsic
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memset intrinsic. This intrinsic is essentially memset with the implementation requirement that all stores used for the assignment are done with unordered-atomic stores of a given element size.

Reviewers: eli.friedman, reames, mkazantsev, skatkov

Reviewed By: reames

Subscribers: jfb, dschuff, sbc100, jgravelle-google, aheejin, efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D34885

llvm-svn: 307854
2017-07-12 21:57:23 +00:00
Daniel Neilson 57226ef33c Add element atomic memmove intrinsic
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memmove intrinsic. This intrinsic is essentially memmove with the implementation requirement that all loads/stores used for the copy are done with unordered-atomic loads/stores of a given element size.

Reviewers: eli.friedman, reames, mkazantsev, skatkov

Reviewed By: reames

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34884

llvm-svn: 307796
2017-07-12 15:25:26 +00:00
Konstantin Zhuravlyov bb80d3e1d3 Enhance synchscope representation
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
  global and local memory. These scopes restrict how synchronization is
  achieved, which can result in improved performance.

  This change extends existing notion of synchronization scopes in LLVM to
  support arbitrary scopes expressed as target-specific strings, in addition to
  the already defined scopes (single thread, system).

  The LLVM IR and MIR syntax for expressing synchronization scopes has changed
  to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
  replaces *singlethread* keyword), or a target-specific name. As before, if
  the scope is not specified, it defaults to CrossThread/System scope.

  Implementation details:
    - Mapping from synchronization scope name/string to synchronization scope id
      is stored in LLVM context;
    - CrossThread/System and SingleThread scopes are pre-defined to efficiently
      check for known scopes without comparing strings;
    - Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
      the bitcode.

Differential Revision: https://reviews.llvm.org/D21723

llvm-svn: 307722
2017-07-11 22:23:00 +00:00
Matthias Braun b38736706e Revert "[DAG] Improve Aliasing of operations to static alloca"
Reverting as it breaks tramp3d-v4 in the llvm test-suite. I added some
comments to https://reviews.llvm.org/D33345 about it.

This reverts commit r307546.

llvm-svn: 307589
2017-07-10 20:51:30 +00:00
Nirav Dave 4dcad5dc6b Add DAG argument to canMergeStoresTo NFC.
llvm-svn: 307583
2017-07-10 20:25:54 +00:00
Nirav Dave 163e1ad9dc [DAG] Improve Aliasing of operations to static alloca
Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.

Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.

Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.

The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.

Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand

Reviewed By: rnk

Subscribers: sdardis, nemanjai, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33345

llvm-svn: 307546
2017-07-10 15:39:41 +00:00
Hiroshi Inoue a86c920b1e fix typos in comments and error messages; NFC
llvm-svn: 307533
2017-07-10 12:44:25 +00:00
Davide Italiano c4b0ccd049 [X86] Relax an assertion when legalizing vector types.
WidenVSELECTAndMask can fold (and it folds in this case) so we
get a BUILD_VECTOR of constants as mask. convertMask() seems to
work fine when the input is a vector of constants, and we still
need to call it to extend/add elements at the end. but the current
code just asserts on anything but a SETCC or AND/OR/XOR of 2xSETCC.
This change was discussed briefly with Simon Pilgrim, who also
suggests we might consider dropping this assertion in the future.

Fixes PR33715.

llvm-svn: 307508
2017-07-09 19:22:48 +00:00
Simon Pilgrim 55a4b6700f Handle ConstantExpr correctly in SelectionDAGBuilder
This change fixes a bug in SelectionDAGBuilder::visitInsertValue and SelectionDAGBuilder::visitExtractValue where constant expressions (InsertValueConstantExpr and ExtractValueConstantExpr) would be treated as non-constant instructions (InsertValueInst and ExtractValueInst). This bug resulted in an incorrect memory access, which manifested as an assertion failure in SDValue::SDValue.

Fixes PR#33094.

Submitted on behalf of @Praetonus (Benoit Vey)

Differential Revision: https://reviews.llvm.org/D34538

llvm-svn: 307502
2017-07-09 16:01:04 +00:00
Igor Breger b80b44b7b9 [FastISel] fix a fallback diagnostic.
Summary: FastISel was marked as failed in case instruction selection succeeded.

Reviewers: qcolombet, zvi, rovka, ab

Reviewed By: zvi

Subscribers: javed.absar, ab, qcolombet, bogner, llvm-commits

Differential Revision: https://reviews.llvm.org/D34438

llvm-svn: 307489
2017-07-09 05:55:20 +00:00
Sanjay Patel 4cea2ec254 [DAGCombiner] use local variable to shorten code; NFCI
llvm-svn: 307429
2017-07-07 19:34:42 +00:00
Vadim Chugunov e6f76558c7 Fix libcall expansion creating DAG nodes with invalid type post type legalization.
If we are lowering a libcall after legalization, we'll split the return type into a pair of legal values.

Patch by Jatin Bhateja and Eli Friedman.

Differential Revision: https://reviews.llvm.org/D34240

llvm-svn: 307207
2017-07-05 22:01:49 +00:00
Simon Pilgrim ac78daf517 {DAGCombiner] Fold (rot x, 0) -> x
llvm-svn: 307184
2017-07-05 18:27:11 +00:00
Andrew Zhogin 45d192823e [DAGCombiner] visitRotate patch to optimize pair of ROTR/ROTL instructions into one with combined shift operand.
For two ROTR operations with shifts C1, C2; combined shift operand will be (C1 + C2) % bitsize.

Differential revision: https://reviews.llvm.org/D12833

llvm-svn: 307179
2017-07-05 17:55:42 +00:00
Nirav Dave b320ef9fab Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset
Relanding after rewriting undef.ll test to avoid host-dependant
endianness.

As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using
generic checks. Also, propagate missing local handling from there to
BaseIndexOffset checks.

Tests of note:

  * test/CodeGen/X86/build-vector* - Improved.
  * test/CodeGen/BPF/undef.ll - Improved store alignment allows an
    additional store merge

  * test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a
    case we already do not handle well. Here, the DAG is improved, but
    scheduling causes a code size degradation.

Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D34472

llvm-svn: 307114
2017-07-05 01:21:23 +00:00
Hiroshi Inoue 79f8933f23 fix trivial typos in comments; NFC
llvm-svn: 307094
2017-07-04 16:35:26 +00:00
Andrew Zhogin de5d250a0b [DAGCombiner] Intermediate variables in visitRotate promoted to the function's begin. NFC precommit for D12833.
llvm-svn: 307091
2017-07-04 15:57:39 +00:00
Anna Thomas a66a98cc74 [FastISel][SelectionDAG]Teach fastISel about GC intrinsics
Summary:
We are crashing in LLC at O0 when gc intrinsics are present in the block.
The reason being FastISel performs basic block ISel by modifying GC.relocates
to be the first instruction in the block. This can cause us to visit the GC
relocate before it's corresponding GC.statepoint is visited, which is incorrect.
When we lower the statepoint, we record the base and derived pointers, along
with the gc.relocates. After this we can visit the gc.relocate.

This patch avoids fastISel from incorrectly creating the block with gc.relocate
as the first instruction.

Reviewers: qcolombet, skatkov, qikon, reames

Reviewed by: skatkov

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34421

llvm-svn: 307084
2017-07-04 15:09:09 +00:00
Nirav Dave a2810e677b [DAG] Fixed predicate for determining when two frame indices
addresses are comparable. NFCI.

llvm-svn: 307055
2017-07-04 02:20:17 +00:00
Anton Yartsev 66d32c5e06 [legalize-types] Clean up softening machinery.
The patch makes SoftenFloatResult/Operand logic just the same as all other legalization routines have: SoftenFloatResult() now fills the SoftenFloats map and SoftenFloatOperand() perform all needed replacements. This prevents softening mashinery from leaving stale entries in SoftenFloats map (that resulted in errors during the legalize type checking) and clarifies softening. The patch replaces https://reviews.llvm.org/D29265.

Differential Revision: https://reviews.llvm.org/D31946

llvm-svn: 307053
2017-07-04 01:08:55 +00:00
Zvi Rackover d7a1c334ce DAGCombine: Combine BUILD_VECTOR to TRUNCATE
Summary:
Add a combine for creating a truncate to replace a build_vector composed of extracts with
indices that form a stride-2^N series.

Example:
v8i32 V = ...

v4i32 build_vector((extract_elt V, 0), (extract_elt V, 2), (extract_elt V, 4), (extract_elt V, 6))
-->
v4i32 truncate (bitcast V to v4i64)

Related discussion in llvm-dev about canonicalizing shuffles to
truncates in LLVM IR:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/108936.html.

Reviewers: spatel, RKSimon, efriedma, igorb, craig.topper, wolfgangp, delena

Reviewed By: delena

Subscribers: guyblank, delena, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D34077

llvm-svn: 307036
2017-07-03 15:47:40 +00:00
Craig Topper 92a8fe34e5 [SelectionDAGBuilder] Use EVT::getVectorVT instead of MVT::getVectorVT to prevent a crash if the type isn't a simple VT.
llvm-svn: 306950
2017-07-01 06:46:09 +00:00
Nirav Dave a35938d827 Revert "[DAG] Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset"
This reverts commit r306819 which appears be exposing underlying
issues in a stage1 ppc64be build

llvm-svn: 306820
2017-06-30 12:56:02 +00:00
Nirav Dave c5a48c1ee8 [DAG] Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset
As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using
generic checks. Also, propagate missing local handling from there to
BaseIndexOffset checks.

Tests of note:

  * test/CodeGen/X86/build-vector* - Improved.
  * test/CodeGen/BPF/undef.ll - Improved store alignment allows an
    additional store merge

  * test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a
    case we already do not handle well. Here, the DAG is improved, but
    scheduling causes a code size degradation.

Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D34472

llvm-svn: 306819
2017-06-30 12:23:41 +00:00
Simon Dardis dede76f428 Revert "[mips] Fix multiprecision arithmetic."
This reverts commit r305389. This broke chromium builds, so reverting
while I investigate further.

llvm-svn: 306741
2017-06-29 20:59:47 +00:00
Nirav Dave 168c5a6a40 [DAG] Fold FrameIndex offset into BaseIndexOffset analysis. NFCI.
Relanding after restricting equalBaseIndex to not erroneuosly consider
a FrameIndices stemming from alloca from being comparable as its
offset is set post-selectionDAG.

Pull FrameIndex comparision reasoning from DAGCombiner::isAlias to
general BaseIndexOffset.

llvm-svn: 306688
2017-06-29 15:48:11 +00:00
Stanislav Mekhanoshin a45584bebe Fold fneg and fabs like multiplications
Given no NaNs and no signed zeroes it folds:

(fmul X, (select (fcmp X > 0.0), -1.0, 1.0)) -> (fneg (fabs X))
(fmul X, (select (fcmp X > 0.0), 1.0, -1.0)) -> (fabs X)

Differential Revision: https://reviews.llvm.org/D34579

llvm-svn: 306592
2017-06-28 20:25:50 +00:00
Nirav Dave c4ce2293b0 Revert "[DAG] Fold FrameIndex offset into BaseIndexOffset analysis. NFCI."
This reverts commit r306498 which appears to cause a compilrt-rt test failures

llvm-svn: 306501
2017-06-28 03:20:04 +00:00
Stanislav Mekhanoshin eb40733bf0 Allow to truncate left shift with non-constant shift amount
That is pretty common for clang to produce code like
(shl %x, (and %amt, 31)). In this situation we can still perform
trunc (shl) into shl (trunc) conversion given the known value
range of shift amount.

Differential Revision: https://reviews.llvm.org/D34723

llvm-svn: 306499
2017-06-28 02:37:11 +00:00
Nirav Dave 8ef03802f1 [DAG] Fold FrameIndex offset into BaseIndexOffset analysis. NFCI.
Pull FrameIndex comparision reasoning from DAGCombiner::isAlias to
general BaseIndexOffset.

llvm-svn: 306498
2017-06-28 02:09:50 +00:00
Hiroshi Inoue 84aafee4fb [SelectionDAG] set dereferenceable flag in MergeConsecutiveStores to fix assetion failure
When SelectionDAG merges consecutive stores and loads in MergeConsecutiveStores, it does not set dereferenceable flag for a created load instruction. This results in an assertion failure if SelectionDAG commonizes this load instruction with other load instructions, as well as it may miss optimization opportunities.

This patch sat dereferenceable flag for the newly created load instruction if all the load instructions to be merged are dereferenceable.

Differential Revision: https://reviews.llvm.org/D34679

llvm-svn: 306404
2017-06-27 12:43:08 +00:00
Wolfgang Pieb 9f65858235 DAGCombine: Make sure we only eliminate trunc/extend when the scales of truncation and extension match.
This fixes PR33368.

Reviewer: rksimon

Differential Revision:  https://reviews.llvm.org/D34069

llvm-svn: 306345
2017-06-26 23:05:51 +00:00
Elena Demikhovsky 72f991cded AVX-512: Fixed a crash during legalization of <3 x i8> type
The compiler fails with assertion during legalization of SETCC for <3 x i8> operands.
The result is extended to <4 x i8> and then truncated <4 x i1>. It does not happen on AVX2, because the final result of SETCC is <4 x i32>.

Differential Revision: https://reviews.llvm.org/D34503

llvm-svn: 306242
2017-06-25 13:36:20 +00:00
Hiroshi Inoue 95f24dca98 [SelectionDAG] set dereferenceable flag when expanding memcpy/memmove
When SelectionDAG expands memcpy (or memmove) call into a sequence of load and store instructions, it disregards dereferenceable flag even the source pointer is known to be dereferenceable.
This results in an assertion failure if SelectionDAG commonizes a load instruction generated for memcpy with another load instruction for the source pointer.
This patch makes SelectionDAG to set the dereferenceable flag for the load instructions properly to avoid the assertion failure.

Differential Revision: https://reviews.llvm.org/D34467

llvm-svn: 306209
2017-06-24 15:17:38 +00:00
Nirav Dave f2c349ccec [DAG] Add Target Store Merge pass ordering function
Allow targets to specify if they should merge stores before or after
legalization.

llvm-svn: 306006
2017-06-22 15:07:49 +00:00
Nirav Dave c1b6aa77bb [DAG] Move BaseIndexOffset into separate Libarary. NFC.
Move BaseIndexOffset analysis out of DAGCombiner for use in other
files.

llvm-svn: 305921
2017-06-21 15:40:43 +00:00
Nirav Dave 9a69d444a3 [DAG] Remove Node csonstruction from BaseIndexOffset match. NFCI.
Move GlobalAddress Offset decomposition from initial match into
comparision check and removing the possibility of constructing a new
offseted global address when examining addresses.

llvm-svn: 305917
2017-06-21 15:07:30 +00:00
Guy Blank 52d73fce85 [DAGCombiner] Add another combine from build vector to shuffle
Add support for combining a build vector to a shuffle.
When the build vector is of extracted elements from 2 vectors (vec1, vec2) where vec2 is 2 times smaller than vec1.

llvm-svn: 305883
2017-06-21 07:38:41 +00:00
Tim Northover 208ddc5bdc DAG: correctly legalize UMULO.
We were incorrectly sign extending into the high word (as you would for
SMULO) when legalizing UMULO in terms of a wider full multiplication.

Patch by James Duley.

llvm-svn: 305800
2017-06-20 15:01:38 +00:00
Haojian Wu 6bd5cc6239 [SelectionDAG] Fix an use-after-free issue introduced in r305775.
vector.back() will be invalidated when memory reallocation happens.

llvm-svn: 305785
2017-06-20 09:29:43 +00:00
Max Kazantsev b5c3362873 [SelectionDAG] Get rid of recursion in CalcNodeSethiUllmanNumber
The recursive implementation of CalcNodeSethiUllmanNumber may
overflow stack on extremely long pred chains. This patch replaces it
with an equivalent iterative implementation.

Differential Revision: https://reviews.llvm.org/D33769

llvm-svn: 305775
2017-06-20 07:07:09 +00:00
Nirav Dave 47a78a2502 [DAG] Simplify BaseIndexOffset. NFCI.
Remove tail calls and cleanup codeflow.

llvm-svn: 305768
2017-06-20 02:48:39 +00:00
Nirav Dave 8dcd008d18 Allow truncated and extend memory operations in Store Merge. NFCI.
As all store merges checks are based on the memory operation
performed, allow use of truncated stores and extended loads as valid
input candidates for merging.

Relanding after fixing selection between truncated and normal store.

llvm-svn: 305701
2017-06-19 15:32:28 +00:00
Davide Italiano 9382c5560b [SelectionDAG] Update Loop info after splitting critical edges.
The analysis is expected to be preserved by SelectionDAG.

llvm-svn: 305621
2017-06-17 00:56:27 +00:00
Craig Topper 288b3c9e69 [SelectionDAG] Use APInt::isSubsetOf. NFC
llvm-svn: 305606
2017-06-16 23:19:14 +00:00
Craig Topper ea5b8bc9ef [SelectionDAG] Use APInt::isNullValue/isOneValue. NFC
llvm-svn: 305605
2017-06-16 23:19:12 +00:00
Craig Topper b681907c50 [TargetLowering] Use ConstantSDNode::isOne and getSExtValue instead of getting the underlying APInt first. NFC
llvm-svn: 305604
2017-06-16 23:19:10 +00:00
Daniel Neilson 3faabbbe85 [Atomics] Rename and change prototype for atomic memcpy intrinsic
Summary:

Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html

This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic.

Reviewers: reames, sanjoy, efriedma

Reviewed By: reames

Subscribers: mzolotukhin, anna, llvm-commits, skatkov

Differential Revision: https://reviews.llvm.org/D33240

llvm-svn: 305558
2017-06-16 14:43:59 +00:00
Ahmed Bougacha d41a9c4e66 Revert "[DAG] Allow truncated and extend memory operations in Store Merge. NFCI."
This reverts commit r305468, as it caused PR33475.

llvm-svn: 305527
2017-06-15 23:29:47 +00:00
Benjamin Kramer 00a970a84b Fold variable into assert.
Silences an unused variable warning in Release builds.

llvm-svn: 305488
2017-06-15 17:58:24 +00:00
Arnold Schwaighofer ae9312c487 ISel: Fix FastISel of swifterror values
The code assumed that we process instructions in basic block order.  FastISel
processes instructions in reverse basic block order. We need to pre-assign
virtual registers before selecting otherwise we get def-use relationships wrong.

This only affects code with swifterror registers.

rdar://32659327

llvm-svn: 305484
2017-06-15 17:34:42 +00:00
Nirav Dave 9d79cade42 [DAG] As StoreMerge now generates only legal nodes remove unecessary guard when run post-legalization NFCI.
llvm-svn: 305477
2017-06-15 16:27:49 +00:00
Nirav Dave be2674a598 [DAG] Defer Pre/Post IndexStore merge to after mergestore. NFCI.
In preparation for doing storemerge post-legalization, reorder
visitSTORE passes to move pre/post-index combining after store
merge. Reordered passes other than store merge are unaffected.

llvm-svn: 305473
2017-06-15 15:05:48 +00:00
Nirav Dave 9464c72850 [DAG] Allow truncated and extend memory operations in Store Merge. NFCI.
As all store merges checks are based on the memory operation
performed, allow use of truncated stores and extended loads as valid
input candidates for merging.

llvm-svn: 305468
2017-06-15 14:04:07 +00:00
Nirav Dave 6a41822ba7 [DAG] Make MergeStores generate legalized stores. NFCI.
Realized merged stores as truncstores if store will be realized as
such by legalization.

llvm-svn: 305467
2017-06-15 13:34:54 +00:00
Nirav Dave 9a4998980d [DAG] Use correct size for truncated store merge of load. NFCI.
Avoid non-legal memory ops by checking correct size when merging
stores of loads into a extload-truncstore pair.

llvm-svn: 305466
2017-06-15 13:28:06 +00:00
Simon Dardis 9790e39f45 [mips] Fix multiprecision arithmetic.
For multiprecision arithmetic on MIPS, rather than using ISD::ADDE / ISD::ADDC,
get SelectionDAG to break down the operation into ISD::ADDs and ISD::SETCCs.

For MIPS, only the DSP ASE has a carry flag, so in the general case it is not
useful to directly support ISD::{ADDE, ADDC, SUBE, SUBC} nodes.

Also improve the generation code in such cases for targets with
TargetLoweringBase::ZeroOrOneBooleanContent by directly using the result of the
comparison node rather than using it in selects. Similarly for ISD::SUBE /
ISD::SUBC.

Address optimization breakage by moving the generation of MIPS specific integer
multiply-accumulate nodes to before legalization.

This revolves PR32713 and PR33424.

Thanks to Simonas Kazlauskas and Pirama Arumuga Nainar for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D33494

llvm-svn: 305389
2017-06-14 14:46:30 +00:00
Geoff Berry 06c9dc3d9c [SelectionDAG] Allow sin/cos -> sincos optimization on GNU triples w/ just -fno-math-errno
Summary:
This change enables the sin(x) cos(x) -> sincos(x) optimization on GNU
target triples.  This optimization was being inhibited when -ffast-math
wasn't set because sincos in GLibC does not set errno, while sin and cos
do.  However, this optimization will only run if the attributes on the
sin/cos calls include readnone, which is how clang represents the fact
that it doesn't care about the errno values set by these functions (via
the -fno-math-errno flag).

Reviewers: hfinkel, bogner

Subscribers: mcrosier, javed.absar, llvm-commits, paul.redmond

Differential Revision: https://reviews.llvm.org/D32921

llvm-svn: 305204
2017-06-12 17:15:41 +00:00
Sanjay Patel d4765a38b4 [DAG] add helper to bind memop chains; NFCI
This step is just intended to reduce code duplication rather than change any functionality.

A follow-up would be to replace PPCTargetLowering::spliceIntoChain() usage with this new helper.

Differential Revision: https://reviews.llvm.org/D33649

llvm-svn: 305192
2017-06-12 14:41:48 +00:00
Amaury Sechet 2127452ff7 [DAGCombine] Make sure we check the ResNo from UADDO before combining
Summary: UADDO has 2 result, and one must check the result no before doing any kind of combine. Without it, the transform is invalid.

Reviewers: joerg

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34088

llvm-svn: 305162
2017-06-11 11:36:38 +00:00
Zvi Rackover 3a2c4b48bb SelectionDAG: Remove deleted nodes from legalized set to avoid clash with newly created nodes
Summary:
During DAG legalization loop in SelectionDAG::Legalize(),
bookkeeping of the SDNodes that were already legalized is implemented
with SmallPtrSet (LegalizedNodes). This kind of set stores only pointers
to objects, not the objects themselves. Unfortunately, if SDNode is
deleted during legalization for some reason, LegalizedNodes set is not
informed about this fact. This wouldn’t be so bad, if SelectionDAG wouldn’t reuse
space deallocated after deletion of unused nodes, for creation of new
ones. Because of this, new nodes, created during legalization often can
have pointers identical to ones that have been previously legalized,
added to the LegalizedNodes set, and deleted afterwards. This in turn
causes, that newly created nodes, sharing the same pointer as deleted
old ones, are present in LegalizedNodes *already at the moment of
creation*, so we never call Legalize on them.
The fix facilitates the fact, that DAG notifies listeners about each
modification. I have registered DAGNodeDeletedListener inside
SelectionDAG::Legalize, with a callback function that removes any
pointer of any deleted SDNode from the LegalizedNodes set. With this
modification, LegalizeNodes set does not contain pointers to nodes that
were deleted, so newly created nodes can always be inserted to it, even
if they share pointers with old deleted nodes.

Patch by pawel.szczerbuk@intel.com

The issue this patch addresses causes failures in an out-of-tree target,
and i was not able to create a reproducer for an in-tree target, hence
there is no test-case.

Reviewers: delena, spatel, RKSimon, hfinkel, davide, qcolombet

Reviewed By: delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33891

llvm-svn: 305084
2017-06-09 14:53:45 +00:00
Simon Dardis 212cccb2f4 Reland "[SelectionDAG] Enable target specific vector scalarization of calls and returns"
By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown,
backends can request that LLVM to scalarize vector types for calls
and returns.

The MIPS vector ABI requires that vector arguments and returns are passed in
integer registers. With SelectionDAG's new hooks, the MIPS backend can now
handle LLVM-IR with vector types in calls and returns. E.g.
'call @foo(<4 x i32> %4)'.

Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for
calls and returns if vector types were not legal. If vector types were legal,
a single 128bit vector argument would be assigned to a single 32 bit / 64 bit
integer register.

By teaching the MIPS backend to inspect the original types, it can now
implement the MIPS vector ABI which requires a particular method of
scalarizing vectors.

Previously, the MIPS backend relied on clang to scalarize types such as "call
@foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3,
i32 inreg %4)".

This patch enables the MIPS backend to take either form for vector types.

The previous version of this patch had a "conditional move or jump depends on
uninitialized value".

Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur

Differential Revision: https://reviews.llvm.org/D27845

llvm-svn: 305083
2017-06-09 14:37:08 +00:00
Nirav Dave 43a4d8122f Prevent RemoveDeadNodes from deleted already deleted node.
This prevents against assertion errors like PR32659 which occur from a
replacement deleting a node after it's been added to the list argument
of RemoveDeadNodes. The specific failure from PR32659 does not
currently happen, but it is still potentially possible. The underlying
cause is that the callers of the change dfunction builds up a list of
nodes to delete after having moved their uses and it possible that a
move of a later node will cause a previously deleted nodes to be
deleted.

Reviewers: bkramer, spatel, davide

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33731

llvm-svn: 305070
2017-06-09 12:57:35 +00:00
Eugene Zelenko 6ac7a34816 [CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 304954
2017-06-07 23:53:32 +00:00
Nirav Dave 772ea3ae1a [DAG] Improve Store Merge candidate pruning. NFC.
When considering merging stores values are the results of loads only
consider stores whose values come from loads from the same base.

This fixes much of the longer compile times in PR33330.

llvm-svn: 304934
2017-06-07 18:51:56 +00:00
Simon Pilgrim be8866f691 [DAG] Move SelectionDAG::isCommutativeBinOp to TargetLowering.
This will allow commutation of target-specific DAG nodes in future patches

Differential Revision: https://reviews.llvm.org/D33882

llvm-svn: 304911
2017-06-07 14:05:04 +00:00
Sanjay Patel 2726ea2ac0 [DAG] remove duplicated code for isOnlyUsedInZeroEqualityComparison(); NFCI
llvm-svn: 304822
2017-06-06 19:40:09 +00:00
Chandler Carruth 6bda14b313 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

llvm-svn: 304787
2017-06-06 11:49:48 +00:00
Mandeep Singh Grang 5e1697ef28 [llvm] Remove double semicolons
Reviewers: craig.topper, arsenm, mehdi_amini

Reviewed By: mehdi_amini

Subscribers: mehdi_amini, wdng, nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33924

llvm-svn: 304767
2017-06-06 05:08:36 +00:00
Davide Italiano fb4d5c095b [SelectionDAG] Update the dominator after splitting critical edges.
Running `llc -verify-dom-info` on the attached testcase results in a
crash in the verifier, due to a stale dominator tree.

i.e.

  DominatorTree is not up to date!
  Computed:
  =============================--------------------------------
  Inorder Dominator Tree:
    [1] %safe_mod_func_uint8_t_u_u.exit.i.i.i {0,7}
      [2] %lor.lhs.false.i61.i.i.i {1,2}
      [2] %safe_mod_func_int8_t_s_s.exit.i.i.i {3,6}
        [3] %safe_div_func_int64_t_s_s.exit66.i.i.i {4,5}

  Actual:
  =============================--------------------------------
  Inorder Dominator Tree:
    [1] %safe_mod_func_uint8_t_u_u.exit.i.i.i {0,9}
      [2] %lor.lhs.false.i61.i.i.i {1,2}
      [2] %safe_mod_func_int8_t_s_s.exit.i.i.i {3,8}
        [3] %safe_div_func_int64_t_s_s.exit66.i.i.i {4,5}
        [3] %safe_mod_func_int8_t_s_s.exit.i.i.i.lor.lhs.false.i61.i.i.i_crit_edge {6,7}

This is because in `SelectionDAGIsel` we split critical edges without
updating the corresponding dominator for the function (and we claim
in `MachineFunctionPass::getAnalysisUsage()` that the domtree is preserved).

We could either stop preserving the domtree in `getAnalysisUsage`
or tell `splitCriticalEdge()` to update it.
As the second option is easy to implement, that's the one I chose.

Differential Revision:  https://reviews.llvm.org/D33800

llvm-svn: 304742
2017-06-05 22:16:41 +00:00
Sanjay Patel 6350de76fa [DAGCombine] Fix unchecked calls to DAGCombiner::*ExtPromoteOperand
Other calls to DAGCombiner::*PromoteOperand check the result, but here it could cause an assertion in getNode. 
Falling back to any extend in this case instead of failing outright seems correct to me.

No test case because:
The failure was triggered by an out of tree backend. In order to trigger it, a backend would need to overload 
TargetLowering::IsDesirableToPromoteOp to return true for a type for which ISD::SIGN_EXTEND_INREG is marked 
illegal. In tree, only X86 overloads and sometimes returns true for MVT::i16 yet it marks 
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16  , Legal);.

Patch by Jacob Young!

Differential Revision: https://reviews.llvm.org/D33633

llvm-svn: 304723
2017-06-05 17:01:10 +00:00
Galina Kistanova bd79f73f02 Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.
llvm-svn: 304635
2017-06-03 05:11:14 +00:00
Eugene Zelenko c85638b29d [CodeGen] Fix Windows builds which treat warnings as errors, broken in r304621.
llvm-svn: 304627
2017-06-03 01:04:06 +00:00
Eugene Zelenko 167595ab51 [CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 304621
2017-06-03 00:22:41 +00:00
Philip Reames b70cecd60a [Statepoint] Be consistent about using deopt naming [NFCI]
We'd called this "vm state" in the early days, but have long since standardized on calling it "deopt" in line with the operand bundle tag.  Fix a few cases we'd missed.

llvm-svn: 304607
2017-06-02 23:03:26 +00:00
Sanjay Patel cdb5dad4cc [TargetLowering] fix formatting; NFC
llvm-svn: 304569
2017-06-02 17:35:02 +00:00
Amaury Sechet 437f7060fe nits in TargetLowering.cpp . NFC
llvm-svn: 304532
2017-06-02 09:18:18 +00:00
Max Kazantsev 4d8748a987 [SelectionDAG] Get rid of recursion in findNonImmUse
The recursive implementation of findNonImmUse may overflow stack
on extremely long use chains. This patch replaces it with an equivalent
iterative implementation.

Reviewed By: bogner

Differential Revision: https://reviews.llvm.org/D33775

llvm-svn: 304522
2017-06-02 07:11:00 +00:00
Nirav Dave 4952871630 [SDAG] Fix CombineTo ordering in visitZERO_EXTEND and visitSIGN_EXTEND
Reorder CombineTo Calls to prevent references to stale/deleted SDNodes which caused undue assertions.

Reviewers: dbabokin

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D31625

llvm-svn: 304460
2017-06-01 19:33:50 +00:00
Matt Arsenault b083570532 DAG: Remove pointless type check
These are only integer operations.

llvm-svn: 304417
2017-06-01 14:49:46 +00:00
Amaury Sechet c84cc230b3 Only generate addcarry node when it is legal.
Summary:
This is a problem uncovered by stage2 testing. ADDCARRY end up being generated on target that do not support it.

The patch that introduced the problem has other patches layed on top of it, so we want to fix the issue rather than revert it to avoid creating a lor of churn.

A regression test will be added shortly, but this is committed as this in order to get the build back to green promptly.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33770

llvm-svn: 304409
2017-06-01 12:03:16 +00:00
Amaury Sechet 251ea8a4f8 Do not legalize large setcc with setcce, introduce setcccarry and do it with usubo/setcccarry.
Summary:
This is a continuation of the work started in D29872 . Passing the carry down as a value rather than as a glue allows for further optimizations. Introducing setcccarry makes the use of addc/subc unecessary and we can start the removal process.

This patch only introduce the optimization strictly required to get the same level of optimization as was available before nothing more.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33374

llvm-svn: 304404
2017-06-01 11:14:17 +00:00
Amaury Sechet 9c5d1e966b [DAGCombine] Refactor common addcarry pattern.
Summary: This pattern is no very useful per se, but it exposes optimization for toehr patterns that wouldn't kick in otherwize. It's very common and worth optimizing for.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32756

llvm-svn: 304402
2017-06-01 10:48:04 +00:00
Amaury Sechet 2e43cb6d03 [DAGCombine] (add/uaddo X, Carry) -> (addcarry X, 0, Carry)
Summary:
This enables further transforms.

Depends on D32916

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32925

llvm-svn: 304401
2017-06-01 10:42:39 +00:00
Nirav Dave 3424373f30 [ScheduleDAG] Deal with already scheduled loads in ScheduleDAG.
Summary:
If we attempt to unfold an SUnit in ScheduleDAG that results in
finding an already scheduled load, we must should abort the
unfold as it will not improve scheduling.

This fixes PR32610.

Reviewers: jmolloy, sunfish, bogner, spatel

Subscribers: llvm-commits, MatzeB

Differential Revision: https://reviews.llvm.org/D32911

llvm-svn: 304321
2017-05-31 18:43:17 +00:00
Nirav Dave 7c70fddba6 [DAG] Avoid use of stale store.
Correct references to alignment of store which may be deleted in a
previous iteration of merge. Instead use first store that would be
merged.

Corrects pr33172's use-after-poison caught by ASan.

Reviewers: spatel, hfinkel, RKSimon

Reviewed By: RKSimon

Subscribers: thegameg, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33686

llvm-svn: 304299
2017-05-31 13:36:17 +00:00
Craig Topper 5fd588be34 [SelectionDAG] Remove special case for ISD::FPOWI from the strict FP intrinsic handling.
This code was compensating for FPOWI defaulting to Legal and many targets not changing it to Expand. This was fixed in r304215 to default to Expand so this special handling should no longer be necessary.

llvm-svn: 304221
2017-05-30 17:12:18 +00:00
Craig Topper f6d4dc5b4a [SelectionDAG] Set ISD::FPOWI to Expand by default
Summary:
Currently FPOWI defaults to Legal and LegalizeDAG.cpp turns Legal into Expand for this opcode because Legal is a "lie".

This patch changes the default for this opcode to Expand and removes the hack from LegalizeDAG.cpp. It also removes all the code in the targets that set this opcode to Expand themselves since they can just rely on the default.

Reviewers: spatel, RKSimon, efriedma

Reviewed By: RKSimon

Subscribers: jfb, dschuff, sbc100, jgravelle-google, nemanjai, javed.absar, andrew.w.kaylor, llvm-commits

Differential Revision: https://reviews.llvm.org/D33530

llvm-svn: 304215
2017-05-30 15:27:55 +00:00
Sanjay Patel 51152a3727 [DAGCombiner] fix load narrowing transform to exclude loads with extension
The extending load possibility was missed in:
https://reviews.llvm.org/rL304072

We might want to handle this cases as a follow-up, but bailing out for now
to avoid miscompiling.

llvm-svn: 304153
2017-05-29 13:24:58 +00:00
Sanjay Patel 33f4a97287 [DAGCombiner] use narrow load to avoid vector extract
If we have (extract_subvector(load wide vector)) with no other users, 
that can just be (load narrow vector). This is intentionally conservative.
Follow-ups may loosen the one-use constraint to account for the extract cost
or just remove the one-use check.

The memop chain updating is based on code that already exists multiple times
in x86 lowering, so that should be pulled into a helper function as a follow-up.

Background: this is a potential improvement noticed via regressions caused by
making x86's peekThroughBitcasts() not loop on consecutive bitcasts (see 
comments in D33137).

Differential Revision: https://reviews.llvm.org/D33578

llvm-svn: 304072
2017-05-27 14:07:03 +00:00
Benjamin Kramer debb3c35e0 Make helper functions static. NFC.
llvm-svn: 304029
2017-05-26 20:09:00 +00:00
Sanjay Patel ec13ebf2c8 [DAGCombiner] use narrow vector ops to eliminate concat/extract (PR32790)
In the best case:
extract (binop (concat X1, X2), (concat Y1, Y2)), N --> binop XN, YN
...we kill all of the extract/concat and just have narrow binops remaining.

If only one of the binop operands is amenable, this transform is still
worthwhile because we kill some of the extract/concat.

Optional bitcasting makes the code more complicated, but there doesn't
seem to be a way to avoid that.

The TODO about extending to more than bitwise logic is there because we really
will regress several x86 tests including madd, psad, and even a plain
integer-multiply-by-2 or shift-left-by-1. I don't think there's anything
fundamentally wrong with this patch that would cause those regressions; those
folds are just missing or brittle.

If we extend to more binops, I found that this patch will fire on at least one
non-x86 regression test. There's an ARM NEON test in
test/CodeGen/ARM/coalesce-subregs.ll with a pattern like:

            t5: v2f32 = vector_shuffle<0,3> t2, t4
          t6: v1i64 = bitcast t5
          t8: v1i64 = BUILD_VECTOR Constant:i64<0>
        t9: v2i64 = concat_vectors t6, t8
      t10: v4f32 = bitcast t9
    t12: v4f32 = fmul t11, t10
  t13: v2i64 = bitcast t12
t16: v1i64 = extract_subvector t13, Constant:i32<0>

There was no functional change in the codegen from this transform from what I
could see though.

For the x86 test changes:

1. PR32790() is the closest call. We don't reduce the AVX1 instruction count in that case,
   but we improve throughput. Also, on a core like Jaguar that double-pumps 256-bit ops,
   there's an unseen win because two 128-bit ops have the same cost as the wider 256-bit op.
   SSE/AVX2/AXV512 are not affected which is expected because only AVX1 has the extract/concat
   ops to match the pattern.
2. do_not_use_256bit_op() is the best case. Everyone wins by avoiding the concat/extract.
   Related bug for IR filed as: https://bugs.llvm.org/show_bug.cgi?id=33026
3. The SSE diffs in vector-trunc-math.ll are just scheduling/RA, so nothing real AFAICT.
4. The AVX1 diffs in vector-tzcnt-256.ll are all the same pattern: we reduced the instruction
   count by one in each case by eliminating two insert/extract while adding one narrower logic op.

https://bugs.llvm.org/show_bug.cgi?id=32790

Differential Revision: https://reviews.llvm.org/D33137

llvm-svn: 303997
2017-05-26 15:33:18 +00:00
Nirav Dave 689709c928 [DAG] Move legal type checks in store merge to be checked only
on non-legal cases. NFC.

llvm-svn: 303994
2017-05-26 14:37:27 +00:00
John Brawn 9009d2905d [ARM] Fix lowering of misaligned memcpy/memset
Currently getOptimalMemOpType returns i32 for large enough sizes without
checking for alignment, leading to poor code generation when misaligned accesses
aren't permitted as we generate a word store then later split it up into byte
stores. This means we inadvertantly go over the MaxStoresPerMemcpy limit and for
memset we splat the memset value into a word then immediately split it up
again.

Fix this by leaving it up to FindOptimalMemOpLowering to figure out which type
to use, but also fix a bug there where it wasn't correctly checking if
misaligned memory accesses are allowed.

Differential Revision: https://reviews.llvm.org/D33442

llvm-svn: 303990
2017-05-26 13:59:12 +00:00
Andrew Kaylor f466001eef Add constrained intrinsics for some libm-equivalent operations
Differential revision: https://reviews.llvm.org/D32319

llvm-svn: 303922
2017-05-25 21:31:00 +00:00
Adrian Prantl f062192632 Fix SelectionDAGBuilder::getDbgValue to not expect DW_OP_deref on FI vars
This fixes an oversight in r300522, which changed alloca
dbg.values to no longer emit a DW_OP_deref.

The array.ll testcase was regenerated from source.

Fixes PR33166:
https://bugs.llvm.org/show_bug.cgi?id=33166

llvm-svn: 303897
2017-05-25 18:54:10 +00:00
Nirav Dave 7a8717d216 [DAG] Prevent crashes when merging constant stores with high-bit set. NFC.
llvm-svn: 303802
2017-05-24 19:56:39 +00:00
Tim Northover 8c605c0eda Revert LLVM changes for "Sema: allow imaginary constants via GNU extension if UDL overloads not present."
The changes accidentally crept into a Clang commit I was making.

llvm-svn: 303697
2017-05-23 21:53:11 +00:00
Tim Northover 6b5eceac2e Sema: allow imaginary constants via GNU extension if UDL overloads not present.
C++14 added user-defined literal support for complex numbers so that you can
write something like "complex<double> val = 2i". However, there is an existing
GNU extension supporting this syntax and interpreting the result as a _Complex
type.

This changes parsing so that such literals are interpreted in terms of C++14's
operators if an overload is present but otherwise falls back to the original
GNU extension.

llvm-svn: 303694
2017-05-23 21:41:49 +00:00
Nirav Dave 6c910c0dd8 [DAG] Add AddressSpace parameter to canMergeStoresTo. NFC.
llvm-svn: 303673
2017-05-23 18:53:02 +00:00
Nirav Dave 3b4f7cc0b3 [DAG] Add canMergeStoresTo predicate checks. NFCI.
Propagate canMergeStoresTo checks to missing cases in StoreMerge.

llvm-svn: 303668
2017-05-23 18:33:09 +00:00
Craig Topper 7e0aeeb884 [KnownBits] Use !hasConflict() in asserts in place of Zero & One == 0 or similar. NFC
llvm-svn: 303614
2017-05-23 07:18:37 +00:00
Nirav Dave e00da22ef3 [DAG] Rework store merge to loop on load candidates. NFCI.
Continue to consider remaining candidate merges until all possible
merges have been considered.

llvm-svn: 303560
2017-05-22 15:33:47 +00:00
Matthias Braun 50ec0b5dce SimplifyLibCalls: Optimize wcslen
Refactor the strlen optimization code to work for both strlen and wcslen.

This especially helps with programs in the wild where people pass
L"string"s to const std::wstring& function parameters and the wstring
constructor gets inlined.

This also fixes a lingerind API problem/bug in getConstantStringInfo()
where zeroinitializers would always give you an empty string (without a
length) back regardless of the actual length of the initializer which
did not work well in the TrimAtNul==false causing the PR mentioned
below.

Note that the fixed getConstantStringInfo() needed fixes to SelectionDAG
memcpy lowering and may lead to some cases for out-of-bounds
zeroinitializer accesses not getting optimized anymore. So some code
with UB may produce out of bound memory reads now instead of just
producing zeros.

The refactoring "accidentally" fixes http://llvm.org/PR32124

Differential Revision: https://reviews.llvm.org/D32839

llvm-svn: 303461
2017-05-19 22:37:09 +00:00
Amaury Sechet 77cfb4a85f [DAGCombine] (addcarry 0, 0, X) -> (ext/trunc X)
Summary:
While this makes some case better and some case worse - so it's unclear if it is a worthy combine just by itself - this is a useful canonicalisation.

As per discussion in D32756 .

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32916

llvm-svn: 303441
2017-05-19 18:20:44 +00:00
Craig Topper 8a950275f7 [Statistics] Add a method to atomically update a statistic that contains a maximum
Summary:
There are several places in the codebase that try to calculate a maximum value in a Statistic object. We currently do this in one of two ways:

  MaxNumFoo = std::max(MaxNumFoo, NumFoo);

or

  MaxNumFoo = (MaxNumFoo > NumFoo) ? MaxNumFoo : NumFoo;

The first version reads from MaxNumFoo one time and uncontionally rwrites to it. The second version possibly reads it twice depending on the result of the first compare.  But we have no way of knowing if the value was changed by another thread between the reads and the writes.

This patch adds a method to the Statistic object that can ensure that we only store if our value is the max and the previous max didn't change after we read it. If it changed we'll recheck if our value should still be the max or not and try again.

This spawned from an audit I'm trying to do of all places we uses the implicit conversion to unsigned on the Statistics objects. See my previous thread on llvm-dev https://groups.google.com/forum/#!topic/llvm-dev/yfvxiorKrDQ

Reviewers: dberlin, chandlerc, hfinkel, dblaikie

Reviewed By: chandlerc

Subscribers: llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D33301

llvm-svn: 303318
2017-05-18 00:51:39 +00:00
Nirav Dave da8f221273 Elide stores which are overwritten without being observed.
Summary:
In SelectionDAG, when a store is immediately chained to another store
to the same address, elide the first store as it has no observable
effects. This is causes small improvements dealing with intrinsics
lowered to stores.

Test notes:

* Many testcases overwrite store addresses multiple times and needed
  minor changes, mainly making stores volatile to prevent the
  optimization from optimizing the test away.

* Many X86 test cases optimized out instructions associated with
  associated with va_start.

* Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has
  dependencies to check and can probably be removed and potentially
  replaced with another test.

Reviewers: rnk, john.brawn

Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33206

llvm-svn: 303198
2017-05-16 19:43:56 +00:00
Nirav Dave cfd357a61a [DAG] Prune deleted nodes in TokenFactor
Fix visitTokenFactor to correctly remove deleted nodes. NFC.

llvm-svn: 303181
2017-05-16 15:49:02 +00:00
Peter Collingbourne 6f0ecca3b5 IR: Give function GlobalValue::getRealLinkageName() a less misleading name: dropLLVMManglingEscape().
This function gives the wrong answer on some non-ELF platforms in some
cases. The function that does the right thing lives in Mangler.h. To try to
discourage people from using this function, give it a different name.

Differential Revision: https://reviews.llvm.org/D33162

llvm-svn: 303134
2017-05-16 00:39:01 +00:00
Simon Pilgrim 754c1618ec [SelectionDAG] Added support for EXTRACT_SUBVECTOR/CONCAT_VECTORS demandedelts in ComputeNumSignBits
llvm-svn: 302997
2017-05-13 22:10:58 +00:00
Simon Pilgrim 7666afd042 [SelectionDAG] Add VECTOR_SHUFFLE support to ComputeNumSignBits
llvm-svn: 302993
2017-05-13 19:57:10 +00:00
Craig Topper 9fe357971c [ValueTracking] Remove const_casts on several calls to computeKnownBits and ComputeSignBit. NFC
llvm-svn: 302991
2017-05-13 17:22:16 +00:00
Tim Shen 10c64e6aea [PPC] Move the combine "a << (b % (sizeof(a) * 8)) -> (PPCshl a, b)" to the backend. NFC.
Summary:
Eli pointed out that it's unsafe to combine the shifts to ISD::SHL etc.,
because those are not defined for b > sizeof(a) * 8, even after some of
the combiners run.

However, PPCISD::SHL defines that behavior (as the instructions themselves).
Move the combination to the backend.

The tests in shift_mask.ll still pass.

Reviewers: echristo, hfinkel, efriedma, iteratee

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D33076

llvm-svn: 302937
2017-05-12 19:25:37 +00:00
Craig Topper 8df66c602a [KnownBits] Add bit counting methods to KnownBits struct and use them where possible
This patch adds min/max population count, leading/trailing zero/one bit counting methods.

The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give.

Differential Revision: https://reviews.llvm.org/D32931

llvm-svn: 302925
2017-05-12 17:20:30 +00:00
Simon Pilgrim eabf6fc4b5 [DAGCombine] Use SelectionDAG::getAnyExtOrTrunc helper. NFCI.
llvm-svn: 302907
2017-05-12 15:26:50 +00:00
Simon Pilgrim f01c301f72 [DAGCombine] Use SelectionDAG::getZExtOrTrunc helper. NFCI.
llvm-svn: 302897
2017-05-12 13:22:12 +00:00
Simon Pilgrim a6ed1b2f12 Use SDValue::getOperand() helper. NFCI.
llvm-svn: 302896
2017-05-12 13:20:24 +00:00
Vadzim Dambrouski 38e30197c3 [MSP430] Generate EABI-compliant libcalls
Updates the MSP430 target to generate EABI-compatible libcall names.
As a byproduct, adjusts the hardware multiplier options available in
the MSP430 target, adds support for promotion of the ISD::MUL operation
for 8-bit integers, and correctly marks R11 as used by call instructions.

Patch by Andrew Wygle.

Differential Revision: https://reviews.llvm.org/D32676

llvm-svn: 302820
2017-05-11 19:56:14 +00:00
Simon Pilgrim 6faddcbd07 [DAGCombine] Use SelectionDAG::getAnyExtOrTrunc helper. NFCI.
llvm-svn: 302808
2017-05-11 16:40:44 +00:00
Simon Pilgrim a4a13a0da0 Strip trailing whitespace. NFCI.
llvm-svn: 302784
2017-05-11 10:03:05 +00:00
David L. Jones bbd97d273b Revert "[SDAG] Relax conditions under stores of loaded values can be merged"
This reverts r302712.

The change fails with ASAN enabled:

ERROR: AddressSanitizer: use-after-poison on address ... at ...
READ of size 2 at ... thread T0
  #0 ... in llvm::SDNode::getNumValues() const <snip>/include/llvm/CodeGen/SelectionDAGNodes.h:855:42
  #1 ... in llvm::SDNode::hasAnyUseOfValue(unsigned int) const <snip>/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7270:3
  #2 ... in llvm::SDValue::use_empty() const <snip> include/llvm/CodeGen/SelectionDAGNodes.h:1042:17
  #3 ... in (anonymous namespace)::DAGCombiner::MergeConsecutiveStores(llvm::StoreSDNode*) <snip>/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:12944:7

Reviewers: niravd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33081

llvm-svn: 302746
2017-05-10 23:56:21 +00:00
Nirav Dave a38c049fc5 [SDAG] Relax conditions under stores of loaded values can be merged
Summary:

Allow consecutive stores whose values come from consecutive loads to
merged in the presense of other uses of the loads. Previously this was
disallowed as in general the merged load cannot be shared with the
other uses. Merging N stores into 1 may cause as many as N redundant
loads. However in the context of caching this should have neglible
affect on memory pressure and reduce instruction count making it
almost always a win.

Fixes PR32086.

Reviewers: spatel, jyknight, andreadb, hfinkel, efriedma

Reviewed By: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30471

llvm-svn: 302712
2017-05-10 19:53:41 +00:00
Amaury Sechet 197685c6d8 Small refactoring in DAGCombine. NFC
llvm-svn: 302699
2017-05-10 17:58:28 +00:00
Simon Pilgrim cd4d913336 [DAGCombiner] Dropped explicit (sra 0, x) -> 0 and (sra -1, x) -> 0 folds.
These are both handled (and tested) by the earlier ComputeNumSignBits == EltSizeInBits fold.

llvm-svn: 302651
2017-05-10 13:06:26 +00:00
Simon Pilgrim c29af824bf [DAGCombiner] Add vector support to fold (shl/srl 0, x) -> 0
llvm-svn: 302641
2017-05-10 12:34:27 +00:00
Ahmed Bougacha 604526fe87 [CodeGen] Don't require AA in SDAGISel at -O0.
Before r247167, the pass manager builder controlled which AA
implementations were used, exporting them all in the AliasAnalysis
analysis group.

Now, AAResultsWrapperPass always uses BasicAA, but still uses other AA
implementations if made available in the pass pipeline.

But regardless, SDAGISel is required at O0, and really doesn't need to
be doing fancy optimizations based on useful AA results.

Don't require AA at CodeGenOpt::None, and only use it otherwise.

This does have a functional impact (and one testcase is pessimized
because we can't reuse a load).  But I think that's desirable no matter
what.

Note that this alone doesn't result in less DT computations: TwoAddress
was previously able to reuse the DT we computed for SDAG.  That will be
fixed separately.

Differential Revision: https://reviews.llvm.org/D32766

llvm-svn: 302611
2017-05-10 00:39:30 +00:00
Zvi Rackover b483e28c77 DAGCombine: Combine shuffles of splat-shuffles
Summary: Reapply r299047, but this time handle correctly splat-masks with undef elements.

Reviewers: spatel, RKSimon, eli.friedman, andreadb

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31961

llvm-svn: 302583
2017-05-09 20:25:38 +00:00
Reid Kleckner 3a363fff7e Re-land "Use the frame index side table for byval and inalloca arguments"
This re-lands r302483. It was not the cause of PR32977.

llvm-svn: 302544
2017-05-09 16:02:20 +00:00
Reid Kleckner 84075fddff Re-land "Don't add DBG_VALUE instructions for static allocas in dbg.declare"
This re-lands commit r302461. It was not the cause of PR32977.

llvm-svn: 302543
2017-05-09 16:01:47 +00:00
Serge Pavlov d526b13e61 Add extra operand to CALLSEQ_START to keep frame part set up previously
Using arguments with attribute inalloca creates problems for verification
of machine representation. This attribute instructs the backend that the
argument is prepared in stack prior to  CALLSEQ_START..CALLSEQ_END
sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size
stored in CALLSEQ_START in this case does not count the size of this
argument. However CALLSEQ_END still keeps total frame size, as caller can
be responsible for cleanup of entire frame. So CALLSEQ_START and
CALLSEQ_END keep different frame size and the difference is treated by
MachineVerifier as stack error. Currently there is no way to distinguish
this case from actual errors.

This patch adds additional argument to CALLSEQ_START and its
target-specific counterparts to keep size of stack that is set up prior to
the call frame sequence. This argument allows MachineVerifier to calculate
actual frame size associated with frame setup instruction and correctly
process the case of inalloca arguments.

The changes made by the patch are:
- Frame setup instructions get the second mandatory argument. It
  affects all targets that use frame pseudo instructions and touched many
  files although the changes are uniform.
- Access to frame properties are implemented using special instructions
  rather than calls getOperand(N).getImm(). For X86 and ARM such
  replacement was made previously.
- Changes that reflect appearance of additional argument of frame setup
  instruction. These involve proper instruction initialization and
  methods that access instruction arguments.
- MachineVerifier retrieves frame size using method, which reports sum of
  frame parts initialized inside frame instruction pair and outside it.

The patch implements approach proposed by Quentin Colombet in
https://bugs.llvm.org/show_bug.cgi?id=27481#c1.
It fixes 9 tests failed with machine verifier enabled and listed
in PR27481.

Differential Revision: https://reviews.llvm.org/D32394

llvm-svn: 302527
2017-05-09 13:35:13 +00:00
Amara Emerson cf9daa33a7 Introduce experimental generic intrinsics for horizontal vector reductions.
- This change allows targets to opt-in to using them instead of the log2
  shufflevector algorithm.
- The SLP and Loop vectorizers have the common code to do shuffle reductions
  factored out into LoopUtils, and now have a unified interface for generating
  reductions regardless of the preference of the target. LoopUtils now uses TTI
  to determine what kind of reductions the target wants to handle.
- For CodeGen, basic legalization support is added.

Differential Revision: https://reviews.llvm.org/D30086

llvm-svn: 302514
2017-05-09 10:43:25 +00:00
Reid Kleckner 41bb94233b Revert "Don't add DBG_VALUE instructions for static allocas in dbg.declare"
This reverts commit r302461.

It appears to be causing failures compiling gtest with debug info on the
Linux sanitizer bot. I was unable to reproduce the failure locally,
however.

llvm-svn: 302504
2017-05-09 01:57:44 +00:00
Reid Kleckner 9f29914d40 Revert "Use the frame index side table for byval and inalloca arguments"
This reverts r302483 and it's follow up fix.

llvm-svn: 302493
2017-05-09 01:14:39 +00:00
Reid Kleckner 45efcf0c96 Use the frame index side table for byval and inalloca arguments
Summary:
For inalloca functions, this is a very common code pattern:

  %argpack = type <{ i32, i32, i32 }>
  define void @f(%argpack* inalloca %args) {
  entry:
    %a = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 0
    %b = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 1
    %c = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 2
    tail call void @llvm.dbg.declare(metadata i32* %a, ... "a")
    tail call void @llvm.dbg.declare(metadata i32* %c, ... "b")
    tail call void @llvm.dbg.declare(metadata i32* %b, ... "c")

Even though these GEPs can be simplified to a constant offset from EBP
or RSP, we don't do that at -O0, and each GEP is computed into a
register. Registers used to compute argument addresses are typically
spilled and clobbered very quickly after the initial computation, so
live debug variable tracking loses information very quickly if we use
DBG_VALUE instructions.

This change moves processing of dbg.declare between argument lowering
and basic block isel, so that we can ask if an argument has a frame
index or not. If the argument lives in a register as is the case for
byval arguments on some targets, then we don't put it in the side table
and during ISel we emit DBG_VALUE instructions.

Reviewers: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32980

llvm-svn: 302483
2017-05-08 23:20:27 +00:00
Reid Kleckner bf828eedb4 Don't add DBG_VALUE instructions for static allocas in dbg.declare
Summary:
An llvm.dbg.declare of a static alloca is always added to the
MachineFunction dbg variable map, so these values are entirely
redundant. They survive all the way through codegen to be ignored by
DWARF emission.

Effectively revert r113967

Two bugpoint-reduced test cases from 2012 broke as a result of this
change. Despite my best efforts, I haven't been able to rewrite the test
case using dbg.value. I'm not too concerned about the lost coverage
because these were reduced from the test-suite, which we still run.

Reviewers: aprantl, dblaikie

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32920

llvm-svn: 302461
2017-05-08 19:58:15 +00:00
Dean Michael Berris 9bcaed867a [XRay] Custom event logging intrinsic
This patch introduces an LLVM intrinsic and a target opcode for custom event
logging in XRay. Initially, its use case will be to allow users of XRay to log
some type of string ("poor man's printf"). The target opcode compiles to a noop
sled large enough to enable calling through to a runtime-determined relative
function call. At runtime, when X-Ray is enabled, the sled is replaced by
compiler-rt with a trampoline to the logic for creating the custom log entries.

Future patches will implement the compiler-rt parts and clang-side support for
emitting the IR corresponding to this intrinsic.

Reviewers: timshen, dberris

Subscribers: igorb, pelikan, rSerge, timshen, echristo, dberris, llvm-commits

Differential Revision: https://reviews.llvm.org/D27503

llvm-svn: 302405
2017-05-08 05:45:21 +00:00
Simon Pilgrim 2c15447f99 [DAGCombiner] If ISD::ABS is legal/custom, use it directly instead of canonicalizing first.
Remove an extra canonicalization step if ISD::ABS is going to be used anyway.

Updated x86 abs combine to check that we are lowering from both canonicalizations.

llvm-svn: 302337
2017-05-06 13:44:42 +00:00
Reid Kleckner ac1a97b32f Simplify dbg.value handling in SDISel with early returns
No functional change other than improving dbgs logging accuracy on
constant dbg values. Previously we would add things like "i32 42" as
debug values, and then log that we were dropping the debug info, which
is silly.

Delete some dead code that was checking for static allocas. This
remained after r207165, but served no purpose. Currently, static alloca
dbg.values are always sent through the DanglingDebugInfoMap, and are
usually made valid the first time the alloca is used.

llvm-svn: 302267
2017-05-05 18:30:34 +00:00
Craig Topper f0aeee01c3 [KnownBits] Add wrapper methods for setting and clear all bits in the underlying APInts in KnownBits.
This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown.

Differential Revision: https://reviews.llvm.org/D32637

llvm-svn: 302262
2017-05-05 17:36:09 +00:00
Chad Rosier 84a238dd62 [DAGCombine] Transform (fadd A, (fmul B, -2.0)) -> (fsub A, (fadd B, B)).
Differential Revision: http://reviews.llvm.org/D32596

llvm-svn: 302153
2017-05-04 14:14:44 +00:00
Krzysztof Parzyszek 41b6e14dc5 Refactoring with range-based for, NFC
Patch by Wei-Ren Chen.

Differential Revision: https://reviews.llvm.org/D32682

llvm-svn: 302148
2017-05-04 13:35:17 +00:00
Craig Topper d4d09fd73d [SelectionDAG] Improve known bits support for CTPOP.
This is based on the same concept from ValueTracking's version of computeKnownBits.

llvm-svn: 302110
2017-05-04 04:33:27 +00:00
Craig Topper d938fd1397 [KnownBits] Add zext, sext, and trunc methods to KnownBits
This patch adds zext, sext, and trunc methods to KnownBits and uses them where possible.

Differential Revision: https://reviews.llvm.org/D32784

llvm-svn: 302088
2017-05-03 22:07:25 +00:00
Sanjay Patel e1cf61c69f [TargetLowering] use isSubsetOf in SimplifyDemandedBits; NFCI
This is the DAG equivalent of https://reviews.llvm.org/D32255 , 
which will hopefully be committed again. The functionality
(preferring a 'not' op) is already here in the DAG, so this is
just intended to be a clean-up and performance improvement.

llvm-svn: 302087
2017-05-03 21:55:34 +00:00
Amaury Sechet 666c705953 [DAGCombine] (addcarry (add|uaddo X, Y), 0, Carry) -> (addcarry X, Y, Carry)
Summary: Do the transform when the carry isn't used. It's a pattern exposed when legalizing large integers.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32755

llvm-svn: 302047
2017-05-03 16:28:10 +00:00
Tim Shen e59d06fe78 [PowerPC, DAGCombiner] Fold a << (b % (sizeof(a) * 8)) back to a single instruction
Summary:
This is the corresponding llvm change to D28037 to ensure no performance
regression.

Reviewers: bogner, kbarton, hfinkel, iteratee, echristo

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28329

llvm-svn: 301990
2017-05-03 00:07:02 +00:00
Amaury Sechet 106a7eab84 [DAGCombine] (uaddo X, (addcarry Y, 0, Carry)) -> (addcarry X, Y, Carry)
Summary: This is a common pattern that arise when legalizing large integers operations. Only do it when Y + 1 cannot overflow as this would change the carry behavior of uaddo .

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32687

llvm-svn: 301922
2017-05-02 14:15:48 +00:00
Amaury Sechet 153911f71d [DAGCombine] (add X, (addcarry Y, 0, Carry)) -> (addcarry X, Y, Carry)
Summary: Common pattern when legalizing large integers operations. Similar to D32687, when the carry isn't used.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Differential Revision: https://reviews.llvm.org/D32738

llvm-svn: 301919
2017-05-02 13:34:25 +00:00
Simon Pilgrim 89ad89cc73 [SelectionDAG] Improve support for promotion of <1 x fX> floating point argument types (PR31088)
PR31088 demonstrated that we were assuming that only integers require promotion from <1 x iX> types, when in fact float types may require it as well - in this case half floats.

This patch adds support for extension/truncation for both integer and float types.

Differential Revision: https://reviews.llvm.org/D32391

llvm-svn: 301910
2017-05-02 10:33:08 +00:00
Simon Pilgrim 8deb87a6c0 [DAGCombiner] Improve MatchBswapHword logic (PR31357)
The existing code only looks at half of the tree when matching bswap + rol patterns ending in an OR tree (as opposed to a cascade).

Patch originally introduced by Jim Lewis.

Submitted on the behalf of Dinar Temirbulatov.

Differential Revision: https://reviews.llvm.org/D32039

llvm-svn: 301907
2017-05-02 10:16:19 +00:00
Craig Topper 6b1b630a98 [SelectionDAG] Use known ones to provide a better bound for the known zeros for CTTZ/CTLZ operations.
This is the SelectionDAG version of D32521. If know where at least one 1 is located in the input to these intrinsics we can place an upper bound on the number of bits needed to represent the count and thus increase the number of known zeros in the output.

I think we can also refine this further for CTTZ_UNDEF/CTLZ_UNDEF by assuming that the answer will never be BitWidth. I've left this out for now because it caused other test failures across multiple targets. Usually because of turning ADD into OR based on this new information.

I'll fix CTPOP in a future patch.

Differential Revision: https://reviews.llvm.org/D32692

llvm-svn: 301806
2017-05-01 16:08:06 +00:00
Amara Emerson d28f0cd448 Generalize the specialized flag-carrying SDNodes by moving flags into SDNode.
This removes BinaryWithFlagsSDNode, and flags are now all passed by value.

Differential Revision: https://reviews.llvm.org/D32527

llvm-svn: 301803
2017-05-01 15:17:51 +00:00
Sanjay Patel ad13826aea [DAGCombiner] shrink/widen a vselect to match its condition operand size (PR14657)
We discussed shrinking/widening of selects in IR in D26556, and I'll try to get back to that
patch eventually. But I'm hoping that this transform is less iffy in the DAG where we can check
legality of the select that we want to produce.

A few things to note:

1. We can't wait until after legalization and do this generically because (at least in the x86
   tests from PR14657), we'll have PACKSS and bitcasts in the pattern.
2. This might benefit more of the SSE codegen if we lifted the legal-or-custom requirement, but
   that requires a closer look to make sure we don't end up worse.
3. There's a 'vblendv' opportunity that we're missing that results in andn/and/or in some cases. 
   That should be fixed next.
4. I'm assuming that AVX1 offers the worst of all worlds wrt uneven ISA support with multiple 
   legal vector sizes, but if there are other targets like that, we should add more tests.
5. There's a codegen miracle in the multi-BB tests from PR14657 (the gcc auto-vectorization tests):
   despite IR that is terrible for the target, this patch allows us to generate the optimal loop
   code because something post-ISEL is hoisting the splat extends above the vector loops.

Differential Revision: https://reviews.llvm.org/D32620

llvm-svn: 301781
2017-04-30 22:44:51 +00:00
Amaury Sechet 8ac81f3924 Do not legalize large add with addc/adde, introduce addcarry and do it with uaddo/addcarry
Summary: As per discution on how to get better codegen an large int legalization, it became clear that using a glue for the carry was preventing several desirable optimizations. Passing the carry down as a value allow for more flexibility.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29872

llvm-svn: 301775
2017-04-30 19:24:09 +00:00
Craig Topper 778f57b4f1 [APInt] Replace calls to setBits with more specific calls to setBitsFrom and setLowBits where possible.
llvm-svn: 301768
2017-04-30 07:44:58 +00:00
Craig Topper ca48af3c87 [KnownBits] Add methods for determining if the known bits represent a negative/nonnegative number and add methods for changing the negative/nonnegative state
Summary: This patch adds isNegative, isNonNegative for querying whether the sign bit is known. It also adds makeNegative and makeNonNegative for controlling the sign bit.

Reviewers: RKSimon, spatel, davide

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32651

llvm-svn: 301747
2017-04-29 16:43:11 +00:00
Reid Kleckner 859f8b544a Make getParamAlignment use argument numbers
The method is called "get *Param* Alignment", and is only used for
return values exactly once, so it should take argument indices, not
attribute indices.

Avoids confusing code like:
  IsSwiftError = CS->paramHasAttr(ArgIdx, Attribute::SwiftError);
  Alignment  = CS->getParamAlignment(ArgIdx + 1);

Add getRetAlignment to handle the one case in Value.cpp that wants the
return value alignment.

This is a potentially breaking change for out-of-tree backends that do
their own call lowering.

llvm-svn: 301682
2017-04-28 20:34:27 +00:00
Matthias Braun 744c215e29 TargetLowering: Add finalizeLowering() function; NFC
Adds a new method finalizeLowering to TargetLoweringBase. This is in
preparation for an upcoming commit.

This function is meant for target specific adjustments to
MachineFrameInfo or register reservations.

Move the freezeRegisters() and the hasCopyImplyingStackAdjustment()
handling into the new function to prove the concept. As an added bonus
GlobalISel no longer missed the hasCopyImplyingStackAdjustment()
handling with this.

Differential Revision: https://reviews.llvm.org/D32621

llvm-svn: 301679
2017-04-28 20:25:05 +00:00
Reid Kleckner 6652a52e2b Use Argument::hasAttribute and AttributeList::ReturnIndex more
This eliminates many extra 'Idx' induction variables in loops over
arguments in CodeGen/ and Target/. It also reduces the number of places
where we assume that ReturnIndex is 0 and that we should add one to
argument numbers to get the corresponding attribute list index.

NFC

llvm-svn: 301666
2017-04-28 18:37:16 +00:00
Craig Topper 24db6b800f [APInt] Add clearSignBit method. Use it and setSignBit in a few places. NFCI
llvm-svn: 301656
2017-04-28 16:58:05 +00:00
Jun Bum Lim 919f9e8d65 [InlineCost] Improve the cost heuristic for Switch
Summary:
The motivation example is like below which has 13 cases but only 2 distinct targets

```
lor.lhs.false2:                                   ; preds = %if.then
  switch i32 %Status, label %if.then27 [
    i32 -7012, label %if.end35
    i32 -10008, label %if.end35
    i32 -10016, label %if.end35
    i32 15000, label %if.end35
    i32 14013, label %if.end35
    i32 10114, label %if.end35
    i32 10107, label %if.end35
    i32 10105, label %if.end35
    i32 10013, label %if.end35
    i32 10011, label %if.end35
    i32 7008, label %if.end35
    i32 7007, label %if.end35
    i32 5002, label %if.end35
  ]
```
which is compiled into a balanced binary tree like this on AArch64 (similar on X86)

```
.LBB853_9:                              // %lor.lhs.false2
        mov     w8, #10012
        cmp             w19, w8
        b.gt    .LBB853_14
// BB#10:                               // %lor.lhs.false2
        mov     w8, #5001
        cmp             w19, w8
        b.gt    .LBB853_18
// BB#11:                               // %lor.lhs.false2
        mov     w8, #-10016
        cmp             w19, w8
        b.eq    .LBB853_23
// BB#12:                               // %lor.lhs.false2
        mov     w8, #-10008
        cmp             w19, w8
        b.eq    .LBB853_23
// BB#13:                               // %lor.lhs.false2
        mov     w8, #-7012
        cmp             w19, w8
        b.eq    .LBB853_23
        b       .LBB853_3
.LBB853_14:                             // %lor.lhs.false2
        mov     w8, #14012
        cmp             w19, w8
        b.gt    .LBB853_21
// BB#15:                               // %lor.lhs.false2
        mov     w8, #-10105
        add             w8, w19, w8
        cmp             w8, #9          // =9
        b.hi    .LBB853_17
// BB#16:                               // %lor.lhs.false2
        orr     w9, wzr, #0x1
        lsl     w8, w9, w8
        mov     w9, #517
        and             w8, w8, w9
        cbnz    w8, .LBB853_23
.LBB853_17:                             // %lor.lhs.false2
        mov     w8, #10013
        cmp             w19, w8
        b.eq    .LBB853_23
        b       .LBB853_3
.LBB853_18:                             // %lor.lhs.false2
        mov     w8, #-7007
        add             w8, w19, w8
        cmp             w8, #2          // =2
        b.lo    .LBB853_23
// BB#19:                               // %lor.lhs.false2
        mov     w8, #5002
        cmp             w19, w8
        b.eq    .LBB853_23
// BB#20:                               // %lor.lhs.false2
        mov     w8, #10011
        cmp             w19, w8
        b.eq    .LBB853_23
        b       .LBB853_3
.LBB853_21:                             // %lor.lhs.false2
        mov     w8, #14013
        cmp             w19, w8
        b.eq    .LBB853_23
// BB#22:                               // %lor.lhs.false2
        mov     w8, #15000
        cmp             w19, w8
        b.ne    .LBB853_3
```
However, the inline cost model estimates the cost to be linear with the number
of distinct targets and the cost of the above switch is just 2 InstrCosts.
The function containing this switch is then inlined about 900 times.

This change use the general way of switch lowering for the inline heuristic. It
etimate the number of case clusters with the suitability check for a jump table
or bit test. Considering the binary search tree built for the clusters, this
change modifies the model to be linear with the size of the balanced binary
tree. The model is off by default for now :
  -inline-generic-switch-cost=false

This change was originally proposed by Haicheng in D29870.

Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier

Reviewed By: hans

Subscribers: joerg, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D31085

llvm-svn: 301649
2017-04-28 16:04:03 +00:00
Simon Pilgrim 7ae9419dc0 [DAGCombiner] Add ComputeNumSignBits vector demanded elements support to ASHR and INSERT_VECTOR_ELT (reapplied)
Reapplied r299221 after fix for nondeterminism in ThinLTO builder (rL301599), with extra check for implicit truncation of inserted element.

llvm-svn: 301644
2017-04-28 13:21:18 +00:00
Craig Topper f42b23f7d8 [ValueTracking] Convert computeKnownBitsFromRangeMetadata to use KnownBits struct.
llvm-svn: 301626
2017-04-28 06:28:56 +00:00
Craig Topper d0af7e8ab8 [SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits
This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently.

This is largely a mechanical transformation from KnownZero to Known.Zero.

Differential Revision: https://reviews.llvm.org/D32569

llvm-svn: 301620
2017-04-28 05:31:46 +00:00
Craig Topper 0e03e74e95 [SelectionDAG] Use various APInt methods to reduce temporary APInt creation
This patch uses various APInt methods to reduce the number of temporary APInts. These were all found while working through converting SelectionDAG's computeKnownBits to also use the KnownBits struct recently added to the ValueTracking version.

llvm-svn: 301618
2017-04-28 04:57:59 +00:00
Craig Topper 24e71017aa [APInt] Use inplace shift methods where possible. NFCI
llvm-svn: 301612
2017-04-28 03:36:24 +00:00
Sanjoy Das 40c32dd9a0 Use a pointer type for target frame indices during statepoint lowering
Summary:
The type of the target frame index is intptr, not the type of the value we're
going to store into it.  Without this change we crash in the attached test case
when trying to type-legalize a TargetFrameIndex.

Patchpoint lowering types the target frame index as intptr as well.

Reviewers: reames, bogner, arsenm

Subscribers: arsenm, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D32256

llvm-svn: 301566
2017-04-27 17:17:16 +00:00
Sanjay Patel a0547c3d9f [DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X)
Besides better codegen, the motivation is to be able to canonicalize this pattern 
in IR (currently we don't) knowing that the backend is prepared for that.

This may also allow removing code for special constant cases in 
DAGCombiner::foldSelectOfConstants() that was added in D30180.

Differential Revision: https://reviews.llvm.org/D31944

llvm-svn: 301457
2017-04-26 20:26:46 +00:00
Craig Topper b45eabcf82 [ValueTracking] Introduce a KnownBits struct to wrap the two APInts for computeKnownBits
This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit.

Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch.

I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases.

Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero|One) so we don't write it out everywhere. Maybe a method for (Zero|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with.

Differential Revision: https://reviews.llvm.org/D32376

llvm-svn: 301432
2017-04-26 16:39:58 +00:00
Sanjay Patel e2ec05a62a [TargetLowering] fix isConstTrueVal to account for build vector truncation
Build vectors have magical truncation powers, so we have things like this:

v4i1 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1>
v4i16 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1>

If we don't truncate the splat node returned by getConstantSplatNode(), then we won't find 
truth when ZeroOrNegativeOneBooleanContent is the rule.

Differential Revision: https://reviews.llvm.org/D32505

llvm-svn: 301408
2017-04-26 14:05:42 +00:00
Ranjeet Singh acbd4e141f Fix signed multiplication with overflow fallback.
For targets that don't have ISD::MULHS or ISD::SMUL_LOHI for the type
and the double width type is illegal, then the two operands are
sign extended to twice their size then multiplied to check for overflow.
The extended upper halves were mismatched causing an incorrect result.
This fixes the mismatch.

A test was added for ARM V6-M where the bug was detected.

Patch by James Duley.

Differential Revision: https://reviews.llvm.org/D31807

llvm-svn: 301404
2017-04-26 13:41:43 +00:00
Sanjay Patel a4b4e9388c [DAG] add FIXME comments for splat detection; NFC
llvm-svn: 301403
2017-04-26 13:27:57 +00:00
Sanjay Patel 7a8317c09a [DAG] fix formatting of isConstantSplat(); NFC
llvm-svn: 301366
2017-04-25 23:33:28 +00:00
Simon Pilgrim 8264ed7075 [DAGCombiner] Refactor to make it easy to add support for vectors in a future patch. NFCI.
llvm-svn: 301320
2017-04-25 16:16:03 +00:00
Simon Pilgrim 37ef04ad1f [SelectionDAG] Use getBuildVector helper where possible. NFCI
llvm-svn: 301314
2017-04-25 15:10:47 +00:00
Simon Pilgrim 986d73cc1d [SelectionDAG] Pull out repeated getValueType calls. NFCI.
Noticed in D32391.

llvm-svn: 301308
2017-04-25 13:39:07 +00:00
Simon Pilgrim 7d65b66962 [DAGCombiner] Add vector support for (srl (trunc (srl x, c1)), c2) combine.
llvm-svn: 301305
2017-04-25 12:40:45 +00:00
Simon Pilgrim ab0446332e [SelectionDAG] Recognise splat vector isKnownToBeAPowerOfTwo one/sign bit shift cases.
llvm-svn: 301303
2017-04-25 12:29:07 +00:00
Simon Pilgrim 96611aa30c [DAGCombiner] Use SDValue::getConstantOperandVal helper where possible. NFCI.
llvm-svn: 301300
2017-04-25 10:47:35 +00:00
Simon Pilgrim 93da6660a2 [DAGCombiner] Use APInt::intersects to avoid tmp variable. NFCI.
llvm-svn: 301258
2017-04-24 21:43:21 +00:00
Krzysztof Parzyszek c8e8e2a046 Move value type list from TargetRegisterClass to TargetRegisterInfo
Differential Revision: https://reviews.llvm.org/D31937

llvm-svn: 301234
2017-04-24 19:51:12 +00:00
Krzysztof Parzyszek 98ab4c64c4 Revert r301231: Accidentally committed stale files
I forgot to commit local changes before commit.

llvm-svn: 301232
2017-04-24 19:48:51 +00:00
Krzysztof Parzyszek c0197066d7 Move value type list from TargetRegisterClass to TargetRegisterInfo
Differential Revision: https://reviews.llvm.org/D31937

llvm-svn: 301231
2017-04-24 19:43:45 +00:00
Yaxun Liu fd23a0c095 CodeGen: Add a hook for getFenceOperandTy
Currently the operand type for ATOMIC_FENCE assumes value type of a pointer in address space 0.
This is fine for most targets. However for amdgcn target, the size of pointer in address space 0
depends on triple environment. For amdgiz environment, it is 64 bit but for other environment it is
32 bit. On the other hand, amdgcn target expects 32 bit fence operands independent of the target
triple environment. Therefore a hook is need in target lowering for getting the fence operand type.

This patch has no effect on targets other than amdgcn.

Differential Revision: https://reviews.llvm.org/D32186

llvm-svn: 301215
2017-04-24 18:26:27 +00:00
Simon Pilgrim f60f57e6e8 [DAGCombiner] Updated bswap byte offset variable names to be more descriptive. NFC
As discussed on D32039, use MaskByteOffset to describe the variable and also pull out repeated getOpcode() calls.

llvm-svn: 301193
2017-04-24 17:05:14 +00:00
Nirav Dave c799f3a809 [SDAG] Teach Chain Analysis about BaseIndexOffset addressing.
While we use BaseIndexOffset in FindBetterNeighborChains to
appropriately realize they're almost the same address and should be
improved concurrently we do not use it in isAlias using the non-index
understanding FindBaseOffset instead. Adding a BaseIndexOffset check
in isAlias like should allow indexed stores to be merged.

FindBaseOffset to be excised in subsequent patch.

Reviewers: jyknight, aditya_nandakumar, bogner

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31987

llvm-svn: 301187
2017-04-24 15:37:20 +00:00
Renato Golin 4abfb3d741 Revert "[APInt] Fix a few places that use APInt::getRawData to operate within the normal API."
This reverts commit r301105, 4, 3 and 1, as a follow up of the previous
revert, which broke even more bots.

For reference:
Revert "[APInt] Use operator<<= where possible. NFC"
Revert "[APInt] Use operator<<= instead of shl where possible. NFC"
Revert "[APInt] Use ashInPlace where possible."

PR32754.

llvm-svn: 301111
2017-04-23 12:15:30 +00:00
Artyom Skrobov 53cf1897cc [ARM] ScheduleDAGRRList::DelayForLiveRegsBottomUp must consider OptionalDefs
Summary:
D30400 has enabled tADC and tSBC instructions to be unglued, thereby allowing CPSR to remain live between Thumb1 scheduling units.

Most Thumb1 instructions have an OptionalDef for CPSR; but the scheduler ignored the OptionalDefs, and could unwittingly insert a flag-setting instruction in between an ADDS and the corresponding ADC.

Reviewers: javed.absar, atrick, MatzeB, t.p.northover, jmolloy, rengolin

Reviewed By: javed.absar

Subscribers: rogfer01, efriedma, aemerson, rengolin, llvm-commits, MatzeB

Differential Revision: https://reviews.llvm.org/D31081

llvm-svn: 301106
2017-04-23 06:58:08 +00:00
Craig Topper 474e5de72d [APInt] Fix a few places that use APInt::getRawData to operate within the normal API.
getRawData exposes the internal type of the APInt class directly to its users. Ideally we wouldn't expose such an implementation detail.

This patch fixes a few of the easy cases by using truncate, extract, or a rotate.

llvm-svn: 301105
2017-04-23 06:41:11 +00:00
Craig Topper cdd5ae6676 [APInt] Use operator<<= where possible. NFC
llvm-svn: 301104
2017-04-23 05:43:02 +00:00
Craig Topper 5f68af0806 [APInt] Use operator<<= instead of shl where possible. NFC
llvm-svn: 301103
2017-04-23 05:18:31 +00:00
Craig Topper ae9672c96d [APInt] Use ashInPlace where possible.
llvm-svn: 301101
2017-04-23 03:45:59 +00:00
Akira Hatanaka 22e839f4b2 [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

This recommits r300932 and r300930, which was causing dag-combine to
loop forever. The problem was that optimizeLogicalImm was returning
true even when there was no change to the immediate node (which happened
when the immediate was all zeros or ones), which caused dag-combine to
push and pop the same node to the work list over and over again without
making any progress.

This commit fixes the bug by returning false early in optimizeLogicalImm
if the immediate is all zeros or ones. Also, it changes the code to
compare the immediate with 0 or Mask rather than calling
countPopulation.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

llvm-svn: 301019
2017-04-21 18:53:12 +00:00
Akira Hatanaka 78ccba6a20 Revert r300932 and r300930.
It seems that r300930 was creating an infinite loop in dag-combine when
compling the following file:

MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c

llvm-svn: 300940
2017-04-21 01:31:50 +00:00
Akira Hatanaka 19077aaee0 [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

This recommits r300913, which broke bots because I didn't fix a call to
ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of
TargetLoweringOpt and TargetLowering.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

llvm-svn: 300930
2017-04-21 00:05:16 +00:00
Akira Hatanaka 7b06cebe73 Revert "[AArch64] Improve code generation for logical instructions taking"
This reverts r300913.

This broke bots.

llvm-svn: 300916
2017-04-20 23:03:30 +00:00
Akira Hatanaka e327f09832 [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

llvm-svn: 300913
2017-04-20 22:47:56 +00:00
Benjamin Kramer 997fd5eeb4 [Recycler] Add asan/msan annotations.
This enables use after free and uninit memory checking for memory
returned by a recycler. SelectionDAG currently relies on the opcode of a
free'd node being ISD::DELETED_NODE, so poke a hole in the asan poison
for SDNode opcodes. This means that we won't find some issues, but only
in SDag.

llvm-svn: 300868
2017-04-20 18:29:37 +00:00
Yaxun Liu 5d977f8ed4 CodeGen: Let frame index value type match alloca addr space
Recently alloca address space has been added to data layout. Due to this
change, pointer returned by alloca may have different size as pointer in
address space 0.

However, currently the value type of frame index is assumed to be of the
same size as pointer in address space 0.

This patch fixes that.

Most targets assume alloca returning pointer in address space 0, which
is the default alloca address space. Therefore it is NFC for them.

AMDGCN target with amdgiz environment requires this change since it
assumes alloca returning pointer to addr space 5 and its size is 32,
which is different from the size of pointer in addr space 0 which is 64.

Differential Revision: https://reviews.llvm.org/D32021

llvm-svn: 300864
2017-04-20 18:15:34 +00:00
Sanjay Patel 13985cd111 [DAGCombiner] use more local variables in isAlias(); NFCI
llvm-svn: 300860
2017-04-20 18:02:27 +00:00
Craig Topper bcfd2d1789 [APInt] Rename getSignBit to getSignMask
getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask.

Differential Revision: https://reviews.llvm.org/D32108

llvm-svn: 300856
2017-04-20 16:56:25 +00:00
Sanjay Patel 2d0e88fb9b [DAGCombiner] fix variable names in isAlias(); NFCI
We started with zero-based params and switched to one-based locals...
Also, variables start with a capital and functions do not.

llvm-svn: 300854
2017-04-20 16:36:37 +00:00
Sanjay Patel b7701bc9af [DAGCombiner] give names to repeated calcs in isAlias(); NFCI
llvm-svn: 300850
2017-04-20 16:15:08 +00:00
Amara Emerson 23e79ec2b3 [MVT][SVE] Scalable vector MVTs (3/3)
Adds MVT::ElementCount to represent the length of a
vector which may be scalable, then adds helper functions
that work with it.

Patch by Graham Hunter.

Differential Revision: https://reviews.llvm.org/D32019

llvm-svn: 300842
2017-04-20 13:54:09 +00:00
Amara Emerson 5054782052 [MVT][SVE] Scalable vector MVTs (1/3)
This patch adds a few helper functions to obtain new vector
value types based on existing ones without needing to care
about whether they are scalable or not.

I've confined their use to a few common locations right now,
and targets that don't have scalable vectors should never
need to care about these.

Patch by Graham Hunter.

Differential Revision: https://reviews.llvm.org/D32017

llvm-svn: 300838
2017-04-20 13:08:17 +00:00
Craig Topper 9ce5ef9475 [SelectionDAG] Fix another place that was passing a large value to APInt::lshrInPlace.
llvm-svn: 300821
2017-04-20 04:55:01 +00:00
Craig Topper d3884b8402 [SelectionDAG] Use getActiveBits() and countTrailingZeros() to avoid creating temporary APInts with lshr and trunc. NFCI
llvm-svn: 300819
2017-04-20 04:23:43 +00:00
Craig Topper 4db0c69373 Recommit "[APInt] Add back the asserts that check that the APInt shift methods aren't called with values larger than BitWidth."
This includes a fix to clamp a right shift of larger than BitWidth in DAG combining.

llvm-svn: 300816
2017-04-20 03:49:18 +00:00
Galina Kistanova 2cc97d92ce Temporarily revert r299221 to fix nondeterminism in ThinLTO builder.
llvm-svn: 300783
2017-04-19 23:16:14 +00:00
Sanjay Patel 0658a95a35 [DAG] add splat vector support for 'or' in SimplifyDemandedBits
I've changed one of the tests to not fold away, but we didn't and still don't do the transform
that the comment claims we do (and I don't know why we'd want to do that).

Follow-up to:
https://reviews.llvm.org/rL300725
https://reviews.llvm.org/rL300763

llvm-svn: 300772
2017-04-19 22:00:00 +00:00
Sanjay Patel ae382bb6af [DAG] add splat vector support for 'xor' in SimplifyDemandedBits
This allows forming more 'not' ops, so we get improvements for ISAs that have and-not.

Follow-up to:
https://reviews.llvm.org/rL300725

llvm-svn: 300763
2017-04-19 21:23:09 +00:00
Craig Topper 9b71a402c2 [APInt] Cast calls to add/sub/mul overflow methods to void if only their overflow bool out param is used.
This is preparation for a clang change to improve the [[nodiscard]] warning to not be ignored on methods that return a class marked [[nodiscard]] that are defined in the class itself. See D32207.

We should consider adding wrapper methods to APInt that return the overflow flag directly and discard the APInt result. This would eliminate the void casts and the need to create a bool before the call to pass to the out param.

llvm-svn: 300758
2017-04-19 21:09:45 +00:00
Sanjay Patel ded7d59f0e [DAG] add splat vector support for 'and' in SimplifyDemandedBits
The patch itself is simple: stop discriminating against vectors in visitAnd() and again in 
SimplifyDemandedBits().

Some notes for reference:

1. We're not consistent about calls to SimplifyDemandedBits in the various visitXXX functions. 
   Sometimes, we check if the RHS is a constant first. Other times (like here), we just dive in.
2. I'd like to break the vector shackles in steps for the sake of risk minimization, but we could
    make similar simultaneous changes in other places if we think that would be better.
3. I don't know what the intent of the changed tests in this patch was supposed to be, but since 
   they wiggled in a positive way, I'm just going with that. :)
4. In the rotate tests, note that we can see through non-splat constants. This is a result of D24253.
5. My motivation for being here now is to make D31944 look better, so this is step 1 of N towards 
   improving the vector codegen in that patch without writing any actual new code.

Differential Revision: https://reviews.llvm.org/D32230

llvm-svn: 300725
2017-04-19 18:05:06 +00:00
Nirav Dave 8563fc4664 [DAG] Loop over remaining candidates on successful merge of stores of
extracted vectors types. NFCI.

llvm-svn: 300688
2017-04-19 13:52:38 +00:00
Chih-Hung Hsieh 877923a87f [X86] Keep EXTRACT_VECTOR_ELT result type as f128 for Android x86_64.
Android x86_64 target uses f128 type and stores f128 values in %xmm* registers.
SoftenFloatRes_EXTRACT_VECTOR_ELT should not convert result value
from f128 to i128.

Differential Revision: http://reviews.llvm.org/D32102

llvm-svn: 300583
2017-04-18 20:15:18 +00:00
Craig Topper fc947bcfba [APInt] Use lshrInPlace to replace lshr where possible
This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result.

This adds an lshrInPlace(const APInt &) version as well.

Differential Revision: https://reviews.llvm.org/D32155

llvm-svn: 300566
2017-04-18 17:14:21 +00:00
Nirav Dave 855ef45602 [DAG] Improve store merge candidate pruning.
Remove non-consecutive stores from store merge candidate search as
they cannot be merged and will prevent us from finding subsequent
mergeable store cases.

Reviewers: jyknight, bogner, javed.absar, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32086

llvm-svn: 300561
2017-04-18 15:36:34 +00:00
Adrian Prantl 6825fb64e9 PR32382: Fix emitting complex DWARF expressions.
The DWARF specification knows 3 kinds of non-empty simple location
descriptions:
1. Register location descriptions
  - describe a variable in a register
  - consist of only a DW_OP_reg
2. Memory location descriptions
  - describe the address of a variable
3. Implicit location descriptions
  - describe the value of a variable
  - end with DW_OP_stack_value & friends

The existing DwarfExpression code is pretty much ignorant of these
restrictions. This used to not matter because we only emitted very
short expressions that we happened to get right by accident.  This
patch makes DwarfExpression aware of the rules defined by the DWARF
standard and now chooses the right kind of location description for
each expression being emitted.

This would have been an NFC commit (for the existing testsuite) if not
for the way that clang describes captured block variables. Based on
how the previous code in LLVM emitted locations, DW_OP_deref
operations that should have come at the end of the expression are put
at its beginning. Fixing this means changing the semantics of
DIExpression, so this patch bumps the version number of DIExpression
and implements a bitcode upgrade.

There are two major changes in this patch:

I had to fix the semantics of dbg.declare for describing function
arguments. After this patch a dbg.declare always takes the *address*
of a variable as the first argument, even if the argument is not an
alloca.

When lowering a DBG_VALUE, the decision of whether to emit a register
location description or a memory location description depends on the
MachineLocation — register machine locations may get promoted to
memory locations based on their DIExpression. (Future) optimization
passes that want to salvage implicit debug location for variables may
do so by appending a DW_OP_stack_value. For example:
  DBG_VALUE, [RBP-8]                        --> DW_OP_fbreg -8
  DBG_VALUE, RAX                            --> DW_OP_reg0 +0
  DBG_VALUE, RAX, DIExpression(DW_OP_deref) --> DW_OP_reg0 +0

All testcases that were modified were regenerated from clang. I also
added source-based testcases for each of these to the debuginfo-tests
repository over the last week to make sure that no synchronized bugs
slip in. The debuginfo-tests compile from source and run the debugger.

https://bugs.llvm.org/show_bug.cgi?id=32382
<rdar://problem/31205000>

Differential Revision: https://reviews.llvm.org/D31439

llvm-svn: 300522
2017-04-18 01:21:53 +00:00
Reid Kleckner fb502d2f5e [IR] Make paramHasAttr to use arg indices instead of attr indices
This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern.

Previously we were testing return value attributes with index 0, so I
introduced hasReturnAttr() for that use case.

llvm-svn: 300367
2017-04-14 20:19:02 +00:00
Nirav Dave 642ed1ef7e Reorder StoreMergeCandidates to run faster. NFCI.
llvm-svn: 300321
2017-04-14 13:34:30 +00:00
Andrew V. Tischenko 4e7bcd5216 Fix for PR#30562: Selection DAG error: Detected cycle in SelectionDAG.
Patch by Dinar Temirbulatov

llvm-svn: 300314
2017-04-14 09:17:09 +00:00
Nirav Dave 9acd2fd9d9 [DAG] Fold away temporary vector in store candidate merge NFC.
llvm-svn: 300241
2017-04-13 20:00:27 +00:00
Craig Topper 8b459c24f3 [SelectionDAG] Use APInt move assignment to avoid 2 memory allocations and copies when bit width is larger than 64-bits.
llvm-svn: 300091
2017-04-12 18:39:27 +00:00
Serge Pavlov 2757afdb85 Remove redundant type casts
llvm-svn: 300063
2017-04-12 14:13:00 +00:00
Nirav Dave a55dad3c33 [SDAG] Factor CandidateMatch check into lambda. NFC.
llvm-svn: 299939
2017-04-11 13:41:19 +00:00
Nirav Dave 83defd1902 [SDAG] Factor ChainMerge into helper function NFCI.
llvm-svn: 299938
2017-04-11 13:41:17 +00:00
Nirav Dave 233eb7a636 [SDAG] Reorder expensive StoreMerge Check after cheaper one. NFC
llvm-svn: 299937
2017-04-11 13:41:16 +00:00
Sam Parker 4fc5f3c02e [SelectionDAG] Check CALLSEQ_BEGIN nodes in DelayForLiveRegs
A fix for the bug reported in PR30911.

The issue arises when multiple CALLSEQ_BEGIN nodes are unscheduled as
the last node to be unscheduled will gain access to the CallResource
register. But when a node is being picked, only CALLSEQ_END nodes are
checked against the CallResource and have their chains evaluated.
This then means that other CALLSEQ_BEGIN nodes can be scheduled
before the existing call sequence has been finalised. This patch adds
a check against the FrameSetup nodes in DelayForLiveRegs to prevent
this from happening.

Differential Revision: https://reviews.llvm.org/D31536

llvm-svn: 299926
2017-04-11 08:43:32 +00:00
Craig Topper 3606e732dd [SelectionDAG] TargetLowering::SimplifyDemandedBits how to properly calculate KnownZero bits for ISD::SETCC and ISD::AssertZExt
Summary:
For SETCC we aren't calculating the KnownZero bits at all. I've copied the code from computeKnownZero over for this.

For AssertZExt we were only setting KnownZero for bits that were demanded. But the upper bits are zero whether they were demanded or not.

I'm interested in fixing this because my belief is the first part of the ISD::AND handling code in SimplifyDemandedBits largely exists because of these two bugs. In that code we go to computeKnownBits for the LHS and optimize a RHS constant. Because computeKnownBits handles SETCC and AssertZExt correctly we get better information sometimes than when we call SimplifyDemandedBits on the LHS later. With these two issues fixed in SimplifyDemandedBits I was able to remove that computeKnownBits call and still pass all X86 tests. I'll submit that change in a separate patch.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31715

llvm-svn: 299839
2017-04-10 07:06:44 +00:00
Simon Dardis f7e4388e3b Revert "[SelectionDAG] Enable target specific vector scalarization of calls and returns"
This reverts commit r299766. This change appears to have broken the MIPS
buildbots. Reverting while I investigate.

Revert "[mips] Remove usage of debug only variable (NFC)"

This reverts commit r299769. Follow up commit.

llvm-svn: 299788
2017-04-07 17:25:05 +00:00
Simon Dardis 6470ff0b24 [SelectionDAG] Enable target specific vector scalarization of calls and returns
By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown,
backends can request that LLVM to scalarize vector types for calls
and returns.

The MIPS vector ABI requires that vector arguments and returns are passed in
integer registers. With SelectionDAG's new hooks, the MIPS backend can now
handle LLVM-IR with vector types in calls and returns. E.g.
'call @foo(<4 x i32> %4)'.

Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for
calls and returns if vector types were not legal. If vector types were legal,
a single 128bit vector argument would be assigned to a single 32 bit / 64 bit
integer register.

By teaching the MIPS backend to inspect the original types, it can now
implement the MIPS vector ABI which requires a particular method of
scalarizing vectors.

Previously, the MIPS backend relied on clang to scalarize types such as "call
@foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3,
i32 inreg %4)".

This patch enables the MIPS backend to take either form for vector types.

Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur

Differential Revision: https://reviews.llvm.org/D27845

llvm-svn: 299766
2017-04-07 13:03:52 +00:00
Nirav Dave 974f7c23ae [SDAG] Fix visitAND optimization to deal with vector extract case again.
Summary:
Fix case elided by rL298920.

Fixes PR32545.

Reviewers: eli.friedman, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31759

llvm-svn: 299688
2017-04-06 19:05:41 +00:00
Jonas Paulsson 45c936ef86 [SelectionDAG] NFC patch removing a redundant check.
Since the BUILD_VECTOR has already been checked by
isBuildVectorOfConstantSDNodes() in SelectionDAG::getNode() for a
SIGN_EXTEND_INREG, it can be assumed that Op is always either undef or a
ConstantSDNode, and Ops.size() will always equal VT.getVectorNumElements().

llvm-svn: 299647
2017-04-06 13:00:37 +00:00
Craig Topper 2ca72f4971 Revert accidental commit of r299619.
llvm-svn: 299622
2017-04-06 04:04:10 +00:00
Craig Topper 6b15606051 Revert accidental commit of r299618
llvm-svn: 299621
2017-04-06 04:03:34 +00:00
Craig Topper 5d7ece8895 bar
llvm-svn: 299619
2017-04-06 04:02:31 +00:00
Craig Topper faf5a8553c foo
llvm-svn: 299618
2017-04-06 04:02:28 +00:00
Adam Nemet d5ffdd3605 [DAGCombine] Support FMF contract in fused multiple-and-sub too
This is a follow-on to r299096 which added support for fmadd.

Subtract does not have the case where with two multiply operands we commute in
order to fuse with the multiply with the fewer uses.

llvm-svn: 299572
2017-04-05 17:58:48 +00:00
Adam Nemet 99e347fc35 [DAGCombine] Remove commented-out code from r299096
llvm-svn: 299571
2017-04-05 17:58:44 +00:00
Sanjay Patel b2f1621bb1 [DAGCombiner] add and use TLI hook to convert and-of-seteq / or-of-setne to bitwise logic+setcc (PR32401)
This is a generic combine enabled via target hook to reduce icmp logic as discussed in:
https://bugs.llvm.org/show_bug.cgi?id=32401

It's likely that other targets will want to enable this hook for scalar transforms, 
and there are probably other patterns that can use bitwise logic to reduce comparisons.

Note that we are missing an IR canonicalization for these patterns, and we will probably
prefer the pair-of-compares form in IR (shorter, more likely to fold).

Differential Revision: https://reviews.llvm.org/D31483

llvm-svn: 299542
2017-04-05 14:09:39 +00:00
Jonas Paulsson 38a2da92bc [DAGCombiner] Don't make a BUILD_VECTOR with operands of illegal type.
When DAGCombiner visits a SIGN_EXTEND_INREG of a BUILD_VECTOR with
constant operands, a new BUILD_VECTOR node will be created transformed
constants.

Llvm-stress found a case where the new BUILD_VECTOR had constant operands
of an illegal type, because the (legal) element type is in fact not a legal
scalar type.

This patch changes this so that the new BUILD_VECTOR has the same operand
type as the old one.

Review: Eli Friedman, Nirav Dave
https://bugs.llvm.org//show_bug.cgi?id=32422

llvm-svn: 299540
2017-04-05 13:45:37 +00:00
Matt Arsenault c82768290d DAG: Fix missing legalization for any_extend_vector_inreg operands
llvm-svn: 299389
2017-04-03 21:28:13 +00:00
Craig Topper 3882613956 [DAGCombine][InstCombine] Fix inverted if condition in equivalent comments in DAGCombine and InstCombine. NFC
llvm-svn: 299378
2017-04-03 19:18:48 +00:00
Zvi Rackover d76a4d0ac6 Revert "[DAGCombine] A shuffle of a splat is always the splat itself"
This reverts commit r299047 which is incorrect because the
simplification may result in incorrect propogation of undefs to users of
the folded shuffle.

Thanks to Andrea Di Biagio for pointing this out.

llvm-svn: 299368
2017-04-03 17:41:19 +00:00
Craig Topper d33ee1b960 [APInt] Move isMask and isShiftedMask out of APIntOps and into the APInt class. Implement them without memory allocation for multiword
This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation.

Differential Revision: https://reviews.llvm.org/D31565

llvm-svn: 299362
2017-04-03 16:34:59 +00:00
Simon Pilgrim 9daf9c047d [DAGCombiner] Check limits before accessing array element (PR32502)
llvm-svn: 299361
2017-04-03 15:27:49 +00:00
Sanjay Patel 665021e7ee [DAGCombiner] enable vector transforms for any/all {sign} bits set/clear
The code already allowed vector types in via "isInteger" (which might want
a more specific name), so use splat-friendly constant predicates to match
those types.

llvm-svn: 299304
2017-04-01 15:05:54 +00:00
Craig Topper 73250168e7 [DAGCombiner] Fix fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask) to explicitly ensure that only one of the inputs of each shuffle is a zero vector.
This can only happen when we have a mix of zero and undef elements and the two vectors have a different arrangement of zeros/undefs. The shuffle should eventually be constant folded to all zeros.

Fixes PR32484.

llvm-svn: 299291
2017-04-01 04:26:20 +00:00
Quentin Colombet 35a47010b1 Revert "Instrument SDISel C++ patterns"
This reverts commit r299284.

Didn't intend to commit this :(

llvm-svn: 299286
2017-04-01 01:26:17 +00:00
Quentin Colombet b43da15602 Instrument SDISel C++ patterns
llvm-svn: 299284
2017-04-01 01:21:32 +00:00
Sanjay Patel 16d458ea0d [DAGCombiner] refactor and/or-of-setcc to get rid of duplicated code; NFCI
llvm-svn: 299266
2017-03-31 21:30:50 +00:00
Sanjay Patel 34da36e74f [DAGCombiner] add fold for 'All sign bits set?'
(and (setlt X,  0), (setlt Y,  0)) --> (setlt (and X, Y),  0)

We have 7 similar folds, but this one got away. The fact that the
x86 test with a branch didn't change is probably a separate bug. We
may also be missing this and the related folds in instcombine.

llvm-svn: 299252
2017-03-31 20:28:06 +00:00
Sanjay Patel 61d3409535 [DAGCombiner] remove redundant code and add comments; NFCI
llvm-svn: 299241
2017-03-31 18:18:58 +00:00
Simon Pilgrim 1cdbfe44b1 [DAGCombiner] Add ComputeNumSignBits vector demanded elements support to ASHR and INSERT_VECTOR_ELT
Followup to D31311

llvm-svn: 299221
2017-03-31 14:21:50 +00:00
Simon Pilgrim 3c81c34d8d [DAGCombiner] Add vector demanded elements support to ComputeNumSignBits
Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course.

Followup to D25691.

Differential Revision: https://reviews.llvm.org/D31311

llvm-svn: 299219
2017-03-31 13:54:09 +00:00
Simon Pilgrim 37b536e4b3 [DAGCombiner] Add vector demanded elements support to computeKnownBitsForTargetNode
Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes.

Differential Revision: https://reviews.llvm.org/D31249

llvm-svn: 299201
2017-03-31 11:24:16 +00:00
Adam Nemet edaec6de73 [DAGCombiner] Initial support for the fast-math flag contract
Now alternatively to the TargetOption.AllowFPOpFusion global flag, FMUL->FADD
can also use the per operation FMF to allow fusion.

The idea here is not to port everything to the new scheme (e.g. fused
multiply-and-sub will be ported later) but that this work all the way from
clang.

The transformation is conditionalized on *both* the FADD and the FMUL having
the FMF contract flag.

Differential Revision: https://reviews.llvm.org/D31169

llvm-svn: 299096
2017-03-30 18:53:04 +00:00
Ahmed Bougacha 6dd6082472 [CodeGen] Pass SDAG an ORE, and replace FastISel stats with remarks.
In the long-term, we want to replace statistics with something
finer-grained that lets us gather per-function data.
Remarks are that replacement.

Create an ORE instance in SelectionDAGISel, and pass it to
SelectionDAG.

SelectionDAG was used so that we can emit remarks from all
SelectionDAG-related code, including TargetLowering and DAGCombiner.
This isn't used in the current patch but Adam tells me he's interested
for the fp-contract combines.

Use the ORE instance to emit FastISel failures as remarks (instead of
the mix of dbgs() dumps and statistics that we currently have).

Eventually, we want to have an API that tells us whether remarks are
enabled (http://llvm.org/PR32352) so that we don't emit expensive
remarks (in this case, dumping IR) when it's not needed.  For now, use
'isEnabled' as a crude replacement.

This does mean that the replacement for '-fast-isel-verbose' is now
'-pass-remarks-missed=isel'.  Additionally, clang users also need to
enable remark diagnostics, using '-Rpass-missed=isel'.

This also removes '-fast-isel-verbose2': there are no static statistics
that we want to only enable in asserts builds, so we can always use
the remarks regardless of the build type.

Differential Revision: https://reviews.llvm.org/D31405

llvm-svn: 299093
2017-03-30 17:49:58 +00:00
Sanjay Patel 6d5ba061f8 [DAGCombiner] add helper function for visitORLike; NFCI
This combines all of the equivalent clean-ups for foldAndOfSetCCs:
https://reviews.llvm.org/rL298938
https://reviews.llvm.org/rL298940
https://reviews.llvm.org/rL298944
https://reviews.llvm.org/rL298949
https://reviews.llvm.org/rL298950
https://reviews.llvm.org/rL299002
https://reviews.llvm.org/rL299013

The sins of code duplication are on full display here:
each function is missing a fold that wasn't copied over from its logical sibling. 

llvm-svn: 299091
2017-03-30 17:32:42 +00:00
Craig Topper eafcbe2d10 [APInt] Remove references to integerPartWidth outside of APFloat implentation.
Turns out integerPartWidth only explicitly defines the width of the tc functions in the APInt class. Functions that aren't used by APInt implementation itself. Many places in the code base already assume APInt is made up of 64-bit pieces. Explicitly assuming 64-bit here doesn't make that situation much worse. A full audit would need to be done if it ever changes.

llvm-svn: 299059
2017-03-30 05:49:03 +00:00
Zvi Rackover 7569436f81 [DAGCombine] A shuffle of a splat is always the splat itself
Summary:
Add a simplification:
shuffle (splat-shuffle), undef, M --> splat-shuffle

Fixes pr32449

Patch by Sanjay Patel

Reviewers: eli.friedman, RKSimon, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31426

llvm-svn: 299047
2017-03-30 01:42:57 +00:00
Davide Italiano cdcdc97879 [DAGCombiner] Remove else after return. NFCI.
llvm-svn: 299022
2017-03-29 19:39:46 +00:00
Sanjay Patel ff211bb5a6 [DAGCombiner] unify type checks and add asserts; NFCI
We had a mix of type checks and usage that wasn't very clear.

llvm-svn: 299013
2017-03-29 18:08:01 +00:00
Sanjay Patel 087e922328 [DAGCombiner] reduce code duplication by rearranging checks; NFCI
llvm-svn: 299002
2017-03-29 15:37:33 +00:00
Adam Nemet 92a5cf4366 [SDAG] Remove -enable-fmf-dag
This is no longer needed as spotted by Sanjay in
https://reviews.llvm.org/D31165.

llvm-svn: 298963
2017-03-28 23:46:14 +00:00
Adam Nemet 6820f391eb [SDAG] Add AllowContract to SNodeFlags
Properly propagate the FMF from the LLVM IR to this flag.

This is toward moving fp-contraction=fast from an LLVM TargetOption to a
FastMathFlag in order to fix PR25721.

Differential Revision: https://reviews.llvm.org/D31165

llvm-svn: 298961
2017-03-28 23:46:08 +00:00
Sanjay Patel a41a5c29f0 [DAGCombiner] reduce code duplication with local variables; NFCI
llvm-svn: 298954
2017-03-28 22:45:53 +00:00
Sanjay Patel 9747d8070b [DAG] fix formatting; NFC
llvm-svn: 298950
2017-03-28 22:25:25 +00:00
Sanjay Patel d832eddde5 [DAGCombiner] remove redundant conditions and duplicated code; NFCI
llvm-svn: 298949
2017-03-28 22:22:50 +00:00
Sanjay Patel d2a26db991 [DAGCombiner] rename variables in foldAndOfSetCCs for easier reading; NFCI
llvm-svn: 298944
2017-03-28 21:40:41 +00:00
Sanjay Patel 3230e4be11 [DAGCombiner] clean up foldAndOfSetCCs; NFCI
1. Fix bogus comment.
2. Early exit to reduce indent.
3. Change node pointer param to what it really is: an SDLoc.

llvm-svn: 298940
2017-03-28 20:28:16 +00:00
Sanjay Patel 16af53a395 [DAGCombiner] add helper function for and-of-setcc folds; NFC
This is just a cut and paste followed by clang-format. Clean up to follow.

llvm-svn: 298938
2017-03-28 19:58:46 +00:00
Sanjay Patel f01a1dad7f [x86] use VPMOVMSK to replace memcmp libcalls for 32-byte equality
Follow-up to:
https://reviews.llvm.org/rL298775

llvm-svn: 298933
2017-03-28 17:23:49 +00:00
Nirav Dave 472b5efc8b [SDAG] Deal with deleted node in PromoteIntShiftOp
Deal with case that initial node is deleted during dag-combine leading
to an assertional failure in promoteIntShiftOp.

Fixes PR32420.

Reviewers: spatel, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31403

llvm-svn: 298931
2017-03-28 17:09:49 +00:00
Nirav Dave 5b414ebe63 [SDAG] Avoid deleted SDNodes PromoteIntBinOp
Reorder work in PromoteIntBinOp to prevent stale (deleted) nodes from
being used.

Fixes PR32340 and PR32345.

Reviewers: hfinkel, dbabokin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31148

llvm-svn: 298923
2017-03-28 15:41:12 +00:00
Nirav Dave 9b5563c52c [SDAG] Fix Stale SDNode usage in visitAND
Reorder CombineTo Calls to prevent potential use of deleted node.
Fixes PR32372.

Reviewers: jnspaulsson, RKSimon, uweigand, jonpa

Reviewed By: jonpa

Subscribers: jonpa, llvm-commits

Differential Revision: https://reviews.llvm.org/D31346

llvm-svn: 298920
2017-03-28 14:11:20 +00:00
Nirav Dave 423b24ae76 [SDAG] Minor cleanup of variable usage. NFC.
llvm-svn: 298916
2017-03-28 13:39:50 +00:00
Sanjay Patel 9ebb68843e [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality
This is the payoff for D31156 - if a target has efficient comparison instructions for vector-sized equality, 
we can replace memcmp calls with inline code that is both smaller and faster.

Differential Revision: https://reviews.llvm.org/D31290

llvm-svn: 298775
2017-03-25 16:05:33 +00:00
Simon Pilgrim dbc94db3f3 Apply clang-format as commented in D31311. NFCI.
llvm-svn: 298751
2017-03-24 23:47:41 +00:00
Nirav Dave e9ca32ae52 [SDAG] Fix zeroExtend assertion error
Move CombineTo preventing deleted node from being returned in
visitZERO_EXTEND.

Fixes PR32284.

Reviewers: RKSimon, bogner

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31254

llvm-svn: 298604
2017-03-23 15:01:50 +00:00
Reid Kleckner b518054b87 Rename AttributeSet to AttributeList
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.

Rename AttributeSetImpl to AttributeListImpl to follow suit.

It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.

Reviewers: sanjoy, javed.absar, chandlerc, pete

Reviewed By: pete

Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits

Differential Revision: https://reviews.llvm.org/D31102

llvm-svn: 298393
2017-03-21 16:57:19 +00:00
Matt Arsenault dce313c3cf DAG: Fold bitcast/extract_vector_elt of undef to undef
Fixes not eliminating store when intrinsic is lowered to undef.

llvm-svn: 298385
2017-03-21 16:20:16 +00:00
Jonas Paulsson 54c7680e1f [DAGTypeLegalizer] Handle widening truncate to vector of i1.
Previously, PromoteIntRes_TRUNCATE() did not handle the case where
the operand needs widening, which resulted in llvm_unreachable().

This patch adds the needed handling, along with a test case.

Review: Eli Friedman, Simon Pilgrim.
https://reviews.llvm.org/D31077

llvm-svn: 298357
2017-03-21 10:24:14 +00:00
Simon Pilgrim 8424df7dea Fix constant folding of fp2int to large integers
We make the assumption in most of our constant folding code that a fp2int will target an integer of 128-bits or less, calling the APFloat::convertToInteger with only uint64_t[2] of raw bits for the result.

Fuzz testing (PR24662) showed that we don't handle other cases at all, resulting in stack overflows and all sorts of crashes.

This patch uses the APSInt version of APFloat::convertToInteger instead to better handle such cases.

Differential Revision: https://reviews.llvm.org/D31074

llvm-svn: 298226
2017-03-19 16:50:25 +00:00
Nirav Dave ac6081cb67 Make library calls sensitive to regparm module flag (Fixes PR3997).
Reviewers: mkuper, rnk

Subscribers: mehdi_amini, jyknight, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D27050

llvm-svn: 298179
2017-03-18 00:44:07 +00:00
Nirav Dave 6de2c77944 Capitalize ArgListEntry fields. NFC.
llvm-svn: 298178
2017-03-18 00:43:57 +00:00
Eli Friedman 46ddab3810 [SelectionDAG] Remove redundant stores more aggressively.
Handle TokenFactors more aggressively in
SDValue::reachesChainWithoutSideEffects.  This isn't really a
very effective change anymore because of other changes to
chain handling, but it's a cheap check, and the expanded
comments are still useful.

It might be possible to loosen the hasOneUse() requirement with a
deeper analysis, but a naive implementation of that check would be
expensive.

Differential Revision: https://reviews.llvm.org/D29845

llvm-svn: 298156
2017-03-17 22:15:50 +00:00
Simon Pilgrim 5a68d401c7 [SelectionDAG] Add SelectionDAG.computeKnownBits test support for ISD::ABS
llvm-svn: 298108
2017-03-17 17:45:36 +00:00
Reid Kleckner 45707d4d5a Remove getArgumentList() in favor of arg_begin(), args(), etc
Users often call getArgumentList().size(), which is a linear way to get
the number of function arguments. arg_size(), on the other hand, is
constant time.

In general, the fact that arguments are stored in an iplist is an
implementation detail, so I've removed it from the Function interface
and moved all other users to the argument container APIs (arg_begin(),
arg_end(), args(), arg_size()).

Reviewed By: chandlerc

Differential Revision: https://reviews.llvm.org/D31052

llvm-svn: 298010
2017-03-16 22:59:15 +00:00
Jonas Paulsson 84319bfc40 [SelectionDAG] Optimize VSELECT->SETCC of incompatible or illegal types.
Don't scalarize VSELECT->SETCC when operands/results needs to be widened,
or when the type of the SETCC operands are different from those of the VSELECT.

(VSELECT SETCC) and (VSELECT (AND/OR/XOR (SETCC,SETCC))) are handled.

The previous splitting of VSELECT->SETCC in DAGCombiner::visitVSELECT() is
no longer needed and has been removed.

Updated tests:

test/CodeGen/ARM/vuzp.ll
test/CodeGen/NVPTX/f16x2-instructions.ll
test/CodeGen/X86/2011-10-19-widen_vselect.ll
test/CodeGen/X86/2011-10-21-widen-cmp.ll
test/CodeGen/X86/psubus.ll
test/CodeGen/X86/vselect-pcmp.ll

Review: Eli Friedman, Simon Pilgrim
https://reviews.llvm.org/D29489

llvm-svn: 297930
2017-03-16 07:17:12 +00:00
Zvi Rackover 48cdde0e59 [DAGCombine] Bail out if can't create a vector with at least two elements
Summary:

Fixes pr32278

Reviewers: igorb, craig.topper, RKSimon, spatel, hfinkel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30978

llvm-svn: 297878
2017-03-15 19:48:36 +00:00
Craig Topper bcb6093610 [CodeGen] Use APInt::setLowBits/setHighBits/setBitsFrom in more places
This patch replaces ORs with getHighBits/getLowBits etc. with setLowBits/setHighBits/setBitsFrom.

In a few of the places we weren't ORing, but the KnownZero/KnownOne vectors were already initialized to zero. We exploit this in most places already there were just some that were inconsistent.

Differential Revision: https://reviews.llvm.org/D30965

llvm-svn: 297860
2017-03-15 16:53:53 +00:00
Simon Pilgrim 018eedd9a5 [SelectionDAG] Support BUILD_VECTOR implicit truncation in SelectionDAG::ComputeNumSignBits (PR32273)
llvm-svn: 297852
2017-03-15 16:22:24 +00:00
Nuno Lopes ae455c562d fix gcc -Wmisleading-indentation [NFC]
llvm-svn: 297816
2017-03-15 09:33:33 +00:00
Simon Pilgrim cf2da96c82 [SelectionDAG] Add a signed integer absolute ISD node
Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering.

ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns.

At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom.

Differential Revision: https://reviews.llvm.org/D29639

llvm-svn: 297780
2017-03-14 21:26:58 +00:00
Sanjay Patel 8dd99dce6c [DAG] vector div/rem with any zero element in divisor is undef
This is the backend counterpart to:
https://reviews.llvm.org/rL297390
https://reviews.llvm.org/rL297409
and follow-up to:
https://reviews.llvm.org/rL297384

It surprised me that we need to duplicate the check in FoldConstantArithmetic and FoldConstantVectorArithmetic, 
but one or the other doesn't catch all of the test cases. There is an existing code comment about merging those 
someday.

Differential Revision: https://reviews.llvm.org/D30826

llvm-svn: 297762
2017-03-14 18:06:28 +00:00
Sam Parker 916b1ba617 [ARM] Move SMULW[B|T] isel to DAG Combine
Create nodes for smulwb and smulwt and move their selection from
DAGToDAG to DAG combine. smlawb and smlawt can then be selected
using tablegen. Added some helper functions to detect shift patterns
as well as a wrapper around SimplifyDemandBits. Added a couple of
extra tests.

Differential Revision: https://reviews.llvm.org/D30708

llvm-svn: 297716
2017-03-14 09:13:22 +00:00
Nirav Dave 4fc8401abf Recommitting Craig Topper's patch now that r296476 has been recommitted.
When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node.

This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.

llvm-svn: 297698
2017-03-14 01:42:23 +00:00
Nirav Dave 54e22f33d9 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements

    Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 297695
2017-03-14 00:34:14 +00:00
Amaury Sechet d1ec5d54cf Use setBits in SelectionDAG
Summary: As per title.

Reviewers: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30836

llvm-svn: 297559
2017-03-11 11:24:03 +00:00
Simon Pilgrim 7dedbfa89d [SelectionDAG] Add support for BUILD_VECTOR to ComputeNumSignBits
llvm-svn: 297492
2017-03-10 18:36:46 +00:00
Amaury Sechet 62e0759d56 [SelectionDAG] Make SelectionDAG aware of the known bits in USUBO and SSUBO and SUBC.
Summary:
Depends on D30379

This improves the state of things for the sub class of operation.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30436

llvm-svn: 297482
2017-03-10 17:26:44 +00:00
Amaury Sechet 69fa16c810 [SelectionDAG] Make SelectionDAG aware of the known bits in UADDO and SADDO.
Summary: As per title. This is extracted from D29872 and I threw SADDO in.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30379

llvm-svn: 297479
2017-03-10 17:06:52 +00:00
Simon Pilgrim b02667c469 [APInt] Add APInt::insertBits() method to insert an APInt into a larger APInt
We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64).

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target.

Differential Revision: https://reviews.llvm.org/D30780

llvm-svn: 297458
2017-03-10 13:44:32 +00:00
Amaury Sechet e7d102cf02 [DAGCombiner] Do various combine on uaddo.
Summary: This essentially does the same transform as for ADC.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30417

llvm-svn: 297416
2017-03-09 22:47:00 +00:00
Amaury Sechet 10425de063 [DAGCombiner] Do various combine on usubo.
Summary: This essentially does the same transform as for SUBC.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30437

llvm-svn: 297404
2017-03-09 19:28:00 +00:00
Sanjay Patel df21979db7 [DAG] recognize div/rem by 0 as undef before trying constant folding
As discussed in the review thread for rL297026, this is actually 2 changes that 
would independently fix all of the test cases in the patch:

1. Return undef in FoldConstantArithmetic for div/rem by 0.
2. Move basic undef simplifications for div/rem (simplifyDivRem()) before 
   foldBinopIntoSelect() as a matter of efficiency.

I will handle the case of vectors with any zero element as a follow-up. That change
is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic().

I'm deleting the test for PR30693 because it does not test for the actual bug any more
(dangers of using bugpoint).

Differential Revision:
https://reviews.llvm.org/D30741

llvm-svn: 297384
2017-03-09 15:02:25 +00:00
Matt Arsenault 9a3fd87523 DAG: Check no signed zeros instead of unsafe math attribute
llvm-svn: 297354
2017-03-09 01:36:39 +00:00
Eli Friedman c2c2e21d77 [DAGCombine] Simplify ISD::AND in GetDemandedBits.
This helps in cases involving bitfields where an AND is exposed by
legalization.

Differential Revision: https://reviews.llvm.org/D30472

llvm-svn: 297249
2017-03-08 00:56:35 +00:00
Sanjay Patel 7f18ec50ba [DAG] refactor related div/rem folds; NFCI
This is known incomplete and not called in the right order relative to
other folds, but that's the current behavior. I'm just trying to clean
this up before making actual functional changes to make the patch smaller.

The logic here should mimic the IR equivalents that are in InstSimplify's
simplifyDivRem().

llvm-svn: 297086
2017-03-06 22:32:40 +00:00
Sanjay Patel 7f7947bf41 [DAGCombiner] simplify div/rem-by-0
Refactoring of duplicated code and more fixes to follow.

This is motivated by the post-commit comments for r296699:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435182.html

Ie, we can crash if we're missing obvious simplifications like this that
exist in the IR simplifier or if these occur later than expected.

The x86 change for non-splat division shows a potential opportunity to improve
vector codegen: we assumed that since only one lane had meaningful results, we
should do the math in scalar. But that means moving back and forth from vector
registers.

llvm-svn: 297026
2017-03-06 16:36:42 +00:00
Sanjay Patel 6b029a5380 [DAG] fix formatting; NFC
llvm-svn: 297015
2017-03-06 15:27:57 +00:00
Sanjay Patel 5273afd4bb [DAG] fix typo in comment; NFC
llvm-svn: 297011
2017-03-06 15:07:43 +00:00
Simon Pilgrim 584d6d9d91 [SelectionDAG] Fix vector splitting for *_EXTEND_VECTOR_INREG instructions
Found by fuzz testing after rL296985 landed

llvm-svn: 296989
2017-03-05 15:52:18 +00:00
Simon Pilgrim 9f5c251d57 [X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops
As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT.

We're missing a couple of shuffle combines that will be added in a future patch for review.

Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases.

Differential Revision: https://reviews.llvm.org/D30549

llvm-svn: 296985
2017-03-05 09:57:20 +00:00
Craig Topper 6ffc044b2f [DAGCombine] Use APInt::operator|(uint64_t) instead of creating a temporary APInt and calling APInt::Or. NFC
This is more efficient by itself. But this is prep for a future patch that may remove APInt::Or while making operator| support rvalue references similar to add/sub.

llvm-svn: 296981
2017-03-05 01:08:16 +00:00
Sanjay Patel 066f3208bf [DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C)
select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook.

This is part of the ongoing process to obsolete D24480.  The motivation is to 
canonicalize to select IR in InstCombine whenever possible, so we need to have a way to
undo that easily in codegen.
 
PowerPC is an obvious winner for this kind of transform because it has fast and complete 
bit-twiddling abilities but generally lousy conditional execution perf (although this might
have changed in recent implementations).

x86 also sees some wins, but the effect is limited because these transforms already mostly
exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 
changes just shows that that code is a mess of special-case holes. We may be able to remove 
some of that logic now.

My guess is that other targets will want to enable this hook for most cases. The likely 
follow-ups would be to add value type and/or the constants themselves as parameters for the
hook. As the tests in select_const.ll show, we can transform any select-of-constants to 
math/logic, but the general transform for any 2 constants needs one more instruction 
(multiply or 'and').

ARM is one target that I think may not want this for most cases. I see infinite loops there
because it wants to use selects to enable conditionally executed instructions.

Differential Revision: https://reviews.llvm.org/D30537

llvm-svn: 296977
2017-03-04 19:18:09 +00:00
Florian Hahn 6406f98342 [legalize-types] Remove stale entries from SoftenedFloats.
Summary:
When replacing a SDValue, we should remove the replaced value from
SoftenedFloats (and possibly the other maps as well?).

When we revisit a Node because it needs analyzing again, we have to
remove all result values from SoftenedFloats (and possibly other maps?).

This fixes the fp128 test failures with expensive checks for X86.

I think we probably should also remove the values from the other maps
(PromotedIntegers and so on), let me know what you think.

Reviewers: baldrick, bogner, davidxl, ab, arsenm, pirama, chh, RKSimon

Reviewed By: chh

Subscribers: danalbert, wdng, srhines, hfinkel, sepavloff, llvm-commits

Differential Revision: https://reviews.llvm.org/D29265

llvm-svn: 296964
2017-03-04 12:00:35 +00:00
Simon Pilgrim 6dfab414db Use APInt::setBits instead of OR'ing in a separate APInt::getBitsSet call
llvm-svn: 296886
2017-03-03 17:03:52 +00:00
Simon Pilgrim cf12b5e1a6 Use APInt::getOneBitSet instead of APInt::getBitsSet for sign bit mask creation
Avoids all the unnecessary extra bitrange creation/shift stages.

llvm-svn: 296879
2017-03-03 16:35:57 +00:00
Simon Pilgrim 10754abe7e Use APInt::getOneBitSet instead of APInt::getBitsSet for sign bit mask creation
Avoids all the unnecessary extra bitrange creation/shift stages.

llvm-svn: 296871
2017-03-03 14:25:46 +00:00
Chandler Carruth ce52b80744 [SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.

llvm-svn: 296862
2017-03-03 10:02:25 +00:00
Taewook Oh 96c6415697 [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)
Summary:
Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below

```
extern int bar();
extern int baz();

int foo(int x, int y) {
  if (x != y)
    return bar();
  else
    return baz();
}
```

, following is the bitcode representation of 'foo' at the end of llvm-ir level optimization:

```
define i32 @foo(i32 %x, i32 %y) !dbg !4 {
entry:
  tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12
  tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13
  %cmp = icmp ne i32 %x, %y, !dbg !14
  br i1 %cmp, label %if.then, label %if.else, !dbg !16

if.then:                                          ; preds = %entry
  %call = tail call i32 (...) @bar() #3, !dbg !17
  br label %return, !dbg !18

if.else:                                          ; preds = %entry
  %call1 = tail call i32 (...) @baz() #3, !dbg !19
  br label %return, !dbg !20

return:                                           ; preds = %if.else, %if.then
  %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ]
  ret i32 %retval.0, !dbg !21
}

!14 = !DILocation(line: 5, column: 9, scope: !15)
!16 = !DILocation(line: 5, column: 7, scope: !4)

```

As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue.

Reviewers: atrick, bogner, andreadb, craig.topper, aprantl

Reviewed By: andreadb

Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29813

llvm-svn: 296825
2017-03-02 21:58:35 +00:00
Sanjay Patel 7884dcb788 [DAG] early exit to improve readability and formatting of visitMemCmpCall(); NFCI
llvm-svn: 296824
2017-03-02 21:56:43 +00:00
Sanjay Patel 209b0f9aad [DAG] improve documentation comments; NFC
llvm-svn: 296808
2017-03-02 20:48:08 +00:00
Sanjay Patel fffa179837 [DAGCombiner] avoid assertion when folding binops with opaque constants
This bug was introduced with:
https://reviews.llvm.org/rL296699

There may be a way to loosen the restriction, but for now just bail out
on any opaque constant.

The tests show that opacity is target-specific. This goes back to cost
calculations in ConstantHoisting based on TTI->getIntImmCost().

llvm-svn: 296768
2017-03-02 17:18:56 +00:00
Sanjay Patel f7aba7ba22 fix typo in comment; NFC
llvm-svn: 296760
2017-03-02 16:37:24 +00:00
Amaury Sechet 71f511fd1e [DAGCombiner] mulhi + 1 never overflow.
Summary:
This can be used to optimize large multiplications after legalization.

Depends on D29565

Reviewers: mkuper, spatel, RKSimon, zvi, bkramer, aaboud, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29587

llvm-svn: 296711
2017-03-01 23:44:17 +00:00
Sanjay Patel 92938657a0 [DAGCombiner] fold binops with constant into select-of-constants
This is part of the ongoing attempt to improve select codegen for all targets and select 
canonicalization in IR (see D24480 for more background). The transform is a subset of what
is done in InstCombine's FoldOpIntoSelect().

I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that 
hopes to convert more selects to basic math ops. This appears to be a general missing DAG
transform though, so I added tests for all standard binops in rL296621 
(PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86).

The poor output for "sel_constants_shl_constant" is tracked with:
https://bugs.llvm.org/show_bug.cgi?id=32105

Differential Revision: https://reviews.llvm.org/D30502

llvm-svn: 296699
2017-03-01 22:51:31 +00:00
Benjamin Kramer 0e429606b0 [DAGCombiner] Remove non-ascii character and reflow comment.
llvm-svn: 296690
2017-03-01 22:10:43 +00:00
Reid Kleckner f7c0980c10 Elide argument copies during instruction selection
Summary:
Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped.  This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.

This is implemented by analyzing the entry block before ISel to identify
copy elision candidates. A copy elision candidate is an argument that is
used to fully initialize an alloca before any other possibly escaping
uses of that alloca. If an argument is a copy elision candidate, we set
a flag on the InputArg. If the the target generates loads from a fixed
stack object that matches the size and alignment requirements of the
alloca, the SelectionDAG builder will delete the stack object created
for the alloca and replace it with the fixed stack object. The load is
left behind to satisfy any remaining uses of the argument value. The
store is now dead and is therefore elided. The fixed stack object is
also marked as mutable, as it may now be modified by the user, and it
would be invalid to rematerialize the initial load from it.

Supersedes D28388

Fixes PR26328

Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29668

llvm-svn: 296683
2017-03-01 21:42:00 +00:00
Nirav Dave 0a4703b5ec [DAG] Prevent Stale nodes from entering worklist
Add check that deleted nodes do not get added to worklist. This can
occur when a node's operand is simplified to an existing node.

This fixes PR32108.

Reviewers: jyknight, hfinkel, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30506

llvm-svn: 296668
2017-03-01 20:19:38 +00:00
Artur Pilipenko e1b2d31468 [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336).

Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591

llvm-svn: 296651
2017-03-01 18:12:29 +00:00
Ahmed Bougacha 20b3e9a835 [CodeGen] Remove dead FastISel code after SDAG emitted a tailcall.
When SDAGISel (top-down) selects a tail-call, it skips the remainder
of the block.

If, before that, FastISel (bottom-up) selected some of the (no-op) next
few instructions, we can end up with dead instructions following the
terminator (selected by SDAGISel).

We need to erase them, as we know they aren't necessary (in addition to
being incorrect).

We already do this when FastISel falls back on the tail-call itself.
Also remove the FastISel-emitted code if we fallback on the
instructions between the tail-call and the return.

llvm-svn: 296552
2017-03-01 00:43:42 +00:00
Sanjay Patel ea61ea9f19 [DAGCombiner] use dyn_cast values in foldSelectOfConstants(); NFC
llvm-svn: 296502
2017-02-28 18:41:49 +00:00
Craig Topper 419f145ebb [DAGISel] When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node.
This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.

llvm-svn: 296486
2017-02-28 16:52:05 +00:00
Nirav Dave f830dec3f2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296476
2017-02-28 14:24:15 +00:00
Eugene Zelenko fa912a7151 [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 296404
2017-02-27 22:45:06 +00:00
Arnold Schwaighofer b2605f31ed ISel: We need to notify FastIS of the IMPLICIT_DEF we created in createSwiftErrorEntriesInEntryBlock
Otherwise, it will insert instructions before it.

rdar://30536186

llvm-svn: 296395
2017-02-27 22:12:06 +00:00
Matt Arsenault 4a7cc16e89 Revert "DAG: Check if extract_vector_elt is legal or custom"
This reverts r295782. This could potentially result in some
legalization loops and I avoided the need for this.

llvm-svn: 296393
2017-02-27 21:59:07 +00:00
Simon Pilgrim 5c4efcdddf [X86][SSE] Attempt to extract vector elements through target shuffles
DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered.

This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it.

Differential Revision: https://reviews.llvm.org/D30176

llvm-svn: 296381
2017-02-27 21:01:57 +00:00
Artur Pilipenko f7196c8d9e [DAGCombine] Fix for a load combine bug with non-zero offset patterns on BE targets
This pattern is essentially a i16 load from p+1 address:

  %p1.i16 = bitcast i8* %p to i16*
  %p2.i8 = getelementptr i8, i8* %p, i64 2
  %v1 = load i16, i16* %p1.i16
  %v2.i8 = load i8, i8* %p2.i8
  %v2 = zext i8 %v2.i8 to i16
  %v1.shl = shl i16 %v1, 8
  %res = or i16 %v1.shl, %v2

Current implementation would identify %v1 load as the first byte load and would mistakenly emit a i16 load from %p1.i16 address. This patch adds a check that the first byte is loaded from a non-zero offset of the first load address. This way this address can be used as the base address for the combined value. Otherwise just give up combining.

llvm-svn: 296336
2017-02-27 13:04:23 +00:00
Artur Pilipenko c43b20a43b [DAGCombine] NFC. MatchLoadCombine extract MemoryByteOffset lambda helper
This refactoring will simplify the upcoming change to fix the bug in folding patterns with non-zero offsets on BE targets.

llvm-svn: 296332
2017-02-27 11:42:54 +00:00
Artur Pilipenko f2c26e0bf2 [DAGCombine] NFC. MatchLoadCombine remember the first byte provider, not the load node
This refactoring will simplify the upcoming change to fix a bug in folding patterns with non-zero offsets on BE targets.

llvm-svn: 296331
2017-02-27 11:40:14 +00:00
Nirav Dave 73cd0194cf Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.

llvm-svn: 296279
2017-02-26 01:27:32 +00:00
Artyom Skrobov ac56719231 No need to copy the variable [NFC]
llvm-svn: 296259
2017-02-25 17:18:09 +00:00
Nirav Dave beabf456df In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296252
2017-02-25 11:43:58 +00:00
Sanjay Patel 832b1622d8 [DAGCombiner] add missing folds for scalar select of {-1,0,1}
The motivation for filling out these select-of-constants cases goes back to D24480, 
where we discussed removing an IR fold from add(zext) --> select. And that goes back to:
https://reviews.llvm.org/rL75531
https://reviews.llvm.org/rL159230

The idea is that we should always canonicalize patterns like this to a select-of-constants 
in IR because that's the smallest IR and the best for value tracking. Note that we currently 
do the opposite in some cases (like the cases in *this* patch). Ie, the proposed folds in 
this patch already exist in InstCombine today:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151

As this patch shows, most targets generate better machine code for simple ext/add/not ops 
rather than a select of constants. So the follow-up steps to make this less of a patchwork 
of special-case folds and missing IR canonicalization:

1. Have DAGCombiner convert any select of constants into ext/add/not ops.
2  Have InstCombine canonicalize in the other direction (create more selects).

Differential Revision: https://reviews.llvm.org/D30180

llvm-svn: 296137
2017-02-24 17:17:33 +00:00
Sanjay Patel 4a4fbe162f [DAG] add convenience function to get -1 constant; NFCI
llvm-svn: 296004
2017-02-23 19:02:33 +00:00
Bill Seurer 8e48f416ad [DAGCombiner] revert r295336
r295336 causes a bootstrapped clang to fail for many compilations on
powerpc BE.  See 
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315
for example.

Reverting as per the developer's request.

llvm-svn: 295849
2017-02-22 16:27:33 +00:00
Matt Arsenault f0a4823b91 DAG: Check if extract_vector_elt is legal or custom
Avoids test regressions in future AMDGPU commits when
more vector types are custom lowered.

llvm-svn: 295782
2017-02-21 22:47:27 +00:00
Simon Pilgrim c0dc9a4913 Strip trailing whitespace.
llvm-svn: 295653
2017-02-20 11:56:43 +00:00
Simon Pilgrim 50b958c07a [SelectionDAG] Add scalarization support for ISD::*_EXTEND_VECTOR_INREG opcodes.
Thanks to Mikael Holmén for the initial test case

llvm-svn: 295652
2017-02-20 11:55:58 +00:00
Artyom Skrobov be31754094 Remove redundant call to GluedNodes.back() [NFC]
llvm-svn: 295607
2017-02-19 16:56:18 +00:00
Sanjay Patel 7f2e58972c [DAGCombiner] split i1 select-of-constants from non-i1 case; NFCI
I can't find any tests of the non-i1 code path, so it may be unnecessary at this point.

llvm-svn: 295463
2017-02-17 17:13:27 +00:00
Simon Pilgrim 0429c0cf8b Fix signed/unsigned comparison warning.
llvm-svn: 295453
2017-02-17 16:01:16 +00:00
Simon Pilgrim 511d788a95 [DAGCombine] Recognise any_extend_vector_inreg and truncation style shuffle masks
During legalization we are often creating shuffles (via a build_vector scalarization stage) that are "any_extend_vector_inreg" style masks, and also other masks that are the equivalent of "truncate_vector_inreg" (if we had such a thing).

This patch is an attempt to match these cases to help undo the effects of just leaving shuffle lowering to handle it - which typically means we lose track of the undefined elements of the shuffles resulting in an unnecessary extension+truncation stage for widened illegal types.

The 2011-10-21-widen-cmp.ll regression will be fixed by making SIGN_EXTEND_VECTOR_IN_REG legal in SSE instead of lowering them to X86ISD::VSEXT (PR31712).

Differential Revision: https://reviews.llvm.org/D29454

llvm-svn: 295451
2017-02-17 15:14:48 +00:00
Sanjay Patel 5573042035 [DAGCombiner] improve readability; NFCI
llvm-svn: 295447
2017-02-17 14:21:59 +00:00
Artur Pilipenko 85d758299e [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Resubmit -r295314 with PowerPC and AMDGPU tests updated.

Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591

llvm-svn: 295336
2017-02-16 17:07:27 +00:00
Artur Pilipenko a1b384c4ce Rever -r295314 "[DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine"
This change causes some of AMDGPU and PowerPC tests to fail.

llvm-svn: 295316
2017-02-16 13:04:46 +00:00
Artur Pilipenko daaa0c0f7d [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591

llvm-svn: 295314
2017-02-16 12:53:26 +00:00
Matt Arsenault 5de8dc9cf5 DAG: Do not scalarize fsub if fneg is legal
Tests will be included with future commit.

llvm-svn: 295242
2017-02-15 22:02:42 +00:00
Michael Kuperstein ba80db39d7 [DAG] Don't try to create an INSERT_SUBVECTOR with an illegal source
We currently can't legalize those, but we should really not be creating
them in the first place, since legalization would probably look similar to the
way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD.

This fixes PR311956.

Differential Revision: https://reviews.llvm.org/D29961

llvm-svn: 295213
2017-02-15 18:37:26 +00:00
Craig Topper 96ec7a23e3 [SelectionDAGBuilder] Simplify creation of shufflevector DAG nodes where inputs are larger than the mask
Summary:
The current code loops over all elements to calculate a used range. Then a second short loop looks at the ranges and determines if they can be used in a extract and creates a properly aligned start index for the extract.

This range finding is unnecessary, we can just calculate a properly aligned start index for an extract for each input during the first loop. If we don't find the same start index for each indice we can't use an extract.

Reviewers: zvi, RKSimon

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29926

llvm-svn: 295152
2017-02-15 05:57:16 +00:00
Aditya Nandakumar bb0483bc8e [Tablegen] Instrumenting table gen DAGGenISelDAG
To help assist in debugging ISEL or to prioritize GlobalISel backend
work, this patch adds two more tables to <Target>GenISelDAGISel.inc -
one which contains the patterns that are used during selection and the
other containing include source location of the patterns
Enabled through CMake varialbe LLVM_ENABLE_DAGISEL_COV

llvm-svn: 295081
2017-02-14 18:32:41 +00:00
Artyom Skrobov dc66a82dc7 Removing a redundant assignment
llvm-svn: 295055
2017-02-14 14:44:01 +00:00
Arnold Schwaighofer 8f3df731dc swiftcc: Don't emit tail calls from callers with swifterror parameters
Backends don't support this yet. They would have to move to the swifterror
register before the tail call to make sure it is live-in to the call.

rdar://30495920

llvm-svn: 294982
2017-02-13 19:58:28 +00:00
Quentin Colombet fbae5fcb96 [FastISel] Add a diagnostic to warm on fallback.
This is consistent with what we do for GlobalISel. That way, it is easy
to see whether or not FastISel is able to fully select a function.
At some point we may want to switch that to an optimization remark.

llvm-svn: 294970
2017-02-13 17:38:59 +00:00
Craig Topper 3668bde371 [DAGCombiner] Teach DAG combine that inserting an extract_subvector result into the same location of a an undef vector can just use the original input to the extract.
llvm-svn: 294932
2017-02-13 04:53:33 +00:00
Craig Topper aa46204ed9 [DAGCombiner] Remove the half vector width check for the combine of EXTRACT_SUBVECTOR from an INSERT_SUBVECTOR.
This gives more parallelism opportunities for AVX-512 when dealing with 128-bit extracts from 512-bit vectors.

llvm-svn: 294930
2017-02-12 23:49:49 +00:00
Sanjay Patel 0557a44287 [TargetLowering] fix SETCC SETLT folding with FP types
The bug was introduced with:
https://reviews.llvm.org/rL294863

...and manifests as a selection failure in x86, but that's actually
another bug. This fix prevents wrong codegen with -0.0, but in the
more common case when we have NSZ and NNAN (-ffast-math), we should 
still be able to fold this setcc/compare.

llvm-svn: 294924
2017-02-12 23:07:52 +00:00
Craig Topper b633adedc7 [DAGCombiner] Make the combine of INSERT_SUBVECTOR into a CONCAT_VECTOR more generic to support larger concats.
llvm-svn: 294875
2017-02-11 22:57:09 +00:00
Sanjay Patel 63499b61c9 [TargetLowering] check for sign-bit comparisons in SimplifyDemandedBits
I don't know if anything other than x86 vectors is affected by this change, but this may allow 
us to remove target-specific intrinsics for blendv* (vector selects). The simplification arises
from the fact that blendv* instructions only use the sign-bit when deciding which vector element
to choose for the destination vector. The mechanism to fold VSELECT into SHRUNKBLEND nodes already
exists in x86 lowering; this demanded bits change just enables the transform to fire more often.

The original motivation starts with a bug for DSE of masked stores that seems completely unrelated, 
but I've explained the likely steps in this series here:
https://llvm.org/bugs/show_bug.cgi?id=11210

Differential Revision: https://reviews.llvm.org/D29687

llvm-svn: 294863
2017-02-11 18:01:55 +00:00
Simon Pilgrim bfb1747806 [DAGCombine] Allow vector constant folding of any value type before type legalization
The patch comes in 2 parts:

1 - it makes use of the SelectionDAG::NewNodesMustHaveLegalTypes flag to tell when it can safely constant fold illegal types.

2 - it correctly resets SelectionDAG::NewNodesMustHaveLegalTypes at the start of each call to SelectionDAGISel::CodeGenAndEmitDAG so all the pre-legalization stages can make use of it - not just the first basic block that gets handled.

Fix for PR30760

Differential Revision: https://reviews.llvm.org/D29568

llvm-svn: 294749
2017-02-10 14:37:25 +00:00
Craig Topper a9f1121896 [SelectionDAG] Dump the DAG after legalizing vector ops and after the second type legalization
Summary:
With -debug, we aren't dumping the DAG after legalizing vector ops. In particular, on X86 with AVX1 only, we don't dump the DAG after we split 256-bit integer ops into pairs of 128-bit ADDs since this occurs during vector legalization.

I'm only dumping if the legalize vector ops changes something since we don't print anything during legalize vector ops. So this dump shows up right after the first type-legalization dump happens. So if nothing changed this second dump is unnecessary.

Having said that though, I think we should probably fix legalize vector ops to log what its doing.

Reviewers: RKSimon, eli.friedman, spatel, arsenm, chandlerc

Reviewed By: RKSimon

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D29554

llvm-svn: 294711
2017-02-10 05:05:57 +00:00
Geoff Berry 7e320c2485 [SelectionDAG] Fix bugs in inverted condition splitting code.
Summary:
Fix two bugs in SelectionDAGBuilder::FindMergedConditions reported by
Mikael Holmen.  Handle non-canonicalized xor not operation
correctly (was assuming operand 0 was always the non-constant operand)
and check that the negated condition is also in the same block as the
original and/or instruction (as is done for and/or operands already)
before proceeding with optimization.

Reviewers: bogner, MatzeB, qcolombet

Subscribers: mcrosier, uabelho, llvm-commits

Differential Revision: https://reviews.llvm.org/D29680

llvm-svn: 294605
2017-02-09 18:28:17 +00:00
Artur Pilipenko 4a64031954 [DAGCombiner] Support non-zero offset in load combine
Enable folding patterns which load the value from non-zero offset:

  i8 *a = ...
  i32 val = a[4] | (a[5] << 8) | (a[6] << 16) | (a[7] << 24)
=>
  i32 val = *((i32*)(a+4))

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D29394

llvm-svn: 294582
2017-02-09 12:06:01 +00:00
Artur Pilipenko 045ab08252 [DAGCombiner] NFC. Mark ByteProvider accessors as const
llvm-svn: 294494
2017-02-08 17:59:34 +00:00
Amaury Sechet 4b946916ac [DAGCombiner] Push truncate through adde when the carry isn't used.
Summary: As per title.

Reviewers: mkuper, spatel, bkramer, RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29528

llvm-svn: 294394
2017-02-08 00:32:36 +00:00
Reid Kleckner 0887d44a61 [SDAGISel] Simplify some SDAGISel code, NFC
Hoist entry block code for arguments and swift error values out of the
basic block instruction selection loop. Lowering arguments once up front
seems much more readable than doing it conditionally inside the loop. It
also makes it clear that argument lowering can update StaticAllocaMap
because no instructions have been selected yet.

Also use range-based for loops where possible.

llvm-svn: 294329
2017-02-07 18:42:53 +00:00
Sanjay Patel 8c99ca3df0 [TargetLowering] fix formatting and comments for ShrinkDemandedConstant; NFC
llvm-svn: 294325
2017-02-07 18:04:26 +00:00
Daniel Jasper 84b3cc394d Revert "[DAGCombiner] (add X, (adde Y, 0, Carry)) -> (adde X, Y, Carry)"
This reverts commit r294186.

On an internal test, this triggers an out-of-memory error on PPC,
presumably because there is another dagcombine that does the exact
opposite triggering and endless loop consuming more and more memory.

Chandler has started at creating a reduced test case and we'll attach it
as soon as possible.

llvm-svn: 294288
2017-02-07 08:57:50 +00:00
Artur Pilipenko d3464bf9ad [DAGCombiner] Support bswap as a part of load combine patterns
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D29397

llvm-svn: 294201
2017-02-06 17:48:08 +00:00
Amaury Sechet e674f5c758 Add ADDC to SelectionDAG::computeKnownBits and ComputeNumSignBits.
Summary: As per title.

Reviewers: bkramer, sunfish, lattner, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29521

llvm-svn: 294188
2017-02-06 14:59:06 +00:00
Amaury Sechet 8a3b32941d [DAGCombiner] Make DAGCombiner smarter about overflow
Summary: Leverage it to transform addc into add.

Reviewers: mkuper, spatel, RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29524

llvm-svn: 294187
2017-02-06 14:54:49 +00:00
Amaury Sechet 1d466f598e [DAGCombiner] (add X, (adde Y, 0, Carry)) -> (adde X, Y, Carry)
Summary: This is extracted from D29443 .

Reviewers: mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29564

llvm-svn: 294186
2017-02-06 14:28:39 +00:00
Simon Pilgrim bfd4495512 [X86][SSE] Combine shuffle nodes with multiple uses if all the users are being combined.
Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines.

We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree.

This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list.

Differential Revision: https://reviews.llvm.org/D29399

llvm-svn: 294183
2017-02-06 13:44:45 +00:00
Geoff Berry 76ca8c2b34 [SelectionDAG] In InstrEmitter, handle EXTRACT_SUBREG of a physical register.
Summary:
Without this change, the getVR() call would hit an assert since it was
being passed a physical register.

Update the AArch64/ldst-opt.ll test with a case that triggers this
behavior by adding a run with strict-align, which causes an unaligned
STR XZR instruction to be split into byte stores, creating an
EXTRACT_SUBREG of XZR that triggers the original problem.

Reviewers: bogner, qcolombet, MatzeB, atrick

Subscribers: aemerson, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D29495

llvm-svn: 294129
2017-02-05 18:28:14 +00:00
Amaury Sechet 143902c29f [DAGCombiner] Leverage add's commutativity
Summary: This avoid the need to duplicate all pattern and actually end up exposing some opportunity to optimize existing pattern that did not exists in both directions on an existing test case.

Reviewers: mkuper, spatel, bkramer, RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29541

llvm-svn: 294125
2017-02-05 14:22:20 +00:00
Craig Topper 42b83f8d6e [DAGCombiner] Canonicalize the order of a chain of INSERT_SUBVECTORs.
Based on similar code for INSERT_VECTOR_ELT.

llvm-svn: 294110
2017-02-04 23:26:39 +00:00
Craig Topper 04dce84ead [DAGCombiner] Use DAG.getAnyExtOrTrunc to simplify some code. NFC
llvm-svn: 294109
2017-02-04 23:26:37 +00:00
Craig Topper ceaf9c1633 [DAGCombiner] In visitINSERT_VECTOR_ELT, move check for BUILD_VECTOR being legal below code that just canonicalizes INSERT_VECTOR_ELT without creating BUILD_VECTORS.
llvm-svn: 294108
2017-02-04 23:26:34 +00:00
Amaury Sechet 6e2d8e49ec Formatting in DAGCombiner. NFC
llvm-svn: 294091
2017-02-04 13:01:53 +00:00
Eugene Zelenko 502d0bc28e [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce TargetInstrInfo.h dependencies.

llvm-svn: 294084
2017-02-04 02:00:53 +00:00
Ahmed Bougacha 9677cc6fb7 [TLI] Robustize SDAG LibFunc proto checking by merging it into TLI.
This re-applies commit r292189, reverted in r292191.

SelectionDAGBuilder recognizes libfuncs using some homegrown
parameter type-checking.

Use TLI instead, removing another heap of redundant code.

This isn't strictly NFC, as the SDAG code was too lax.
Concretely, this means changes are required to a few tests:
- calling a non-variadic function via a variadic prototype isn't OK;
  it just happens to work on x86_64 (but not on, e.g., aarch64).
- mempcpy has a size_t parameter;  the SDAG code accepts any integer
  type, which meant using i32 on x86_64 worked.
- a handful of SystemZ tests check the SDAG support for lax prototype
  checking: Ulrich agrees on removing them.

I don't think it's worth supporting any of these (IMO) invalid
testcases.  Instead, fix them to be more meaningful.

llvm-svn: 294028
2017-02-03 19:11:19 +00:00
Alexey Bataev a0d9f2582b [SelectionDAG] Fix for PR30775: Assertion `NodeToMatch->getOpcode() !=
ISD::DELETED_NODE && "NodeToMatch was removed partway through
selection"' failed.

NodeToMatch can be modified during matching, but code does not handle
this situation.

Differential Revision: https://reviews.llvm.org/D29292

llvm-svn: 294003
2017-02-03 12:28:40 +00:00
Nirav Dave 93f9d5ce04 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.

llvm-svn: 293915
2017-02-02 18:24:55 +00:00
Amaury Sechet f3e421d6e9 Use N0 instead of N->getOperand(0) in DagCombiner::visitAdd. NFC
llvm-svn: 293903
2017-02-02 16:07:44 +00:00
Nirav Dave 4442667fc5 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixing X86 inc/dec chain bug.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293893
2017-02-02 14:39:42 +00:00
Florian Hahn 7a5ec55fb3 [legalizetypes] Push fp16 -> fp32 extension node to worklist.
Summary:
This way, the type legalization machinery will take care of registering
the result of this node properly.

This patches fixes all failing fp16 test cases  with expensive checks.
(CodeGen/ARM/fp16-promote.ll, CodeGen/ARM/fp16.ll, CodeGen/X86/cvt16.ll
CodeGen/X86/soft-fp.ll) 


Reviewers: t.p.northover, baldrick, olista01, bogner, jmolloy, davidxl, ab, echristo, hfinkel

Reviewed By: hfinkel

Subscribers: mehdi_amini, hfinkel, davide, RKSimon, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28195

llvm-svn: 293765
2017-02-01 13:01:33 +00:00
Nicolai Haehnle 8813d5d221 [DAGCombine] require UnsafeFPMath for re-association of addition
Summary:
The affected transforms all implicitly use associativity of addition,
for which we usually require unsafe math to be enabled.

The "Aggressive" flag is only meant to convey information about the
performance of the fused ops relative to a fmul+fadd sequence.

Fixes Bug 31626.

Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD

Subscribers: jholewinski, nemanjai, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D28675

llvm-svn: 293635
2017-01-31 14:35:37 +00:00
Simon Pilgrim ffe2535cf6 Use SelectionDAG::getBuildVector helper function where possible. NFCI.
llvm-svn: 293532
2017-01-30 18:53:45 +00:00
Justin Bogner 8f520a73b2 SDAG: Update ChainNodesMatched during UpdateChains if a node is replaced
Previously, we would hit UB (or the ISD::DELETED_NODE assert) if we
happened to replace a node during UpdateChains, because it would be
left in the list we were iterating over. This nulls out the pointer
when that happens so that we can avoid the issue.

Fixes llvm.org/PR31710

llvm-svn: 293522
2017-01-30 18:29:46 +00:00
Simon Pilgrim 0a5ab5c4db Use SelectionDAG::getBuildVector/getSplatBuildVector helper functions where possible. NFCI.
llvm-svn: 293520
2017-01-30 18:20:42 +00:00
Matt Arsenault 32e6bfa20f DAG: Fold fneg into compare with constant into the constant
fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred)

InstCombine already does this.

llvm-svn: 293512
2017-01-30 17:57:28 +00:00
Matt Arsenault 0c687390fe DAG: Constant fold fp16_to_fp/fp16_to_fp
This fixes emitting conversions of constants on targets
without legal f16 that need to use these for legalization.

llvm-svn: 293499
2017-01-30 16:57:41 +00:00
Craig Topper 135da1faf5 [SelectionDAG] Make SDNode::getConstantOperandVal an inline method.
It's operation already exists manually in many places without using the method.

llvm-svn: 293421
2017-01-29 06:08:02 +00:00
Craig Topper 4753736abf [DAGCombiner] Use unsigned for a constant vector index instead of APInt.
The type system requires that the number of vector elements should fit in 32-bits so this should be safe.

llvm-svn: 293414
2017-01-29 04:38:21 +00:00
Craig Topper d15730902b [DAGCombiner] Remove unnecessary check on the size of the type of the index of EXTRACT_SUBVECTOR.
The type system already requires that the number of vector elements must fit in 32-bits so an index should as well. Even if the type of the index were larger all we care about is that the constant index can fit in 64-bits so that we can call getZExtValue.

llvm-svn: 293413
2017-01-29 04:38:19 +00:00
Craig Topper 24cdbe8fa6 [DAGCombiner] Make sure index of EXTRACT_SUBVECTOR is a constant before trying to use getConstantOperandVal.
llvm-svn: 293412
2017-01-29 04:38:16 +00:00
Matthias Braun 8c209aa877 Cleanup dump() functions.
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html

For reference:
- Public headers should just declare the dump() method but not use
  LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
  #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
  LLVM_DUMP_METHOD void MyClass::dump() {
    // print stuff to dbgs()...
  }
  #endif

llvm-svn: 293359
2017-01-28 02:02:38 +00:00
Jonas Paulsson bb0ed3e732 [DAGTypeLegalizer] Handle SIGN/ZERO_EXTEND in WidenVecRes_Convert().
In case of a SIGN/ZERO_EXTEND of an incomplete vector type (using only a
partial number of available vector elements), WidenVecRes_Convert() used to
resort to scalarization.

This patch adds a handling of the (common) case where an input vector can be
found of same width as the widened result vector, by converting the node to
SIGN/ZERO_EXTEND_VECTOR_INREG.

Review: Eli Friedman
llvm-svn: 293268
2017-01-27 07:46:26 +00:00
Andrew Kaylor a0a1164ce4 Add intrinsics for constrained floating point operations
This commit introduces a set of experimental intrinsics intended to prevent
optimizations that make assumptions about the rounding mode and floating point
exception behavior.  These intrinsics will later be extended to specify
flush-to-zero behavior.  More work is also required to model instruction
dependencies in machine code and to generate these instructions from clang
(when required by pragmas and/or command line options that are not currently
supported).

Differential Revision: https://reviews.llvm.org/D27028

llvm-svn: 293226
2017-01-26 23:27:59 +00:00
Nirav Dave d32a421f75 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293184 which is failing in LTO builds

llvm-svn: 293188
2017-01-26 16:46:13 +00:00
Nirav Dave de6516c466 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
* Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293184
2017-01-26 16:02:24 +00:00
Craig Topper 001aad7da7 [DAGCombiner] Fold extract_subvector of undef to undef. Fold away inserting undef subvectors.
llvm-svn: 293152
2017-01-26 05:38:46 +00:00
Tim Northover 470f070b7d SDag: fix how initial loads are formed when splitting vector ops.
Later code expects the vector loads produced to be directly
concatenable, which means we shouldn't pad anything except the last load
produced with UNDEF.

llvm-svn: 293088
2017-01-25 20:58:26 +00:00
Krzysztof Parzyszek ee9aa3ffee Add iterator_range<regclass_iterator> to {Target,MC}RegisterInfo, NFC
llvm-svn: 293077
2017-01-25 19:29:04 +00:00
Artur Pilipenko bc93452420 Fix buildbot failures introduced by 293036
Fix unused variable, specify types explicitly to make VC compiler happy.

llvm-svn: 293039
2017-01-25 09:10:07 +00:00
Artur Pilipenko 41c0005aa3 [DAGCombiner] Match load by bytes idiom and fold it into a single load. Attempt #2.
The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm.
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html

This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used.

From the original commit:

Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.

Assuming little endian target:
  i8 *a = ...
  i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
  i32 val = *((i32)a)

  i8 *a = ...
  i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
  i32 val = BSWAP(*((i32)a))

This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.

Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
  i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)

Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.

The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.

Reviewed By: RKSimon, filcab, chandlerc 

Differential Revision: https://reviews.llvm.org/D27861

llvm-svn: 293036
2017-01-25 08:53:31 +00:00
Matt Arsenault 732a531506 DAG: Recognize no-signed-zeros-fp-math attribute
clang already emits this with -cl-no-signed-zeros, but codegen
doesn't do anything with it. Treat it like the other fast math
attributes, and change one place to use it.

llvm-svn: 293024
2017-01-25 06:08:42 +00:00
Matt Arsenault 8a27aee6ae DAGCombiner: Allow negating ConstantFP after legalize
llvm-svn: 293019
2017-01-25 04:54:34 +00:00
Geoff Berry 92a286ae5a [SelectionDAG] Handle inverted conditions when splitting into multiple branches.
Summary:
When conditional branches with complex conditions are split into
multiple branches in SelectionDAGBuilder::FindMergedConditions, also
handle inverted conditions.  These may sometimes appear without having
been optimized by InstCombine when CodeGenPrepare decides to sink and
duplicate cmp instructions, causing them to have only one use.  This
problem can be increased by e.g. GVNHoist hiding more cmps from
InstCombine by combining equivalent cmps from different blocks.

For example codegen X & !(Y | Z) as:
    jmp_if_X TmpBB
    jmp FBB
  TmpBB:
    jmp_if_notY Tmp2BB
    jmp FBB
  Tmp2BB:
    jmp_if_notZ TBB
    jmp FBB

Reviewers: bogner, MatzeB, qcolombet

Subscribers: llvm-commits, hiraditya, mcrosier, sebpop

Differential Revision: https://reviews.llvm.org/D28380

llvm-svn: 292944
2017-01-24 16:36:07 +00:00
Craig Topper ff272ad4f3 [SelectionDAG] Teach getNode to simplify a couple easy cases of EXTRACT_SUBVECTOR
Summary:
This teaches getNode to simplify extracting from Undef. This is similar to what is done for EXTRACT_VECTOR_ELT. It also adds support for extracting from CONCAT_VECTOR when we can reuse one of the inputs to the concat. These seem like simple non-target specific optimizations.

For X86 we currently handle undef in extractSubvector, but not all EXTRACT_SUBVECTOR creations go through there.

Ultimately, my motivation here is to simplify extractSubvector and remove custom lowering for EXTRACT_SUBVECTOR since we don't do anything but handle undef and BUILD_VECTOR optimizations, but those should be DAG combines.

Reviewers: RKSimon, delena

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29000

llvm-svn: 292876
2017-01-24 02:36:59 +00:00
David L. Jones d21529fa0d [Analysis] Add LibFunc_ prefix to enums in TargetLibraryInfo. (NFC)
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).

Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.

The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)

There are additional changes required in clang.

Reviewers: rsmith

Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28476

llvm-svn: 292848
2017-01-23 23:16:46 +00:00
Matt Arsenault 4e305c6c1e DAG: Don't fold vector extract into load if target doesn't want to
Fixes turning a 32-bit scalar load into an extending vector load
for AMDGPU when dynamically indexing a vector.

llvm-svn: 292842
2017-01-23 22:48:53 +00:00
Matt Arsenault 7030661364 DAG: Allow legalization of fcanonicalize vector types
llvm-svn: 292814
2017-01-23 18:52:26 +00:00
Simon Pilgrim fb32eea1b4 [SelectionDAG] Improve knownbits handling of UMIN/UMAX (PR31293)
This patch improves the knownbits logic for unsigned integer min/max opcodes.

For UMIN we know that the result will have the maximum of the inputs' known leading zero bits in the result, similarly for UMAX the maximum of the inputs' leading one bits.

This is particularly useful for simplifying clamping patterns,. e.g. as SSE doesn't have a uitofp instruction we want to use sitofp instead where possible and for that we need to confirm that the top bit is not set.

Differential Revision: https://reviews.llvm.org/D28853

llvm-svn: 292528
2017-01-19 22:41:22 +00:00
Mikael Holmen 2074e7497b [DAG] Don't increase SDNodeOrder for dbg.value/declare.
Summary:
The SDNodeOrder is saved in the IROrder field in the SDNode, and this
field may affects scheduling. Thus, letting dbg.value/declare increase
the order numbers may in turn affect scheduling.

Because of this change we also need to update the code deciding when
dbg values should be output, in ScheduleDAGSDNodes.cpp/ProcessSDDbgValues.

Dbg values now have the same order as the SDNode they are connected to,
not the following orders.

Test cases provided by Florian Hahn.

Reviewers: bogner, aprantl, sunfish, atrick

Reviewed By: atrick

Subscribers: fhahn, probinson, andreadb, llvm-commits, MatzeB

Differential Revision: https://reviews.llvm.org/D25318

llvm-svn: 292485
2017-01-19 13:55:55 +00:00
Matt Arsenault f411071d63 DAG: Consider nnan in isKnownNeverNaN
llvm-svn: 292328
2017-01-18 02:10:08 +00:00
Ahmed Bougacha 9e5a085cf1 Revert "[TLI] Robustize SDAG proto checking by merging it into TLI."
This reverts commit r292189, as it causes issues on SystemZ bots.

llvm-svn: 292191
2017-01-17 03:31:00 +00:00
Ahmed Bougacha c018efd680 [TLI] Robustize SDAG proto checking by merging it into TLI.
SelectionDAGBuilder recognizes libfuncs using some homegrown
parameter type-checking.

Use TLI instead, removing another heap of redundant code.

This isn't strictly NFC, as the SDAG code was too lax.
Concretely, this means changes are required to two tests:
- calling a non-variadic function via a variadic prototype isn't OK;
  it just happens to work on x86_64 (but not on, e.g., aarch64).
- mempcpy has a size_t parameter;  the SDAG code accepts any integer
  type, which meant using i32 on x86_64 worked.

I don't think it's worth supporting either of these (IMO) broken
testcases.  Instead, fix them to be more correct.

llvm-svn: 292189
2017-01-17 03:10:06 +00:00
Simon Pilgrim 3e91519a1c [SelectionDAG] Add knownbits support for BITREVERSE
llvm-svn: 292130
2017-01-16 14:49:26 +00:00
Simon Pilgrim db73dbcc7c [SelectionDAG] Add support for BITREVERSE constant folding
We were relying on constant folding of the legalized instructions to do what constant folding we had previously

llvm-svn: 292114
2017-01-16 13:39:00 +00:00
Malcolm Parsons 17d266bc96 Remove unused lambda captures. NFC
llvm-svn: 291916
2017-01-13 17:12:16 +00:00
Benjamin Kramer 061f4a5fe6 Apply clang-tidy's performance-unnecessary-value-param to LLVM.
With some minor manual fixes for using function_ref instead of
std::function. No functional change intended.

llvm-svn: 291904
2017-01-13 14:39:03 +00:00
Diana Picus 116bbab4e4 [CodeGen] Rename MachineInstrBuilder::addOperand. NFC
Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.

See https://reviews.llvm.org/D28057 for the whole discussion.

Differential Revision: https://reviews.llvm.org/D28556

llvm-svn: 291891
2017-01-13 09:58:52 +00:00
Craig Topper 7af39837a9 Revert r291645 "[DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent."
Some test appears to be hanging on the build bots.

llvm-svn: 291650
2017-01-11 04:59:25 +00:00
Craig Topper 577d258569 [DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent.
llvm-svn: 291645
2017-01-11 04:02:23 +00:00
Matt Arsenault e482403e1c DAGCombiner: Add hasOneUse checks to fadd/fma combine
Even with aggressive fusion enabled, this requires duplicating
the fmul, or increases an fadd to another fma which is not an
improvement.

llvm-svn: 291642
2017-01-11 02:02:12 +00:00
Eugene Zelenko c4ad1ce068 [Target] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 291641
2017-01-11 01:45:03 +00:00
Matt Arsenault def496c04b Remove unused CONVERT_RNDSAT intrinsics
llvm-svn: 291607
2017-01-10 22:38:02 +00:00
Matt Arsenault 0b382a7cb8 DAG: Avoid OOB when legalizing vector indexing
If a vector index is out of bounds, the result is supposed to be
undefined but is not undefined behavior. Change the legalization
for indexing the vector on the stack so that an out of bounds
index does not create an out of bounds memory access.

llvm-svn: 291604
2017-01-10 22:02:30 +00:00
Simon Dardis 548a53f5ee [mips] Fix Mips MSA instrinsics
The usage of some MIPS MSA instrinsics that took immediates could crash LLVM
during lowering. This patch addresses that behaviour. Crucially this patch
also makes the use of intrinsics with out of range immediates as producing an
internal error.

The ld,st instrinsics would trigger an assertion failure for MIPS64 as their
lowering would attempt to add an i32 offset to a i64 pointer.

Reviewers: vkalintiris, slthakur

Differential Revision: https://reviews.llvm.org/D25438

llvm-svn: 291571
2017-01-10 16:40:57 +00:00
Craig Topper 588c734b0f [DAGCombiner] Merge together duplicate checks for folding fold (select C, 1, X) -> (or C, X) and folding (select C, X, 0) -> (and C, X). Also be consistent about checking that both the condition and the result type are i1. NFC
I guess previously we just assumed if the result type was i1, then the condition type must also be i1?

llvm-svn: 291548
2017-01-10 07:42:57 +00:00
Craig Topper d915d6ba57 [DAGCombiner] Remove code for optimizing select (xor Cond, 0), X, Y -> select Cond, X, Y. Just let combine on the xor itself take care of it.
llvm-svn: 291534
2017-01-10 04:12:19 +00:00
Bjorn Pettersson b14afd452d [SelectionDAG] Fix in legalization of UMAX/SMAX/UMIN/SMIN. Solves PR31486.
Summary:
Originally

 i64 = umax t8, Constant:i64<4>

was expanded into

 i32,i32 = umax Constant:i32<0>, Constant:i32<0>
 i32,i32 = umax t7, Constant:i32<4>

Now instead the two produced umax:es return i32 instead of i32, i32.

Thanks to Jan Vesely for help with the test case.

Patch by mikael.holmen at ericsson.com

Reviewers: bogner, jvesely, tstellarAMD, arsenm

Subscribers: test, wdng, RKSimon, arsenm, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D28135

llvm-svn: 291441
2017-01-09 12:03:50 +00:00
David Majnemer 9e04befb09 [SelectionDAG] Rework lowerRangeToAssertZExt
Utilize ConstantRange to make it easier to interpret range metadata.

llvm-svn: 291211
2017-01-06 02:43:28 +00:00
David Majnemer eaba06cffa [SelectionDAG] Correctly transform range metadata to AssertZExt
We used the logBase2 of the high instead of the ceilLogBase2 resulting
in the wrong result for certain values.  For example, it resulted in an
i1 AssertZExt when the exclusive portion of the range was 3.

llvm-svn: 291196
2017-01-06 00:11:46 +00:00
Tim Shen 5480eb8445 [Legalizer] Fix fp-to-uint to fp-tosint promotion assertion.
Summary:
When promoting fp-to-uint16 to fp-to-sint32, the result is actually zero
extended. For example, given double 65534.0, without legalization:

  fp-to-uint16: 65534.0 -> 0xfffe

With the legalization:

  fp-to-sint32: 65534.0 -> 0x0000fffe

Without this patch, legalization wrongly emits a signed extend assertion,
which is consumed by later icmp instruction, and cause miscompile.

Note that the floating point value must be in [0, 65535), otherwise the
behavior is undefined.

This patch reverts r279223 behavior and adds more tests and
documentations.

In PR29041's context, James Molloy mentioned that:

  We don't need to mask because conversion from float->uint8_t is
  undefined if the integer part of the float value is not representable in
  uint8_t. Therefore we can assume this doesn't happen!

which is totally true and good, because fptoui is documented clearly to
have undefined behavior when overflow/underflow happens. We should take
the advantage of this behavior so that we can save unnecessary mask
instructions.

Reviewers: jmolloy, nadav, echristo, kbarton

Subscribers: mehdi_amini, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28284

llvm-svn: 291015
2017-01-04 22:11:42 +00:00
Evgeny Stupachenko c88697dc16 The patch fixes (base, index, offset) match.
Summary:
Instead of matching:
  (a + i) + 1 -> (a + i, undef, 1)
Now it matches:
  (a + i) + 1 -> (a, i, 1)

Reviewers: rengolin

Differential Revision: http://reviews.llvm.org/D26367

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 291012
2017-01-04 21:43:39 +00:00
Florian Hahn f872d230ad [selectiondag] Check PromotedFloats map during expansive checks.
Summary:
`PromotedFloats` needs to be checked in 
`DAGTypeLegalizer::PerformExpensiveChecks`. This patch fixes a few type
legalization failures with expansive checks for ARM fp16 tests.

Reviewers: baldrick, bogner, arsenm

Subscribers: arsenm, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28187

llvm-svn: 290796
2017-01-01 13:58:27 +00:00
Reid Kleckner 0e7c84c682 Simplify FunctionLoweringInfo.cpp with range for loops
I'm preparing to add some pattern matching code here, so simplify the
code before I do. NFC

llvm-svn: 290731
2016-12-30 00:21:38 +00:00
Igor Laevsky 4f31e52f94 Introduce element-wise atomic memcpy intrinsic
This change adds a new intrinsic which is intended to provide memcpy functionality
with additional atomicity guarantees. Please refer to the review thread
or language reference for further details.

Differential Revision: https://reviews.llvm.org/D27133

llvm-svn: 290708
2016-12-29 14:31:07 +00:00
Simon Pilgrim 0d66d29678 [SelectionDAG] Early out from computeKnownBits when we know we will have no common bits.
Avoid extra (recursive) calls to computeKnownBits if we already know that there are no common known bits.

llvm-svn: 290490
2016-12-24 12:59:35 +00:00
Zijiao Ma bf6007bd1b Make the canonicalisation on shifts benifit to more case.
1.Fix pessimized case in FIXME.
2.Add tests for it.
3.The canonicalisation on shifts results in different sequence for
  tests of machine-licm.Correct some check lines.

Differential Revision: https://reviews.llvm.org/D27916

llvm-svn: 290410
2016-12-23 02:56:07 +00:00
Wei Mi f3f01aba48 Change the interface of TLI.isMultiStoresCheaperThanBitsMerge.
This is for splitMergedValStore in DAG Combine to share the target query interface
with similar logic in CodeGenPrepare.

Differential Revision: https://reviews.llvm.org/D24707

llvm-svn: 290363
2016-12-22 19:38:22 +00:00
Matt Arsenault 485dacd90c DAG: Add helper for testing constant values
There are helpers for testing for constant or constant build_vector,
and for splat ConstantFP vectors, but not for a constantfp or
non-splat ConstantFP vector.

llvm-svn: 290317
2016-12-22 04:39:45 +00:00
Oren Ben Simhon 3b95157090 [X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible.
vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use. 
The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above.

The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it.
This aubmit also includes additional lit tests to cover better HVAs corner cases.

Differential Revision: https://reviews.llvm.org/D27392

llvm-svn: 290240
2016-12-21 08:31:45 +00:00
Joel Jones 8980ba643e Fix name typo in SelectonDAG
llvm-svn: 289969
2016-12-16 18:22:54 +00:00
Chandler Carruth ba5de63bc3 Add extra headers that got deleted by my revert in r289916 but for which
new usage had already grown in the file.

llvm-svn: 289917
2016-12-16 04:08:31 +00:00
Chandler Carruth 4154062b69 Revert patch series introducing the DAG combine to match a load-by-bytes
idiom.

r289538: Match load by bytes idiom and fold it into a single load
r289540: Fix a buildbot failure introduced by r289538
r289545: Use more detailed assertion messages in the code ...
r289646: Add a couple of assertions to the load combine code ...

This DAG combine has a bad crash in it that is quite hard to trigger
sadly -- it relies on sneaking code with UB through the SDAG build and
into this particular combine. I've responded to the original commit with
a test case that reproduces it.

However, the code also has other problems that will require substantial
changes to address and so I'm going ahead and reverting it for now. This
should unblock us and perhaps others that are hitting the crash in the
wild and will let a fresh patch with updated approach come in cleanly
afterward.

Sorry for any trouble or disruption!

llvm-svn: 289916
2016-12-16 04:05:22 +00:00
Eli Friedman 379294676d Don't combine splats with other shuffles.
We sometimes end up creating shuffles which are worse than the obvious
translation of the IR.

Fixes https://llvm.org/bugs/show_bug.cgi?id=31301 .

Differential Revision: https://reviews.llvm.org/D27793

llvm-svn: 289882
2016-12-15 22:41:40 +00:00
Eli Friedman 34505083c6 Don't combine a shuffle of two BUILD_VECTORs with duplicate elements.
Targets can't handle this case well in general; we often transform
a shuffle of two cheap BUILD_VECTORs to element-by-element insertion,
which is very inefficient.

Fixes https://llvm.org/bugs/show_bug.cgi?id=31364 . Partially
fixes https://llvm.org/bugs/show_bug.cgi?id=31301.

Differential Revision: https://reviews.llvm.org/D27787

llvm-svn: 289874
2016-12-15 21:36:59 +00:00
Sanjay Patel afee21a5b2 [DAG] allow more select folding for targets that have 'and not' (PR31175)
The original motivation for this patch comes from wanting to canonicalize 
more IR to selects and also canonicalizing min/max.

If we're going to do that, we need more backend fixups to undo select codegen 
when simpler ops will do. I chose AArch64 for the tests because that shows the
difference in the simplest way. This should fix:
https://llvm.org/bugs/show_bug.cgi?id=31175

Differential Revision: https://reviews.llvm.org/D27489

llvm-svn: 289738
2016-12-14 22:59:14 +00:00
Nirav Dave f5bf03c7ef Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
Reverting due to ARM MCJIT and MIPS LLD error.

This reverts commit r289659.

llvm-svn: 289667
2016-12-14 16:43:44 +00:00
Nirav Dave 8527ab0ad2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing after removing load-store factoring through
token factors in favor of improved token factor operand pruning

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289659
2016-12-14 15:44:26 +00:00
Simon Pilgrim 05ab8ffc7e [DAGCombiner] Try to use SelectionDAG::isKnownToBeAPowerOfTwo instead of just APInt::isPowerOf2
Generalize sdiv/udiv/srem/urem combines using APInt::isPowerOf2, which only works for const/splat-const values, to call SelectionDAG::isKnownToBeAPowerOfTwo instead which recognises many more cases.

Added a DAGCombiner::BuildLogBase2 helper since PowerOf2 combines often involve taking the log2 of such a value.

Differential Revision: https://reviews.llvm.org/D27714

llvm-svn: 289654
2016-12-14 15:08:13 +00:00
Stephan Bergmann 17c7f70362 Replace APFloatBase static fltSemantics data members with getter functions
At least the plugin used by the LibreOffice build
(<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly
uses those members (through inline functions in LLVM/Clang include files in turn
using them), but they are not exported by utils/extract_symbols.py on Windows,
and accessing data across DLL/EXE boundaries on Windows is generally
problematic.

Differential Revision: https://reviews.llvm.org/D26671

llvm-svn: 289647
2016-12-14 11:57:17 +00:00
Artur Pilipenko f3ee444010 Add a couple of assertions to the load combine code introduced by r289538
llvm-svn: 289646
2016-12-14 11:55:47 +00:00
Artur Pilipenko 469fcd2afd Use more detailed assertion messages in the code introduced by r289538
llvm-svn: 289545
2016-12-13 16:26:15 +00:00
Artur Pilipenko 79d1255e26 Fix a buildbot failure introduced by r289538
Build failed because of unused variable in product mode.

llvm-svn: 289540
2016-12-13 14:55:31 +00:00
Artur Pilipenko c93cc5955f [DAGCombiner] Match load by bytes idiom and fold it into a single load
Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.

Assuming little endian target:
  i8 *a = ...
  i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
  i32 val = *((i32)a)

  i8 *a = ...
  i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
  i32 val = BSWAP(*((i32)a))

This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.

Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
  i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)

Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.

The general scheme is to match OR expressions by recursively calculating the origin of individual bits which constitute the resulting OR value. If all the OR bits come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.

Reviewed By: hfinkel, RKSimon, filcab

Differential Revision: https://reviews.llvm.org/D26149

llvm-svn: 289538
2016-12-13 14:21:14 +00:00
Artur Pilipenko 01e86444a0 Move BaseIndexOffset in DAGCombiner.cpp so it will be available for the upcoming user
llvm-svn: 289537
2016-12-13 14:16:02 +00:00
Simon Pilgrim 9dc67c0101 [SelectionDAG] computeKnownBits - simplified knownbits sign extension. NFCI.
We don't need to extract+test the sign bit of the known ones/zeros, we can use sext which will handle all of this.

llvm-svn: 289534
2016-12-13 13:36:27 +00:00
Philip Reames 51387a8c28 [Statepoints] Reuse stack slots more than once within a basic block
The stack slot reuse code had a really amusing bug. We ended up only reusing a stack slot exact once (initial use + reuse) within a basic block. If we had a third statepoint to process, we ended up allocating a new set of stack slots. If we crossed a basic block boundary, the set got cleared. As a result, code which is invoke heavy doesn't see the problem, but multiple calls within a basic block does. Net result: as we optimize invokes into calls, lowering gets worse.

The root error here is that the bitmap uses by the custom allocator wasn't kept in sync. The result was that we ended up resizing the bitmap on the next statepoint (to handle the cross block case), reset the bit once, but then never reset it again.

Differential Revision: https://reviews.llvm.org/D25243

llvm-svn: 289509
2016-12-13 01:21:15 +00:00
Simon Pilgrim 040a36c176 [SelectionDAG] Add support for EXTRACT_SUBVECTOR to ComputeNumSignBits
Pre-commit as discussed on D27657

llvm-svn: 289425
2016-12-12 10:29:43 +00:00
Simon Pilgrim 54945a12ec [SelectionDAG] Add ability for computeKnownBits to peek through bitcasts from 'large element' scalar/vector to 'small element' vector.
Extension to D27129 which already supported bitcasts from 'small element' vector to 'large element' scalar/vector types.

llvm-svn: 289329
2016-12-10 17:00:00 +00:00
Simon Pilgrim 017b7a71d8 [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED)
Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type.

llvm-svn: 289232
2016-12-09 17:53:11 +00:00
Matt Arsenault 38d8ed2b75 AMDGPU: Fix i128 mul
llvm-svn: 289231
2016-12-09 17:49:14 +00:00
Nirav Dave bedb5d906c Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r289221 which appears to be triggering an assertion

llvm-svn: 289226
2016-12-09 17:18:24 +00:00
Nirav Dave fd51ff4fd8 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing overly aggressive load-store forwarding optimization.

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289221
2016-12-09 16:15:12 +00:00
Simon Pilgrim b9eb99f570 Use SelectionDAG.getSplatBuildVector helper. NFCI.
llvm-svn: 289220
2016-12-09 16:01:50 +00:00
Simon Pilgrim bf9c0e7434 [SelectionDAG] Use SelectionDAG.getBuildVector helper. NFCI.
Makes interception of BUILD_VECTOR creation easier for debugging.

llvm-svn: 289218
2016-12-09 15:23:41 +00:00
Simon Pilgrim 15f1f828b5 [SelectionDAG] Add additional checks to CONCAT_VECTORS creation
Part of the work for PR31323 - add extra asserts checking that the input vectors are of consistent type and result in the correct number of vector elements.

llvm-svn: 289214
2016-12-09 14:27:52 +00:00
Simon Pilgrim e4050a2961 [SelectionDAG] Add partial BITCAST support to computeKnownBits
Adds support for bitcasting a little endian 'small element' vector to 'large element' scalar/vector (e.g. v16i8 to v4i32 or v2i32 to i64), which is required for PR30845. We extract the knownbits for each 'small element' part and concatenate the results together.

We can add support for big endian and 'large element' scalar/vector to 'small element' vector bitcasting once we have test cases for them.

Differential Revision: https://reviews.llvm.org/D27129

llvm-svn: 289200
2016-12-09 10:13:45 +00:00
Daniel Jasper f51e05ffbc Revert "[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes"
This reverts commit r288916 as it is currently causing a crasher in
Halide. Reproducer on llvm.org/PR31323. While it might be that halide is
generating invalid IR, llc shouldn't crash.

llvm-svn: 289194
2016-12-09 09:04:51 +00:00
Nicolai Haehnle f08dc90253 [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI
Summary:
Most targets set the action for these nodes to Expand even though there
isn't actually any code for them in ExpandNode. Instead, targets simply
relied on the fact that no code generates these nodes as long as the
nodes aren't legal or custom.

However, generating these nodes can be useful e.g. for divide-by-constant
in wider integer types.

Expand of [US]MUL_LOHI will use MULH[US] when legal or custom, and
a sequence of half-width multiplications otherwise. Promote uses a wider
multiply.

This patch intends to not change the generated code, but indirect effects
are possible since expansions/promotions that were previously done in
DAGCombine may now be done in LegalizeDAG.

See D24822 for a change that actually uses the new expansion.

Reviewers: spatel, bkramer, venkatra, efriedma, hfinkel, ast, nadav, tstellarAMD

Subscribers: arsenm, jyknight, nemanjai, wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D24956

llvm-svn: 289050
2016-12-08 14:08:14 +00:00
Simon Pilgrim ba05d41095 [SelectionDAG] Add knownbits support for vector demandedelts in SMAX/SMIN/UMAX/UMIN opcodes
llvm-svn: 288926
2016-12-07 17:54:00 +00:00
Simon Pilgrim 967325b373 [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes
llvm-svn: 288916
2016-12-07 16:28:21 +00:00
Simon Pilgrim ff79f31328 [SelectionDAG] Removed old knownbits TODO comment. NFCI.
EXTRACT_VECTOR_ELT does support demanded elts if the element index is known and in range.

llvm-svn: 288913
2016-12-07 15:31:12 +00:00
Eli Friedman 0a76e3241f [CodeGen] Fix result type for SMULO/UMULO legalization
On some platforms (like MSP430) the second element of the result
structure for SMULO/UMULO may have a shorter type than the one
returned by SetCC. We need to truncate it to the right type, or
else some incorrect code may be generated later on.

This fixes issue https://github.com/rust-lang/rust/issues/37829

Patch by Vadzim Dambrouski!

Differential Revision: https://reviews.llvm.org/D27154

llvm-svn: 288857
2016-12-06 22:49:36 +00:00
Simon Pilgrim dd6ca639d5 [DAGCombine] Add (sext_in_reg (zext x)) -> (sext x) combine
Handle the case where a sign extension has ended up being split into separate stages (typically to get around vector legal ops) and a zext + sext_in_reg gets inserted.

Differential Revision: https://reviews.llvm.org/D27461

llvm-svn: 288842
2016-12-06 19:09:37 +00:00
Simon Pilgrim 1577b39f51 [SelectionDAG] We can ignore knownbits from an undef shuffle vector index if we don't actually demand that element
llvm-svn: 288839
2016-12-06 18:58:25 +00:00
Simon Pilgrim 29c17f3f58 Avoid repeated calls to Op.getOpcode(). NFCI.
llvm-svn: 288814
2016-12-06 14:50:09 +00:00
Sanjay Patel 1f158d6955 [TargetLowering] add special-case for demanded bits analysis of 'not'
We treat bitwise 'not' as a special operation and try not to reduce its all-ones mask. 
Presumably, this is because a 'not' may be cheaper than a generic 'xor' or it may get
folded into another logic op if the target has those. However, if we can remove a logic
instruction by changing the xor's constant mask value, that should always be a win.

Note that the IR version of SimplifyDemandedBits() does not treat 'not' as a special-case
currently (although that's marked with a FIXME). So if you run this IR through -instcombine,
you should get the same end result. I'm hoping to add a different backend transform that 
will expose this problem though, so I need to solve this first.

Differential Revision: https://reviews.llvm.org/D27356

llvm-svn: 288676
2016-12-05 15:58:21 +00:00
Matt Arsenault 92fede361f DAG: Fold out out of bounds insert_vector_elt
getNode already prevents formation of out of bounds constant
extract_vector_elts. Do the same for insert_vector_elt.

llvm-svn: 288603
2016-12-03 23:03:26 +00:00
Nicolai Haehnle 33ca182c91 [DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default
Summary:
When X = 0 and Y = inf, the original code produces inf, but the transformed
code produces nan. So this transform (and its relatives) should only be
used when the no-infs-fp-math flag is explicitly enabled.

Also disable the transform using fmad (intermediate rounding) when unsafe-math
is not enabled, since it can reduce the precision of the result; consider this
example with binary floating point numbers with two bits of mantissa:

  x = 1.01
  y = 111

  x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step)

  x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero)

The example relies on rounding towards zero at least in the second step.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578

Reviewers: RKSimon, tstellarAMD, spatel, arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26602

llvm-svn: 288506
2016-12-02 16:06:18 +00:00
Peter Collingbourne ab85225be4 IR: Change the gep_type_iterator API to avoid always exposing the "current" type.
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.

Differential Revision: https://reviews.llvm.org/D26594

llvm-svn: 288458
2016-12-02 02:24:42 +00:00
Justin Bogner 35c5e58f8c SDAG: Avoid a large, usually empty SmallVector in a recursive function
This SmallVector is using up 128 bytes on the stack every time despite
almost always being empty[1], and since this function can recurse quite
deeply that adds up to a lot of overhead. We've seen this run afoul of
ulimits in some cases with ASAN on.

Replacing the SmallVector with a std::vector trades an occasional heap
allocation for vastly less stack usage.

[1]: I gathered some stats on an internal test suite and the vector
was non-empty in only 45,000 of 10,000,000 calls to this function.

llvm-svn: 288441
2016-12-02 00:11:01 +00:00
Matthias Braun d0ee66c2e9 Move most EH from MachineModuleInfo to MachineFunction
Recommitting r288293 with some extra fixes for GlobalISel code.

Most of the exception handling members in MachineModuleInfo is actually
per function data (talks about the "current function") so it is better
to keep it at the function instead of the module.

This is a necessary step to have machine module passes work properly.

Also:
- Rename TidyLandingPads() to tidyLandingPads()
- Use doxygen member groups instead of "//===- EH ---"... so it is clear
  where a group ends.
- I had to add an ugly const_cast at two places in the AsmPrinter
  because the available MachineFunction pointers are const, but the code
  wants to call tidyLandingPads() in between
  (markFunctionEnd()/endFunction()).

Differential Revision: https://reviews.llvm.org/D27227

llvm-svn: 288405
2016-12-01 19:32:15 +00:00
Nicolai Haehnle da7e4017c6 [SelectionDAG] Rename and clarify visitFMULForFMADCombine (NFC)
Summary: Suggested by @spatel in D26602.

Reviewers: spatel, hfinkel

Subscribers: spatel, llvm-commits

Differential Revision: https://reviews.llvm.org/D27260

llvm-svn: 288336
2016-12-01 14:04:13 +00:00
Eric Christopher e70b7c3dfb Temporarily Revert "Move most EH from MachineModuleInfo to MachineFunction"
This apprears to have broken the global isel bot:
http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-globalisel_build/5174/console

This reverts commit r288293.

llvm-svn: 288322
2016-12-01 07:50:12 +00:00
Matthias Braun ed14cb0604 Move most EH from MachineModuleInfo to MachineFunction
Most of the exception handling members in MachineModuleInfo is actually
per function data (talks about the "current function") so it is better
to keep it at the function instead of the module.

This is a necessary step to have machine module passes work properly.

Also:
- Rename TidyLandingPads() to tidyLandingPads()
- Use doxygen member groups instead of "//===- EH ---"... so it is clear
  where a group ends.
- I had to add an ugly const_cast at two places in the AsmPrinter
  because the available MachineFunction pointers are const, but the code
  wants to call tidyLandingPads() in between
  (markFunctionEnd()/endFunction()).

Differential Revision: https://reviews.llvm.org/D27227

llvm-svn: 288293
2016-11-30 23:49:01 +00:00
Matthias Braun ef331eff5a Move VariableDbgInfo from MachineModuleInfo to MachineFunction
VariableDbgInfo is per function data, so it makes sense to have it with
the function instead of the module.

This is a necessary step to have machine module passes work properly.

Differential Revision: https://reviews.llvm.org/D27186

llvm-svn: 288292
2016-11-30 23:48:50 +00:00
Nicolai Haehnle 73a9a27b5a [SelectionDAG] Refactor TargetLowering::expandMUL (NFC)
Summary: Further preparation for the expansion of MUL_LOHI added in D24956.

Reviewers: efriedma, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27064

llvm-svn: 288248
2016-11-30 16:26:33 +00:00
Warren Ristow d9777c1dbb Test commit. Comment changes. NFC.
llvm-svn: 288100
2016-11-29 02:37:13 +00:00
Sanjay Patel 2bd32b05fb [DAG] clean up foldSelectCCToShiftAnd(); NFCI
llvm-svn: 288088
2016-11-28 23:05:55 +00:00
Sanjay Patel 1cf9aff659 [DAG] add helper function for selectcc --> and+shift transforms; NFC
llvm-svn: 288073
2016-11-28 21:47:41 +00:00
Nirav Dave a413361798 Revert "[DAG] Improve loads-from-store forwarding to handle TokenFactor"
This reverts commit r287773 which caused issues with ppc64le builds.

llvm-svn: 288035
2016-11-28 14:30:29 +00:00
Simon Pilgrim c5fb167df0 Use SDValue helpers instead of explicitly going via SDValue::getNode(). NFCI
llvm-svn: 287941
2016-11-25 17:25:21 +00:00
Craig Topper 8c4cdf06db [DAGCombine] Teach DAG combine that if both inputs of a vselect are the same, then the condition doesn't matter and the vselect can be removed.
Selects with scalar condition already handle this correctly.

llvm-svn: 287904
2016-11-24 21:48:52 +00:00
Nicolai Haehnle 934470f536 [SelectionDAG] Early-out in TargetLowering::expandMUL (NFC)
Summary: Reduce indentation level; preparation for D24956.

Reviewers: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27063

llvm-svn: 287831
2016-11-23 22:14:20 +00:00
Nirav Dave cf34556330 [DAG] Improve loads-from-store forwarding to handle TokenFactor
Forward store values to matching loads down through token
factors. Factored from D14834.

Reviewers: jyknight, hfinkel

Subscribers: hfinkel, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D26080

llvm-svn: 287773
2016-11-23 16:48:35 +00:00
John Brawn 150addb45c [DAGCombiner] Fix infinite loop in vector mul/shl combining
We have the following DAGCombiner transformations:
 (mul (shl X, c1), c2) -> (mul X, c2 << c1)
 (mul (shl X, C), Y) -> (shl (mul X, Y), C)
 (shl (mul x, c1), c2) -> (mul x, c1 << c2)
Usually the constant shift is optimised by SelectionDAG::getNode when it is
constructed, by SelectionDAG::FoldConstantArithmetic, but when we're dealing
with vectors and one of those vector constants contains an undef element
FoldConstantArithmetic does not fold and we enter an infinite loop.

Fix this by making FoldConstantArithmetic use getNode to decide how to fold each
vector element, the same as FoldConstantVectorArithmetic does, and rather than
adding the constant shift to the work list instead only apply the transformation
if it's already been folded into a constant, as if it's not we're going to loop
endlessly. Additionally add missing NoOpaques to one of those transformations,
which I noticed when writing the tests for this.

Differential Revision: https://reviews.llvm.org/D26605

llvm-svn: 287766
2016-11-23 16:05:51 +00:00
Elena Demikhovsky 09375d98b8 Type legalization for compressstore and expandload intrinsics.
Implemented widening (v2f32) and splitting (v16f64).
On splitting, I use "popcnt" to calculate memory increment. 
More type legalization work will come in the next patches.

llvm-svn: 287761
2016-11-23 13:58:24 +00:00
Simon Pilgrim 72e43570b7 [SelectionDAG] ComputeNumSignBits of TRUNCATE operations
Add basic ComputeNumSignBits support for TRUNCATE ops for cases where the source's number of sign bits overlaps with the truncated size.

Improves X86 SIGN_EXTEND_IN_REG vector cases which were needlessly sign extending boolean vector results.

Differential Revision: https://reviews.llvm.org/D26851

llvm-svn: 287635
2016-11-22 11:29:19 +00:00
Matt Arsenault b30d2aca58 DAG: Ignore call site attributes when emitting target intrinsic
A target intrinsic may be defined as possibly reading memory,
but the call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic
assumption of the intrinsic definition, so the chain should
still be used.

llvm-svn: 287593
2016-11-21 22:56:42 +00:00
Simon Pilgrim 5662074ba3 [VectorLegalizer] Remove EVT::getSizeInBits code duplications. NFCI.
We were calling SVT.getSizeInBits() several times in a row - just call it once and reuse the result.

llvm-svn: 287556
2016-11-21 18:24:44 +00:00
Simon Pilgrim 49d7eda968 [SelectionDAG] Add ComputeNumSignBits support for CONCAT_VECTORS opcode
llvm-svn: 287541
2016-11-21 14:36:19 +00:00
Simon Pilgrim 7a6b6d5656 Fix spelling mistakes in SelectionDAG comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287487
2016-11-20 13:14:57 +00:00
Simon Pilgrim e40900dddd [SelectionDAG] Add knowbits support for CONCAT_VECTOR opcode
llvm-svn: 287387
2016-11-18 22:21:22 +00:00
Matthias Braun 9f15a79e5d Timer: Track name and description.
The previously used "names" are rather descriptions (they use multiple
words and contain spaces), use short programming language identifier
like strings for the "names" which should be used when exporting to
machine parseable formats.

Also removed a unused TimerGroup from Hexxagon.

Differential Revision: https://reviews.llvm.org/D25583

llvm-svn: 287369
2016-11-18 19:43:18 +00:00
Simon Pilgrim c4d733cd6a Fix spelling in comment. NFC.
llvm-svn: 287222
2016-11-17 12:03:05 +00:00
Chris Bieneman 05c279fc4b [CMake] NFC. Updating CMake dependency specifications
This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system.

llvm-svn: 287206
2016-11-17 04:36:50 +00:00
Ahmed Bougacha bd6ce9a247 [CodeGen] Pass references, not pointers, to MMI helpers. NFC.
While there, rename them to follow the coding style.

llvm-svn: 287169
2016-11-16 22:25:03 +00:00
Ahmed Bougacha 456dce8a84 [CodeGen] Pull MMI helpers from FunctionLoweringInfo to MMI. NFC.
They're not SelectionDAG- or FunctionLoweringInfo-specific.  They
are, however, specific to building MMI from IR.
We could make them members, but it's nice having MMI be a "simple" data
structure and this logic kept separate.

This also lets us reuse them from GlobalISel.

llvm-svn: 287167
2016-11-16 22:24:56 +00:00
Pawel Bylica c3f6c97f71 Integer legalization: fix MUL expansion
Summary:
This fixes the runtime results produces by the fallback multiplication expansion introduced in r270720.

For tests I created a fuzz tester that compares the results with Boost.Multiprecision.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26628

llvm-svn: 286998
2016-11-15 18:29:24 +00:00
Joerg Sonnenberger 1a7eec68a9 Introduce TLI predicative for base-relative Jump Tables.
For 64bit ABIs it is common practice to use relative Jump Tables with
potentially different relocation bases.  As the logic for the jump table
itself doesn't depend on the relocation base, make it easier for targets
to use the generic logic. Start by dropping the now redundant MIPS logic.

Differential Revision: https://reviews.llvm.org/D26578

llvm-svn: 286951
2016-11-15 12:39:46 +00:00
Asaf Badouh b573553424 DAGCombiner: fix combine of trunc and select
bugzilla:
https://llvm.org/bugs/show_bug.cgi?id=29002
pr29002

Differential Revision: https://reviews.llvm.org/D26449


 

llvm-svn: 286938
2016-11-15 07:55:22 +00:00
Simon Pilgrim 807f9cf243 [SelectionDAG] Add support for vector demandedelts in BSWAP opcodes
llvm-svn: 286582
2016-11-11 11:51:29 +00:00
Simon Pilgrim 813721e98a [SelectionDAG] Add support for vector demandedelts in UREM/SREM opcodes
llvm-svn: 286578
2016-11-11 11:23:43 +00:00
Simon Pilgrim 0652227814 [SelectionDAG] Add support for vector demandedelts in UDIV opcodes
llvm-svn: 286576
2016-11-11 10:47:24 +00:00
Evandro Menezes 21f9ce1a0d [DAG Combiner] Fix the native computation of the Newton series for reciprocals
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself.  However, the original code did not properly consider this condition
if returned by a target.  This patch addresses the issues to allow a target
to compute the series on its own.

Differential revision: https://reviews.llvm.org/D22975

llvm-svn: 286523
2016-11-10 23:31:06 +00:00
Simon Pilgrim 38f0045cb0 [SelectionDAG] Add support for vector demandedelts in ADD/SUB opcodes
llvm-svn: 286516
2016-11-10 22:41:49 +00:00
Simon Pilgrim fe3a54371d [SelectionDAG] Add support for splatted vectors in SUB opcode
llvm-svn: 286509
2016-11-10 21:57:42 +00:00
Simon Pilgrim d67af68f06 [SelectionDAG] Add support for vector demandedelts in TRUNCATE opcodes
llvm-svn: 286481
2016-11-10 17:43:52 +00:00
Simon Pilgrim 33fef8e865 Use common SDLoc. NFCI.
llvm-svn: 286473
2016-11-10 16:47:09 +00:00
Simon Pilgrim ee187fd6e7 [SelectionDAG] Add support for vector demandedelts in MUL opcodes
llvm-svn: 286471
2016-11-10 16:27:42 +00:00
Simon Pilgrim ca57e53ded [SelectionDAG] Add support for vector demandedelts in SRA opcodes
llvm-svn: 286461
2016-11-10 15:05:09 +00:00
Simon Pilgrim 37c9034bd6 [DAGCombiner] Correctly extract the ConstOrConstSplat shift value for SHL nodes
We were failing to extract a constant splat shift value if the shifted value was being masked.

The (shl (and (setcc) N01CV) N1CV) -> (and (setcc) N01CV<<N1CV) combine was unnecessarily preventing this.

llvm-svn: 286454
2016-11-10 14:35:09 +00:00
Simon Pilgrim 3bf99c056a [SelectionDAG] Add support for vector demandedelts in SHL/SRL opcodes
llvm-svn: 286448
2016-11-10 13:52:42 +00:00
Simon Pilgrim 778596bf59 [TargetLowering] Fix undef vector element issue with true/false result handling
Fixed an issue with vector usage of TargetLowering::isConstTrueVal / TargetLowering::isConstFalseVal boolean result matching.

The comment said we shouldn't handle constant splat vectors with undef elements. But the the actual code was returning false if the build vector contained no undef elements....

This patch now ignores the number of undefs (getConstantSplatNode will return null if the build vector is all undefs).

The change has also unearthed a couple of missed opportunities in AVX512 comparison code that will need to be addressed.

Differential Revision: https://reviews.llvm.org/D26031

llvm-svn: 286238
2016-11-08 15:07:01 +00:00
Simon Pilgrim d02c55204b [VectorLegalizer] Expansion of CTLZ using CTPOP when possible
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.

This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.

Differential Revision: https://reviews.llvm.org/D25910

llvm-svn: 286233
2016-11-08 14:10:28 +00:00
Richard Smith 857efb0880 Add -O0 support for @llvm.invariant.group.barrier by discarding it if it gets to ISel.
Differential Revision: https://reviews.llvm.org/D26292

llvm-svn: 286119
2016-11-07 16:47:20 +00:00
Simon Pilgrim 39df78e384 [SelectionDAG] Add support for vector demandedelts in XOR opcodes
llvm-svn: 286075
2016-11-06 16:49:19 +00:00
Simon Pilgrim dd4809a603 [SelectionDAG] Add support for vector demandedelts in OR opcodes
llvm-svn: 286071
2016-11-06 16:29:09 +00:00
Nicolai Haehnle bea772c6dc DAGCombiner: fix use-after-free when merging consecutive stores
Summary:
Have MergeConsecutiveStores explicitly return information about the stores
that were merged, so that we can safely determine whether the starting
node has been freed.

Reviewers: chandlerc, bogner, niravd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25601

llvm-svn: 285916
2016-11-03 14:25:04 +00:00
Elena Demikhovsky caaceef4b3 Expandload and Compressstore intrinsics
2 new intrinsics covering AVX-512 compress/expand functionality.
This implementation includes syntax, DAG builder, operation lowering and tests.
Does not include: handling of illegal data types, codegen prepare pass and the cost model.

llvm-svn: 285876
2016-11-03 03:23:55 +00:00
Simon Pilgrim 93f2f7fb6c Use !operator to test if APInt is zero/non-zero. NFCI.
Avoids APInt construction and slower comparisons.

llvm-svn: 285822
2016-11-02 15:41:15 +00:00
Joerg Sonnenberger 5e31b3ad93 Simplify.
llvm-svn: 285802
2016-11-02 12:45:28 +00:00
Sanjay Patel 70c5f02d25 [DAG] disable nsw/nuw for add/sub/mul when simplifying based on demanded bits (PR30841)
This bug was exposed by using nsw/nuw for more aggressive folds in:
https://reviews.llvm.org/rL284844

The changes mimic the IR demanded bits logic in InstCombiner::SimplifyDemandedUseBits(),
but we can't just flip flag bits in the DAG; we have to create a new node that has the
bits cleared.

This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30841 

llvm-svn: 285656
2016-10-31 23:28:45 +00:00
Sanjay Patel 339a51ac13 [DAG] x | x --> x
llvm-svn: 285522
2016-10-30 18:19:35 +00:00
Sanjay Patel 13aee345ca [DAG] x & x --> x
llvm-svn: 285521
2016-10-30 18:13:30 +00:00
Simon Pilgrim 75a697a17e [DAGCombiner] (REAPPLIED) Add vector demanded elements support to computeKnownBits
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.

I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.

DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.

This looked like this had caused compile time regressions on some buildbots (and was reverted in rL285381), but appears to have just been a harmless bystander!

Differential Revision: https://reviews.llvm.org/D25691

llvm-svn: 285494
2016-10-29 11:29:39 +00:00
Davide Italiano 86168b23cf [DAGCombiner] Fix a crash visiting `AND` nodes.
Instead of asserting that the shift count is != 0 we just bail out
as it's not profitable trying to optimize a node which will be
removed anyway.

Differential Revision:  https://reviews.llvm.org/D26098

llvm-svn: 285480
2016-10-28 23:55:32 +00:00
Justin Bogner db6b6a7f0c SDAG: Make sure we use an allocatable reg class when we create this vreg
As per the discussion on r280783, if constrainRegClass fails we need
to call getAllocatableClass like we did before that commit.

llvm-svn: 285467
2016-10-28 22:42:54 +00:00
Simon Pilgrim d9189891fc [SelectionDAG] computeKnownBits - early-out if any BUILD_VECTOR element has no known bits
No need to check the remaining elements - no common known bits are available.

llvm-svn: 285399
2016-10-28 14:07:44 +00:00
Simon Pilgrim 8c043061e5 [SelectionDAG] Tidyup UDIV computeKnownBits implementation
No need to clear KnownOne2/KnownZero2 bits as the next call to computeKnownBits will overwrite them anyway

llvm-svn: 285398
2016-10-28 13:42:23 +00:00
Simon Pilgrim 755cef1ba8 [SelectionDAG] Increment computeKnownBits recursion depth for SMIN/SMAX/UMIN/UMAX like all other ops
llvm-svn: 285397
2016-10-28 13:13:16 +00:00
Juergen Ributzka 5cee232be4 Revert "[DAGCombiner] Add vector demanded elements support to computeKnownBits"
This seems to have increased LTO compile time bejond 2x of previous builds.
See http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto/10676/

llvm-svn: 285381
2016-10-28 04:01:12 +00:00
Simon Pilgrim 01e755eab1 [DAGCombiner] Add vector demanded elements support to computeKnownBits
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.

I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.

DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.

Differential Revision: https://reviews.llvm.org/D25691

llvm-svn: 285296
2016-10-27 14:29:28 +00:00
Nemanja Ivanovic 275853e777 Do not assume that FP vector operands are never legalized by expanding
This patch ensures that if a floating point vector operand is legalized by
expanding, it is legalized through the stack rather than by calling
DAGTypeLegalizer::IntegerToVector which will cause a failure since the operand
is a non-integer type.

This fixes PR 30715.

llvm-svn: 285231
2016-10-26 19:51:35 +00:00
Tom Stellard 284cf32ab4 LegalizeDAG: Support promoting [US]DIV and [US]REM operations
Summary:
AMDGPU will need this one i16 is added as a legal type.  This is tested by:

test/CodeGen/AMDGPU/sdiv.ll
test/CodeGen/AMDGPU/sdivrem24.ll
test/CodeGen/AMDGPU/udiv.ll
test/CodeGen/AMDGPU/udivrem24.ll

Reviewers: bogner, efriedma

Subscribers: efriedma, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D25699

llvm-svn: 285199
2016-10-26 14:52:25 +00:00
Simon Pilgrim de86241a09 [DAGCombiner] Enable (urem x, (shl pow2, y)) -> (and x, (add (shl pow2, y), -1)) combine for splatted vectors
llvm-svn: 285129
2016-10-25 22:01:09 +00:00
Simon Pilgrim f534573e8c [DAGCombiner] Enable srem(x.y) -> urem(x,y) combine for vectors
SelectionDAG::SignBitIsZero (via SelectionDAG::computeKnownBits) has supported vectors since rL280927

llvm-svn: 285123
2016-10-25 21:20:18 +00:00
Simon Pilgrim 4ebb04510a [DAGCombiner] Enable sdiv(x.y) -> udiv(x,y) combine for vectors
SelectionDAG::SignBitIsZero (via SelectionDAG::computeKnownBits) has supported vectors since rL280927

llvm-svn: 285118
2016-10-25 20:56:42 +00:00
Evandro Menezes 601f4cb9f7 Switch lowering: improve partitioning of jump tables
When there's a tie between partitionings of jump tables, consider also cases
that result in no jump tables, but in one or a few cases.  The motivation is
that many contemporary processors typically perform case switches fairly
quickly.

Differential revision: https://reviews.llvm.org/D25212

llvm-svn: 285099
2016-10-25 19:11:43 +00:00
Zvi Rackover 124470a202 [DAGCombine] Preserve shuffles when one of the vector operands is constant
Summary:
Do *not* perform combines such as:

    vector_shuffle<4,1,2,3>(build_vector(Ud, C0, C1 C2), scalar_to_vector(X))
    ->
    build_vector(X, C0, C1, C2)

Keeping the shuffle allows lowering the constant build_vector to a materialized
constant vector (such as a vector-load from the constant-pool or some other idiom).

Reviewers: delena, igorb, spatel, mkuper, andreadb, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25524

llvm-svn: 285063
2016-10-25 12:14:19 +00:00
Simon Pilgrim e3e6585c2d [SelectionDAG] Update ComputeNumSignBits SRA/SHL handlers to accept scalar or vector splats
Use isConstOrConstSplat helper.

Also use APInt instead of getZExtValue directly to avoid out of range issues.

llvm-svn: 285033
2016-10-24 21:47:19 +00:00
Simon Pilgrim d8ec09c74f Use SDValue::getConstantOperandVal() helper. NFCI.
llvm-svn: 285025
2016-10-24 20:56:52 +00:00
Peter Collingbourne 16e9b944e9 CodeGen: Do not add a global's address space to the folding set profile.
It is already part of the type (which is part of the global, which is already
being added), so there's no need to do it.

llvm-svn: 285002
2016-10-24 18:56:09 +00:00
Sanjay Patel 9ca028c2d6 [DAG] enhance computeKnownBits to handle SRL/SRA with vector splat constant
llvm-svn: 284953
2016-10-23 23:13:31 +00:00
Simon Pilgrim d06641d3dc Use SDValue::getConstantOperandVal() helper. NFCI.
llvm-svn: 284949
2016-10-23 20:17:21 +00:00
Sanjay Patel ca92c36e01 [DAG] enhance computeKnownBits to handle SHL with vector splat constant
Also, use APInt to avoid crashing on types larger than vNi64.

llvm-svn: 284874
2016-10-21 20:16:27 +00:00
Sanjay Patel 81029f6a76 [DAG] fold negation of sign-bit
0 - X --> 0, if the sub is NUW
0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW
0 - X --> X, if X is 0 or the minimum signed value

This is the DAG equivalent of:
https://reviews.llvm.org/rL284649

plus the fold for the NUW case which already existed in InstSimplify.

Note that we miss a vector fold because of a deficiency in the DAG version of
computeKnownBits().

llvm-svn: 284844
2016-10-21 17:24:26 +00:00
Sanjay Patel cbaba93ce8 [DAG] use SDNode flags 'nsz' to enable fadd/fsub with zero folds
As discussed in D24815, let's start the process of killing off the broken fast-math global
state housed in TargetOptions and eliminate the need for function-level fast-math attributes.

Here we enable two similar folds that are possible when we don't care about signed-zero:
fadd nsz x, 0 --> x
fsub nsz 0, x --> -x

Note that although the test cases include a 'sin' function call, I'm side-stepping the 
FMF-on-calls question (and lack of support in the DAG) for now. It's not needed for these
tests - isNegatibleForFree/GetNegatedExpression just look through a ISD::FSIN node.

Also, when we create an FNEG node and propagate the Flags of the FSUB to it, this doesn't
actually do anything today because Flags are silently dropped for any node that is not a
binary operator.

Differential Revision: https://reviews.llvm.org/D25297

llvm-svn: 284824
2016-10-21 14:36:58 +00:00
Pirama Arumuga Nainar 05b0f93ad3 Fix *_EXTEND_VECTOR_INREG legalization
Summary:
While promoting *_EXTEND_VECTOR_INREG nodes whose inputs are already
promoted, perform the appropriate sign extension for the promoted node
before doing the *_EXTEND_VECTOR_INREG operation.  If not, the undefined
high-order bits of the promoted operand may (a) be garbage inc ase of
zext) or (b) contribute the wrong sign-bit (in case of sext)

Updated the promote-vec3.ll test after this change.  The diff shows
explicit zeroing in case of zext and intermediate sign extension in case
of sext.

Reviewers: RKSimon

Subscribers: llvm-commits, srhines

Differential Revision: https://reviews.llvm.org/D25790

llvm-svn: 284752
2016-10-20 17:56:36 +00:00
Sanjay Patel 0051efcf97 [Target] remove TargetRecip class; 2nd try
This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.

This version also renames a pair of functions:
getRecipEstimateDivEnabled()
getRecipEstimateSqrtEnabled()
as suggested by Eric Christopher.

original commit msg:

[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering

This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284746
2016-10-20 16:55:45 +00:00
Simon Pilgrim 618d3aedaf [DAGCombiner] Add general constant vector support to (srl (shl x, c), c) -> (and x, cst2)
We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector

llvm-svn: 284717
2016-10-20 11:10:21 +00:00
Simon Pilgrim e32d0f8413 Merged nested ifs. NFCI.
llvm-svn: 284616
2016-10-19 17:30:24 +00:00
Simon Pilgrim a20aeea998 [DAGCombiner] Add general constant vector support to (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector

llvm-svn: 284613
2016-10-19 17:12:22 +00:00
Reid Kleckner f7ad5341d0 [WinEH] Allow catchpads to reuse the same catch object
This code used a regular when it should have used a multimap.

llvm-svn: 284612
2016-10-19 17:08:23 +00:00
Sanjay Patel 3a3aaf67e0 [DAG] optimize negation of bool
Use mask and negate for legalization of i1 source type with SIGN_EXTEND_INREG.
With the mask, this should be no worse than 2 shifts. The mask can be eliminated
in some cases, so that should be better than 2 shifts.

This change exposed some missing folds related to negation:
https://reviews.llvm.org/rL284239
https://reviews.llvm.org/rL284395

There may be others, so please let me know if you see any regressions.

Differential Revision: https://reviews.llvm.org/D25485

llvm-svn: 284611
2016-10-19 16:58:59 +00:00
Simon Pilgrim 4554e161be [DAGCombiner] Add general constant vector support to (shl (sra x, c1), c1) -> (and x, (shl -1, c1))
We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector

llvm-svn: 284608
2016-10-19 16:15:30 +00:00
Simon Pilgrim c2e9724909 [DAGCombiner] Add general constant vector support to (shl (mul x, c1), c2) -> (mul x, c1 << c2)
We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector

llvm-svn: 284607
2016-10-19 15:59:28 +00:00
Simon Pilgrim 7dcb6e572e [DAGCombiner] Just call isConstOrConstSplat directly. NFCI.
This will get the same ConstantSDNode scalar or vector splat value as the current separate dyn_cast<ConstantSDNode> / isVector() approach.

llvm-svn: 284578
2016-10-19 11:28:15 +00:00
Simon Pilgrim b2ca2505cc [DAGCombine] Generalize distributeTruncateThroughAnd to work with any non-opaque constant or constant vector
llvm-svn: 284574
2016-10-19 08:57:37 +00:00
Sanjay Patel 19601fa587 revert r284495: [Target] remove TargetRecip class
There's something wrong with the StringRef usage while parsing the attribute string.

llvm-svn: 284513
2016-10-18 18:36:49 +00:00
Sanjay Patel 08fff9ca81 [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering
This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284495
2016-10-18 17:05:05 +00:00
Simon Pilgrim 25e9628978 [DAGCombiner] Add splatted vector support to (udiv x, (shl pow2, y)) -> x >>u (log2(pow2)+y)
llvm-svn: 284491
2016-10-18 16:36:00 +00:00
Simon Pilgrim 65e0c73875 Strip trailing whitespace (NFCI)
llvm-svn: 284478
2016-10-18 13:44:00 +00:00
Sanjay Patel 523cd8290a [DAG] use isConstOrConstSplat in ComputeNumSignBits to optimize SRA
The scalar version of this pattern was noted in:
https://reviews.llvm.org/D25485

and fixed with:
https://reviews.llvm.org/rL284395

More refactoring of the constant/splat helpers is needed and will happen in follow-up patches.

Differential Revision: https://reviews.llvm.org/D25685

llvm-svn: 284424
2016-10-17 20:41:39 +00:00
Sanjay Patel a7cab58055 [DAG] make isConstOrConstSplat and isConstOrConstSplatFP more accessible; NFC
As noted in:
https://reviews.llvm.org/D25685

This is the next-to-smallest step needed to enable the ComputeNumSignBits fix in that patch. 
In a minor attempt to keep some structure, we're pulling the FP helper over along with its
integer sibling, but clearly we can and should do more refactoring of the similar helper
functions in DAGCombiner and SelectionDAG to simplify and not duplicate functionality.

llvm-svn: 284421
2016-10-17 20:26:46 +00:00
Sanjay Patel 2cf6bfaf73 [DAG] optimize away an arithmetic-right-shift of a 0 or -1 value
This came up as part of:
https://reviews.llvm.org/D25485

Note that the vector case is missed because ComputeNumSignBits() is deficient for vectors.

llvm-svn: 284395
2016-10-17 15:58:28 +00:00
James Molloy aa79b19a3e [SDAG] Use ABI type alignment for constant pools when optimizing for size
SelectionDAG::getConstantPool will automatically determine an appropriate alignment if one is not specified. It does this by querying the type's preferred alignment. This can end up creating quite a lot of padding when the preferred alignment for vectors is 128.

In optimize-for-size mode, it makes sense to instead query the ABI type alignment which is often smaller and causes less padding.

llvm-svn: 284381
2016-10-17 12:54:07 +00:00
Konstantin Zhuravlyov 8ea0246e93 [MachineMemOperand] Move synchronization scope and atomic orderings from SDNode to MachineMemOperand, and remove redundant getAtomic* member functions from SelectionDAG.
Differential Revision: https://reviews.llvm.org/D24577

llvm-svn: 284312
2016-10-15 22:01:18 +00:00
Sanjay Patel 72b5ff646d [DAG] avoid creating illegal node when transforming negated shifted sign bit
Eli noted this potential bug in the post-commit thread for:
https://reviews.llvm.org/rL284239
...but I'm not sure how to trigger it, so there's no test case yet.

llvm-svn: 284268
2016-10-14 19:46:31 +00:00
Tom Stellard ab61007914 TargetLowering: Add SimplifyDemandedBits() helper to TargetLoweringOpt
Summary:
The main purpose of this new helper is to enable simplifying operations that
have multiple uses.  SimplifyDemandedBits does not handle multiple uses
currently, and this new function makes it possible to optimize:

and v1, v0, 0xffffff
mul24 v2, v1, v1      ; Multiply ignoring high 8-bits.

To:

mul24 v2, v0, v0

Where before this would not be optimized, because v1 has multiple uses.

Reviewers: bogner, arsenm

Subscribers: nhaehnle, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D24964

llvm-svn: 284266
2016-10-14 19:14:26 +00:00
Sanjay Patel 00fc7a6159 [DAG] add folds for negated shifted sign bit
The same folds exist in InstCombine already.

This came up as part of:
https://reviews.llvm.org/D25485

llvm-svn: 284239
2016-10-14 14:26:47 +00:00
Nicolai Haehnle 86e72d98dd Fix use-after-frees
Extracted from D25313, as suggested by Justin Bogner.

llvm-svn: 284220
2016-10-14 09:49:51 +00:00
Craig Topper 40feb7f157 [DAGCombiner] Teach createBuildVecShuffle to handle cases where input vectors are less than half of the output vector size.
This will be needed by a future commit to support sign/zero extending from v8i8 to v8i64 which requires a sign/zero_extend_vector_inreg to be created which requires v8i8 to be concatenated upto v64i8 and goes through this code.

llvm-svn: 284204
2016-10-14 06:00:42 +00:00
Sanjay Patel 98d0ea64ca [DAG] hoist DL(N) and fix formatting; NFC
llvm-svn: 284170
2016-10-13 22:27:10 +00:00
Tom Stellard f80c1875a3 LegalizeDAG: Implement PROMOTE for ISD::BITREVERSE
Summary:
This operation is promoted the same way was ISD::BSWAP.  This will
prevent a regression in test/Target/AMDGOU/bitreverse.ll when i16
support is implemented.

Reviewers: bogner, hfinkel

Subscribers: hfinkel, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D25202

llvm-svn: 284163
2016-10-13 21:03:49 +00:00
Nirav Dave a81682aad4 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r284151 which appears to be triggering a LTO
failures on Hexagon

llvm-svn: 284157
2016-10-13 20:23:25 +00:00
Nirav Dave 4b36957243 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after upstream changes.

   Simplify Consecutive Merge Store Candidate Search

   Now that address aliasing is much less conservative, push through
   simplified store merging search which only checks for parallel stores
   through the chain subgraph. This is cleaner as the separation of
   non-interfering loads/stores from the store-merging logic.

   Whem merging stores, search up the chain through a single load, and
   finds all possible stores by looking down from through a load and a
   TokenFactor to all stores visited. This improves the quality of the
   output SelectionDAG and generally the output CodeGen (with some
   exceptions).

   Additional Minor Changes:

       1. Finishes removing unused AliasLoad code
       2. Unifies the the chain aggregation in the merged stores across
       code paths
       3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
       4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

   This finishes the change Matt Arsenault started in r246307 and
   jyknight's original patch.

   Many tests required some changes as memory operations are now
   reorderable. Some tests relying on the order were changed to use
   volatile memory operations

   Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 284151
2016-10-13 19:20:16 +00:00
Simon Pilgrim cb59b5257c [DAGCombiner] Add vector support to (mul (shl X, Y), Z) -> (shl (mul X, Z), Y) style combines
llvm-svn: 284122
2016-10-13 14:04:35 +00:00
Simon Pilgrim fa8fadc0e5 [DAGCombiner] Add vector support to C2-(A+C1) -> (C2-C1)-A folding
llvm-svn: 284117
2016-10-13 12:49:31 +00:00
Simon Pilgrim 833b8a2071 [DAGCombiner] Add vector support to (sub -1, x) -> (xor x, -1) canonicalization
Improves commutation potential

llvm-svn: 284113
2016-10-13 12:05:20 +00:00
Albert Gutowski 795d7d6381 Create llvm.addressofreturnaddress intrinsic
Summary: We need a new LLVM intrinsic to implement MS _AddressOfReturnAddress builtin on 64-bit Windows.

Reviewers: majnemer, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25293

llvm-svn: 284061
2016-10-12 22:13:19 +00:00
Simon Pilgrim 08190943cb [DAGCombiner] Update most ADD combines to support general vector combines
Add a number of helper functions to match scalar or vector equivalent constant/splat values to allow most of the combine patterns to be used by vectors.

Differential Revision: https://reviews.llvm.org/D25374

llvm-svn: 284015
2016-10-12 13:48:10 +00:00
Konstantin Zhuravlyov 081385a74e [DAGCombiner] Do not remove the load of stored values when optimizations are disabled
This combiner breaks debug experience and should not be run when optimizations are disabled.

For example:
  int main() {
    int j = 0;
    j += 2;
    if (j == 2)
      return 0;
    return 5;
  }
When debugging this code compiled in /O0, it should be valid to break at line "j+=2;" and edit the value of j. It should change the return value of the function.

Differential Revision: https://reviews.llvm.org/D19268

llvm-svn: 284014
2016-10-12 13:44:24 +00:00
Michael Kuperstein 7adbf6b042 [DAG] Fix crash in build_vector -> vector_shuffle combine
Fixes a crash in the build_vector -> vector_shuffle combine
when the first vector input is twice as wide as the output,
and the second input vector is even wider.

llvm-svn: 283953
2016-10-11 22:44:31 +00:00
Arnold Schwaighofer 9103e268cf Silence -Wunused-but-set-variable warning
llvm-svn: 283927
2016-10-11 19:49:29 +00:00
Sanjay Patel 8253e15ef3 [DAG] add fold for masked negated sign-extended bool
This enhances the fold added with:
https://reviews.llvm.org/rL283900

llvm-svn: 283905
2016-10-11 17:05:52 +00:00
Sanjay Patel 8384703d9b [DAG] add fold for masked negated extended bool
The non-obvious motivation for adding this fold (which already happens in InstCombine)
is that we want to canonicalize IR towards select instructions and canonicalize DAG 
nodes towards boolean math. So we need to recreate some folds in the DAG to handle that
change in direction. 

An interesting implementation difference for cases like this is that InstCombine
generally works top-down while the DAG goes bottom-up. That means we need to detect 
different patterns. In this case, the SimplifyDemandedBits fold prevents us from 
performing a zext to sext fold that would then be recognized as a negation of a sext. 

llvm-svn: 283900
2016-10-11 16:26:36 +00:00
Sanjay Patel 38a42e4bfa [DAG] simplify logic; NFC
llvm-svn: 283885
2016-10-11 14:14:30 +00:00
Sanjay Patel 907ae69125 [DAG] hoist DL(N) and fix formatting; NFC
llvm-svn: 283884
2016-10-11 14:04:24 +00:00
Sanjay Patel 9609f3d6c7 [DAG] fix formatting; NFC
llvm-svn: 283878
2016-10-11 13:47:43 +00:00
Hal Finkel fcd2421667 [SelectionDAGBuilder] Support llvm.flt.rounds on targets where i32 is not legal
Add integer expansion for FLT_ROUNDS_ for targets where i32 is not a legal
type.

Patch by Edward Jones, thanks!

Differential Revision: https://reviews.llvm.org/D24459

llvm-svn: 283797
2016-10-10 20:45:15 +00:00
Elena Demikhovsky 5b10aa1f1e DAG: Setting Masked-Expand-Load as a variant of Masked-Load node
Masked-expand-load node represents load operation that loads a variable amount of elements from memory according to amount of "true" bits in the mask and expands the loaded elements according to their position in the mask vector.
Right now, the node is used in intrinsics for VEXPAND* instructions. 
The work is done towards implementation of masked.expandload and masked.compressstore intrinsics.

Differential Revision: https://reviews.llvm.org/D25322

llvm-svn: 283694
2016-10-09 10:48:52 +00:00
Arnold Schwaighofer 3f25658143 swifterror: Don't compute swifterror vregs during instruction selection
The code used llvm basic block predecessors to decided where to insert phi
nodes. Instruction selection can and will liberally insert new machine basic
block predecessors. There is not a guaranteed one-to-one mapping from pred.
llvm basic blocks and machine basic blocks.

Therefore the current approach does not work as it assumes we can mark
predecessor machine basic block as needing a copy, and needs to know the set of
all predecessor machine basic blocks to decide when to insert phis.

Instead of computing the swifterror vregs as we select instructions, propagate
them at the end of instruction selection when the MBB CFG is complete.

When an instruction needs a swifterror vreg and we don't know the value yet,
generate a new vreg and remember this "upward exposed" use, and reconcile this
at the end of instruction selection.

This will only happen if the target supports promoting swifterror parameters to
registers and the swifterror attribute is used.

rdar://28300923

llvm-svn: 283617
2016-10-07 22:06:55 +00:00
Sanjay Patel 14c02052d6 [DAG] clean up foldSelectOfConstants(); NFCI
Rename variables, simplify logic. 
Not clear yet why we don't handle a target with ZeroOrNegativeOneBooleanContent too.

llvm-svn: 283613
2016-10-07 21:55:42 +00:00
Sanjay Patel ecaf343fe7 [DAG] move fold (select C, 0, 1 -> xor C, 1) to a helper function; NFC
We're missing at least 3 other similar folds based on what we have in InstCombine. 

llvm-svn: 283596
2016-10-07 20:47:51 +00:00
Vedant Kumar 7beb423765 Delete some dead code in SelectionDAG (NFC)
Differential Revision: https://reviews.llvm.org/D24435

llvm-svn: 283505
2016-10-06 22:53:43 +00:00
Pirama Arumuga Nainar cc152ac794 Handle *_EXTEND_VECTOR_INREG during Integer Legalization
Summary:
These nodes need legalization for 3-element vectors.  This commit
handles the legalization and adds tests for zext and sext.

This fixes PR30614.

Reviewers: RKSimon, srhines

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25268

llvm-svn: 283496
2016-10-06 21:27:05 +00:00
Michael Kuperstein 7cc2123847 [DAG] Generalize build_vector -> vector_shuffle combine for more than 2 inputs
This generalizes the build_vector -> vector_shuffle combine to support any
number of inputs. The idea is to create a binary tree of shuffles, where
the first layer performs pairwise shuffles of the input vectors placing each
input element into the correct lane, and the rest of the tree blends these
shuffles together.

This doesn't try to be smart and create any sort of "optimal" shuffles.
The assumption is that even a "poor" shuffle sequence is better than extracting
and inserting the elements one by one.

Differential Revision: https://reviews.llvm.org/D24683

llvm-svn: 283480
2016-10-06 18:58:24 +00:00
Peter Collingbourne d799d28540 FastISel: Remove unused/un-overridden entry points. NFCI.
llvm-svn: 283366
2016-10-05 19:25:20 +00:00
Bjorn Pettersson 12559441bd [DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look through EXTRACT_VECTOR_ELT.
Summary: Both computeKnownBits and ComputeNumSignBits can now do a simple
look-through of EXTRACT_VECTOR_ELT. It will compute the result based
on the known bits (or known sign bits) for the vector that the element
is extracted from.

Reviewers: bogner, tstellarAMD, mkuper

Subscribers: wdng, RKSimon, jyknight, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D25007

llvm-svn: 283347
2016-10-05 17:40:27 +00:00
Mehdi Amini 3e021be3b6 Use StringRef in FastISel API (NFC)
llvm-svn: 283291
2016-10-05 01:37:29 +00:00
whitequark 7c4fe0e9a3 [SelectionDAG] Fix calling convention in expansion of ?MULO.
The SMULO/UMULO DAG nodes, when not directly supported by the target,
expand to a multiplication twice as wide. In case that the resulting
type is not legal, an __mul?i3 intrinsic is used. Since the type is
not legal, the legalizer cannot directly call the intrinsic with
the wide arguments; instead, it "pre-lowers" them by splitting them
in halves.

The "pre-lowering" code in essence made assumptions about
the calling convention, specifically that i(N*2) values will be
split into two iN values and passed in consecutive registers in
little-endian order. This, naturally, breaks on a big-endian system,
such as our OR1K out-of-tree backend.

Thanks to James Miller <james@aatch.net> for help in debugging.

Differential Revision: https://reviews.llvm.org/D25223

llvm-svn: 283203
2016-10-04 09:07:49 +00:00
Sanjay Patel f7df85af87 fix formatting; NFC
llvm-svn: 283115
2016-10-03 15:18:36 +00:00
Nirav Dave e524f50882 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604
2016-09-28 16:37:50 +00:00
Nirav Dave e17e055b75 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600
2016-09-28 15:50:43 +00:00
Michael Kuperstein 3e06eafc20 [DAG] Remove isVectorClearMaskLegal() check from vector_build dagcombine
This check currently doesn't seem to do anything useful on any in-tree target:
On non-x86, it always evaluates to false, so we never hit the code path that
creates the shuffle with zero.
On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to
query in general, but doesn't make sense if only restricted to zero blends.

Differential Revision: https://reviews.llvm.org/D24625

llvm-svn: 282567
2016-09-28 06:13:58 +00:00
Evandro Menezes e45de8a5ec Add support to optionally limit the size of jump tables.
Many high-performance processors have a dedicated branch predictor for
indirect branches, commonly used with jump tables.  As sophisticated as such
branch predictors are, they tend to have well defined limits beyond which
their effectiveness is hampered or even nullified.  One such limit is the
number of possible destinations for a given indirect branches that such
branch predictors can handle.

This patch considers a limit that a target may set to the number of
destination addresses in a jump table.

Patch by: Evandro Menezes <e.menezes@samsung.com>, Aditya Kumar
<aditya.k7@samsung.com>, Sebastian Pop <s.pop@samsung.com>.

Differential revision: https://reviews.llvm.org/D21940

llvm-svn: 282412
2016-09-26 15:32:33 +00:00
Ayman Musa d7a5ed4141 [X86][avx512] Fix bug in masked compress store.
Differential Revision: https://reviews.llvm.org/D23984

llvm-svn: 282381
2016-09-26 06:22:08 +00:00
Nirav Dave 9011da3d44 [DAG] Fix incorrect alignment of ext load.
Correctly use alignment size from loaded size not output value size.

Reviewers: jyknight, tstellarAMD, arsenm

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23356

llvm-svn: 282177
2016-09-22 17:28:43 +00:00
Arnold Schwaighofer de2490d0dc Disable tail calls if there is an swifterror argument
ISel does not handle them correctly yet i.e we crash trying to emit tail call
code.

radar://28407842

llvm-svn: 282088
2016-09-21 16:53:36 +00:00
Craig Topper af5ee86bc9 [AVX-512] Don't lower CVTPD2PS intrinsics to ISD::FP_ROUND with an X86 rounding mode encoding in the second operand. This immediate should only be 0 or 1 and indicates if the truncation loses precision.
Also enhance an assert in SelectionDAG::getNode to flag this sort of problem in the future.

llvm-svn: 281868
2016-09-18 21:49:32 +00:00
Simon Pilgrim 6c21e6a54e [X86][SSE] Improve recognition of uitofp conversions that can be performed as sitofp
With D24253 we can now use SelectionDAG::SignBitIsZero with vector operations.

This patch uses SelectionDAG::SignBitIsZero to recognise that a zero sign bit means that we can use a sitofp instead of a uitofp (which is not directly support on pre-AVX512 hardware).

While AVX512 does provide support for uitofp, the conversion to sitofp should not cause any regressions.

Differential Revision: https://reviews.llvm.org/D24343

llvm-svn: 281852
2016-09-18 12:45:23 +00:00
Wei Mi ab24cd189f Change the order of the splitted store from high - low to low - high.
It is a trivial change which could make the testcase easier to be reused
for the store splitting in CodeGenPrepare.

llvm-svn: 281846
2016-09-18 06:10:32 +00:00
Matt Arsenault e8e0f5cac6 Make analyzeBranch family of instruction names consistent
analyzeBranch was renamed to use lowercase first, rename
the related set to match.

llvm-svn: 281506
2016-09-14 17:24:15 +00:00
Sanjay Patel 284582b6d4 getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits(), round 2 ; NFCI
llvm-svn: 281498
2016-09-14 16:54:10 +00:00
Sanjay Patel 1ed771f5d7 getVectorElementType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281495
2016-09-14 16:37:15 +00:00
Sanjay Patel b1f0a0f4a8 getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCI
llvm-svn: 281493
2016-09-14 16:05:51 +00:00
Sanjay Patel 5f6bb6cd24 getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits() ; NFCI
llvm-svn: 281490
2016-09-14 15:43:44 +00:00
Sanjay Patel bd6fca1419 getScalarType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281489
2016-09-14 15:21:00 +00:00
Pawel Bylica c397f0b272 [CodeGen] Fix invalid shift in mul expansion
Summary: When expanding mul in type legalization make sure the type for shift amount can actually fit the value. This fixes PR30354 https://llvm.org/bugs/show_bug.cgi?id=30354.

Reviewers: hfinkel, majnemer, RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D24478

llvm-svn: 281403
2016-09-13 21:55:41 +00:00
Michael Kuperstein 59f8305305 [DAG] Allow build-to-shuffle combine to combine builds from two wide vectors.
This allows us to, in some cases, create a vector_shuffle out of a build_vector, when
the inputs to the build are extract_elements from two different vectors, at least one
of which is wider than the output. (E.g. a <8 x i16> being constructed out of
elements from a <16 x i16> and a <8 x i16>).

Differential Revision: https://reviews.llvm.org/D24491

llvm-svn: 281402
2016-09-13 21:53:32 +00:00
Simon Pilgrim 4a8eba3e96 [DAGCombiner] Use APInt directly in (shl (zext (srl x, C)), C) combine range test
To avoid assertion, we must ensure that the inner shift constant is within range before calling ConstantSDNode::getZExtValue(). We already know that the outer shift constant is in range.

Followup to D23007

llvm-svn: 281362
2016-09-13 18:33:29 +00:00
Simon Pilgrim bd28a85d14 [DAGCombiner] Use APInt directly in (shl (ext (shl x, c1)), c2) combine
Fix failure to detect out of range shift constants leading to assert in ConstantSDNode::getZExtValue()

Followup to D23007

llvm-svn: 281354
2016-09-13 17:15:28 +00:00
Ayman Musa 0c2da88f82 Remove MVT:i1 xor instruction before SELECT. (Performance improvement).
Differential Revision: https://reviews.llvm.org/D23764

llvm-svn: 281308
2016-09-13 09:12:45 +00:00
Michael Kuperstein efc0667583 [DAG] Refactor BUILD_VECTOR combine to make it easier to extend. NFCI.
This should make it easier to add cases that we currently don't cover,
like supporting more kinds of type mismatches and more than 2 input vectors.

llvm-svn: 281283
2016-09-13 00:57:43 +00:00
Justin Lebar adbf09e8cf [CodeGen] Split out the notions of MI invariance and MI dereferenceability.
Summary:
An IR load can be invariant, dereferenceable, neither, or both.  But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.

This patch splits up the notions of invariance and dereferenceability at
the MI level.  It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D23371

llvm-svn: 281151
2016-09-11 01:38:58 +00:00
Arnold Schwaighofer 7d7b4b4014 Create phi nodes for swifterror values at the end of the phi instructions list
ISel makes assumption about the order of phi nodes.

rdar://28190150

llvm-svn: 281095
2016-09-09 21:18:47 +00:00
Simon Pilgrim 153b408433 [SelectionDAG] Ensure DAG::getZeroExtendInReg is called with a scalar type
Fixes issue with rL280927 identified by Mikael Holmén

llvm-svn: 281042
2016-09-09 13:31:52 +00:00
James Molloy c6a6144966 [SDAGBuilder] Don't create a binary tree for switches in minsize mode
This bloats codesize - all of the non-leaf nodes are extra code.

llvm-svn: 280932
2016-09-08 13:12:22 +00:00
Simon Pilgrim cc7b4b511b [SelectionDAG] Add BUILD_VECTOR support to computeKnownBits and SimplifyDemandedBits
Add the ability to computeKnownBits and SimplifyDemandedBits to extract the known zero/one bits from BUILD_VECTOR, returning the known bits that are shared by every vector element.

This is an initial step towards determining the sign bits of a vector (PR29079).

Differential Revision: https://reviews.llvm.org/D24253

llvm-svn: 280927
2016-09-08 12:57:51 +00:00
Simon Pilgrim a01ee07a19 [DAGCombiner] Enable AND combines of splatted constant vectors
Allow AND combines to use a vector splatted constant as well as a constant scalar.

Preliminary part of D24253.

llvm-svn: 280926
2016-09-08 12:36:39 +00:00
Elena Demikhovsky dcc86d5bb6 Shift-left (ISD::SHL) operation crashes on "DAG Legalization" phase.
https://llvm.org/bugs/show_bug.cgi?id=29058.

While node legalization we tried to legalize its operands.
If an operand node is replaced during legalization the user node may be destroyed.

Differential Revision: https://reviews.llvm.org/D24244

llvm-svn: 280862
2016-09-07 20:54:33 +00:00
Matt Arsenault 6cda10c950 Remove unnecessary call to getAllocatableRegClass
This reapplies r252565 and r252674, effectively reverting r252956.

This allows VS_32/VS_64 to be unallocatable like they should be.

llvm-svn: 280783
2016-09-07 06:16:45 +00:00
Hal Finkel 8ca2ed22b2 [DAGCombine] More fixups to SETCC legality checking (visitANDLike/visitORLike)
I might have called this "r246507, the sequel". It fixes the same issue, as the
issue has cropped up in a few more places. The underlying problem is that
isSetCCEquivalent can pick up select_cc nodes with a result type that is not
legal for a setcc node to have, and if we use that type to create new setcc
nodes, nothing fixes that (and so we've violated the contract that the
infrastructure has with the backend regarding setcc node types).

Fixes PR30276.

For convenience, here's the commit message from r246507, which explains the
problem is greater detail:

[DAGCombine] Fixup SETCC legality checking

SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.

The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.

The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:

  (setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
     (which is possible for some combinations of (op1, op2))

If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.

To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).

Fixes PR24636.

llvm-svn: 280767
2016-09-06 23:02:23 +00:00
Simon Pilgrim 1b4462b7c1 [SelectionDAG] Simplify extract_subvector( insert_subvector ( Vec, In, Idx ), Idx ) -> In
If we are extracting a subvector that has just been inserted then we should just use the original inserted subvector.

This has come up in certain several x86 shuffle lowering cases where we are crossing 128-bit lanes.

Differential Revision: https://reviews.llvm.org/D24254

llvm-svn: 280715
2016-09-06 16:42:05 +00:00
Wei Mi c54d1298f5 Split the store of a wide value merged from an int-fp pair into multiple stores.
For the store of a wide value merged from a pair of values, especially int-fp pair,
sometimes it is more efficent to split it into separate narrow stores, which can
remove the bitwise instructions or sink them to colder places.

Now the feature is only enabled on x86 target, and only store of int-fp pair is
splitted. It is possible that the application scope gets extended with perf evidence
support in the future.

Differential Revision: https://reviews.llvm.org/D22840

llvm-svn: 280505
2016-09-02 17:17:04 +00:00
Andrea Di Biagio fd503e5af3 [DAGcombiner] Fix incorrect sinking of a truncate into the operand of a shift.
This fixes a regression introduced by revision 268094.
Revision 268094 added the following dag combine rule:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2

That rule converts a truncate of a shift-by-constant into a shift of a truncated
value. We do this only if the shift count is less than half the size in bits of
the truncated value (K < vt.size / 2).

The problem is that the constraint on the shift count is incorrect, so the rule
doesn't work well in some cases involving vector types. The combine rule should
have been written instead like this:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits()

Basically, if K is smaller than the "scalar size in bits" of the truncated value
then we know that by "sinking" the truncate into the operand of the shift we
would never accidentally make the shift undefined.

This patch fixes the check on the shift count, and adds test cases to make sure
that we don't regress the behavior.

Differential Revision: https://reviews.llvm.org/D24154

llvm-svn: 280482
2016-09-02 11:29:09 +00:00
Aditya Kumar 356f79d535 [SelectionDAGBuilder] Add const to relevant places
Reviewers: hans, evandro, sebpop

Differential Revision: https://reviews.llvm.org/D24112

llvm-svn: 280430
2016-09-01 23:35:26 +00:00
Michael Kuperstein 7bc54cebea [Legalizer] Don't throw away false low half when expanding GT/LT SETCC
When expanding a SETCC for which the low half is known to evaluate to false,
we can only throw it away for LT/GT comparisons, not LE/GE.

This fixes PR29170.

Differential Revision: https://reviews.llvm.org/D24151

llvm-svn: 280424
2016-09-01 23:02:32 +00:00
Michael Kuperstein 5f17d08f49 [SelectionDAG] Generate vector_shuffle nodes for undersized result vector sizes
Prior to this, we could generate a vector_shuffle from an IR shuffle when the
size of the result was exactly the sum of the sizes of the input vectors.
If the output vector was narrower - e.g. a <12 x i8> being formed by a shuffle
with two <8 x i8> inputs - we would lower the shuffle to a sequence of extracts
and inserts.

Instead, we can form a larger vector_shuffle, and then extract a subvector
of the right size - e.g. shuffle the two <8 x i8> inputs into a <16 x i8>
and then extract a <12 x i8>.

This also includes a target-specific X86 combine that in the presence of
AVX2 combines:
(vector_shuffle <mask> (concat_vectors t1, undef)
                       (concat_vectors t2, undef))
into:
(vector_shuffle <mask> (concat_vectors t1, t2), undef)
in cases where this allows us to form VPERMD/VPERMQ.

(This is not a separate commit, as that pattern does not appear without
the DAGBuilder change.)

llvm-svn: 280418
2016-09-01 21:32:09 +00:00
Michael Kuperstein b4743597bd Rename some variables to have meaningful names. NFC.
llvm-svn: 280391
2016-09-01 18:24:42 +00:00
Michael Kuperstein 65bc3c89ff [DAGCombine] Don't fold a trunc if it feeds an anyext
Legalization tends to create anyext(trunc) patterns. This should always be
combined - into either a single trunc, a single ext, or nothing if the
types match exactly. But if we happen to combine the trunc first, we may pull
the trunc away from the anyext or make it implicit (e.g. the truncate(extract)
-> extract(bitcast) fold).

To prevent this, we can avoid doing the fold, similarly to how we already handle
fpround(fpextend).

Differential Revision: https://reviews.llvm.org/D23893

llvm-svn: 280386
2016-09-01 17:59:24 +00:00
Hal Finkel 5081ac27c7 Add ISD::EH_DWARF_CFA, simplify @llvm.eh.dwarf.cfa on Mips, fix on PowerPC
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible
__builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently
broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is
lowered using:

  ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET)

where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86,
FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not
work for PowerPC. Because of the way that the stack layout works, the canonical
frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC
(there is a lower save-area offset as well), so it is not just a matter of
implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its
semantics -- We can do that, since it is currently used only for
@llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct
itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips
currently does this, but by using a custom lowering for ADD that specifically
recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern.

This change introduces a ISD::EH_DWARF_CFA node, which by default expands using
the existing logic, but can be directly lowered by the target. Mips is updated
to use this method (which simplifies its implementation, and I suspect makes it
more robust), and updates PowerPC to do the same.

Fixes PR26761.

Differential Revision: https://reviews.llvm.org/D24038

llvm-svn: 280350
2016-09-01 10:28:47 +00:00
Philip Reames 2b1084ac93 [statepoints][experimental] Add support for live-in semantics of values in deopt bundles
This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes.

The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate.

Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.)

My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default.

Differential Revision: https://reviews.llvm.org/D24000

llvm-svn: 280250
2016-08-31 15:12:17 +00:00
Krzysztof Parzyszek 354832e585 Propagate TBAA info in SelectionDAG::getIndexedLoad
Patch by Pranav Bhandarkar.

llvm-svn: 279998
2016-08-29 19:50:15 +00:00
Igor Breger 24281b4740 Fixed a bug in type legalizer for masked gather.
The problem occurs when the Node doesn't updated in place , UpdateNodeOperation() return the node that already exist.
In this case assert fail in PromoteIntegerOperand() , N have 2 results ( val + chain).

Differential Revision: http://reviews.llvm.org/D23756

llvm-svn: 279961
2016-08-29 09:12:31 +00:00
Quentin Colombet e063e1f68a [SelectionDAG] Do not run the ISel process on already selected code.
Right now, this cannot happen, but with the fall back path of GlobalISel
it will show up eventually.

llvm-svn: 279877
2016-08-26 22:32:55 +00:00
Michael Kuperstein 260daed147 Reuse an SDLoc throughout a function. NFC.
llvm-svn: 279767
2016-08-25 18:50:56 +00:00
Justin Lebar 1972e222ea [SelectionDAG] Use a union of bitfield structs for SDNode::SubclassData.
Summary:
This greatly simplifies our handling of SDNode::SubclassData.

NFC, hopefully.  :)

See discussion in D23035 for discussion about the design API of these
bitfields.

Reviewers: chandlerc

Subscribers: llvm-commits, rnk

Differential Revision: https://reviews.llvm.org/D23036

llvm-svn: 279537
2016-08-23 17:18:11 +00:00
Pete Cooper 036b94dad3 Fix some more asserts after r279466.
That commit added a new version of Intrinsic::getName which should only
be called when the intrinsic has no overloaded types.  There are several
debugging paths, such as SDNode::dump which are printing the name of the
intrinsic but don't have the overloaded types.  These paths should be ok
to just print the name instead of crashing.

The fix here is ultimately to just add a 'None' second argument as that
calls the overload capable getName, which is less efficient, but this is a
debugging path anyway, and not perf critical.

Thanks to Björn Pettersson for pointing out that there were more crashes.

llvm-svn: 279528
2016-08-23 16:23:45 +00:00
Simon Pilgrim 02b13d4d3c Use SDValue::getOpcode() helper instead of via SDValue::getNode()
llvm-svn: 279381
2016-08-20 20:04:18 +00:00
James Molloy 7ee640f9b6 [CodeGen] Fix a trivial type conversion bug dating back to pre-2008
The heuristic above this code is incredibly suspect, but disregarding that it mutates the cast opcode so we need to check the *mutated* opcode later to see if we need to emit an AssertSext or AssertZext node.

Fixes PR29041.

llvm-svn: 279223
2016-08-19 08:38:50 +00:00
Justin Bogner cd1d5aaf2e Replace a few more "fall through" comments with LLVM_FALLTHROUGH
Follow up to r278902. I had missed "fall through", with a space.

llvm-svn: 278970
2016-08-17 20:30:52 +00:00
Ayman Musa 71b43c5c1d Fix bug in DAGBuilder for getelementptr with expanded vector.
Replacing the usage of MVT with EVT in case the vector type is expanded.
Differential Revision: https://reviews.llvm.org/D23306

llvm-svn: 278913
2016-08-17 07:52:15 +00:00
Ayman Musa c96f421ad4 First commit (test commit) - Adding empty line.
llvm-svn: 278910
2016-08-17 07:37:34 +00:00
Justin Bogner b03fd12cef Replace "fallthrough" comments with LLVM_FALLTHROUGH
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.

llvm-svn: 278902
2016-08-17 05:10:15 +00:00
Pierre Gousseau 051db7d838 [x86] Refactor a PowerPC specific ctlz/srl transformation (NFC).
Following the discussion on D22038, this refactors a PowerPC specific setcc -> srl(ctlz) transformation so it can be used by other targets.

Differential Revision: https://reviews.llvm.org/D23445

llvm-svn: 278799
2016-08-16 13:53:53 +00:00
Eli Friedman 98151d6440 Fix typo in lowering for fp128 ueq.
Regression from r259791.

Differential Revision: https://reviews.llvm.org/D23374

llvm-svn: 278750
2016-08-15 21:46:19 +00:00
Wolfgang Pieb dfad9b20c9 Local variables whose address is taken and passed on to a call are described
in debug info using their stack slots instead of as an indirection of param reg + 0
offset. This is done by detecting FrameIndexSDNodes in SelectionDAG and generating
FrameIndexDbgValues for them. This ultimately generates DBG_VALUEs with stack
location operands.

Differential Revision: http://reviews.llvm.org/D23283

llvm-svn: 278703
2016-08-15 18:18:26 +00:00
Duncan P. N. Exon Smith f197b1f78f ADT: Remove all ilist_iterator => pointer casts, NFC
Remove all ilist_iterator to pointer casts.  There were two reasons for
casts:

  - Checking for an uninitialized (i.e., null) iterator.  I added
    MachineInstrBundleIterator::isValid() to check for that case.

  - Comparing an iterator against the underlying pointer value while
    avoiding converting the pointer value to an iterator.  This is
    occasionally necessary in MachineInstrBundleIterator, since there is
    an assertion in the constructors that the underlying MachineInstr is
    not bundled (but we don't care about that if we're just checking for
    pointer equality).

To support the latter case, I rewrote the == and != operators for
ilist_iterator and MachineInstrBundleIterator.

  - The implicit constructors now use enable_if to exclude
    const-iterator => non-const-iterator conversions from overload
    resolution (previously it was a compiler error on instantiation, now
    it's SFINAE).

  - The == and != operators are now global (friends), and are not
    templated.

  - MachineInstrBundleIterator has overloads to compare against both
    const_pointer and const_reference.  This avoids the implicit
    conversions to MachineInstrBundleIterator that assert, instead just
    checking the address (and I added unit tests to confirm this).

Notably, the only remaining uses of ilist_iterator::getNodePtrUnchecked
are in ilist.h, and no code outside of ilist*.h directly relies on this
UB end-iterator-to-pointer conversion anymore.  It's still needed for
ilist_*sentinel_traits, but I'll clean that up soon.

llvm-svn: 278478
2016-08-12 05:05:36 +00:00
David Majnemer 0d955d0bf5 Use the range variant of find instead of unpacking begin/end
If the result of the find is only used to compare against end(), just
use is_contained instead.

No functionality change is intended.

llvm-svn: 278433
2016-08-11 22:21:41 +00:00
David Majnemer 0a16c22846 Use range algorithms instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278417
2016-08-11 21:15:00 +00:00
Simon Pilgrim 85c7ea86ae [DAGCombine] Avoid INSERT_SUBVECTOR reinsertions (PR28678)
If the input vector to INSERT_SUBVECTOR is another INSERT_SUBVECTOR, and this inserted subvector replaces the last insertion, then insert into the common source vector.

i.e. 
INSERT_SUBVECTOR( INSERT_SUBVECTOR( Vec, SubOld, Idx ), SubNew, Idx ) --> INSERT_SUBVECTOR( Vec, SubNew, Idx )

Differential Revision: https://reviews.llvm.org/D23330

llvm-svn: 278211
2016-08-10 10:50:53 +00:00
Simon Pilgrim 76964e3140 [DAGCombiner] Better support for shifting large value type by constants
As detailed on D22726, much of the shift combining code assume constant values will fit into a uint64_t value and calls ConstantSDNode::getZExtValue where it probably shouldn't (leading to asserts). Using APInt directly avoids this problem but we encounter other assertions if we attempt to compare/operate on 2 APInt of different bitwidths.

This patch adds a helper function to ensure that 2 APInt values are zero extended as required so that they can be safely used together. I've only added an initial example use for this to the '(SHIFT (SHIFT x, c1), c2) --> (SHIFT x, (ADD c1, c2))' combines. Further cases can easily be added as required.

Differential Revision: https://reviews.llvm.org/D23007

llvm-svn: 278141
2016-08-09 17:39:11 +00:00
Diana Picus 4dd6c249ac [SelectionDAG] Refactor visitInlineAsm a bit. NFCI.
This shaves off ~100 lines from visitInlineAsm.

llvm-svn: 277987
2016-08-08 08:54:39 +00:00
Nikolai Bozhenov f679530ba1 [X86] Heuristic to selectively build Newton-Raphson SQRT estimation
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.

The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.

Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.

Differential Revision: https://reviews.llvm.org/D21379

llvm-svn: 277725
2016-08-04 12:47:28 +00:00
Diana Picus ddddbc2440 Typo fix in comment. NFC
llvm-svn: 277704
2016-08-04 08:25:08 +00:00
Elliot Colp 82b1468a4d Disable shrinking of SNaN constants
When expanding FP constants, we attempt to shrink doubles to floats and perform an extending load.
However, on SystemZ, and possibly on other targets (I've only confirmed the problem on SystemZ), the FP extending load instruction may convert SNaN into QNaN, or may cause an exception. So in the general case, we would still like to shrink FP constants, but SNaNs should be left as doubles.

Differential Revision: https://reviews.llvm.org/D22685

llvm-svn: 277602
2016-08-03 15:09:21 +00:00
Michael Kuperstein c97da7f3a4 [DAGCombine] Make sext(setcc) combine respect getBooleanContents
We used to combine "sext(setcc x, y, cc) -> (select (setcc x, y, cc), -1, 0)"
Instead, we should combine to (select (setcc x, y, cc), T, 0) where the value
of T is 1 or -1, depending on the type of the setcc, and getBooleanContents()
for the type if it is not i1.

This fixes PR28504.

llvm-svn: 277371
2016-08-01 19:39:49 +00:00
Weiming Zhao 812fde3603 DAG: avoid duplicated truncating for sign extended operand
Summary:
When performing cmp for EQ/NE and the operand is sign extended, we can
avoid the truncaton if the bits to be tested are no less than origianl
bits.

Reviewers: eli.friedman

Subscribers: eli.friedman, aemerson, nemanjai, t.p.northover, llvm-commits

Differential Revision: https://reviews.llvm.org/D22933

llvm-svn: 277252
2016-07-29 23:33:48 +00:00
Andrew Kaylor b99d1cc7ed Recommitting r275284: add support to inline __builtin_mempcpy
Patch by Sunita Marathe

Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!)

llvm-svn: 277189
2016-07-29 18:23:18 +00:00
Nirav Dave 563d6f8614 Cleanup TransferDbgValues
[DAG] Check debug values for invalidation before transferring and mark
old debug values invalid when transferring to another SDValue.

This fixes PR28613.

Reviewers: jyknight, hans, dblaikie, echristo

Subscribers: yaron.keren, ismail, llvm-commits

Differential Revision: https://reviews.llvm.org/D22858

llvm-svn: 277135
2016-07-29 11:49:32 +00:00
Nirav Dave b7c72717c9 Fix DbgValue handling in SelectionDAG.
[DAG] Relocate TransferDbgValues in ReplaceAllUsesWith(SDValue, SDValue)
to before we modify the CSE maps.

llvm-svn: 277027
2016-07-28 19:48:39 +00:00
Matthias Braun 941a705b7b MachineFunction: Return reference for getFrameInfo(); NFC
getFrameInfo() never returns nullptr so we should use a reference
instead of a pointer.

llvm-svn: 277017
2016-07-28 18:40:00 +00:00
Simon Pilgrim 10bf0ff879 [DAGCombiner] Use APInt directly to detect out of range shift constants
Using getZExtValue() will assert if the value doesn't fit into uint64_t - SHL was already doing this, I've just updated ASHR/LSHR to match

As mentioned on D22726

llvm-svn: 276855
2016-07-27 10:30:55 +00:00
Andrew Kaylor f990fa5f7b Reverting r276771 due to MSan failures.
llvm-svn: 276824
2016-07-27 01:19:24 +00:00
Andrew Kaylor 3104a6bad0 Re-committing r275284: add support to inline __builtin_mempcpy
Patch by Sunita Marathe

Differential Revision: http://reviews.llvm.org/D21920

llvm-svn: 276771
2016-07-26 17:23:13 +00:00
Simon Pilgrim 820f87a72d [SelectionDAG] Optimization of BITREVERSE legalization for power-of-2 integer scalar/vector types
An extension of D19978, this patch replaces the default BITREVERSE evaluation of individual bit masks+shifts with block mask+shifts when we have integer elements of power-of-2 bits in size.

After calling BSWAP to reverse the order of the constituent bytes (which typically follows a similar approach), every neighbouring 4-bits, 2-bits and finally 1-bit pairs are masked off and swapped over with shifts.

In doing so we can significantly reduce the number of operations required.

Differential Revision: https://reviews.llvm.org/D21578

llvm-svn: 276432
2016-07-22 16:46:25 +00:00
Ahmed Bougacha 29333c9de6 [FastISel] Ignore @llvm.assume.
llvm-svn: 276410
2016-07-22 12:54:53 +00:00
Elena Demikhovsky 2c0780b8e5 AVX-512: Fixed BT instruction selection.
The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets.
But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1).

I added the new sequence (truncate (a >>n) to i1) to the BT pattern.

Differential Revision: https://reviews.llvm.org/D22354

llvm-svn: 275950
2016-07-19 07:14:21 +00:00
Chih-Hung Hsieh 4d9f2c154d [X86] Accept SELECT op code for x86-64 fp128 type
DAGTypeLegalizer::CanSkipSoftenFloatOperand should allow
SELECT op code for x86_64 fp128 type for MME targets,
so SoftenFloatOperand does not abort on SELECT op code.

Differential Revision: http://reviews.llvm.org/D21758

llvm-svn: 275818
2016-07-18 17:20:09 +00:00
Simon Dardis d32a2d30cb [inlineasm] Propagate operand constraints to the backend
When SelectionDAGISel transforms a node representing an inline asm
block, memory constraint information is not preserved. This can cause
constraints to be broken when a memory offset is of the form:

offset + frame index

when the frame is resolved.

By propagating the constraints all the way to the backend, targets can
enforce memory operands of inline assembly to conform to their constraints.

For MIPSR6, some instructions had their offsets reduced to 9 bits from
16 bits such as ll/sc. This becomes problematic when using inline assembly
to perform atomic operations, as an offset can generated that is too big to
encode in the instruction.

Reviewers: dsanders, vkalintris

Differential Review: https://reviews.llvm.org/D21615

llvm-svn: 275786
2016-07-18 13:17:31 +00:00
Justin Lebar 9c375817ac [SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, getStore, and friends.
Summary:
Instead, we take a single flags arg (a bitset).

Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.

This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted.  It also greatly simplifies the process of adding another flag
to getLoad.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits

Differential Revision: http://reviews.llvm.org/D22249

llvm-svn: 275592
2016-07-15 18:27:10 +00:00
Justin Lebar 0af80cd6f0 [CodeGen] Take a MachineMemOperand::Flags in MachineFunction::getMachineMemOperand.
Summary:
Previously we took an unsigned.

Hooray for type-safety.

Reviewers: chandlerc

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D22282

llvm-svn: 275591
2016-07-15 18:26:59 +00:00
Michael Kuperstein 4d36e77048 Fix copy/paste bug in r275340.
llvm-svn: 275343
2016-07-13 23:28:00 +00:00
Michael Kuperstein be837fa40f [DAG] Correctly chain masked loads
If a masked loads is not added to the chain, it should not reset the chain's
root.

This fixes the remaining part of PR28515.

llvm-svn: 275340
2016-07-13 23:23:40 +00:00
Andrew Kaylor 346dd7f1bd Reverting r275284 due to platform-specific test failures
llvm-svn: 275304
2016-07-13 19:09:16 +00:00
Andrew Kaylor 12cccdd731 Fix for Bug 26903, adds support to inline __builtin_mempcpy
Patch by Sunita Marathe

Differential Revision: http://reviews.llvm.org/D21920

llvm-svn: 275284
2016-07-13 17:25:11 +00:00
Sanjay Patel bb7d87ee25 fix documentation comments; NFC
llvm-svn: 275101
2016-07-11 20:50:39 +00:00
Sanjay Patel fedc01ad76 [DAG] make isConstantSplatVector() available to the rest of lowering
llvm-svn: 275025
2016-07-10 21:27:06 +00:00
Sanjay Patel 9bedcdb5f5 fix documentation comments; NFC
llvm-svn: 275021
2016-07-10 21:02:16 +00:00
Sanjay Patel 303326541b reformat, fix comments/names; NFCI
llvm-svn: 275015
2016-07-10 13:05:57 +00:00
Benjamin Kramer 4d09892e9a Give helper classes/functions internal linkage. NFC.
llvm-svn: 275014
2016-07-10 11:28:51 +00:00
Sanjay Patel 6170b4bebd fix documentation comments; NFC
llvm-svn: 274981
2016-07-09 18:52:07 +00:00
Matt Arsenault 3fb8f9eabf Reapply r274829 with fix for FP vectors
llvm-svn: 274937
2016-07-08 21:25:33 +00:00
Nico Weber 28410c6846 Revert r274829, it caused PR28472.
llvm-svn: 274916
2016-07-08 19:52:19 +00:00
Duncan P. N. Exon Smith 1b824c9e43 SelectionDAG: Avoid implicit iterator conversions in SelectionDAGBuilder, NFC
llvm-svn: 274907
2016-07-08 19:23:12 +00:00
Duncan P. N. Exon Smith dca9bffa31 SelectionDAG: Avoid implicit iterator conversions in SelectionDAGISel, NFC
llvm-svn: 274904
2016-07-08 19:11:40 +00:00
Duncan P. N. Exon Smith 6135f3f1cb SelectionDAG: Avoid implicit iterator conversions in ScheduleDAGSDNodes, NFC
llvm-svn: 274903
2016-07-08 19:07:09 +00:00
Duncan P. N. Exon Smith 10383ecd76 SelectionDAG: Avoid implicit iterator conversions in FastISel, NFC
llvm-svn: 274899
2016-07-08 18:36:41 +00:00
Sjoerd Meijer 1ee119f897 Do not expand SDIV when compiling for minimum code size
Differential Revision: http://reviews.llvm.org/D22139

llvm-svn: 274855
2016-07-08 15:32:01 +00:00
Sjoerd Meijer 46c4c3d31c Addressing post-commit comments regarding not expanding UDIV;
we don't expand only when compiling for minimum code size.

llvm-svn: 274847
2016-07-08 14:17:09 +00:00
Sjoerd Meijer a625af3feb Code size optimisation: don't expand a div to a mul and and a shift sequence.
As a result, the urem instruction will not be expanded to a sequence of umull,
lsrs, muls and sub instructions, but just a call to __aeabi_uidivmod.

Differential Revision: http://reviews.llvm.org/D22131

llvm-svn: 274843
2016-07-08 12:54:43 +00:00
Matt Arsenault c3a6fe6ecd Bug 28444: Fix assertion when extract_vector_elt has mismatched type
For some reason extract_vector_elt is sometimes allowed to have
a different result type than the vector element type.

llvm-svn: 274829
2016-07-08 07:05:00 +00:00
Andrew Kaylor 65fa0704aa Include SelectionDAGISel in the opt-bisect process
Differential Revision: http://reviews.llvm.org/D21143

llvm-svn: 274786
2016-07-07 18:55:02 +00:00
Tim Shen 1c3c0afc53 [DAGCombiner] Fix visitSTORE to continue processing current SDNode, if findBetterNeighborChains doesn't actually CombineTo it.
Summary:
findBetterNeighborChains may or may not find a better chain for each node it finds, which include the node ("St") that visitSTORE is currently processing. If no better chain is found for St, visitSTORE should continue instead of return SDValue(St, 0), as if it's CombinedTo'ed.

This fixes bug 28130. There might be other ways to make the test pass (see D21409). I think both of the patches are fixing actual bugs revealed by the same testcase.

Reviewers: echristo, wschmidt, hfinkel, kbarton, amehsan, arsenm, nemanjai, bogner

Subscribers: mehdi_amini, nemanjai, llvm-commits

Differential Revision: http://reviews.llvm.org/D21692

llvm-svn: 274644
2016-07-06 17:44:03 +00:00
Balaram Makam d4acd7ed10 Revert r259387: "AArch64: Implement missed conditional compare sequences."
This reverts commit r259387 because it inserts illegal code after legalization
    in some backends where i64 OR type is illegal for example.

llvm-svn: 274573
2016-07-05 20:24:05 +00:00
Matt Arsenault 2d79389508 DAGCombiner: Fold away vector extract of insert with the same index
This only really matters when the index is non-constant since the
constant case already gets taken care of by other combines.

llvm-svn: 274569
2016-07-05 18:25:02 +00:00
Craig Topper d83f818a3e [CodeGen] Make the code that detects a if a shuffle is really a concatenation of the inputs more general purpose.
We can now handle concatenation of each source multiple times. The previous code just checked for each source to appear once in either order.

This also now handles an entire source vector sized piece having undef indices correctly. We now concat with UNDEF instead of using one of the sources. This is responsible for the test case change.

llvm-svn: 274483
2016-07-04 06:19:35 +00:00
Craig Topper d1eca0f32c [CodeGen] Teach OR combine of shuffles involving zero vectors to better handle undef indices.
Undef indices can now be treated as zeros. Or if its undef ORed with zero, we will keep the undef.

llvm-svn: 274472
2016-07-03 19:37:12 +00:00
Craig Topper 90d7664a22 [CodeGen] Cleanup getVectorShuffle a bit to take advantage of its new ArrayRef argument and its begin/end iterators. Also use 'int' type for number of elements and loop iterators to remove several typecasts. No functional change intended.
llvm-svn: 274338
2016-07-01 06:54:51 +00:00
Craig Topper 2bd8b4b180 [CodeGen,Target] Remove the version of DAG.getVectorShuffle that takes a pointer to a mask array. Convert all callers to use the ArrayRef version. No functional change intended.
For the most part this simplifies all callers. There were two places in X86 that needed an explicit makeArrayRef to shorten a statically sized array.

llvm-svn: 274337
2016-07-01 06:54:47 +00:00
Duncan P. N. Exon Smith e4f5e4f4d1 CodeGen: Use MachineInstr& in TargetLowering, NFC
This is a mechanical change to make TargetLowering API take MachineInstr&
(instead of MachineInstr*), since the argument is expected to be a valid
MachineInstr.  In one case, changed a parameter from MachineInstr* to
MachineBasicBlock::iterator, since it was used as an insertion point.

As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.

llvm-svn: 274287
2016-06-30 22:52:52 +00:00
Rafael Espindola db6bd02185 Delete unused includes. NFC.
llvm-svn: 274225
2016-06-30 12:19:16 +00:00
Craig Topper 3a011de10c [DAGCombine] Teach DAG combine to handle ORs of shuffles involving zero vectors where the zero vector is the first operand to the shuffle instead of the second.
llvm-svn: 274097
2016-06-29 03:29:12 +00:00
Craig Topper f067a043fb [CodeGen] Make ShuffleVectorSDNode::commuteMask take a MutableArrayRef instead of SmallVectorImpl. NFC.
llvm-svn: 274095
2016-06-29 03:29:06 +00:00
Rafael Espindola b1556c42ce Use isPositionIndependent in a few more places.
I think this converts all the simple cases that really just care about
the generated code being position independent or not. The remaining
uses are a bit more complicated and are checking things like "is this
a library or executable" or "can this symbol be preempted".

llvm-svn: 274055
2016-06-28 20:13:36 +00:00
Rafael Espindola 3beef8d6db Move shouldAssumeDSOLocal to Target.
Should fix the shared library build.

llvm-svn: 273958
2016-06-27 23:15:57 +00:00
Matt Arsenault f0f721a682 DAGCombiner: Don't narrow volatile vector loads + extract
llvm-svn: 273909
2016-06-27 19:31:04 +00:00
Rafael Espindola 0a68bf9627 Use isPositionIndependent predicate. NFC.
llvm-svn: 273830
2016-06-26 22:38:44 +00:00
Rafael Espindola 12bb38d367 Use isPositionIndependent predicate.
llvm-svn: 273828
2016-06-26 22:30:06 +00:00
Rafael Espindola ae0d866f56 Refactor a duplicated predicate. NFC.
llvm-svn: 273826
2016-06-26 22:13:55 +00:00
Craig Topper 9c809bee1c [SelectionDAG] Use DAG.getCommutedVectorShuffle instead of reimplementing it.
llvm-svn: 273802
2016-06-26 05:10:49 +00:00
Rafael Espindola 88ae09e9be Use shouldAssumeDSOLocal in isOffsetFoldingLegal.
This makes it slightly more powerful for dynamic-no-pic.

llvm-svn: 273704
2016-06-24 18:48:36 +00:00
Nirav Dave bfdb483755 Preserve DebugInfo when replacing values in DAGCombiner
Recommiting after correcting over-eager Debug Value transfer fixing PR28270.

[DAG] Previously debug values would transfer debuginfo for the selected
start node for a replacement which allows for debug to be dropped.

Push debug value transfer to occur with node/value replacement in
SelectionDAG, remove now extraneous transfers of debug values.

This refixes PR9817 which was being incompletely checked in the
testsuite.

Reviewers: jyknight

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D21037

llvm-svn: 273585
2016-06-23 17:52:57 +00:00
Peter Collingbourne 6717803485 Revert r273456, "Preserve DebugInfo when replacing values in DAGCombiner" as it caused pr28270.
llvm-svn: 273518
2016-06-23 00:06:17 +00:00
Nirav Dave 96beb7dee5 Preserve DebugInfo when replacing values in DAGCombiner
Recommiting after fixing over-aggressive assertion

[DAG] Previously debug values would transfer debuginfo for the selected
start node for a replacement which allows for debug to be dropped.

Push debug value transfer to occur with node/value replacement in
SelectionDAG, remove now extraneous transfers of debug values.

This refixes PR9817 which was being incompletely checked in the
testsuite.

Reviewers: jyknight

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D21037

llvm-svn: 273456
2016-06-22 19:03:26 +00:00
Wei Ding 0526e7f8d9 AMDGPU: Add convergent flag to INLINEASM instruction.
Differential Revision: http://reviews.llvm.org/D21214

llvm-svn: 273455
2016-06-22 18:51:08 +00:00
Krzysztof Parzyszek e116d500a7 [SDAG] Remove FixedArgs parameter from CallLoweringInfo::setCallee
The setCallee function will set the number of fixed arguments based
on the size of the argument list. The FixedArgs parameter was often
explicitly set to 0, leading to a lack of consistent value for non-
vararg functions.

Differential Revision: http://reviews.llvm.org/D20376

llvm-svn: 273403
2016-06-22 12:54:25 +00:00
Simon Pilgrim bb8a40fdd2 Strip trailing whitespace
llvm-svn: 273264
2016-06-21 14:37:39 +00:00
Daniel Sanders bf2c03ee69 [arm+x86] Make GNU variants behave like GNU w.r.t combining sin+cos into sincos.
Summary:
canCombineSinCosLibcall() would previously combine sin+cos into sincos for
GNUX32/GNUEABI/GNUEABIHF regardless of whether UnsafeFPMath were set or not.
However, GNU would only combine them for UnsafeFPMath because sincos does not
set errno like sin and cos do. It seems likely that this is an oversight.

Reviewers: t.p.northover

Subscribers: t.p.northover, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D21431

llvm-svn: 273259
2016-06-21 12:29:03 +00:00
David Majnemer e61e4bfd87 Replace silly uses of 'signed' with 'int'
llvm-svn: 273244
2016-06-21 05:10:24 +00:00