Commit Graph

26301 Commits

Author SHA1 Message Date
Cameron McInally 156eb28289 [CodeGen] Add comment about FSUB <-> FNEG xforms
Differential Revision: https://reviews.llvm.org/D61741

llvm-svn: 360366
2019-05-09 19:28:52 +00:00
Florian Hahn be10bc71f9 [DAGCombiner] Limit number of nodes explored as store candidates.
To find the candidates to merge stores we iterate over all nodes in a chain
for each store, which leads to quadratic compile times for large basic blocks
with a large number of stores.

Reviewers: niravd, spatel, craig.topper

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D61511

llvm-svn: 360357
2019-05-09 17:05:52 +00:00
Simon Pilgrim 38ef296265 [CodeGenPrepare] Ensure we get a non-null result from getTrueOrFalseValue. NFCI.
llvm-svn: 360328
2019-05-09 10:51:26 +00:00
Markus Lavin 92d5db524e Make sub-registers index names case sensitive in the MIRParser
Prior to this change sub-register index names are assumed to be lower
case (but they are printed with original casing). This means that if a
target has some upper case characters in its sub-register names then
mir-export directly followed by mir-import is not possible. This also
means that sub-register indices currently are (and will continue to be)
slightly inconsistent with register names which are printed and assumed
to be lower case.

As the current textual representation of mir has a few inconsistencies
in this area it is a bit arbitrary how to address the matter. This
change is towards the direction that we feel is most correct (i.e. case
sensitivity).

Differential Revision: https://reviews.llvm.org/D61499

llvm-svn: 360318
2019-05-09 08:29:04 +00:00
Pengfei Wang c05aad0532 Bugfix for nullptr check by klocwork
Klocwork static check:
Pointer from call to function `DebugLoc::operator DILocation *() const `
may be NULL and will be dereference in function `printExtendedName```
Patch by Shengchen Kan (skan)
Differential Revision: https://reviews.llvm.org/D61715

llvm-svn: 360317
2019-05-09 08:09:21 +00:00
Bjorn Pettersson 8d19e94f13 [CodeGen] Use "DL.getPointerSizeInBits" instead of "8 * DL.getPointerSize". NFC
llvm-svn: 360315
2019-05-09 08:07:36 +00:00
Leonard Chan 95b7abdcc5 [SelectionDAG] Expand ADD/SUBCARRY
This patch allows for expansion of ADDCARRY and SUBCARRY when the target does not support it.

Differential Revision: https://reviews.llvm.org/D61411

llvm-svn: 360303
2019-05-09 01:17:48 +00:00
Eric Christopher c93f56d39e Temporarily Revert "[DebugInfo] Terminate more location-list ranges at the end of blocks"
as it was causing significant compile time regressions.

This reverts commit r359426 while we come up with testcases and additional ideas.

llvm-svn: 360301
2019-05-08 23:54:03 +00:00
Sanjay Patel 902b3ecdad [SelectionDAG] fold 'fneg undef' to undef
This is extracted from the original draft of D61419 with some additional tests.
We don't currently get this in IR (it's conservatively turned into a NaN),
but presumably that'll get updated as we add real IR support for 'fneg'
rather than 'fsub -0.0, x'.

The x86-32 run shows the following, and I haven't looked further to see why,
but that seems to be independent:
  Legalizing: t1: f32 = undef
  Trying to expand node
  Creating fp constant: t4: f32 = ConstantFP<0.000000e+00>

Differential Revision: https://reviews.llvm.org/D61516

llvm-svn: 360296
2019-05-08 22:19:52 +00:00
Quentin Colombet 157427245a [RegAllocFast] Scan physcial reg definitions before assigning virtual reg definitions
When assigning the definitions of an instruction we were updating
the available registers while walking the definitions. Some of
those definitions may be from physical registers and thus, they are
not available for other definitions to take, but by the time we see
that we may have already assign these registers to another
virtual register.

Fix that by walking through all the definitions and mark as unavailable
the physical register definitions, then do the virtual register assignments.

PR41790

llvm-svn: 360278
2019-05-08 18:30:26 +00:00
Craig Topper 493aec3ef5 [FastISel][X86] Support FNeg instruction in target independent fast isel handling
This patch adds support for calling selectFNeg for FNeg instructions in addition to the fsub idiom

Differential Revision: https://reviews.llvm.org/D61624

llvm-svn: 360273
2019-05-08 17:27:08 +00:00
Simon Pilgrim 2788ad3ee2 [LegalizeDAG] Assert non-power-of-2 load/store op splits are in range. NFCI.
Fixes static analyzer undefined/out-of-range shift warnings.

llvm-svn: 360245
2019-05-08 11:22:10 +00:00
Simon Pilgrim 97a0c54179 Fix cppcheck operator precedence warning. NFCI.
llvm-svn: 360234
2019-05-08 10:07:34 +00:00
QingShan Zhang 0e71a6e755 [CodeGenPrepare] Don't split the store if it is volatile
We shouldn't split the store when it is volatile.

Differential Revision: https://reviews.llvm.org/D61169

llvm-svn: 360228
2019-05-08 07:32:12 +00:00
QingShan Zhang e065af6a42 [NFC] Add a static function to do the endian check
Add a new function to do the endian check, as I will commit another patch later, which will also need the endian check. 

Differential Revision: https://reviews.llvm.org/D61236

llvm-svn: 360226
2019-05-08 07:21:37 +00:00
Austin Kerbow 6e6480e216 [CodeGen] Rename DEBUG_TYPE for default hazard recognizer.
Summary:
The DEBUG_TYPE of the default hazard recognizer should be updated to
match the DEBUG_TYPE of the machine-scheduler pass.

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61359

llvm-svn: 360198
2019-05-07 22:09:04 +00:00
Adrian Prantl e6e8db5e9b Debug Info: Support address space attributes on rvalue references.
DWARF5, 2.12 20ff says that

Any debugging information entry representing a pointer or reference
type [may have a DW_AT_address_class attribute].

The existing code (https://reviews.llvm.org/D29670) seems to take a
quite literal interpretation of that wording. I don't see a reason why
an rvalue reference isn't a reference type in the spirit of that
paragraph. This patch allows rvalue references to also have address
spaces.

rdar://problem/50511483

Differential Revision: https://reviews.llvm.org/D61625

llvm-svn: 360176
2019-05-07 17:42:38 +00:00
Florian Hahn a9d6c32eaf [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
When simplifying TokenFactors, we potentially iterate over all
operands of a large number of TokenFactors. This causes quadratic
compile times in some cases and the large token factors cause additional
scalability problems elsewhere.

This patch adds some limits to the number of nodes explored for the
cases mentioned above.

Reviewers: niravd, spatel, craig.topper

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D61397

llvm-svn: 360171
2019-05-07 16:47:27 +00:00
Simon Pilgrim 3044ac058b Avoid use-after-move warnings by using swap instead. NFCI.
Swap should be as quick in these cases, and leaves the original variables in a known (empty) state.

llvm-svn: 360164
2019-05-07 15:45:00 +00:00
Craig Topper c6d445f9c1 [FastISel][X86] If selectFNeg fails, fall back to SelectionDAG not treating it as an fsub.
Summary:
If fneg lowering for fsub -0.0, x fails we currently fall back to treating it as an fsub. This has different behavior for nans than the xor with sign bit trick we normally try to do. On X86, the xor trick for double fails fast-isel in 32-bit mode with sse2 due to 64 bit integer types not being available. With -O2 we would always use an xorpd for this case. If we use subsd, this creates an observable behavior difference between -O0 and -O2. So fall back to SelectionDAG if we can't fast-isel it, that way SelectionDAG will use the xorpd.

I believe this patch is restoring the behavior prior to r345295 from last October. This was missed then because our fast isel case in 32-bit mode aborted fast-isel earlier for another reason. But I've added new tests to cover that.

Reviewers: andrew.w.kaylor, cameron.mcinally, spatel, efriedma

Reviewed By: cameron.mcinally

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61622

llvm-svn: 360111
2019-05-07 04:25:24 +00:00
Fangrui Song da82ce99b7 [DebugInfo] Delete TypedDINodeRef
TypedDINodeRef<T> is a redundant wrapper of Metadata * that is actually a T *.

Accordingly, change DI{Node,Scope,Type}Ref uses to DI{Node,Scope,Type} * or their const variants.
This allows us to delete many resolve() calls that clutter the code.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D61369

llvm-svn: 360108
2019-05-07 02:06:37 +00:00
Amy Huang 987b969bab Fix bug in getCompleteTypeIndex in codeview debug info
Summary:
When there are multiple instances of a forward decl record type, only the first one is emitted with a type index, because
the type is added to a map with a null type index. Avoid this by reordering so that forward decl types aren't added to the map.

Reviewers: rnk

Subscribers: aprantl, hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61460

llvm-svn: 360101
2019-05-06 23:37:03 +00:00
Craig Topper 39f1a97417 [FastISel] Pass the fneg input operand to hasTrivialKill in FastISel::selectFNeg.
We're trying to calculate the kill flag for OpReg which is the input so we need to pass the input here.

llvm-svn: 360097
2019-05-06 23:09:09 +00:00
Philip Reames 2f53d79bff Fix pr33010, a 2 year old crashing regression
The problem was that we were creating a CMOV64rr <TargetFrameIndex>, <TargetFrameIndex>.  The entire point of a TFI is that address code is not generated, so there's no way to legalize/lower this.  Instead, simply prevent it's creation.

Arguably, we shouldn't be using *Target*FrameIndices in StatepointLowering at all, but that's a much deeper change.  

llvm-svn: 360090
2019-05-06 22:09:31 +00:00
Craig Topper ad56843dd7 [SelectionDAG][X86] Support inline assembly returning an mmx register into a type with fewer than 64 bits.
It's possible to use the 'y' mmx constraint with a type narrower than 64-bits.

This patch supports this by bitcasting the mmx type to 64-bits and then
truncating to the desired type.

There are probably other missing type combinations we need to support, but this
is the case we have a bug report for.

Fixes PR41748.

Differential Revision: https://reviews.llvm.org/D61582

llvm-svn: 360069
2019-05-06 19:50:14 +00:00
Craig Topper 55a71b575c Revert r359392 and r358887
Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead"
Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"

Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list.

Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead
moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer
contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with
respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work
there. I'll file a separate PR for that and add test cases.

Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised
if that bug can still be hit independent of that.

This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again.

llvm-svn: 360066
2019-05-06 19:29:24 +00:00
Nikita Popov cfe786a195 [SDAG][AArch64] Boolean and/or reduce to umax/min reduce (PR41635)
This addresses one half of https://bugs.llvm.org/show_bug.cgi?id=41635
by combining a VECREDUCE_AND/OR into VECREDUCE_UMIN/UMAX (if latter is
legal but former is not) for zero-or-all-ones boolean reductions (which
are detected based on sign bits).

Differential Revision: https://reviews.llvm.org/D61398

llvm-svn: 360054
2019-05-06 16:17:17 +00:00
Craig Topper f723490e76 [SelectionDAG] Replace llvm_unreachable at the end of getCopyFromParts with a report_fatal_error.
Based on PR41748, not all cases are handled in this function.

llvm_unreachable is treated as an optimization hint than can prune code paths
in a release build. This causes weird behavior when PR41748 is encountered on a
release build. It appears to generate an fp_round instruction from the floating
point code.

Making this a report_fatal_error prevents incorrect optimization of the code
and will instead generate a message to file a bug report.

llvm-svn: 360008
2019-05-06 04:01:49 +00:00
Roman Lebedev 1a1b922177 [NFC] BasicBlock: refactor changePhiUses() out of replacePhiUsesWith(), use it
Summary:
It is a common thing to loop over every `PHINode` in some `BasicBlock`
and change old `BasicBlock` incoming block to a new `BasicBlock` incoming block.
`replaceSuccessorsPhiUsesWith()` already had code to do that,
it just wasn't a function.
So outline it into a new function, and use it.

Reviewers: chandlerc, craig.topper, spatel, danielcdh

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61013

llvm-svn: 359996
2019-05-05 18:59:39 +00:00
Roman Lebedev e3b1d82b53 [NFC] PHINode: introduce replaceIncomingBlockWith() function, use it
Summary:
There is `PHINode::getBasicBlockIndex()`, `PHINode::setIncomingBlock()`
and `PHINode::getNumOperands()`, but no function to replace every
specified `BasicBlock*` predecessor with some other specified `BasicBlock*`.
Clearly, there are a lot of places that could use that functionality.

Reviewers: chandlerc, craig.topper, spatel, danielcdh

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61011

llvm-svn: 359995
2019-05-05 18:59:30 +00:00
Simon Pilgrim 0f89b76b84 [SelectionDAG] Use any_of/all_of where possible. NFCI.
llvm-svn: 359974
2019-05-05 10:30:04 +00:00
Sanjay Patel 5ab41a7a05 [CodeGenPrepare] limit overflow intrinsic matching to a single basic block (2nd try)
This is a subset of the original commit from rL359879
which was reverted because it could crash when using the 'RemovedInstructions'
structure that enables delayed deletion of dead instructions. The motivating
compile-time win does not require that change though. We should get most of
that win from this change alone.

Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.

See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html

Differential Revision: https://reviews.llvm.org/D61075

llvm-svn: 359969
2019-05-04 12:46:32 +00:00
Matt Arsenault b6c599afd3 Reapply r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block"
This reverts commit r359912.

This should pass now, since the clang test was made less fragile in
r359918.

llvm-svn: 359919
2019-05-03 19:06:57 +00:00
Simon Pilgrim 5d3b100750 [DAGCombine] Remove repeated variables. NFCI.
llvm-svn: 359915
2019-05-03 18:20:28 +00:00
Nico Weber bb852a9672 Revert r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block"
Makes clang/test/Misc/backend-stack-frame-diagnostics-fallback.cpp fail.

llvm-svn: 359912
2019-05-03 18:08:03 +00:00
Simon Pilgrim 308b5ec1ff [TargetLowering] SimplifySetCC - remove repeated variable. NFCI.
Also reduce scope of Temp variable.

llvm-svn: 359911
2019-05-03 18:02:33 +00:00
Evgeniy Stepanov 46ec57e576 Revert "[CodeGenPrepare] limit overflow intrinsic matching to a single basic block"
This reverts commit r359879, which introduced a compiler crash.

llvm-svn: 359908
2019-05-03 17:31:49 +00:00
Matt Arsenault daf2d653fa RegAllocFast: Add heuristic to detect values not live-out of a block
Add an improved/new heuristic to catch more cases when values are not
live out of a basic block.

Patch by Matthias Braun

llvm-svn: 359906
2019-05-03 17:03:24 +00:00
Simon Pilgrim d857f64c31 [SelectionDAG] CreateTopologicalOrder - don't use iterator
We shouldn't use an iterator to loop across a std::vector when the same loop is adding elements to that std::vector

Found by cppcheck

llvm-svn: 359900
2019-05-03 15:50:37 +00:00
Simon Pilgrim bc876df3a5 [TargetLowering] ShrinkDemandedConstant - reduce scope of TLO.DAG variable. NFCI.
Only ever used in one block

llvm-svn: 359890
2019-05-03 14:38:24 +00:00
Sanjay Patel 8ff072e48e [CodeGenPrepare] limit overflow intrinsic matching to a single basic block
Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.

Also, we were restarting the iterator loops when doing the overflow intrinsic
transforms by marking the dominator tree for update. That was done to prevent
iterating over a removed instruction. But we can postpone the deletion using
the existing "RemovedInsts" structure, and that means we don't need to update
the DT.

See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html

Differential Revision: https://reviews.llvm.org/D61075

llvm-svn: 359879
2019-05-03 13:09:18 +00:00
Simon Pilgrim e798e3a346 [TargetLowering] expandUnalignedStore - cleanup EVT variables. NFCI.
Avoid duplicated EVTs and rename Store/Load VTs to avoid -Wshadow warnings.

llvm-svn: 359877
2019-05-03 12:55:25 +00:00
Anton Afanasyev 6d08b8dbae Revert "[MIR] Add simple PRE pass to MachineCSE"
This reverts commit 9c20156de3.
It breaks stage 2 of clang-ppc64be-linux-multistage.

llvm-svn: 359875
2019-05-03 12:36:22 +00:00
Simon Pilgrim 42d2b604b5 [SelectionDAG] Use INT_MIN as (1 << 31) is UB for signed integers. NFCI.
llvm-svn: 359873
2019-05-03 11:32:00 +00:00
Simon Pilgrim bfd00a6440 [SelectionDAG] computeKnownBits - remove some duplicate/shadow variables. NFCI.
llvm-svn: 359872
2019-05-03 11:11:03 +00:00
Anton Afanasyev 9c20156de3 [MIR] Add simple PRE pass to MachineCSE
This is the second part of the commit fixing PR38917 (hoisting
partitially redundant machine instruction). Most of PRE (partitial
redundancy elimination) and CSE work is done on LLVM IR, but some of
redundancy arises during DAG legalization. Machine CSE is not enough
to deal with it. This simple PRE implementation works a little bit
intricately: it passes before CSE, looking for partitial redundancy
and transforming it to fully redundancy, anticipating that the next
CSE step will eliminate this created redundancy. If CSE doesn't
eliminate this, than created instruction will remain dead and eliminated
later by Remove Dead Machine Instructions pass.

The third part of the commit is supposed to refactor MachineCSE,
to make it more clear and to merge MachinePRE with MachineCSE,
so one need no rely on further Remove Dead pass to clear instrs
not eliminated by CSE.

First step: https://reviews.llvm.org/D54839

Fixes llvm.org/PR38917

Reviewers: RKSimon

Subscribers: hfinkel, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D56772

llvm-svn: 359870
2019-05-03 10:30:59 +00:00
Quentin Colombet c9256cc6ba [IRTranslator] Use the alloc size instead of the store size when translating allocas
We use to incorrectly use the store size instead of the alloc size when
creating the stack slot for allocas.
On aarch64 this can be demonstrated by allocating weirdly sized types.

For instance, in the added test case, we use an alloca for i19. We used
to allocate a slot of size 24-bit (19 rounded up to the next byte),
whereas we really want to use a full 32-bit slot for this type.

llvm-svn: 359856
2019-05-03 01:23:56 +00:00
Eli Friedman 0b61d220c9 [AArch64][Windows] Compute function length correctly in unwind tables.
The primary fix here is to WinException.cpp: we need to exclude jump
tables when computing the length of a function, or else we fail to
correctly compute the length. (We can only compute the number of bytes
consumed by certain assembler directives after the entire file is
parsed. ".p2align" is one of those directives, and is used by jump table
generation.)

The secondary fix, to MCWin64EH, is to make sure we don't silently
miscompile if we hit a similar situation in the future.

It's possible we could extend ARM64EmitUnwindInfo so it allows function
bodies that contain assembler directives, but that's a lot more
complicated; see the FIXME in MCWin64EH.cpp.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41581 .

Differential Revision: https://reviews.llvm.org/D61095

llvm-svn: 359849
2019-05-03 00:10:45 +00:00
Craig Topper e8a1cde886 [SelectionDAG] Add asserts to verify the vectorness of input and output types of TRUNCATE/ZERO_EXTEND/ANY_EXTEND/SIGN_EXTEND agree
As a result of the underlying cause of PR41678 we created an ANY_EXTEND node with a scalar result type and v1i1 input type. Ideally we would have asserted for this instead of letting it go through to instruction selection and generate bad machine IR

Differential Revision: https://reviews.llvm.org/D61463

llvm-svn: 359836
2019-05-02 22:26:26 +00:00
Sanjay Patel 1972826178 [DAGCombiner] try repeated fdiv divisor transform before building estimate (2nd try)
The original patch was committed at rL359398 and reverted at rL359695 because of
infinite looping.

This includes a fix to check for a vector splat of "1.0" to avoid the infinite loop.

Original commit message:

This was originally part of D61028, but it's an independent diff.

If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.

The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.

Differential Revision: https://reviews.llvm.org/D61149

llvm-svn: 359793
2019-05-02 15:02:08 +00:00
Sanjay Patel 284472be6d [SelectionDAG] remove constant folding limitations based on FP exceptions
We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops),
so it does not make sense to have them here in the DAG either. Nothing else in the backend tries
to preserve exceptions (again outside of strict ops), so I don't see how this could have ever
worked for real code that cares about FP exceptions.

There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least
partially) to preserve exceptions without even asking if the target supports FP exceptions. Those
should be corrected in subsequent patches.

Real support for FP exceptions requires several changes to handle the constrained/strict FP ops.

Differential Revision: https://reviews.llvm.org/D61331

llvm-svn: 359791
2019-05-02 14:47:59 +00:00
Sanjay Patel 64d5751254 Revert "[DAGCombiner] try repeated fdiv divisor transform before building estimate"
This reverts commit fb9a5307a9 (rL359398)
because it can cause an infinite loop due to opposing combines.

llvm-svn: 359695
2019-05-01 16:06:21 +00:00
Tim Northover ee2474df9f DAG: allow DAG pointer size different from memory representation.
In preparation for supporting ILP32 on AArch64, this modifies the SelectionDAG
builder code so that pointers are allowed to have a larger type when "live" in
the DAG compared to memory.

Pointers get zero-extended whenever they are loaded, and truncated prior to
stores.  In addition, a few not quite so obvious locations need updating:

  * A GEP that has not been marked inbounds needs to enforce the IR-documented
    2s-complement wrapping at the memory pointer size. Inbounds GEPs are
    undefined if they overflow the address space, so no additional operations
    are needed.
  * Signed comparisons would give incorrect results if performed on the
    zero-extended values.

This shouldn't affect CodeGen for now, but will become active when the AArch64
ILP32 support is committed.

llvm-svn: 359676
2019-05-01 12:37:30 +00:00
Sanjay Patel 0387bf5269 [SelectionDAG] remove div-by-zero constant folding restriction
We don't have this restriction in IR, so it should not be here
either simply out of consistency. Code that wants to handle FP
exceptions is expected to use the 'strict' variants of these
nodes.

We don't get the frem case because frem by 0.0 produces NaN (invalid),
and that's the remaining check here (so the removed check for frem
was dead code AFAIK).

This is the only place in SDAG that uses "HasFPExceptions", so I
think we should remove that entirely as a follow-up patch.

llvm-svn: 359566
2019-04-30 14:37:15 +00:00
Sjoerd Meijer 0ed4619679 [TargetLowering] findOptimalMemOpLowering. NFCI.
This was a local static funtion in SelectionDAG, which I've promoted to
TargetLowering so that I can reuse it to estimate the cost of a memory
operation in D59787.

Differential Revision: https://reviews.llvm.org/D59766

llvm-svn: 359543
2019-04-30 10:09:15 +00:00
Fangrui Song 7bce25cd7d [AsmPrinter] Make AsmPrinter::HandlerInfo::Handler a unique_ptr
Handlers.clear() in AsmPrinter::doFinalization() will destroy these handlers.
A unique_ptr makes the ownership clearer.

llvm-svn: 359541
2019-04-30 09:14:02 +00:00
Sjoerd Meijer 180f1ae57c [TargetLowering] Change getOptimalMemOpType to take a function attribute list
The MachineFunction wasn't used in getOptimalMemOpType, but more importantly,
this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType.

This is the groundwork for the changes in D59766 and D59787, that allows
implementation of TTI::getMemcpyCost.

Differential Revision: https://reviews.llvm.org/D59785

llvm-svn: 359537
2019-04-30 08:38:12 +00:00
Markus Lavin a475da36eb [DebugInfo] DW_OP_deref_size in PrologEpilogInserter.
The PrologEpilogInserter need to insert a DW_OP_deref_size before
prepending a memory location expression to an already implicit
expression to avoid having the existing expression act on the memory
address instead of the value behind it.

The reason for using DW_OP_deref_size and not plain DW_OP_deref is that
big-endian targets need to read the right size as simply truncating a
larger read would yield the wrong result (LSB bytes are not at the lower
address).

This re-commit fixes issues reported in the first one. Namely deref was
inserted under wrong conditions and additionally the deref_size argument
was incorrectly encoded.

Differential Revision: https://reviews.llvm.org/D59687

llvm-svn: 359535
2019-04-30 07:58:57 +00:00
Zi Xuan Wu 49d60fdc2e [DAGCombiner] Do not generate ISD::ADDE node if adde is not legal for the target when combine ISD::TRUNC node
Do not combine (trunc adde(X, Y, Carry)) into (adde trunc(X), trunc(Y), Carry), 
if adde is not legal for the target. Even it's at type-legalize phase. 
Because adde is special and will not be legalized at operation-legalize phase later.

This fixes: PR40922
https://bugs.llvm.org/show_bug.cgi?id=40922

Differential Revision: https://reviews.llvm.org//D60854

llvm-svn: 359532
2019-04-30 03:01:14 +00:00
Simon Pilgrim 9b17b80a0e computePolynomialFromPointer - add missing early-out return for non-pointer types.
Reported in https://www.viva64.com/en/b/0629/

llvm-svn: 359486
2019-04-29 19:25:16 +00:00
Daniel Sanders 8f079844d0 [globalisel] Improve Legalizer debug output
* LegalizeAction should be printed by name rather than number
* Newly created instructions are incomplete at the point the observer first sees
  them. They are therefore recorded in a small vector and printed just before
  the legalizer moves on to another instruction. By this point, the instruction
  must be complete.

llvm-svn: 359481
2019-04-29 18:45:59 +00:00
Bjorn Pettersson 820994572c [DAG] Refactor DAGCombiner::ReassociateOps
Summary:
Extract the logic for doing reassociations
from DAGCombiner::reassociateOps into a helper
function DAGCombiner::reassociateOpsCommutative,
and use that helper to trigger reassociation
on the original operand order, or the commuted
operand order.

Codegen is not identical since the operand order will
be different when doing the reassociations for the
commuted case. That causes some unfortunate churn in
some test cases. Apart from that this should be NFC.

Reviewers: spatel, craig.topper, tstellar

Reviewed By: spatel

Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61199

llvm-svn: 359476
2019-04-29 17:50:10 +00:00
Jeremy Morse 055aee1d8a [DebugInfo] Terminate more location-list ranges at the end of blocks
This patch fixes PR40795, where constant-valued variable locations can
"leak" into blocks placed at higher addresses. The root of this is that
DbgEntityHistoryCalculator terminates all register variable locations at
the end of each block, but not constant-value variable locations.

Fixing this requires constant-valued DBG_VALUE instructions to be
broadcast into all blocks where the variable location remains valid, as
documented in the LiveDebugValues section of SourceLevelDebugging.rst,
and correct termination in DbgEntityHistoryCalculator.

Differential Revision: https://reviews.llvm.org/D59431

llvm-svn: 359426
2019-04-29 09:13:16 +00:00
Sanjay Patel fb9a5307a9 [DAGCombiner] try repeated fdiv divisor transform before building estimate
This was originally part of D61028, but it's an independent diff.

If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.

The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.

Differential Revision: https://reviews.llvm.org/D61149

llvm-svn: 359398
2019-04-28 12:23:43 +00:00
Nick Desaulniers 7ab164c4a4 [AsmPrinter] refactor to support %c w/ GlobalAddress'
Summary:
Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when
printing the address of a MachineOperand::MO_GlobalAddress. Move that
handling into a new overriden method in each base class. A virtual
method was added to the base class for handling the generic case.

Refactors a few subclasses to support the target independent %a, %c, and
%n.

The patch also contains small cleanups for AVRAsmPrinter and
SystemZAsmPrinter.

It seems that NVPTXTargetLowering is possibly missing some logic to
transform GlobalAddressSDNodes for
TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended
inline assembly asm constraints.

Fixes:
- https://bugs.llvm.org/show_bug.cgi?id=41402
- https://github.com/ClangBuiltLinux/linux/issues/449

Reviewers: echristo, void

Reviewed By: void

Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60887

llvm-svn: 359337
2019-04-26 18:45:04 +00:00
Simon Pilgrim ef54b1dddf [DAGCombine] Cleanup visitEXTRACT_SUBVECTOR. NFCI.
Use ArrayRef::slice, reduce some rather awkward long lines for legibility and run clang-format.

llvm-svn: 359326
2019-04-26 17:49:02 +00:00
Simon Pilgrim 5d6ef94c36 [X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 targets (PR40758)
As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs).

Differential Revision: https://reviews.llvm.org/D61068

llvm-svn: 359293
2019-04-26 10:49:13 +00:00
Marcello Maggioni c596584f67 [GlobalISel] Fix inserting copies in the right position for reg definitions
When constrainRegClass is called if the constraining happens on a use the COPY
needs to be inserted before the instruction that contains the MachineOperand,
but if we are constraining a definition it actually needs to be added
after the instruction. In addition, the COPY needs to have its operands
flipped (in the use case we are copying from the old unconstrained register
to the new constrained register, while in the definition case we are copying
from the new constrained register that the instruction defines to the old
unconstrained register).

llvm-svn: 359282
2019-04-26 07:21:56 +00:00
Craig Topper f9c30eddd0 [SelectionDAG][X86] Use stack load/store in PromoteIntRes_BITCAST when the input needs to be be split and the output type is a vector.
We had special case handling here, but it uses a scalar any_extend for the
promotion then bitcasts to the final type. This won't split up the input data
into multiple promoted elements like we need.

This patch falls back to doing the conversion through memory.

Fixes PR41594 which I believe was reflected in the bitcast-vector-bool.ll
changes. The changes to vector-half-conversions.ll are fixing a previously
unknown miscompile from this issue.

Differential Revision: https://reviews.llvm.org/D61114

llvm-svn: 359219
2019-04-25 18:19:59 +00:00
Jessica Paquette ba55767f51 [GlobalISel][AArch64] Legalize G_FNEARBYINT
Add legalizer support for G_FNEARBYINT. It's the same as G_FCEIL etc.

Since the importer allows us to automatically select this after legalization,
also add tests for selection etc. Also update arm64-vfloatintrinsics.ll.

llvm-svn: 359204
2019-04-25 16:44:40 +00:00
Jessica Paquette bd7ac30b15 [GlobalISel] Add IRTranslator support for G_FNEARBYINT
Translate llvm.nearbyint into G_FNEARBYINT as a simple intrinsic. Update
arm64-irtranslator.ll.

Differential Revision: https://reviews.llvm.org/D60922

llvm-svn: 359203
2019-04-25 16:39:28 +00:00
Amy Huang 68c9199493 Recommitting r358783 and r358786 "[MS] Emit S_HEAPALLOCSITE debug info" with fixes for buildbot error (undefined assembler label).
Summary:
This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug
info in codeview. Currently only changes FastISel, so emitting labels still
needs to be implemented in SelectionDAG.

Reviewers: rnk

Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D61083

llvm-svn: 359149
2019-04-24 23:02:48 +00:00
Sanjay Patel 6f41bf948b [DAGCombiner] scale repeated FP divisor by splat factor
If we have a vector FP division with a splatted divisor, use the existing transform
that converts 'x/y' into 'x * (1.0/y)' to allow more conversions. This can then
potentially be converted into a scalar FP division by existing combines (rL358984)
as seen in the tests here.

That can be a potentially big perf difference if scalar fdiv has better timing
(including avoiding possible frequency throttling for vector ops).

Differential Revision: https://reviews.llvm.org/D61028

llvm-svn: 359147
2019-04-24 22:28:58 +00:00
David Blaikie 832c7d9f36 DebugInfo: Emit only declarations (not whole definitions) of non-unit user defined types into type units
While this doesn't come up in reasonable cases currently (the only user
defined types not in type units are ones without linkage - which makes
for near-ODR violations, because it'd be a type with linkage referencing
a type without linkage - such a type can't be validly defined in more
than one TU, so arguably it shouldn't be in a type unit to begin with -
but it's a convenient way to demonstrate an issue that will become more
revalent with homed modular debug info type definitions - which also
don't need to be in type units but more legitimately so).

Precursor to the Clang change to de-type-unit (by omitting the
'identifier') types homed due to strong linkage vtables. (making that
change without this one would lead to major type duplication in type
units)

llvm-svn: 359122
2019-04-24 18:09:44 +00:00
Bjorn Pettersson 71e8c6f20f Add "const" in GetUnderlyingObjects. NFC
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.

It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.

Reviewers: hfinkel, materi, jkorous

Reviewed By: jkorous

Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61038

llvm-svn: 359072
2019-04-24 06:55:50 +00:00
Francis Visoiu Mistrih 7fee2b89fd [Remarks] Add string deduplication using a string table
* Add support for uniquing strings in the remark streamer and emitting the string table in the remarks section.

* Add parsing support for the string table in the RemarkParser.

From this remark:

```
--- !Missed
Pass:     inline
Name:     NoDefinition
DebugLoc: { File: 'test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c',
            Line: 7, Column: 3 }
Function: printArgsNoRet
Args:
  - Callee:   printf
  - String:   ' will not be inlined into '
  - Caller:   printArgsNoRet
    DebugLoc: { File: 'test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c',
                Line: 6, Column: 0 }
  - String:   ' because its definition is unavailable'
...
```

to:

```
--- !Missed
Pass: 0
Name: 1
DebugLoc: { File: 3, Line: 7, Column: 3 }
Function: 2
Args:
  - Callee:   4
  - String:   5
  - Caller:   2
    DebugLoc: { File: 3, Line: 6, Column: 0 }
  - String:   6
...
```

And the string table in the .remarks/__remarks section containing:

```
inline\0NoDefinition\0printArgsNoRet\0
test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c\0printf\0
will not be inlined into \0 because its definition is unavailable\0
```

This is mostly supposed to be used for testing purposes, but it gives us
a 2x reduction in the remark size, and is an incremental change for the
updates to the remarks file format.

Differential Revision: https://reviews.llvm.org/D60227

llvm-svn: 359050
2019-04-24 00:06:24 +00:00
Francis Visoiu Mistrih 1646851b87 [CGP] Look through bitcasts when duplicating returns for tail calls
The simple case of:

```
int *callee();
void *caller(void *a) {
  if (a == NULL)
    return callee();
  return a;
}
```

would generate a regular call instead of a tail call because we don't
look through the bitcast of the call to `callee` when duplicating the
return blocks.

Differential Revision: https://reviews.llvm.org/D60837

llvm-svn: 359041
2019-04-23 21:57:46 +00:00
Amy Huang fc79ab9857 Revert "[MS] Emit S_HEAPALLOCSITE debug info" because of ToTWin64(db)
buildbot failure.

This reverts commit d07d6d6177 and
c774f687b6.

llvm-svn: 359034
2019-04-23 21:12:58 +00:00
Jessica Paquette 3cc6d1f542 [AArch64][GlobalISel] Legalize G_INTRINSIC_ROUND
Add it to the same rule as G_FCEIL etc. Add a legalizer test, and add a missing
switch case to AArch64LegalizerInfo.cpp.

llvm-svn: 359033
2019-04-23 21:11:57 +00:00
David Blaikie 2f51176223 Reapply: "DebugInfo: Emit only one kind of accelerated access/name table""
Originally committed in r358931
Reverted in r358997

Seems this change made Apple accelerator tables miss names (because
names started respecting the CU NameTableKind GNU & assuming that
shouldn't produce accelerated names too), which is never correct (apple
accelerator tables don't have separators or CU lists - if present, they
must describe all names in all CUs).

Original Description:
Currently to opt in to debug_names in DWARFv5, the IR must contain
'nameTableKind: Default' which also enables debug_pubnames.

Instead, only allow one of {debug_names, apple_names, debug_pubnames,
debug_gnu_pubnames}.

nameTableKind: Default gives debug_names in DWARFv5 and greater,
debug_pubnames in v4 and earlier - and apple_names when tuning for lldb
on MachO.
nameTableKind: GNU always gives gnu_pubnames

llvm-svn: 359026
2019-04-23 19:00:45 +00:00
Jessica Paquette 56342642a0 [AArch64][GlobalISel] Legalize G_INTRINSIC_TRUNC
Same patch as G_FCEIL etc.

Add the missing switch case in widenScalar, add G_INTRINSIC_TRUNC to the correct
rule in AArch64LegalizerInfo.cpp, and add a test.

llvm-svn: 359021
2019-04-23 18:20:44 +00:00
David Blaikie a2470a4653 Revert "DebugInfo: Emit only one kind of accelerated access/name table"
Regresses some apple_names situations - still investigating.

This reverts commit r358931.

llvm-svn: 358997
2019-04-23 15:03:24 +00:00
Fangrui Song efd94c56ba Use llvm::stable_sort
While touching the code, simplify if feasible.

llvm-svn: 358996
2019-04-23 14:51:27 +00:00
Sanjay Patel 06ff5eae5b [DAGCombiner] generalize binop-of-splats scalarization
If we only match build vectors, we can miss some patterns
that use shuffles as seen in the affected tests.

Note that the underlying calls within getSplatSourceVector()
have the potential for compile-time explosion because of
exponential recursion looking through binop opcodes, but
currently the list of supported opcodes is very limited.
Both of those problems should be addressed in follow-up
patches.

llvm-svn: 358984
2019-04-23 13:16:41 +00:00
Bjorn Pettersson f97b29be88 [DAGCombiner] Combine OR as ADD when no common bits are set
Summary:
The DAGCombiner is rewriting (canonicalizing) an ISD::ADD
with no common bits set in the operands as an ISD::OR node.

This could sometimes result in "missing out" on some
combines that normally are performed for ADD. To be more
specific this could happen if we already have rewritten an
ADD into OR, and later (after legalizations or combines)
we expose patterns that could have been optimized if we
had seen the OR as an ADD (e.g. reassociations based on ADD).

To make the DAG combiner less sensitive to if ADD or OR is
used for these "no common bits set" ADD/OR operations we
now apply most of the ADD combines also to an OR operation,
when value tracking indicates that the operands have no
common bits set.

Reviewers: spatel, RKSimon, craig.topper, kparzysz

Reviewed By: spatel

Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59758

llvm-svn: 358965
2019-04-23 10:01:08 +00:00
Chandler Carruth bbddf21f90 Revert "Use const DebugLoc&"
This reverts r358910 (git commit 2b74466530)

While this patch *seems* trivial and safe and correct, it is not. The
copies are actually load bearing copies. You can observe this with MSan
or other ways of checking for use-after-destroy, but otherwise this may
result in ... difficult to debug inexplicable behavior.

I suspect the issue is that the debug location is used after the
original reference to it is removed. The metadata backing it gets
destroyed as its last references goes away, and then we reference it
later through these const references.

llvm-svn: 358940
2019-04-23 01:42:07 +00:00
David Blaikie 68602ab2f3 DebugInfo: Emit only one kind of accelerated access/name table
Currently to opt in to debug_names in DWARFv5, the IR must contain
'nameTableKind: Default' which also enables debug_pubnames.

Instead, only allow one of {debug_names, apple_names, debug_pubnames,
debug_gnu_pubnames}.

nameTableKind: Default gives debug_names in DWARFv5 and greater,
debug_pubnames in v4 and earlier - and apple_names when tuning for lldb
on MachO.
nameTableKind: GNU always gives gnu_pubnames

llvm-svn: 358931
2019-04-22 22:45:11 +00:00
Sanjay Patel bf8aacb715 [SelectionDAG] move splat util functions up from x86 lowering
This was supposed to be NFC, but the change in SDLoc
definitions causes instruction scheduling changes.

There's nothing x86-specific in this code, and it can
likely be used from DAGCombiner's simplifyVBinOp().

llvm-svn: 358930
2019-04-22 22:43:36 +00:00
Matt Arsenault 2b74466530 Use const DebugLoc&
llvm-svn: 358910
2019-04-22 19:14:27 +00:00
Matt Arsenault 8f624abc1d GlobalISel: Legalize scalar G_EXTRACT sources
llvm-svn: 358892
2019-04-22 15:10:42 +00:00
Simon Pilgrim 6276ce0142 [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling
This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.

The AMDGPU backend needed an extra  (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions.

The X86 changes are all definite wins.

Differential Revision: https://reviews.llvm.org/D60462

llvm-svn: 358887
2019-04-22 14:04:35 +00:00
Sanjay Patel 9bc6c77220 [DAGCombiner] make variable name less ambiguous; NFC
llvm-svn: 358886
2019-04-22 13:42:50 +00:00
Sanjay Patel d6989daae9 [DAGCombiner] prepare shuffle-of-splat to handle more patterns; NFC
llvm-svn: 358884
2019-04-22 13:36:07 +00:00
Amara Emerson 4286652556 Revert r358800. Breaks Obsequi from the test suite.
The last attempt fixed gcc and consumer-typeset, but Obsequi seems to fail with
a different issue.

llvm-svn: 358829
2019-04-20 21:25:00 +00:00
Fangrui Song d3b2682351 [ExecutionDomainFix] Optimize a binary search insertion
llvm-svn: 358815
2019-04-20 13:00:50 +00:00
Amara Emerson eac69e9377 Revert "Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores""
We were shifting the wrong component of a split load when trying to combine them
back into a single value.

llvm-svn: 358800
2019-04-19 23:54:44 +00:00
Jessica Paquette d5c69e0836 [GlobalISel][AArch64] Legalize + select G_FRINT
Exactly the same as G_FCEIL, G_FABS, etc.

Add tests for the fp16/nofp16 behaviour, update arm64-vfloatintrinsics, etc.

Differential Revision: https://reviews.llvm.org/D60895

llvm-svn: 358799
2019-04-19 23:41:52 +00:00
Jessica Paquette ad69af3e95 [GlobalISel] Add IRTranslator support for G_FRINT
Add it as a simple intrinsic, update arm64-irtranslator.ll.

Differential Revision: https://reviews.llvm.org/D60893

llvm-svn: 358787
2019-04-19 21:46:12 +00:00
Amy Huang d07d6d6177 Attempt to fix buildbot failure in commit 1bb57bac959ac163fd7d8a76d734ca3e0ecee6ab.
llvm-svn: 358786
2019-04-19 21:44:30 +00:00
Amy Huang c774f687b6 [MS] Emit S_HEAPALLOCSITE debug info
Summary:
This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug
info in codeview. Currently only changes FastISel, so emitting labels still
needs to be implemented in SelectionDAG.

Reviewers: hans, rnk

Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D60800

llvm-svn: 358783
2019-04-19 21:09:11 +00:00