Commit Graph

124284 Commits

Author SHA1 Message Date
Matt Arsenault 23f03f5059 AMDGPU: Fix iterator crash in AMDGPUPromoteAlloca
The lifetime intrinsic was erased, which was the next iterator.

llvm-svn: 363668
2019-06-18 12:23:44 +00:00
Matt Arsenault d5ce8ec778 AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.scale
llvm-svn: 363667
2019-06-18 12:23:42 +00:00
Sjoerd Meijer 7a7009f7c8 [ARM] Some Thumb2ITBlock clean ups. NFC
Some more refactoring, like registering the IT Block pass, less cryptic
variable names, and some simplification of loops.

Differential Revision: https://reviews.llvm.org/D63419

llvm-svn: 363666
2019-06-18 12:13:11 +00:00
Jonas Paulsson 5c64a8c4c6 [SystemZ] Fix AHIMuxK pseudo expansion.
Do not emit a copy if the source and destination registers are the same.

Review: Ulrich Weigand
llvm-svn: 363665
2019-06-18 12:10:02 +00:00
Valery Pykhtin 7e854e1cdd [AMDGPU] Speed up live-in virtual register set computaion in GCNScheduleDAGMILive.
Differential revision: https://reviews.llvm.org/D62401

llvm-svn: 363661
2019-06-18 11:43:17 +00:00
Graham Hunter 43854e3ccc [SVE][IR] Scalable Vector IR Type with pr42210 fix
Recommit of D32530 with a few small changes:
  - Stopped recursively walking through aggregates in
    the verifier, so that we don't impose too much
    overhead on large modules under LTO (see PR42210).
  - Changed tests to match; the errors are slightly
    different since they only report the array or
    struct that actually contains a scalable vector,
    rather than all aggregates which contain one in
    a nested member.
  - Corrected an older comment

Reviewers: thakis, rengolin, sdesmalen

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D63321

llvm-svn: 363658
2019-06-18 10:11:56 +00:00
Simon Pilgrim 7dd529e54d [X86] Replace any_extend* vector extensions with zero_extend* equivalents
First step toward addressing the vector-reduce-mul-widen.ll regression in D63281 - we should replace ANY_EXTEND/ANY_EXTEND_VECTOR_INREG in X86ISelDAGToDAG to avoid having to add duplicate patterns when treating any extensions as legal.

In future patches this will also allow us to keep any extension nodes around a lot longer in the DAG, which should mean that we can keep better track of undef elements that otherwise become zeros that we think we have to keep......

Differential Revision: https://reviews.llvm.org/D63326

llvm-svn: 363655
2019-06-18 09:50:13 +00:00
Yevgeny Rouban 69daf4a72d [SimplifyCFG] NFC, prof branch_weighs handling is simplified
Using the new SwitchInstProfUpdateWrapper this patch
simplifies 3 places of prof branch_weights handling.

Differential Revision: https://reviews.llvm.org/D62123

llvm-svn: 363652
2019-06-18 06:50:52 +00:00
Craig Topper f4284f8a9d [X86] Move code that shrinks immediates for ((x << C1) op C2) into a helper function. NFCI
Preliminary step for D59909

llvm-svn: 363645
2019-06-18 04:23:58 +00:00
Craig Topper 587427716c [X86] Remove MOVDI2SSrm/MOV64toSDrm/MOVSS2DImr/MOVSDto64mr CodeGenOnly instructions.
The isel patterns for these use a bitcast and load/store, but
DAG combine should have canonicalized those away.

For the purposes of the memory folding table these opcodes can be
replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes.

llvm-svn: 363644
2019-06-18 03:23:15 +00:00
Craig Topper 8582ecd8d9 [X86] Introduce new MOVSSrm/MOVSDrm opcodes that use VR128 register class.
Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt.

Use the new versions in patterns that previously used a COPY_TO_REGCLASS
to VR128. These patterns expect the upper bits to be zero. The
current set up appears to work, but I'm not sure we should be
enforcing upper bits being zero through a COPY_TO_REGCLASS.

I wanted to flip the arrangement and use a COPY_TO_REGCLASS to
FR32/FR64 for the patterns that need an f32/f64 result, but that
complicated fastisel and globalisel.

I've been doing some experiments with reducing some isel patterns
and ended up in a situation where I had a
(SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our
post-isel peephole was unable to avoid using an instruction for
the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128
instruction removes the COPY_TO_REGCLASS that was breaking this.

llvm-svn: 363643
2019-06-18 03:23:11 +00:00
Tom Stellard 1f7f64665c GlobalISel: Remove redundant pass initialization
Summary:
All the GlobalISel passes are initialized when the target calls
initializeGlobalISel(), so we don't need to call the initializers
from the pass constructors.

Reviewers: qcolombet, t.p.northover, paquette, dsanders, aemerson, aditya_nandakumar

Reviewed By: aemerson

Subscribers: rovka, kristof.beyls, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63235

llvm-svn: 363642
2019-06-18 02:05:06 +00:00
Matt Arsenault 5a321b899e GlobalISel: Use the original flags when lowering fneg to fsub
This was ignoring the flag on fneg, and using the source instruction's
flags. Also fixes tests missing from r358702.

Note the expansion itself isn't correct without nnan, but that should
be fixed separately.

llvm-svn: 363637
2019-06-17 23:48:43 +00:00
Peter Collingbourne d57f7cc15e hwasan: Use bits [3..11) of the ring buffer entry address as the base stack tag.
This saves roughly 32 bytes of instructions per function with stack objects
and causes us to preserve enough information that we can recover the original
tags of all stack variables.

Now that stack tags are deterministic, we no longer need to pass
-hwasan-generate-tags-with-calls during check-hwasan. This also means that
the new stack tag generation mechanism is exercised by check-hwasan.

Differential Revision: https://reviews.llvm.org/D63360

llvm-svn: 363636
2019-06-17 23:39:51 +00:00
Peter Collingbourne fb9ce100d1 hwasan: Add a tag_offset DWARF attribute to instrumented stack variables.
The goal is to improve hwasan's error reporting for stack use-after-return by
recording enough information to allow the specific variable that was accessed
to be identified based on the pointer's tag. Currently we record the PC and
lower bits of SP for each stack frame we create (which will eventually be
enough to derive the base tag used by the stack frame) but that's not enough
to determine the specific tag for each variable, which is the stack frame's
base tag XOR a value (the "tag offset") that is unique for each variable in
a function.

In IR, the tag offset is most naturally represented as part of a location
expression on the llvm.dbg.declare instruction. However, the presence of the
tag offset in the variable's actual location expression is likely to confuse
debuggers which won't know about tag offsets, and moreover the tag offset
is not required for a debugger to determine the location of the variable on
the stack, so at the DWARF level it is represented as an attribute so that
it will be ignored by debuggers that don't know about it.

Differential Revision: https://reviews.llvm.org/D63119

llvm-svn: 363635
2019-06-17 23:39:41 +00:00
Amara Emerson 146882242f [GlobalISel][Localizer] Rewrite localizer to run in 2 phases, inter & intra block.
Inter-block localization is the same as what currently happens, except now it
only runs on the entry block because that's where the problematic constants with
long live ranges come from.

The second phase is a new intra-block localization phase which attempts to
re-sink the already localized instructions further right before one of the
multiple uses.

One additional change is to also localize G_GLOBAL_VALUE as they're constants
too. However, on some targets like arm64 it takes multiple instructions to
materialize the value, so some additional heuristics with a TTI hook have been
introduced attempt to prevent code size regressions when localizing these.

Overall, these changes improve CTMark code size on arm64 by 1.2%.

Full code size results:

Program                                         baseline       new       diff
------------------------------------------------------------------------------
 test-suite...-typeset/consumer-typeset.test    1249984      1217216     -2.6%
 test-suite...:: CTMark/ClamAV/clamscan.test    1264928      1232152     -2.6%
 test-suite :: CTMark/SPASS/SPASS.test          1394092      1361316     -2.4%
 test-suite...Mark/mafft/pairlocalalign.test    731320       714928      -2.2%
 test-suite :: CTMark/lencod/lencod.test        1340592      1324200     -1.2%
 test-suite :: CTMark/kimwitu++/kc.test         3853512      3820420     -0.9%
 test-suite :: CTMark/Bullet/bullet.test        3406036      3389652     -0.5%
 test-suite...ark/tramp3d-v4/tramp3d-v4.test    8017000      8016992     -0.0%
 test-suite...TMark/7zip/7zip-benchmark.test    2856588      2856588      0.0%
 test-suite...:: CTMark/sqlite3/sqlite3.test    765704       765704       0.0%
 Geomean difference                                                      -1.2%

Differential Revision: https://reviews.llvm.org/D63303

llvm-svn: 363632
2019-06-17 23:20:29 +00:00
Michael Berg f9bff2a55e Propagate fmf in IRTranslate for fneg
Summary: This case is related to D63405 in that we need to be propagating FMF on negates.

Reviewers: volkan, spatel, arsenm

Reviewed By: arsenm

Subscribers: wdng, javed.absar

Differential Revision: https://reviews.llvm.org/D63458

llvm-svn: 363631
2019-06-17 23:19:40 +00:00
Craig Topper 971ad74ba2 Use VR128X instead of FR32X/FR64X for the register class in VMOVSSZmrk/VMOVSDZmrk.
Removes COPY_TO_REGCLASS from some patterns.

llvm-svn: 363630
2019-06-17 23:08:29 +00:00
Craig Topper 0e18300802 [X86] Make an assert in LowerSCALAR_TO_VECTOR stricter to make it clear what types are allowed here. NFC
Make it clear that only integer type with i32 or smaller elements shoudl get to this part of the code.

llvm-svn: 363629
2019-06-17 23:08:09 +00:00
Stanislav Mekhanoshin 121956108f [AMDGPU] Use custom inserter for gfx10 VOP2b
This is part of the approved D63204 pending parent revision.
This small change is in fact a part of the VOP2b legalization which
does not technically belong to wave32 support, so extracted
separately.

llvm-svn: 363625
2019-06-17 22:37:37 +00:00
Philip Reames 44475363e8 Teach getSCEVAtScope how to handle loop phis w/invariant operands in loops w/taken backedges
This patch really contains two pieces:
    Teach SCEV how to fold a phi in the header of a loop to the value on the backedge when a) the backedge is known to execute at least once, and b) the value is safe to use globally within the scope dominated by the original phi.
    Teach IndVarSimplify's rewriteLoopExitValues to allow loop invariant expressions which already exist (and thus don't need new computation inserted) even in loops where we can't optimize away other uses.

Differential Revision: https://reviews.llvm.org/D63224

llvm-svn: 363619
2019-06-17 21:06:17 +00:00
Daniel Sanders 184c8ee920 [globalisel] Fix iterator invalidation in the extload combines
Summary:
Change the way we deal with iterator invalidation in the extload combines as it
was still possible to neglect to visit a use. Even worse, it happened in the
in-tree test cases and the checks weren't good enough to detect it.

We now take a cheap copy of the use list before iterating over it. This
prevents iterator invalidation from occurring and has the nice side effect
of making the existing schedule-for-erase/schedule-for-insert mechanism
moot.

Reviewers: aditya_nandakumar

Reviewed By: aditya_nandakumar

Subscribers: rovka, kristof.beyls, javed.absar, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61813

llvm-svn: 363616
2019-06-17 20:56:31 +00:00
Stanislav Mekhanoshin 3138278287 [AMDGPU] Propagate function attributes thru bitcasts
AMDGPUPropagateAttributes will not work on function bitcatsts,
so move AMDGPUFixFunctionBitcasts before it.

Differential Revision: https://reviews.llvm.org/D63455

llvm-svn: 363614
2019-06-17 20:42:48 +00:00
Philip Reames fe8bd96ebd Fix a bug w/inbounds invalidation in LFTR (recommit)
Recommit r363289 with a bug fix for crash identified in pr42279.  Issue was that a loop exit test does not have to be an icmp, leading to a null dereference crash when new logic was exercised for that case.  Test case previously committed in r363601.

Original commit comment follows:

This contains fixes for two cases where we might invalidate inbounds and leave it stale in the IR (a miscompile). Case 1 is when switching to an IV with no dynamically live uses, and case 2 is when doing pre-to-post conversion on the same pointer type IV.

The basic scheme used is to prove that using the given IV (pre or post increment forms) would have to already trigger UB on the path to the test we're modifying. As such, our potential UB triggering use does not change the semantics of the original program.

As was pointed out in the review thread by Nikita, this is defending against a separate issue from the hasConcreteDef case. This is about poison, that's about undef. Unfortunately, the two are different, see Nikita's comment for a fuller explanation, he explains it well.

(Note: I'm going to address Nikita's last style comment in a separate commit just to minimize chance of subtle bugs being introduced due to typos.)

Differential Revision: https://reviews.llvm.org/D62939

llvm-svn: 363613
2019-06-17 20:32:22 +00:00
Nicolai Haehnle ae4fcb97dd AMDGPU/GFX10: Don't generate s_code_end padding in the asm-printer
Summary:
The purpose of the padding is to guard against stale code being
fetched into the instruction cache by the lowest level prefetching.
We're generating relocatable ELF here, and so the padding should
arguably be added by the linker. This is in fact what Mesa does.

This also fixes multi-part shaders for Mesa.

Change-Id: I6bfede58f20e9f337762ccf39ef9e0e263e69e82

Reviewers: arsenm, rampitec, t-tye

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63427

llvm-svn: 363602
2019-06-17 19:28:43 +00:00
Joseph Tremoulet daa1ae6142 [EarlyCSE] Fix hashing of self-compares
Summary:
Update compare normalization in SimpleValue hashing to break ties (when
the same value is being compared to itself) by switching to the swapped
predicate if it has a lower numerical value.  This brings the hashing in
line with isEqual, which already recognizes the self-compares with
swapped predicates as equal.

Fixes PR 42280.

Reviewers: spatel, efriedma, nikic, fhahn, uabelho

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63349

llvm-svn: 363598
2019-06-17 19:11:28 +00:00
Alina Sbirlea 7a0098aa6e [MemorySSA] Don't use template when the clone is a simplified instruction.
Summary:
LoopRotate doesn't create a faithful clone of an instruction, it may
simplify it beforehand. Hence the clone of an instruction that has a
MemoryDef associated may not be a definition, but a use or not a memory
alternig instruction.
Don't rely on the template when the clone may be simplified.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63355

llvm-svn: 363597
2019-06-17 18:58:40 +00:00
Jessica Paquette 49537bbf74 [GlobalISel][AArch64] Fold G_SUB into G_ICMP when it's safe to do so
Basically porting over the behaviour in AArch64ISelLowering to GISel. See
emitComparison for reference.

When we have something like this:

```
  lhs = G_SUB 0, y
  ...
  G_ICMP lhs, rhs
```

We can fold away the G_SUB and produce a cmn instead, given that we produce
the same value in NZCV.

Add a test showing that the transformation works, and also showing that we
don't perform the transformation when it's unsafe.

Also factor out the CSet emission into emitCSetForICMP.

Differential Revision: https://reviews.llvm.org/D63163

llvm-svn: 363596
2019-06-17 18:40:06 +00:00
Craig Topper f3f968adcd [X86] Add TB_NO_REVERSE to some memory folding table entries where the register form requires 64-bit mode, but the memory form does not.
We don't know if its safe to unfold if we're in 32-bit mode.

This is simlar to what was done to some load opcodes in r363523.

I think its pretty unlikely we will try to unfold these anyway so
I don't think this is testable.

llvm-svn: 363595
2019-06-17 18:38:07 +00:00
Simon Pilgrim 835999e48a [X86][SSE] Scalarize under-aligned XMM vector nt-stores (PR42026)
If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64).

llvm-svn: 363592
2019-06-17 18:20:04 +00:00
Alina Sbirlea 05f77803f4 [MemorySSA] Add all MemoryPhis before filling their values.
Summary:
Add all MemoryPhis in IDF before filling in their incomign values.
Otherwise, a new Phi can be added that needs to become the incoming
value of another Phi.
Test fails the verification in verifyPrevDefInPhis.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63353

llvm-svn: 363590
2019-06-17 18:16:53 +00:00
Stanislav Mekhanoshin a9191c8492 [AMDGPU] gfx1010 wavefrontsize intrinsic folding
Differential Revision: https://reviews.llvm.org/D63206

llvm-svn: 363588
2019-06-17 17:57:50 +00:00
Matt Arsenault 6d741f29ec AMDGPU: Fold readlane/readfirstlane calls
llvm-svn: 363587
2019-06-17 17:52:35 +00:00
Stanislav Mekhanoshin ad04e7ad42 [AMDGPU] Pass to propagate ABI attributes from kernels to the functions
The pass works in two modes:

Mode 1: Just set attributes starting from kernels. This can work at
the very beginning of opt and llc pipeline, but cannot clone functions
because it must be a function pass.

Mode 2: Actually clone functions for new attributes. This can only work
after all function passes in the opt pipeline because it has to be a
module pass.

Differential Revision: https://reviews.llvm.org/D63208

llvm-svn: 363586
2019-06-17 17:47:28 +00:00
Simon Pilgrim bb9adfdb4e [X86][AVX] Split under-aligned vector nt-stores.
If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it.

llvm-svn: 363582
2019-06-17 17:22:38 +00:00
Warren Ristow 6452bdd29b [LV] Suppress vectorization in some nontemporal cases
When considering a loop containing nontemporal stores or loads for
vectorization, suppress the vectorization if the corresponding
vectorized store or load with the aligment of the original scaler
memory op is not supported with the nontemporal hint on the target.

This adds two new functions:
  bool isLegalNTStore(Type *DataType, unsigned Alignment) const;
  bool isLegalNTLoad(Type *DataType, unsigned Alignment) const;

to TTI, leaving the target independent default implementation as
returning true, but with overriding implementations for X86 that
check the legality based on available Subtarget features.

This fixes https://llvm.org/PR40759

Differential Revision: https://reviews.llvm.org/D61764

llvm-svn: 363581
2019-06-17 17:20:08 +00:00
Matt Arsenault 3e140066bc GlobalISel: Ignore callsite attributes when picking intrinsic type
A target intrinsic may be defined as possibly reading memory, but the
call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic assumption
of the intrinsic definition, so the chain should still be used.

I fixed the same bug in SelectionDAG in r287593.

llvm-svn: 363580
2019-06-17 17:01:35 +00:00
Matt Arsenault a7f09f3c9e GlobalISel: Verify intrinsics
I keep using the wrong instruction when manually writing tests. This
really needs to check the number of operands, but I don't see an easy
way to do that right now.

llvm-svn: 363579
2019-06-17 17:01:32 +00:00
Matt Arsenault fee1949b35 AMDGPU/GlobalISel: Account for multiple defs when finding intrinsic ID
llvm-svn: 363578
2019-06-17 17:01:27 +00:00
Stanislav Mekhanoshin 5d00c3060e [AMDGPU] gfx1010 wave32 metadata
Differential Revision: https://reviews.llvm.org/D63207

llvm-svn: 363577
2019-06-17 16:48:56 +00:00
Tom Stellard 8b1c53b528 AMDGPU/GlobalISel: Implement select for G_ICMP and G_SELECT
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60640

llvm-svn: 363576
2019-06-17 16:27:43 +00:00
Francis Visoiu Mistrih 34667519dc [Remarks] Extend -fsave-optimization-record to specify the format
Use -fsave-optimization-record=<format> to specify a different format
than the default, which is YAML.

For now, only YAML is supported.

llvm-svn: 363573
2019-06-17 16:06:00 +00:00
Simon Pilgrim 12cb792d7f [X86] combineLoad - begun making the load split code more generic. NFCI.
This is currently only used for ymm->xmm splitting but we shouldn't hardcode the offsets/alignment.

This is necessary for an upcoming patch to split under-aligned non-temporal vector loads.

llvm-svn: 363570
2019-06-17 15:54:36 +00:00
Whitney Tsang 15b7f5b72d PHINode: introduce setIncomingValueForBlock() function, and use it.
Summary:
There is PHINode::getBasicBlockIndex() and PHINode::setIncomingValue()
but no function to replace incoming value for a specified BasicBlock*
predecessor.
Clearly, there are a lot of places that could use that functionality.

Reviewer: craig.topper, lebedev.ri, Meinersbur, kbarton, fhahn
Reviewed By: Meinersbur, fhahn
Subscribers: fhahn, hiraditya, zzheng, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63338

llvm-svn: 363566
2019-06-17 14:38:56 +00:00
Simon Pilgrim 454e6b9010 [X86][SSE] Prevent misaligned non-temporal vector load/store combines
For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway.

First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets.

Differential Revision: https://reviews.llvm.org/D63246

llvm-svn: 363564
2019-06-17 14:26:10 +00:00
Matt Arsenault 1df203d78e InferAddressSpaces: Fix cloning original addrspacecast
If an addrspacecast needed to be inserted again, this was creating a
clone of the original cast for each user. Just use the original, which
also saves losing the value name.

llvm-svn: 363562
2019-06-17 14:13:29 +00:00
Matt Arsenault b10f097833 AMDGPU: Ignore subtarget for InferAddressSpaces
Even if the target doesn't have flat instructions, addrspace(0) is
still flat. It just happens to not work.

llvm-svn: 363561
2019-06-17 14:13:24 +00:00
Matt Arsenault 29e792659b AMDGPU/GlobalISel: Fix default mapping for non-register operands
Tests will be in future commits when new intrinsics are handled here.

llvm-svn: 363559
2019-06-17 13:52:19 +00:00
Matt Arsenault e683eba0ed AMDGPU: Cleanup custom PseudoSourceValue definitions
Use separate enums for each kind, avoid repeating overloads, and add
missing classof implementation.

llvm-svn: 363558
2019-06-17 13:52:15 +00:00
Sam Parker 1bd3d00e7e [CodeGen] Check for HardwareLoop Latch ExitBlock
The HardwareLoops pass finds exit blocks with a scevable exit count.
If the target specifies to update the loop counter in a register,
through a phi, we need to ensure that the exit block is a latch so
that we can insert the phi with the correct value for the incoming
edge.

Differential Revision: https://reviews.llvm.org/D63336

llvm-svn: 363556
2019-06-17 13:39:28 +00:00
Bjorn Pettersson 83773b77a5 [LV] Deny irregular types in interleavedAccessCanBeWidened
Summary:
Avoid that loop vectorizer creates loads/stores of vectors
with "irregular" types when interleaving. An example of
an irregular type is x86_fp80 that is 80 bits, but that
may have an allocation size that is 96 bits. So an array
of x86_fp80 is not bitcast compatible with a vector
of the same type.

Not sure if interleavedAccessCanBeWidened is the best
place for this check, but it solves the problem seen
in the added test case. And it is the same kind of check
that already exists in memoryInstructionCanBeWidened.

Reviewers: fhahn, Ayal, craig.topper

Reviewed By: fhahn

Subscribers: hiraditya, rkruppe, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63386

llvm-svn: 363547
2019-06-17 12:02:24 +00:00
Luis Marques 2e46312ffd [DAGCombiner] [CodeGenPrepare] More comprehensive GEP splitting
Some GEPs were not being split, presumably because that split would just be 
undone by the DAGCombiner. Not performing those splits can prevent important 
optimizations, such as preventing the element indices / member offsets from 
being (partially) folded into load/store instruction immediates. This patch:

- Makes the splits also occur in the cases where the base address and the GEP 
  are in the same BB.
- Ensures that the DAGCombiner doesn't reassociate them back again.

Differential Revision: https://reviews.llvm.org/D60294

llvm-svn: 363544
2019-06-17 10:54:12 +00:00
Fangrui Song 5401c2db6e Fix clang -Wcovered-switch-default after stack-id change by D60137
llvm-svn: 363543
2019-06-17 10:20:20 +00:00
Simon Pilgrim ef78e55205 [SelectionDAG] Fold insert_subvector(undef, extract_subvector(v, c), c) -> v in getNode
This is already done in DAGCombiner::visitINSERT_SUBVECTOR, but this helps a number of shuffles across different vector widths recognise when they come from the same source.

llvm-svn: 363542
2019-06-17 10:14:52 +00:00
Sam Parker 60d6fb2a63 [SCEV] Use NoWrapFlags when expanding a simple mul
Second functional change following on from rL362687. Pass the
NoWrapFlags from the MulExpr to InsertBinop when we're generating a
shl or mul.

Differential Revision: https://reviews.llvm.org/D61934

llvm-svn: 363540
2019-06-17 10:05:18 +00:00
Fangrui Song 4bde5d3c08 [ARM] Fix another -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D63265
llvm-svn: 363535
2019-06-17 09:29:50 +00:00
Fangrui Song 89d6905c59 [ARM] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D63265
llvm-svn: 363534
2019-06-17 09:26:50 +00:00
Sander de Smalen 5d6ee76c16 Describe stack-id as an enum
This patch changes MIR stack-id from an integer to an enum,
and adds printing/parsing support for this in MIR files. The default
stack-id '0' is now renamed to 'default'.

This should make MIR tests that have stack objects with different stack-ids
more descriptive. It also clarifies code operating on StackID.

Reviewers: arsenm, thegameg, qcolombet

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D60137

llvm-svn: 363533
2019-06-17 09:13:29 +00:00
Sam Parker a059efa885 [ARM] Remove ARMComputeBlockSize
Forgot to remove file!

llvm-svn: 363532
2019-06-17 09:13:10 +00:00
Sam Parker f7c0b3aeb2 [ARM] Add ARMBasicBlockInfo.cpp
Forgot to add file!

llvm-svn: 363531
2019-06-17 09:05:43 +00:00
Sam Parker 966f4e874e [ARM] Extract some code from ARMConstantIslandPass
Create the ARMBasicBlockUtils class for tracking and querying basic
blocks sizes so we can use them when generating low-overhead loops.

Differential Revision: https://reviews.llvm.org/D63265

llvm-svn: 363530
2019-06-17 08:49:09 +00:00
Hans Wennborg a9e5d2f35d Re-commit r357452 (take 3): "SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)"
Third time's the charm.

This was reverted in r363220 due to being suspected of an internal benchmark
regression and a test failure, none of which turned out to be caused by this.

llvm-svn: 363529
2019-06-17 07:47:28 +00:00
Yevgeny Rouban ee62c40eae [SimplifyCFG] Fix prof branch_weights MD while removing unreachable switch cases
SimplifyCFG has a bug that results in inconsistent prof branch_weights metadata
if unreachable switch cases are removed. This patch fixes this bug by making use
of the newly introduced SwitchInstProfUpdateWrapper class (see patch D62122).
A new test is created.

Differential Revision: https://reviews.llvm.org/D62186

llvm-svn: 363527
2019-06-17 05:55:12 +00:00
Justin Hibbits 1d1cf30b73 PowerPC: Optimize SPE double parameter calling setup
Summary:
SPE passes doubles the same as soft-float, in register pairs as i32
types.  This is all handled by the target-independent layer.  However,
this is not optimal when splitting or reforming the doubles, as it
pushes to the stack and loads from, on either side.

For instance, to pass a double argument to a function, assuming the
double value is in r5, the sequence currently looks like this:

    evstdd      5, X(1)
    lwz         3, X(1)
    lwz         4, X+4(1)

Likewise, to form a double into r5 from args in r3 and r4:

    stw         3, X(1)
    stw         4, X+4(1)
    evldd       5, X(1)

This optimizes the fence to use SPE instructions.  Now, to pass a double
to a function:

    mr          4, 5
    evmergehi   3, 5, 5

And to form a double into r5 from args in r3 and r4:

    evmergelo   5, 3, 4

This is comparable to the way that gcc generates the double splits.

This also fixes a bug with expanding builtins to libcalls, where the
LowerCallTo() code path was generating intermediate illegal type nodes.

Reviewers: nemanjai, hfinkel, joerg

Subscribers: kbarton, jfb, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D54583

llvm-svn: 363526
2019-06-17 03:15:23 +00:00
Craig Topper 9f2f127009 [X86] Add TB_NO_REVERSE to some folding table entries where the register from uses the REX prefix, but the memory form does not.
It would not be safe to unfold the memory form the register form
without checking that we are compiling for 64-bit mode.

This probaby isn't a real functional issue since we are unlikely
to unfold any of these instructions since they don't have any
tied registers, aren't commutable, and don't have any inputs
other than the address.

llvm-svn: 363523
2019-06-16 22:33:09 +00:00
Roman Lebedev 5a663bd77a [InstSimplify] Fix addo/subo undef folds (PR42209)
Fix folds of addo and subo with an undef operand to be:

`@llvm.{u,s}{add,sub}.with.overflow` all fold to `{ undef, false }`,
 as per LLVM undef rules.
Same for commuted variants.

Based on the original version of the patch by @nikic.

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42209 | PR42209 ]]

Differential Revision: https://reviews.llvm.org/D63065

llvm-svn: 363522
2019-06-16 20:39:45 +00:00
Nicolai Haehnle 41abf2766e AMDGPU: Prepare for explicit absolute relocations in code generation
Summary:
We will use absolute relocations for LDS symbols.

Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61492

llvm-svn: 363517
2019-06-16 17:43:37 +00:00
Nicolai Haehnle 6d71be4e67 AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0
Summary:
Instead of encoding a high-word of 0 using a fake TargetGlobalAddress,
just use a literal target constant. This simplifies some subsequent changes.

The generated assembly is now more explicit about the kind of relocation
that is to be used.

Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61491

llvm-svn: 363516
2019-06-16 17:32:01 +00:00
Nicolai Haehnle 490e83cd43 AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsic
Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62486

Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0
llvm-svn: 363514
2019-06-16 17:14:12 +00:00
Stanislav Mekhanoshin 5250021672 [AMDGPU] gfx10 conditional registers handling
This is cpp source part of wave32 support, excluding overriden
getRegClass().

Differential Revision: https://reviews.llvm.org/D63351

llvm-svn: 363513
2019-06-16 17:13:09 +00:00
Sanjay Patel c8d88ad1a9 [CodeGenPrepare][x86] shift both sides of a vector select when profitable
This is based on the example/discussion in PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428

Proper vector shift instructions don't appear until AVX2, so we may generate several
extra instructions within a loop trying to compensate for that. It's difficult to
recover from that shift expansion later than this, so use the existing TLI hook and
splat analysis to enable better codegen.

This extends CGP functionality introduced with:
rL201655

Differential Revision: https://reviews.llvm.org/D63233

llvm-svn: 363511
2019-06-16 15:29:03 +00:00
Sanjay Patel d14389c0a5 [x86] split 256-bit vector selects if operands are vector concats
This is similar logic/motivation to the select splitting in D62969.

In D63233, the pattern changes so that we no longer have an extract_subvector of vselect,
but the operands of the select are still being concatenated.

The closest case is represented in either the first or last test diffs here - we have an
extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions.
I think that's the right trade-off for most AVX1 targets.

In the example based on PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428
...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx).

Differential Revision: https://reviews.llvm.org/D63364

llvm-svn: 363508
2019-06-16 14:04:49 +00:00
Simon Pilgrim fcffc2facc [X86] CombineShuffleWithExtract - handle cases with different vector extract sources
Insert the shorter vector source into an undef vector of the longer vector source's type.

llvm-svn: 363507
2019-06-16 08:00:41 +00:00
Simon Pilgrim 456ca5d7f7 [X86] CombineShuffleWithExtract - assert all src ops types are multiples of rootsize. NFCI.
llvm-svn: 363501
2019-06-15 19:12:44 +00:00
Simon Pilgrim 90e87af303 [X86][AVX] Handle lane-crossing shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles
Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well.

Fixes PR34380

llvm-svn: 363500
2019-06-15 18:30:43 +00:00
Simon Pilgrim 990f3ceb67 [X86][AVX] Decode constant bits from insert_subvector(c1, c2, c3)
This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0)

llvm-svn: 363499
2019-06-15 17:05:24 +00:00
Aaron Puchert e1dc495e63 [Clang] Harmonize Split DWARF options with llc
Summary:
With Split DWARF the resulting object file (then called skeleton CU)
contains the file name of another ("DWO") file with the debug info.
This can be a problem for remote compilation, as it will contain the
name of the file on the compilation server, not on the client.

To use Split DWARF with remote compilation, one needs to either

* make sure only relative paths are used, and mirror the build directory
  structure of the client on the server,
* inject the desired file name on the client directly.

Since llc already supports the latter solution, we're just copying that
over. We allow setting the actual output filename separately from the
value of the DW_AT_[GNU_]dwo_name attribute in the skeleton CU.

Fixes PR40276.

Reviewers: dblaikie, echristo, tejohnson

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D59673

llvm-svn: 363496
2019-06-15 15:38:51 +00:00
Kang Zhang 2d51adcb57 [PowerPC] Set the innermost hot loop to align 32 bytes
Summary:
If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that
we can decrease cache misses and branch-prediction misses. Actual alignment of
 the loop will depend on the hotness check and other logic in alignBlocks.

The old code will only align hot loop to 32 bytes when the LoopSize larger than
16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop
 to 32 bytes not only for the hot loop whose size is 16~32 bytes.

Reviewed By: steven.zhang, jsji

Differential Revision: https://reviews.llvm.org/D61228

llvm-svn: 363495
2019-06-15 15:10:24 +00:00
Gauthier Harnisch 83c7b61052 [clang] Add storage for APValue in ConstantExpr
Summary:
When using ConstantExpr we often need the result of the expression to be kept in the AST. Currently this is done on a by the node that needs the result and has been done multiple times for enumerator, for constexpr variables... . This patch adds to ConstantExpr the ability to store the result of evaluating the expression. no functional changes expected.

Changes:
 - Add trailling object to ConstantExpr that can hold an APValue or an uint64_t. the uint64_t is here because most ConstantExpr yield integral values so there is an optimized layout for integral values.
 - Add basic* serialization support for the trailing result.
 - Move conversion functions from an enum to a fltSemantics from clang::FloatingLiteral to llvm::APFloatBase. this change is to make it usable for serializing APValues.
 - Add basic* Import support for the trailing result.
 - ConstantExpr created in CheckConvertedConstantExpression now stores the result in the ConstantExpr Node.
 - Adapt AST dump to print the result when present.

basic* : None, Indeterminate, Int, Float, FixedPoint, ComplexInt, ComplexFloat,
the result is not yet used anywhere but for -ast-dump.

Reviewers: rsmith, martong, shafik

Reviewed By: rsmith

Subscribers: rnkovacs, hiraditya, dexonsmith, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D62399

llvm-svn: 363493
2019-06-15 10:24:47 +00:00
Fangrui Song b6dc09e725 [BranchProbability] Delete a redundant overflow check
llvm-svn: 363492
2019-06-15 10:09:59 +00:00
Nikita Popov 8550fb386a [SCEV] Use unsigned/signed intersection type in SCEV
Based on D59959, this switches SCEV to use unsigned/signed range
intersection based on the sign hint. This will prefer non-wrapping
ranges in the relevant domain. I've left the one intersection in
getRangeForAffineAR() to use the smallest intersection heuristic,
as there doesn't seem to be any obvious preference there.

Differential Revision: https://reviews.llvm.org/D60035

llvm-svn: 363490
2019-06-15 09:15:52 +00:00
Nikita Popov 9145562b48 [SimplifyIndVar] Simplify non-overflowing saturating add/sub
If we can detect that saturating math that depends on an IV cannot
overflow, replace it with simple math. This is similar to the CVP
optimization from D62703, just based on a different underlying
analysis (SCEV vs LVI) that catches different cases.

Differential Revision: https://reviews.llvm.org/D62792

llvm-svn: 363489
2019-06-15 08:48:52 +00:00
Fangrui Song 44cc4e9351 [RISCV] Simplify RISCVAsmBackend::writeNopData(). NFC
llvm-svn: 363486
2019-06-15 06:14:15 +00:00
Michael Berg ad6bb86b2d adding more fmf propagation for selects plus updated tests
llvm-svn: 363484
2019-06-15 04:53:51 +00:00
Fangrui Song 968b5f84af Revert "adding more fmf propagation for selects plus tests"
This reverts rL363474. -debug-only=isel was added to some tests that
don't specify `REQUIRES: asserts`. This causes failures on
-DLLVM_ENABLE_ASSERTIONS=off builds.

I chose to revert instead of fixing the tests because I'm not sure
whether we should add `REQUIRES: asserts` to more tests.

llvm-svn: 363482
2019-06-15 03:51:08 +00:00
Matt Arsenault 9487278010 Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect"
This reapplies r363410, avoiding null dereference if there is no
AltRegBank.

llvm-svn: 363478
2019-06-15 00:33:26 +00:00
Mitch Phillips 0d44f129bb Revert "GlobalISel: Avoid producing Illegal copies in RegBankSelect"
This patch breaks UBSan build bots. See
https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild for
a guide as to how to reproduce the error.

This reverts commit c2864c0de0.
This reverts rL363410.

llvm-svn: 363476
2019-06-14 23:45:34 +00:00
Michael Berg 69394bedc5 adding more fmf propagation for selects plus tests
llvm-svn: 363474
2019-06-14 23:30:52 +00:00
Guozhi Wei d2210af332 [MBP] Move a latch block with conditional exit and multi predecessors to top of loop
Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is:

    * a latch block
    * it has two successors, one is loop header, another is exit
    * it has more than one predecessors

If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch.

Differential Revision: https://reviews.llvm.org/D43256

llvm-svn: 363471
2019-06-14 23:08:59 +00:00
Akira Hatanaka a704a8f28c [ObjC][ARC] Delete ObjC runtime calls on global variables annotated
with 'objc_arc_inert'

Those calls are no-ops, so they can be safely deleted.

rdar://problem/49839633

Differential Revision: https://reviews.llvm.org/D62433

llvm-svn: 363468
2019-06-14 22:06:32 +00:00
Matt Arsenault aa41e92e17 AMDGPU: Avoid most waitcnts before calls
Currently you get extra waits, because waits are inserted for the
register dependencies of the call, and the function prolog waits on
everything.

Currently waits are still inserted on returns. It may make sense to
not do this, and wait in the caller instead.

llvm-svn: 363465
2019-06-14 21:52:26 +00:00
Ziang Wan af857b93df Add --print-supported-cpus flag for clang.
This patch allows clang users to print out a list of supported CPU models using
clang [--target=<target triple>] --print-supported-cpus

Then, users can select the CPU model to compile to using
clang --target=<triple> -mcpu=<model> a.c

It is a handy feature to help cross compilation.

llvm-svn: 363464
2019-06-14 21:42:21 +00:00
Matt Arsenault 282dac717e SROA: Allow eliminating addrspacecasted allocas
There is a circular dependency between SROA and InferAddressSpaces
today that requires running both multiple times in order to be able to
eliminate all simple allocas and addrspacecasts. InferAddressSpaces
can't remove addrspacecasts when written to memory, and SROA helps
move pointers out of memory.

This should avoid inserting new commuting addrspacecasts with GEPs,
since there are unresolved questions about pointer wrapping between
different address spaces.

For now, don't replace volatile operations that don't match the alloca
addrspace, as it would change the address space of the access. It may
be still OK to insert an addrspacecast from the new alloca, but be
more conservative for now.

llvm-svn: 363462
2019-06-14 21:38:31 +00:00
Jinsong Ji bbab7acedf [PowerPC][NFC] Comments update and remove some unused def
llvm-svn: 363461
2019-06-14 21:33:51 +00:00
Matt Arsenault 9e5fa33378 AMDGPU: Fix dropping memref for ds append/consume
The way SelectionDAG treats memory operands is very frustrating, and
by default drops them unless a property is set on the pattern. There
is no pattern for manually selected instructions, so this requires
manually setting them.

llvm-svn: 363455
2019-06-14 21:01:24 +00:00
Matt Arsenault 1c5a87956f AMDGPU: Set isTrap on S_TRAP
This seems to only be used for generating some kind
of documentation, but might as well set it.

llvm-svn: 363454
2019-06-14 21:01:24 +00:00
Lang Hames 1b091540d2 [JITLink] Move JITLinkMemoryManager into its own header.
llvm-svn: 363444
2019-06-14 19:41:21 +00:00
Francis Visoiu Mistrih 0b0851399e [Remarks] Use the RemarkSetup error in setupOptimizationRemarks
Added the errors in r363415 but they were not used in the
RemarkStreamer.

llvm-svn: 363439
2019-06-14 18:18:26 +00:00
Amara Emerson f79d3bc724 [GlobalISel] Add a G_BRJT opcode.
This is a branch opcode that takes a jump table pointer, jump table index and an
index into the table to do an indirect branch.

We pass both the table pointer and JTI to allow targets like ARM64 to more
easily use the existing jump table compression optimization without having to
walk up the block to find a paired G_JUMP_TABLE.

Differential Revision: https://reviews.llvm.org/D63159

llvm-svn: 363434
2019-06-14 17:55:48 +00:00
Florian Hahn dcdd12b68c Revert Fix a bug w/inbounds invalidation in LFTR
Reverting because it breaks a green dragon build:
    http://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/18208

This reverts r363289 (git commit eb88badff9)

llvm-svn: 363427
2019-06-14 17:23:09 +00:00
Florian Hahn a19809045c Revert [LFTR] Stylistic cleanup as suggested in last review comment of D62939 [NFC]
Reverting because it depends on r363289, which breaks a green dragon build:
    http://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/18208

This reverts r363292 (git commit 42a3fc133d)

llvm-svn: 363426
2019-06-14 17:22:56 +00:00
Florian Hahn e1b4b1b46e Revert [LFTR] Rename variable to minimize confusion [NFC]
Reverting because it depends on r363289, which breaks a green dragon
build:
    http://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/18208

This reverts r363293 (git commit c37be29634)

llvm-svn: 363425
2019-06-14 17:22:49 +00:00
Jinsong Ji c9e3dbb0a5 [PowerPC][NFC] Format comments in P9InstrResrouce.td
llvm-svn: 363423
2019-06-14 17:04:24 +00:00
Shawn Landden f2e60fc4e8 [SimpligyCFG] NFC intended, remove GCD that was only used for powers of two
and replace with an equilivent countTrailingZeros.

GCD is much more expensive than this, with repeated division.

This depends on D60823

Differential Revision: https://reviews.llvm.org/D61151

llvm-svn: 363422
2019-06-14 16:56:49 +00:00
Valery Pykhtin ffeb01c113 [AMDGPU] Don't constrain callees with inlinehint from inlining on MaxBB check
Summary: Function bodies marked inline in an opencl source are eliminated but MaxBB check may prevent inlining them leaving undefined references.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, Anastasia, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63337

llvm-svn: 363418
2019-06-14 16:37:33 +00:00
Kevin P. Neal fece7c6c83 [FPEnv] Lower STRICT_FP_EXTEND and STRICT_FP_ROUND nodes in preprocess phase of ISelLowering to mirror non-strict nodes on x86.
I recently discovered a bug on the x86 platform: The fp80 type was not handled well by x86 for constrained floating point nodes, as their regular counterparts are replaced by extending loads and truncating stores during the preprocess phase. Normally, platforms don't have this issue, as they don't typically attempt to perform such legalizations during instruction selection preprocessing. Before this change, strict_fp nodes survived until they were mutated to normal nodes, which happened shortly after preprocessing on other platforms. This modification lowers these nodes at the same phase while properly utilizing the chain.5

Submitted by:	Drew Wock <drew.wock@sas.com>
Reviewed by:	Craig Topper, Kevin P. Neal
Approved by:	Craig Topper
Differential Revision:	https://reviews.llvm.org/D63271

llvm-svn: 363417
2019-06-14 16:28:55 +00:00
Stanislav Mekhanoshin cdf339266b [AMDGPU] gfx1010 BoolReg definition. NFC.
Earlier commit has added AMDGPUOperand::isBoolReg(). Turns out
gcc issues warning about unused function since D63204 is not
yet submitted.

Added NFC part of D63204 to have a use of that function and
mute the warning.

llvm-svn: 363416
2019-06-14 16:25:46 +00:00
Francis Visoiu Mistrih 7a21113ce8 Reland: [Remarks] Refactor optimization remarks setup
* Add a common function to setup opt-remarks
* Rename common options to the same names
* Add error types to distinguish between file errors and regex errors

llvm-svn: 363415
2019-06-14 16:20:51 +00:00
Matt Arsenault c2864c0de0 GlobalISel: Avoid producing Illegal copies in RegBankSelect
Avoid producing illegal register bank copies for reg_sequence and
phi. The default implementation assumes it is possible to pick any
operand's bank and use that for the result, introducing a copy for
operands with a different bank. This does not check for illegal
copies. It is not legal to introduce a VGPR->SGPR copy, so any VGPR
operand requires the result to be a VGPR.

The changes in getInstrMappingImpl aren't strictly necessary, since
AMDGPU now just bypasses this for reg_sequence/phi. This could be
replaced with an assert in case other targets run into this. It is
currently responsible for producing the error for unsatisfiable
copies, but this will be better served with a verifier check.

For phis, for now assume any undetermined operands must be
VGPRs. Eventually, this needs to be able to defer mapping these
operations. This also does not yet have a way to check for whether the
block is in a divergent region.

llvm-svn: 363410
2019-06-14 15:22:25 +00:00
Sanjay Patel 7ea378b940 [CodeGenPrepare] propagate debuginfo when copying a shuffle
llvm-svn: 363409
2019-06-14 15:05:35 +00:00
Johannes Doerfert 282d34ee78 [Attributor] Disable the Attributor by default and fix a comment
llvm-svn: 363408
2019-06-14 14:53:41 +00:00
Matt Arsenault 492d71cc99 AMDGPU: Fold readlane intrinsics of constants
I'm not 100% sure about this, since I'm worried about IR transforms
that might end up introducing divergence downstream once replaced with
a constant, but I haven't come up with an example yet.

llvm-svn: 363406
2019-06-14 14:51:26 +00:00
Mikhail Maltsev d1cc2e1543 [ARM] Add MVE horizontal accumulation instructions
This is the family of vector instructions that combine all the lanes
in their input vector(s), and output a value in one or two GPRs.

Differential Revision: https://reviews.llvm.org/D62670

llvm-svn: 363403
2019-06-14 14:31:13 +00:00
Eugene Leviant c74910b842 Fix failing test on ARM buildbot
r363261 caused test failure on 32-bit ARM buildbot,
because of unsigned integer overflow. This patch
fixes it changing offset type from size_t to uint64_t.

llvm-svn: 363393
2019-06-14 13:45:21 +00:00
Matt Arsenault 731a81598e RegBankSelect: Remove checks for invalid mappings
Avoid a check for valid and a set of redundant asserts. The place
InstructionMapping is constructed asserts all of the default fields
are passed anyway for an invalid mapping, so don't overcomplicate
this.

llvm-svn: 363391
2019-06-14 13:42:40 +00:00
Matt Arsenault 5a86dbcf30 AMDGPU: Fix input chain when gluing copies to m0
I don't think this was causing any observable issues, but was making
reading the DAG dump confusing.

llvm-svn: 363389
2019-06-14 13:33:36 +00:00
Andrea Di Biagio 6b78e4d0a4 [MCA] Ignore invalid processor resource writes of zero cycles. NFCI
In debug mode, the tool also raises a warning and prints out a message which
helps identify the problematic MCWriteProcResEntry from the scheduling class.
This message would have been useful to have when triaging PR42282.

llvm-svn: 363387
2019-06-14 13:31:21 +00:00
Matt Arsenault 3062e87a1e Fix not calling TargetCustom PSVs printer
If the enum value was greater than the starting target custom value,
the custom printer wasn't called.

llvm-svn: 363386
2019-06-14 13:26:34 +00:00
Matt Arsenault d3c84e6719 AMDGPU: Refactor to prepare for manually selecting more intrinsics
llvm-svn: 363385
2019-06-14 13:26:32 +00:00
Matt Arsenault 74d67c2086 AMDGPU: Fix printing trailing whitespace after s_endpgm
llvm-svn: 363384
2019-06-14 13:26:29 +00:00
Matt Arsenault 642f39c93e AMDGPU: Fix missing const
llvm-svn: 363383
2019-06-14 13:26:23 +00:00
Sjoerd Meijer 3058a62b90 [ARM] MVE VPT Block Pass
Initial commit of a new pass to create vector predication blocks, called VPT
blocks, that are supported by the Armv8.1-M MVE architecture.

This is a first naive implementation. I.e., for 2 consecutive predicated
instructions I1 and I2, for example, it will generate 2 VPT blocks:

VPST
I1
VPST
I2

A more optimal implementation would obviously put instructions in the same VPT
block when they are predicated on the same condition and when it is allowed to
do this:

VPTT
I1
I2

We will address this optimisation with follow up patches when the groundwork is
in. Creating VPT Blocks is very similar to IT Blocks, which is the reason I
added this to Thumb2ITBlocks.cpp. This allows reuse of the def use analysis
that we need for the more optimal implementation.

VPT blocks cannot be nested in IT blocks, and vice versa, and so these 2 passes
cannot interact with each other. Instructions allowed in VPT blocks must
be MVE instructions that are marked as VPT compatible.

Differential Revision: https://reviews.llvm.org/D63247

llvm-svn: 363370
2019-06-14 11:46:05 +00:00
George Rimar cfa1a62a4c [yaml2obj] - Allow setting cutom Flags for implicit sections.
With this patch we get ability to set any flags we want
for implicit sections defined in YAML.

Differential revision: https://reviews.llvm.org/D63136

llvm-svn: 363367
2019-06-14 11:01:14 +00:00
Sam Parker 0cf9639a9c [SCEV] Pass NoWrapFlags when expanding an AddExpr
InsertBinop now accepts NoWrapFlags, so pass them through when
expanding a simple add expression.

This is the first re-commit of the functional changes from rL362687,
which was previously reverted.

Differential Revision: https://reviews.llvm.org/D61934

llvm-svn: 363364
2019-06-14 09:19:41 +00:00
Eric Christopher 5e83d8fff4 Move commentary on opcode translation for code16 mov instructions
to segment registers closer to the segment register check for when
we add further optimizations.

llvm-svn: 363355
2019-06-14 04:51:55 +00:00
David Blaikie 4129e3e0f8 DebugInfo: Include enumerators in pubnames
This is consistent with GCC's behavior (which is the defacto standard
for pubnames). Though I find the presence of enumerators from enum
classes to be a bit confusing, possibly a bug on GCC's end (since they
can't be named unqualified, unlike the other names - and names nested in
classes don't go in pubnames, for instance - presumably because one must
name the class first & that's enough to limit the scope of the search)

llvm-svn: 363349
2019-06-14 01:58:56 +00:00
Stanislav Mekhanoshin c43e67bfff [AMDGPU] gfx1011/gfx1012 targets
Differential Revision: https://reviews.llvm.org/D63307

llvm-svn: 363344
2019-06-14 00:33:31 +00:00
Francis Visoiu Mistrih e4147ea1ef Revert "[Remarks] Refactor optimization remarks setup"
This reverts commit 6e6e3af55b.

This breaks greendragon.

llvm-svn: 363343
2019-06-14 00:05:56 +00:00
Vedant Kumar 1e4882c890 [Coverage] Speculative fix for r363325 for an older compiler
It looks like an older version of gcc can't figure out that it needs to
move a unique_ptr while implicitly constructing an Expected object.

llvm-svn: 363342
2019-06-14 00:03:22 +00:00
Stanislav Mekhanoshin 68a2fef9ae [AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32
Differential Revision: https://reviews.llvm.org/D63301

llvm-svn: 363339
2019-06-13 23:47:36 +00:00
Amy Huang 49275272e3 Use fully qualified name when printing S_CONSTANT records
Summary:
Before it was using the fully qualified name only for static data members.
Now it does for all variable names to match MSVC.

Reviewers: rnk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63012

llvm-svn: 363335
2019-06-13 22:53:43 +00:00
Peter Collingbourne 0feb6e52f1 Symbolize: Remove dead code. NFCI.
The only caller of SymbolizableObjectFile::create passes a non-null
DebugInfoContext and asserts that they do so. Move the assert into
SymbolizableObjectFile::create and remove null checks.

Differential Revision: https://reviews.llvm.org/D63298

llvm-svn: 363334
2019-06-13 22:49:34 +00:00
Amara Emerson fb0a40f064 [GlobalISel][IRTranslator] Add debug loc with line 0 to constants emitted into the entry block.
Constants, including G_GLOBAL_VALUE, are all emitted into the entry block which
lets us use the vreg def assuming it dominates all other users. However, it can
cause jumpy debug behaviour since the DebugLoc attached to these MIs are from
a user instruction that could be in a different block.

Fixes PR40887.

Differential Revision: https://reviews.llvm.org/D63286

llvm-svn: 363331
2019-06-13 22:15:35 +00:00
Craig Topper cf34a2bd5d [X86Disassembler] Unify the EVEX and VEX code in emitContextTable. Merge the ATTR_VEXL/ATTR_EVEXL bits. NFCI
Merging the two bits shrinks the context table from 16384 bytes to 8192 bytes.

Remove the ATTRIBUTE_BITS macro and just create an enum directly. Then fix the ATTR_max define to be 8192 to reflect the table size so we stop hardcoding it separately.

llvm-svn: 363330
2019-06-13 22:15:25 +00:00
Jinsong Ji 1c88445840 [MachinePiepliner] Don't check boundary node in checkValidNodeOrder
This was exposed by PowerPC target enablement.

In ScheduleDAG, if we haven't seen any uses in this scheduling region,
we will create a dependence edge to ExitSU to model the live-out latency.
This is required for vreg defs with no in-region use, and prefetches with
no vreg def.

When we build NodeOrder in Scheduler, we ignore these boundary nodes.
However, when we check Succs in checkValidNodeOrder, we did not skip
them, so we still assume all the nodes have been sorted and in order in
Indices array. So when we call lower_bound() for ExitSU, it will return
Indices.end(), causing memory issues in following Node access.

Differential Revision: https://reviews.llvm.org/D63282

llvm-svn: 363329
2019-06-13 21:51:12 +00:00
Francis Visoiu Mistrih 6e6e3af55b [Remarks] Refactor optimization remarks setup
* Add a common function to setup opt-remarks
* Rename common options to the same names
* Add error types to distinguish between file errors and regex errors

llvm-svn: 363328
2019-06-13 21:46:57 +00:00
Vedant Kumar 901d04fc6d [Coverage] Load code coverage data from archives
Support loading code coverage data from regular archives, thin archives,
and from MachO universal binaries which contain archives.

Testing: check-llvm, check-profile (with {A,UB}San enabled)

rdar://51538999

Differential Revision: https://reviews.llvm.org/D63232

llvm-svn: 363325
2019-06-13 20:48:57 +00:00
Stanislav Mekhanoshin ccecd22db9 [AMDGPU] gfx1010 AMDGPUSetCCOp definition
It was missing from D63293 and breaks in a debug tablegen w/o
this part.

llvm-svn: 363323
2019-06-13 20:23:02 +00:00
Lang Hames 2f8c6f9362 [ORC] Rename MaterializationResponsibility resolve and emit methods to
notifyResolved/notifyEmitted.

The 'notify' prefix better describes what these methods do: they update the JIT
symbol states and notify any pending queries that the 'resolved' and 'emitted'
states have been reached (rather than actually performing the resolution or
emission themselves). Since new states are going to be introduced in the near
future (to track symbol registration/initialization) it's worth changing the
convention pre-emptively to avoid further confusion.

llvm-svn: 363322
2019-06-13 20:11:23 +00:00
Nikita Popov ad81d427ca [LangRef] Clarify poison semantics
I find the current documentation of poison somewhat confusing,
mainly because its use of "undefined behavior" doesn't seem to
align with our usual interpretation (of immediate UB). Especially
the sentence "any instruction that has a dependence on a poison
value has undefined behavior" is very confusing.

Clarify poison semantics by:

 * Replacing the introductory paragraph with the standard rationale
   for having poison values.
 * Spelling out that instructions depending on poison return poison.
 * Spelling out how we go from a poison value to immediate undefined
   behavior and give the two examples we currently use in ValueTracking.
 * Spelling out that side effects depending on poison are UB.

Differential Revision: https://reviews.llvm.org/D63044

llvm-svn: 363320
2019-06-13 19:45:36 +00:00
Philip Reames 038e01dc9a Add a clarifying comment about branching on poison
I recently got this wrong (again), and I'm sure I'm not the only one.  Put a comment in the logical place someone would look to "fix" the obvious "missed optimization" which arrises based on the common misunderstanding.  Hopefully, this will save others time.  :)

llvm-svn: 363318
2019-06-13 19:27:56 +00:00
Stanislav Mekhanoshin 8bcc9bb595 [AMDGPU] gfx1010 base changes for wave32
Differential Revision: https://reviews.llvm.org/D63293

llvm-svn: 363299
2019-06-13 19:18:29 +00:00
Philip Reames c37be29634 [LFTR] Rename variable to minimize confusion [NFC]
As pointed out by Nikita in D62625, BackedgeTakenCount is generally used to refer to the backedge taken count of the loop.  A conditional backedge taken count - one which only applies if a particular exit is taken - is called a ExitCount in SCEV code, so be consistent here.

llvm-svn: 363293
2019-06-13 18:40:15 +00:00
Philip Reames 42a3fc133d [LFTR] Stylistic cleanup as suggested in last review comment of D62939 [NFC]
llvm-svn: 363292
2019-06-13 18:32:55 +00:00
Philip Reames eb88badff9 Fix a bug w/inbounds invalidation in LFTR
This contains fixes for two cases where we might invalidate inbounds and leave it stale in the IR (a miscompile). Case 1 is when switching to an IV with no dynamically live uses, and case 2 is when doing pre-to-post conversion on the same pointer type IV.

The basic scheme used is to prove that using the given IV (pre or post increment forms) would have to already trigger UB on the path to the test we're modifying.  As such, our potential UB triggering use does not change the semantics of the original program.

As was pointed out in the review thread by Nikita, this is defending against a separate issue from the hasConcreteDef case. This is about poison, that's about undef. Unfortunately, the two are different, see Nikita's comment for a fuller explanation, he explains it well.

(Note: I'm going to address Nikita's last style comment in a separate commit just to minimize chance of subtle bugs being introduced due to typos.)

Differential Revision: https://reviews.llvm.org/D62939

llvm-svn: 363289
2019-06-13 18:23:13 +00:00
Leonard Chan 09f56b51ec [clang][NewPM] Fix broken -O0 test from missing assumptions
Add an AssumptionCache callback to the InlineFuntionInfo used for the
AlwaysInlinerPass to match codegen of the AlwaysInlinerLegacyPass to generate
llvm.assume. This fixes CodeGen/builtin-movdir.c when new PM is enabled by
default.

Differential Revision: https://reviews.llvm.org/D63170

llvm-svn: 363287
2019-06-13 18:18:40 +00:00
David Bolvansky 896ece41e4 [Codegen] Merge tail blocks with no successors after block placement
Summary:
I found the following case having tail blocks with no successors merging opportunities after block placement.

Before block placement:

bb0:
    ...
    bne a0, 0, bb2:

bb1:
    mv a0, 1
    ret 

bb2:
    ...

bb3:
    mv a0, 1
    ret

bb4:
    mv a0, -1
    ret

The conditional branch bne in bb0 is opposite to beq.

After block placement:

bb0:
    ...
    beq a0, 0, bb1

bb2:
    ...

bb4:
    mv a0, -1
    ret

bb1:
    mv a0, 1
    ret

bb3:
    mv a0, 1
    ret

After block placement, that appears new tail merging opportunity, bb1 and bb3 can be merged as one block. So the conditional constraint for merging tail blocks with no successors should be removed. In my experiment for RISC-V, it decreases code size.


Author of original patch: Jim Lin

Reviewers: haicheng, aheejin, craig.topper, rnk, RKSimon, Jim, dmgreen

Reviewed By: Jim, dmgreen

Subscribers: xbolva00, dschuff, javed.absar, sbc100, jgravelle-google, aheejin, kito-cheng, dmgreen, PkmX, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D54411

llvm-svn: 363284
2019-06-13 18:11:32 +00:00
Stanislav Mekhanoshin 2bda177da0 [AMDGPU] ImmArg and SourceOfDivergence for permlane/dpp
Added missing ImmArg and SourceOfDivergence to the crosslane
intrinsics.

Differential Revision: https://reviews.llvm.org/D63216

llvm-svn: 363276
2019-06-13 16:31:51 +00:00
Joseph Tremoulet 3bc6e2a7aa [EarlyCSE] Ensure equal keys have the same hash value
Summary:
The logic in EarlyCSE that looks through 'not' operations in the
predicate recognizes e.g. that `select (not (cmp sgt X, Y)), X, Y` is
equivalent to `select (cmp sgt X, Y), Y, X`.  Without this change,
however, only the latter is recognized as a form of `smin X, Y`, so the
two expressions receive different hash codes.  This leads to missed
optimization opportunities when the quadratic probing for the two hashes
doesn't happen to collide, and assertion failures when probing doesn't
collide on insertion but does collide on a subsequent table grow
operation.

This change inverts the order of some of the pattern matching, checking
first for the optional `not` and then for the min/max/abs patterns, so
that e.g. both expressions above are recognized as a form of `smin X, Y`.

It also adds an assertion to isEqual verifying that it implies equal
hash codes; this fires when there's a collision during insertion, not
just grow, and so will make it easier to notice if these functions fall
out of sync again.  A new flag --earlycse-debug-hash is added which can
be used when changing the hash function; it forces hash collisions so
that any pair of values inserted which compare as equal but hash
differently will be caught by the isEqual assertion.

Reviewers: spatel, nikic

Reviewed By: spatel, nikic

Subscribers: lebedev.ri, arsenm, craig.topper, efriedma, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62644

llvm-svn: 363274
2019-06-13 15:24:11 +00:00
Simon Pilgrim 757a2f13fd [X86] Use fresh MemOps when emitting VAARG64
Previously it copied over MachineMemOperands verbatim which caused MOV32rm to have store flags set, and MOV32mr to have load flags set. This fixes some assertions being thrown with EXPENSIVE_CHECKS on.

Committed on behalf of @luke (Luke Lau)

Differential Revision: https://reviews.llvm.org/D62726

llvm-svn: 363268
2019-06-13 14:05:37 +00:00
David Stenberg 1278a19282 Remove ';' after namespace's closing bracket [NFC]
llvm-svn: 363267
2019-06-13 14:02:55 +00:00
Diogo N. Sampaio 0be2d25ecc [FIX] Forces shrink wrapping to consider any memory access as aliasing with the stack
Summary:
Relate bug: https://bugs.llvm.org/show_bug.cgi?id=37472

The shrink wrapping pass prematurally restores the stack, at a point where the stack might still be accessed.
Taking an exception can cause the stack to be corrupted.

As a first approach, this patch is overly conservative, assuming that any instruction that may load or store could access
the stack.

Reviewers: dmgreen, qcolombet

Reviewed By: qcolombet

Subscribers: simpal01, efriedma, eli.friedman, javed.absar, llvm-commits, eugenis, chill, carwil, thegameg

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63152

llvm-svn: 363265
2019-06-13 13:56:19 +00:00
Eugene Leviant 407c8f1f49 Extra error checking to ARMAttributeParser
The patch checks for subsection length as discussed in D63191

llvm-svn: 363260
2019-06-13 13:25:20 +00:00
Jeremy Morse d2cd9c23b4 [NFC] Sink a function call into LiveDebugValues::process
This was requested in D62904, which I successfully missed. This is just
a refactor and shouldn't change any behaviour.

llvm-svn: 363259
2019-06-13 13:11:57 +00:00
Simon Tatham 286e1d2c2d [ARM] Set up infrastructure for MVE vector instructions.
This commit prepares the way to start adding the main collection of
MVE instructions, which operate on the 128-bit vector registers.

The most obvious thing that's needed, and the simplest, is to add the
MQPR register class, which is like the existing QPR except that it has
fewer registers in it.

The more complicated part: MVE defines a system of vector predication,
in which instructions operating on 128-bit vector registers can be
constrained to operate on only a subset of the lanes, using a system
of prefix instructions similar to the existing Thumb IT, in that you
have one prefix instruction which designates up to 4 following
instructions as subject to predication, and within that sequence, the
predicate can be inverted by means of T/E suffixes ('Then' / 'Else').

To support instructions of this type, we've added two new Tablegen
classes `vpred_n` and `vpred_r` for standard clusters of MC operands
to add to a predicated instruction. Both include a flag indicating how
the instruction is predicated at all (options are T, E and 'not
predicated'), and an input register field for the register controlling
the set of active lanes. They differ from each other in that `vpred_r`
also includes an input operand for the previous value of the output
register, for instructions that leave inactive lanes unchanged.
`vpred_n` lacks that extra operand; it will be used for instructions
that don't preserve inactive lanes in their output register (either
because inactive lanes are zeroed, as the MVE load instructions do, or
because the output register isn't a vector at all).

This commit also adds the family of prefix instructions themselves
(VPT / VPST), and all the machinery needed to work with them in
assembly and disassembly (e.g. generating the 't' and 'e' mnemonic
suffixes on disassembled instructions within a predicated block)

I've added a couple of demo instructions that derive from the new
Tablegen base classes and use those two operand clusters. The bulk of
the vector instructions will come in followup commits small enough to
be manageable. (One exception is that I've added the full version of
`isMnemonicVPTPredicable` in the AsmParser, because it seemed
pointless to carefully split it up.)

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62669

llvm-svn: 363258
2019-06-13 13:11:13 +00:00
Simon Pilgrim 6b56ad164c [CodeGen] Add getMachineMemOperand + MachineMemOperand::Flags allocator helper wrapper. NFCI.
Pre-commit for D62726 on behalf of @luke (Luke Lau)

llvm-svn: 363257
2019-06-13 12:58:55 +00:00
Jeremy Morse bf2b2f08b0 [DebugInfo] Honour variable fragments in LiveDebugValues
This patch makes the LiveDebugValues pass consider fragments when propagating
DBG_VALUE insts between blocks, fixing PR41979. Fragment info for a variable
location is added to the open-ranges key, which allows distinct fragments to be
tracked separately. To handle overlapping fragments things become slightly
funkier. To avoid excessive searching for overlaps in the data-flow part of
LiveDebugValues, this patch:
 * Pre-computes pairings of fragments that overlap, for each DILocalVariable
 * During data-flow, whenever something happens that causes an open range to
   be terminated (via erase), any fragments pre-determined to overlap are
   also terminated.

The effect of which is that when encountering a DBG_VALUE fragment that
overlaps others, the overlapped fragments do not get propagated to other
blocks. We still rely on later location-list building to correctly handle
overlapping fragments within blocks.

It's unclear whether a mixture of DBG_VALUEs with and without fragmented
expressions are legitimate. To avoid suprises, this patch interprets a
DBG_VALUE with no fragment as overlapping any DBG_VALUE _with_ a fragment.

Differential Revision: https://reviews.llvm.org/D62904

llvm-svn: 363256
2019-06-13 12:51:57 +00:00
Dmitry Preobrazhensky 1fca3b1972 [AMDGPU][MC] Enabled constant expressions as operands of s_getreg/s_setreg
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D61125

llvm-svn: 363255
2019-06-13 12:46:37 +00:00
Eugene Leviant b00dbcbb43 [ThinLTO][Bitcode] Add 'entrycount' to FS_COMBINED_PROFILE. NFC
Differential revision: https://reviews.llvm.org/D63078

llvm-svn: 363254
2019-06-13 12:33:26 +00:00
Simon Pilgrim 0baf136a4d [X86][SSE] Avoid assert for broadcast(horiz-op()) cases for non-f64 cases.
Based on fuzz test from @craig.topper

llvm-svn: 363251
2019-06-13 11:26:21 +00:00
Nikola Prica 076ae0d2e2 [DebugInfo] Move Value struct out of DebugLocEntry as DbgValueLoc (NFC)
Since the DebugLocEntry::Value is used as part of DwarfDebug and
DebugLocEntry make it as the separate class.

Reviewers: aprantl, dstenb

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D63213

llvm-svn: 363246
2019-06-13 10:23:26 +00:00
Jeremy Morse 181bf0cefb [DebugInfo] Use FrameDestroy to extend stack locations to end-of-function
We aim to ignore changes in variable locations during the prologue and
epilogue of functions, to avoid using space documenting location changes
that aren't visible. However in D61940 / r362951 this got ripped out as
the previous implementation was unsound.

Instead, use the FrameDestroy flag to identify when we're in the epilogue
of a function, and ignore variable location changes accordingly. This fits
in with existing code that examines the FrameSetup flag.

Some variable locations get shuffled in modified tests as they now cover
greater ranges, which is what would be expected. Some additional
single-location variables are generated too. Two tests are un-xfailed,
they were only xfailed due to r362951 deleting functionality they depended
on.

Apparently some out-of-tree backends don't accurately maintain FrameDestroy
flags -- if you're an out-of-tree maintainer and see changes in variable
locations disappear due to a faulty FrameDestroy flag, it's safe to back
this change out. The impact is just slightly more debug info than necessary.

Differential Revision: https://reviews.llvm.org/D62314

llvm-svn: 363245
2019-06-13 10:03:17 +00:00
Simon Tatham 848d3d0d2c [ARM] Refactor handling of IT mask operands.
During assembly, the mask operand to an IT instruction (storing the
sequence of T/E for 'Then' and 'Else') is parsed out of the mnemonic
into a representation that encodes 'Then' and 'Else' in the same way
regardless of the condition code. At some point during encoding it has
to be converted into the instruction encoding used in the
architecture, in which the mask encodes a sequence of replacement
low-order bits for the condition code, so that which bit value means
'then' and which 'else' depends on whether the original condition code
had its low bit set.

Previously, that transformation was done by processInstruction(), half
way through assembly. So an MCOperand storing an IT mask would
sometimes store it in one format, and sometimes in the other,
depending on where in the assembly pipeline you were. You can see this
in diagnostics from `llvm-mc -debug -triple=thumbv8a -show-inst`, for
example: if you give it an instruction such as `itete eq`, you'd see
an `<MCOperand Imm:5>` in a diagnostic become `<MCOperand Imm:11>` in
the final output.

Having the same data structure store values with time-dependent
semantics is confusing already, and it will get more confusing when we
introduce the MVE VPT instruction which reuses the Then/Else bitmask
idea in a different context. So I'm refactoring: now, all `ARMOperand`
and `MCOperand` representations of an IT mask work exactly the same
way, namely, 0 means 'Then' and 1 means 'Else', regardless of what
original predicate is being referred to. The architectural encoding of
IT that depends on the original condition is now constructed at the
point when we turn the `MCOperand` into the final instruction bit
pattern, and decoded similarly in the disassembler.

The previous condition-independent parse-time format used 0 for Else
and 1 for Then. I've taken the opportunity to flip the sense of it
while I'm changing all of this anyway, because it seems to me more
natural to use 0 for 'leave the starting condition unchanged' and 1
for 'invert it', as if those bits were an XOR mask.

Reviewers: ostannard

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63219

llvm-svn: 363244
2019-06-13 10:01:52 +00:00
Sander de Smalen 51c2fa0e2a Improve reduction intrinsics by overloading result value.
This patch uses the mechanism from D62995 to strengthen the
definitions of the reduction intrinsics by letting the scalar
result/accumulator type be overloaded from the vector element type.

For example:

  ; The LLVM LangRef specifies that the scalar result must equal the
  ; vector element type, but this is not checked/enforced by LLVM.
  declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a)

This patch changes that into:

  declare i32 @llvm.experimental.vector.reduce.or.v4i32(<4 x i32> %a)

Which has the type-constraint more explicit and causes LLVM to check
the result type with the vector element type.

Reviewers: RKSimon, arsenm, rnk, greened, aemerson

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D62996

llvm-svn: 363240
2019-06-13 09:37:38 +00:00
Sam Parker 179e0fa881 [NFC] Simplify Call query
Use getIntrinsicID() directly from IntrinsicInst.

llvm-svn: 363235
2019-06-13 08:32:56 +00:00
Sam Parker 9d28473a35 [ARM][TTI] Scan for existing loop intrinsics
TTI should report that it's not profitable to generate a hardware loop
if it, or one of its child loops, has already been converted.

Differential Revision: https://reviews.llvm.org/D63212

llvm-svn: 363234
2019-06-13 08:28:46 +00:00
Sander de Smalen 7957fc6547 [IntrinsicEmitter] Extend argument overloading with forward references.
Extend the mechanism to overload intrinsic arguments by using either
backward or forward references to the overloadable arguments.

In for example:

  def int_something : Intrinsic<[LLVMPointerToElt<0>],
                                [llvm_anyvector_ty], []>;

LLVMPointerToElt<0> is a forward reference to the overloadable operand
of type 'llvm_anyvector_ty' and would allow intrinsics such as:

  declare i32* @llvm.something.v4i32(<4 x i32>);
  declare i64* @llvm.something.v2i64(<2 x i64>);

where the result pointer type is deduced from the element type of the
first argument.

If the returned pointer is not a pointer to the element type, LLVM will
give an error:

  Intrinsic has incorrect return type!
  i64* (<4 x i32>)* @llvm.something.v4i32

Reviewers: RKSimon, arsenm, rnk, greened

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D62995

llvm-svn: 363233
2019-06-13 08:19:33 +00:00
Shawn Landden 8b142bcc3f [SimplifyCFG] reverting preliminary Switch patches again
This reverts 363226 and 363227, both NFC intended

I swear I fixed the test case that is failing, and ran
the tests, but I will look into it again.

llvm-svn: 363229
2019-06-13 05:26:17 +00:00
Shawn Landden 636220e83c [SimpligyCFG] NFC intended, remove GCD that was only used for powers of two
and replace with an equilivent countTrailingZeros.

GCD is much more expensive than this, with repeated division.

This depends on D60823

Differential Revision: https://reviews.llvm.org/D61151

llvm-svn: 363227
2019-06-13 05:01:44 +00:00
Tom Stellard f335672218 X86: Clean up pass initialization
Summary:
- Remove redundant initializations from pass constructors that were
  already being initialized by LLVMInitializeX86Target().

- Add initialization function for the FPS pass.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63218

llvm-svn: 363221
2019-06-13 02:09:32 +00:00
David L. Jones c73fadaa84 Revert r361811: 'Re-commit r357452 (take 2): "SimplifyCFG SinkCommonCodeFromPredecessors ...'
We have observed some failures with internal builds with this revision.

- Performance regressions:
  - llvm's SingleSource/Misc evalloop shows performance regressions (although these may be red herrings).
  - Benchmarks for Abseil's SwissTable.
- Correctness:
  - Failures for particular libicu tests when building the Google AppEngine SDK (for PHP).

hwennborg has already been notified, and is aware of reproducer failures.

llvm-svn: 363220
2019-06-13 02:04:45 +00:00
Philip Reames ae2581cef3 [IndVars] Extend diagnostic -replexitval flag w/ability to bypass hard use hueristic
Note: This does mean that "always" is now more powerful than it was. 
llvm-svn: 363196
2019-06-12 19:52:05 +00:00
Stanislav Mekhanoshin 245b5ba344 [AMDGPU] gfx1010 dpp16 and dpp8
Differential Revision: https://reviews.llvm.org/D63203

llvm-svn: 363186
2019-06-12 18:02:41 +00:00
Stanislav Mekhanoshin 5f581c9f08 [AMDGPU] gfx1010 premlane instructions
Differential Revision: https://reviews.llvm.org/D63202

llvm-svn: 363185
2019-06-12 17:52:51 +00:00
Simon Atanasyan efc0d1a298 [Mips] Add s.d instruction alias for Mips1
Add support for s.d instruction for Mips1 which expands into two swc1
instructions.

Patch by Mirko Brkusanin.

Differential Revision: https://reviews.llvm.org/D63199

llvm-svn: 363184
2019-06-12 17:52:05 +00:00
Philip Reames e51c3d8b82 [SCEV] Teach computeSCEVAtScope benefit from one-input Phi. PR39673
SCEV does not propagate arguments through one-input Phis so as to make it easy for the SCEV expander (and related code) to preserve LCSSA.  It's not entirely clear this restriction is neccessary, but for the moment it exists.   For this reason, we don't analyze single-entry phi inputs.  However it is possible that when an this input leaves the loop through LCSSA Phi, it is a provable constant.  Missing that results in an order of optimization issue in loop exit value rewriting where we miss some oppurtunities based on order in which we visit sibling loops.

This patch teaches computeSCEVAtScope about this case. We can generalize it later, but so far we can only replace LCSSA Phis with their constant loop-exiting values.  We should probably also add similiar logic directly in the SCEV construction path itself.

Patch by: mkazantsev (with revised commit message by me)
Differential Revision: https://reviews.llvm.org/D58113

llvm-svn: 363180
2019-06-12 17:21:47 +00:00
Simon Pilgrim 4e0648a541 [TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123)
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.

This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.

If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.

Differential Revision: https://reviews.llvm.org/D63075

llvm-svn: 363179
2019-06-12 17:14:03 +00:00
Simon Pilgrim 5b0e0dd709 [X86][AVX] Fold concat(vpermilps(x,c),vpermilps(y,c)) -> vpermilps(concat(x,y),c)
Handles PSHUFD/PSHUFLW/PSHUFHW (AVX2) + VPERMILPS (AVX1).

An extra AVX1 PSHUFD->VPERMILPS combine will be added in a future commit.

llvm-svn: 363178
2019-06-12 16:38:20 +00:00
Matt Arsenault f29366b1f5 StackProtector: Use PointerMayBeCaptured
This was using its own, outdated list of possible captures. This was
at minimum not catching cmpxchg and addrspacecast captures.

One change is now any volatile access is treated as capturing. The
test coverage for this pass is quite inadequate, but this required
removing volatile in the lifetime capture test.

Also fixes some infrastructure issues to allow running just the IR
pass.

Fixes bug 42238.

llvm-svn: 363169
2019-06-12 14:23:33 +00:00
Mikael Holmen 030df51e27 [ARM] Fix compiler warning
Without this fix clang 3.6 complains with:

../lib/Target/ARM/ARMAsmPrinter.cpp:1473:18: error: variable 'BranchTarget' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
      } else if (MI->getOperand(1).isSymbol()) {
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
../lib/Target/ARM/ARMAsmPrinter.cpp:1479:22: note: uninitialized use occurs here
      MCInst.addExpr(BranchTarget);
                     ^~~~~~~~~~~~
../lib/Target/ARM/ARMAsmPrinter.cpp:1473:14: note: remove the 'if' if its condition is always true
      } else if (MI->getOperand(1).isSymbol()) {
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../lib/Target/ARM/ARMAsmPrinter.cpp:1465:33: note: initialize the variable 'BranchTarget' to silence this warning
      const MCExpr *BranchTarget;
                                ^
                                 = nullptr
1 error generated.

Discussed here:
 http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190610/661417.html

llvm-svn: 363166
2019-06-12 14:19:22 +00:00
Matt Arsenault aa6bdf9dcd LoopVersioning: Respect convergent
This changes the standalone pass only. Arguably the utility class
itself should assert there are no convergent calls. However, a target
pass with additional context may still be able to version a loop if
all of the dynamic conditions are sufficiently uniform.

llvm-svn: 363165
2019-06-12 14:05:58 +00:00
Anton Afanasyev 339b39b773 [MIR] Skip hoisting to basic block which may throw exception or return
Summary:
Fix hoisting to basic block which are not legal for hoisting cause
it can be terminated by exception or it is return block.

Reviewers: john.brawn, RKSimon, MatzeB

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63148

llvm-svn: 363164
2019-06-12 13:51:44 +00:00
Matt Arsenault 86325be3d7 LoopLoadElim: Respect convergent
llvm-svn: 363162
2019-06-12 13:50:47 +00:00
Matt Arsenault 2466ba97bc LoopDistribute/LAA: Respect convergent
This case is slightly tricky, because loop distribution should be
allowed in some cases, and not others. As long as runtime dependency
checks don't need to be introduced, this should be OK. This is further
complicated by the fact that LoopDistribute partially ignores if LAA
says that vectorization is safe, and then does its own runtime pointer
legality checks.

Note this pass still does not handle noduplicate correctly, as this
should always be forbidden with it. I'm not going to bother trying to
fix it, as it would require more effort and I think noduplicate should
be removed.

https://reviews.llvm.org/D62607

llvm-svn: 363160
2019-06-12 13:34:19 +00:00
Nico Weber 8bbdea447e Fix a Wunused-lambda-capture warning.
The capture was added in the first commit of https://reviews.llvm.org/D61934
when it was used. In the reland, the use was removed but the capture
wasn't removed.

llvm-svn: 363155
2019-06-12 12:46:46 +00:00
Sam Parker 757ac02dc8 [ARM] Implement TTI::isHardwareLoopProfitable
Implement the backend target hook to drive the HardwareLoops pass.
The low-overhead branch extension for Arm M-class cores is flexible
enough that we don't have to ensure correctness at this point, except
checking that the loop counter variable can be stored in LR - a
32-bit register. For it to be profitable, we want to avoid loops that
contain function calls, or any other instruction that alters the PC.
    
This implementation uses TargetLoweringInfo, to query type and
operation actions, looks at intrinsic calls and also performs some
manual checks for remainder/division and FP operations.
    
I think this should be a good base to start and extra details can be
filled out later.

Differential Revision: https://reviews.llvm.org/D62907

llvm-svn: 363149
2019-06-12 12:00:42 +00:00
Sam Parker 61de6a4e9c [NFC][SCEV] Add NoWrapFlag argument to InsertBinOp
'Use wrap flags in InsertBinop' (rL362687) was reverted due to
miscompiles. This patch introduces the previous change to pass
no-wrap flags but now only FlagAnyWrap is passed.

Differential Revision: https://reviews.llvm.org/D61934

llvm-svn: 363147
2019-06-12 11:53:55 +00:00
Nico Weber 1dc2123d64 Share /machine: handling code with llvm-cvtres too
r363016 let lld-link and llvm-lib share the /machine: parsing code.
This lets llvm-cvtres share it as well.

Making llvm-cvtres depend on llvm-lib seemed a bit strange (it doesn't
need llvm-lib's dependencies on BinaryFormat and BitReader) and I
couldn't find a good place to put this code. Since it's just a few
lines, put it in lib/Object for now.

Differential Revision: https://reviews.llvm.org/D63120

llvm-svn: 363144
2019-06-12 11:32:43 +00:00
Simon Pilgrim ca39de7199 [XCore] CombineSTORE - Use allowsMemoryAccess wrapper. NFCI.
Noticed in D63075 - there was a allowsMisalignedMemoryAccesses call to check for unaligned loads and a check for aligned legal type loads - which is exactly what allowsMemoryAccess does.

llvm-svn: 363141
2019-06-12 11:08:29 +00:00
Ben Dunbobbin 564d248ec2 [ThinLTO]LTO]Legacy] Fix dependent libraries support by adding querying of the IRSymtab
Dependent libraries support for the legacy api was committed in a
broken state (see: https://reviews.llvm.org/D60274). This was missed
due to the painful nature of having to integrate the changes into a
linker in order to test. This change implements support for dependent
libraries in the legacy LTO api:

- I have removed the current api function, which returns a single
string, and   added functions to access each dependent library
specifier individually.

- To reduce the testing pain, I have made the api functions as thin as
possible to   maximize coverage from llvm-lto.

- When doing ThinLTO the system linker will load the modules lazily
when scanning   the input files. Unfortunately, when modules are
lazily loaded there is no access   to module level named metadata. To
fix this I have added api functions that allow   querying the IRSymtab
for the dependent libraries. I hope to expand the api in the   future
so that, eventually, all the information needed by a client linker
during   scan can be retrieved from the IRSymtab.

Differential Revision: https://reviews.llvm.org/D62935

llvm-svn: 363140
2019-06-12 11:07:56 +00:00
Simon Pilgrim 32c1e73603 [XCore] LowerLOAD/LowerSTORE - Use allowsMemoryAccess wrapper. NFCI.
Noticed in D63075 - there was a allowsMisalignedMemoryAccesses call to check for unaligned loads and a check for aligned legal type loads - which is exactly what allowsMemoryAccess does.

llvm-svn: 363137
2019-06-12 10:46:50 +00:00
Orlando Cazalet-Hyams a947156396 Revert "[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion"
This reverts commit 1a0f7a2077.
See phabricator thread for D60831.

llvm-svn: 363132
2019-06-12 08:34:51 +00:00
Sjoerd Meijer de73404b8c [AArch64] Merge globals when optimising for size
Extern global merging is good for code-size. There's definitely potential for
performance too, but there's one regression in a benchmark that needs
investigating, so that's why we enable it only when we optimise for size for
now.

Patch by Ramakota Reddy and Sjoerd Meijer.

Differential Revision: https://reviews.llvm.org/D61947

llvm-svn: 363130
2019-06-12 08:28:35 +00:00
Craig Topper ed4cd44870 [X86] Add VCMPSSZrr_Intk and VCMPSDZrr_Intk to isNonFoldablePartialRegisterLoad.
The non-masked versions are already in there. I'm having some
trouble coming up with a way to test this right now. Most load
folding should happen during isel so I'm not sure how to get
peephole pass to do it.

llvm-svn: 363125
2019-06-12 06:29:53 +00:00
Hsiangkai Wang 04ddf39b44 [RISCV] Add CFI directives for RISCV prologue/epilog.
In order to generate correct debug frame information, it needs to
generate CFI information in prologue and epilog.

Differential Revision: https://reviews.llvm.org/D61773

llvm-svn: 363120
2019-06-12 03:04:22 +00:00
Hsiangkai Wang 93be25b580 [NFC] Correct comments in RegisterCoalescer.
Differential Revision: https://reviews.llvm.org/D63124

llvm-svn: 363119
2019-06-12 02:58:04 +00:00
Philip Reames 02f0b379f5 Fix a bug in getSCEVAtScope w.r.t. non-canonical loops
The issue is that if we have a loop with multiple predecessors outside the loop, the code was expecting to merge them and only return if equal, but instead returned the first one seen.

I have no idea if this actually tripped anywhere.  I noticed it by accident when reading the code and have no idea how to go about constructing a test case.

llvm-svn: 363112
2019-06-11 23:21:24 +00:00
Philip Reames 082cd30327 Generalize icmp matching in IndVars' eliminateTrunc
We were only matching RHS being a loop invariant value, not the inverse. Since there's nothing which appears to canonicalize loop invariant values to RHS, this means we missed cases.

Differential Revision: https://reviews.llvm.org/D63112

llvm-svn: 363108
2019-06-11 22:43:25 +00:00
Sanjay Patel 40e3bdf876 [Analysis] add isSplatValue() for vectors in IR
We have the related getSplatValue() already in IR (see code just above the proposed addition).
But sometimes we only need to know that the value is a splat rather than capture the splatted
scalar value. Also, we have an isSplatValue() function already in SDAG.

Motivation - recent bugs that would potentially benefit from improved splat analysis in IR:
https://bugs.llvm.org/show_bug.cgi?id=37428
https://bugs.llvm.org/show_bug.cgi?id=42174

Differential Revision: https://reviews.llvm.org/D63138

llvm-svn: 363106
2019-06-11 22:25:18 +00:00
Amara Emerson d133c15925 [GlobalISel] Add a G_JUMP_TABLE opcode.
This opcode generates a pointer to the address of the jump table
specified by the source operand, which is a jump table index.

It will be used in conjunction with an upcoming G_BRJT opcode to support
jump table codegen with GlobalISel.

Differential Revision: https://reviews.llvm.org/D63111

llvm-svn: 363096
2019-06-11 19:58:06 +00:00
Alina Sbirlea cb4ed8a7bc [MemorySSA] When applying updates, clean unnecessary Phis.
Summary: After applying a set of insert updates, there may be trivial Phis left over. Clean them up.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63033

llvm-svn: 363094
2019-06-11 19:09:34 +00:00
Alina Sbirlea 3cef1f7d64 Only passes that preserve MemorySSA must mark it as preserved.
Summary:
The method `getLoopPassPreservedAnalyses` should not mark MemorySSA as
preserved, because it's being called in a lot of passes that do not
preserve MemorySSA.
Instead, mark the MemorySSA analysis as preserved by each pass that does
preserve it.
These changes only affect the new pass mananger.

Reviewers: chandlerc

Subscribers: mehdi_amini, jlebar, Prazek, george.burgess.iv, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62536

llvm-svn: 363091
2019-06-11 18:27:49 +00:00
Amy Huang 9970817c57 Deduplicate S_CONSTANTs in LLD.
Summary: Deduplicate S_CONSTANTS when linking, if they have the same value.

Reviewers: rnk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63151

llvm-svn: 363089
2019-06-11 18:02:39 +00:00
Jinsong Ji ef2d6d99c0 [PowerPC] Enable MachinePipeliner for P9 with -ppc-enable-pipeliner
Implement necessary target hooks to enable MachinePipeliner for P9 only.
The pass is off by default, can be enabled with -ppc-enable-pipeliner for P9.

Differential Revision: https://reviews.llvm.org/D62164

llvm-svn: 363085
2019-06-11 17:40:39 +00:00
Jonas Devlieghere a6fe345ac9 [Path] Set FD to -1 in moved-from TempFile
When moving a temp file, explicitly set the file descriptor to -1 so we
can never accidentally close the moved-from TempFile.

Differential revision: https://reviews.llvm.org/D63087

llvm-svn: 363083
2019-06-11 16:42:42 +00:00
Cameron McInally 08200d6d26 [InstCombine] Handle -(X-Y) --> (Y-X) for unary fneg when NSZ
Differential Revision: https://reviews.llvm.org/D62612

llvm-svn: 363082
2019-06-11 16:21:21 +00:00
Cameron McInally 796de11331 [InstCombine] Update fptrunc (fneg x)) -> (fneg (fptrunc x) for unary FNeg
Differential Revision: https://reviews.llvm.org/D62629

llvm-svn: 363080
2019-06-11 15:45:41 +00:00
Nico Weber af6bc65ddf lld-link: Reject more than one resource .obj file
Users are exepcted to pass all .res files to the linker, which then
merges all the resource in all .res files into a tree structure and then
converts the final tree structure to a .obj file with .rsrc$01 and
.rsrc$02 sections and then links that.

If the user instead passes several .obj files containing such resources,
the correct thing to do would be to have custom code to merge the trees
in the resource sections instead of doing normal section merging -- but
link.exe rejects if multiple resource obj files are passed in with
LNK4078, so let lld-link do that too instead of silently writing broken
.rsrc sections in that case.

The only real way to run into this is if users manually convert .res
files to .obj files by running cvtres and then handing the resulting
.obj files to lld-link instead, which in practice likely never happens.

(lld-link is slightly stricter than link.exe now: If link.exe is passed
one .obj file created by cvtres, and a .res file, for some reason it
just emits a warning instead of an error and outputs strange looking
data. lld-link now errors out on mixed input like this.)

One way users could accidentally run into this is the following
scenario: If a .res file is passed to lib.exe, then lib.exe calls
cvtres.exe on the .res file before putting it in the output .lib.
(llvm-lib currently doesn't do this.)
link.exe's /wholearchive seems to only add obj files referenced from the
static library index, but lld-link current really adds all files in the
archive. So if lld-link /wholearchive is used with .lib files produced
by lib.exe and .res files were among the files handed to lib.exe, we
previously silently produced invalid output, but now we error out.

link.exe's /wholearchive semantics on the other hand mean that it
wouldn't load the resource object files from the .lib file at all.
Since this scenario is probably still an unlikely corner case,
the difference in behavior here seems fine -- and lld-link might have to
change to use link.exe's /wholearchive semantics in the future anyways.

Vaguely related to PR42180.

Differential Revision: https://reviews.llvm.org/D63109

llvm-svn: 363078
2019-06-11 15:22:28 +00:00
Lewis Revill a5240361dd [RISCV] Add lowering of addressing sequences for PIC
This patch allows lowering of PIC addresses by using PC-relative
addressing for DSO-local symbols and accessing the address through the
global offset table for non-DSO-local symbols.

Differential Revision: https://reviews.llvm.org/D55303

llvm-svn: 363058
2019-06-11 12:57:47 +00:00
Lewis Revill 28a5cadb3a [RISCV] Lower inline asm constraints I, J & K for RISC-V
This validates and lowers arguments to inline asm nodes which have the
constraints I, J & K, with the following semantics (equivalent to GCC):

I: Any 12-bit signed immediate.
J: Immediate integer zero only.
K: Any 5-bit unsigned immediate.

Differential Revision: https://reviews.llvm.org/D54093

llvm-svn: 363054
2019-06-11 12:42:13 +00:00
Mikhail Maltsev 7bd5c55cad [ARM] First MVE instructions: scalar shifts.
This introduces a new decoding table for MVE instructions, and starts
by adding the family of scalar shift instructions that are part of the
MVE architecture extension: saturating shifts within a single GPR, and
long shifts across a pair of GPRs (both saturating and normal).

Some of these shift instructions have only 3-bit register fields in
the encoding, with the low bit fixed. So they can only address an odd
or even numbered GPR (depending on the operand), and therefore I add
two new register classes, GPREven and GPROdd.

Differential Revision: https://reviews.llvm.org/D62668

Change-Id: Iad95d5f83d26aef70c674027a184a6b1e0098d33
llvm-svn: 363051
2019-06-11 12:04:32 +00:00
Nico Weber dd6019526d Let writeWindowsResourceCOFF() take a TimeStamp parameter
For lld, pass in Config->Timestamp (which is set based on lld's
/timestamp: and /Brepro flags). Since the writeWindowsResourceCOFF()
data is only used in-memory by LLD and the obj's timestamp isn't used
for anything in the output, this doesn't change behavior.

For llvm-cvtres, add an optional /timestamp: parameter, and use the
current behavior of calling time() if the parameter is not passed in.

This doesn't really change observable behavior (unless someone passes
/timestamp: to llvm-cvtres, which wasn't possible before), but it
removes the last unqualified call to time() from llvm/lib, which seems
like a good thing.

Differential Revision: https://reviews.llvm.org/D63116

llvm-svn: 363050
2019-06-11 11:26:50 +00:00
Simon Pilgrim 266f43964e [TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI.
As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075.

llvm-svn: 363048
2019-06-11 11:00:23 +00:00
Orlando Cazalet-Hyams 1a0f7a2077 [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024

The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:

A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.

In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.

I have set up a separate review D61933 for a fix which is required for this patch.

Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse

Reviewed By: hfinkel, jmorse

Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits

Tags: #llvm, #debug-info

Differential Revision: https://reviews.llvm.org/D60831

llvm-svn: 363046
2019-06-11 10:37:20 +00:00
Simon Tatham 14241378d3 [ARM] Fix unused-variable warning in rL363039.
The variable `OffsetMask` is currently only used in an assertion, so
if assertions are compiled out and -Werror is enabled, it becomes a
build failure.

llvm-svn: 363043
2019-06-11 10:09:12 +00:00
Simon Pilgrim 287e78c82b [DAGCombine] GetNegatedExpression - constant float vector support (PR42105)
Add support for negation of constant build vectors.

Differential Revision: https://reviews.llvm.org/D62963

llvm-svn: 363040
2019-06-11 09:44:33 +00:00
Simon Tatham 8c865cacda [ARM] Add the non-MVE instructions in Arm v8.1-M.
This adds support for the new family of conditional selection /
increment / negation instructions; the low-overhead branch
instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole
list of registers at once; the new VMRS/VMSR and VLDR/VSTR
instructions to get data in and out of 8.1-M system registers,
particularly including the new VPR register used by MVE vector
predication.

To support this, we also add a register name 'zr' (used by the CSEL
family to force one of the inputs to the constant 0), and operand
types for lists of registers that are also allowed to include APSR or
VPR (used by CLRM). The VLDR/VSTR instructions also need a new
addressing mode.

The low-overhead branch instructions exist in their own separate
architecture extension, which we treat as enabled by default, but you
can say -mattr=-lob or equivalent to turn it off.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Reviewed By: samparker

Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62667

llvm-svn: 363039
2019-06-11 09:29:18 +00:00
Sander de Smalen cbeb563cfb Change semantics of fadd/fmul vector reductions.
This patch changes how LLVM handles the accumulator/start value
in the reduction, by never ignoring it regardless of the presence of
fast-math flags on callsites. This change introduces the following
new intrinsics to replace the existing ones:

  llvm.experimental.vector.reduce.fadd -> llvm.experimental.vector.reduce.v2.fadd
  llvm.experimental.vector.reduce.fmul -> llvm.experimental.vector.reduce.v2.fmul

and adds functionality to auto-upgrade existing LLVM IR and bitcode.

Reviewers: RKSimon, greened, dmgreen, nikic, simoll, aemerson

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D60261

llvm-svn: 363035
2019-06-11 08:22:10 +00:00
Craig Topper 627d8168e7 [X86] Add load folding isel patterns to scalar_math_patterns and AVX512_scalar_math_fp_patterns.
Also add a FIXME for the peephole pass not being able to handle this.

llvm-svn: 363032
2019-06-11 04:30:53 +00:00
Tom Stellard 4b0b26199b Revert CMake: Make most target symbols hidden by default
This reverts r362990 (git commit 374571301d)

This was causing linker warnings on Darwin:

ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)'
from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol
'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&),
std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)'
means the weak symbol cannot be overridden at runtime. This was likely caused by different translation
units being compiled with different visibility settings.

llvm-svn: 363028
2019-06-11 03:21:13 +00:00
Peter Collingbourne e5bdedac9d Symbolize: Make DWPName a symbolizer option instead of an argument to symbolize{,Inlined}Code.
This makes the interface simpler and more consistent with the interface for
.dSYM files and fixes a bug where llvm-symbolizer would not read the dwp if
it was asked to symbolize data before symbolizing code.

Differential Revision: https://reviews.llvm.org/D63114

llvm-svn: 363025
2019-06-11 02:32:27 +00:00
Matt Arsenault c5830f5f05 AtomicExpand: Don't crash on non-0 alloca
This now produces garbage on AMDGPU with a call to an nonexistent,
anonymous libcall but won't assert.

llvm-svn: 363022
2019-06-11 01:35:07 +00:00
Matt Arsenault 383e72fcfe AMDGPU: Expand < 32-bit atomics
Also fix AtomicExpand asserting on atomicrmw fadd/fsub.

llvm-svn: 363021
2019-06-11 01:35:00 +00:00
Nico Weber b941fa8821 llvm-lib: Implement /machine: argument
And share some code with lld-link.

While here, also add a FIXME about PR42180 and merge r360150 to llvm-lib.

Differential Revision: https://reviews.llvm.org/D63021

llvm-svn: 363016
2019-06-11 01:13:41 +00:00
Yi Kong 432f48fcd4 [AArch64] Add more CPUs to host detection
Returns "cortex-a73" for 3rd and 4th gen Kryo; not precisely correct,
but close enough.

Differential Revision: https://reviews.llvm.org/D63099

llvm-svn: 363013
2019-06-11 00:05:36 +00:00
Puyan Lotfi 4d89462a1c [MIR-Canon] Fixing non-determinism that was breaking bots (NFC).
An earlier fix of a subtle iterator invalidation bug had uncovered a
nondeterminism that was present in the MultiUsers bag. Problem was that
MultiUsers was being looked up using pointers.

This patch is an NFC change that numbers each multiuser and processes each in
numbered order. This fixes the test failure on netbsd and will likely fix the
green-dragon bot too.

llvm-svn: 363012
2019-06-11 00:00:25 +00:00
Shoaib Meenai 5062cf599c [Support] Explicitly detect recursive response files
Previous detection relied upon an arbitrary hard coded limit of 21
response files, which some code bases were running up against.

The new detection maintains a stack of processing response files and
explicitly checks if a newly encountered file is in the current stack.
Some bookkeeping data is necessary in order to detect when to pop the
stack.

Patch by Chris Glover.

Differential Revision: https://reviews.llvm.org/D62798

llvm-svn: 363005
2019-06-10 23:24:02 +00:00
Rong Xu 7ea131c20c [PGO] Fix the buildbot failure in r362995
Fixed one unused variable warning.

llvm-svn: 363004
2019-06-10 23:20:04 +00:00
Rong Xu e44fa83c37 [PGO] Handle cases of non-instrument BBs
As shown in PR41279, some basic blocks (such as catchswitch) cannot be
instrumented. This patch filters out these BBs in PGO instrumentation.
It also sets the profile count to the fail-to-instrument edge, so that we
can propagate the counts in the CFG.

Differential Revision: https://reviews.llvm.org/D62700

llvm-svn: 362995
2019-06-10 22:36:27 +00:00
Tom Stellard 374571301d CMake: Make most target symbols hidden by default
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.

A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.

This patch reduces the number of public symbols in libLLVM.so by about
25%.  This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so

One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.

Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278

Reviewers: chandlerc, beanz, mgorny, rnk, hans

Reviewed By: rnk, hans

Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D54439

llvm-svn: 362990
2019-06-10 22:12:56 +00:00
Jessica Paquette b22954384e [GlobalISel] Translate memset/memmove/memcpy from undef ptrs into nops
If the source is undef, then just don't do anything.

This matches SelectionDAG's behaviour in SelectionDAG.cpp.

Also add a test showing that we do the right thing here.
(irtranslator-memfunc-undef.ll)

Differential Revision: https://reviews.llvm.org/D63095

llvm-svn: 362989
2019-06-10 21:53:56 +00:00
Philip Reames 4bf1c23990 Factor out a helper function for readability and reuse in a future patch [NFC]
llvm-svn: 362980
2019-06-10 20:41:27 +00:00
Philip Reames a9633d5f0b [LFTR] Use recomputed BE count
This was discussed as part of D62880.  The basic thought is that computing BE taken count after widening should produce (on average) an equally good backedge taken count as the one before widening.  Since there's only one test in the suite which is impacted by this change, and it's essentially equivelent codegen, that seems to be a reasonable assertion.  This change was separated from r362971 so that if this turns out to be problematic, the triggering piece is obvious and easily revertable.

For the nestedIV example from elim-extend.ll, we end up with the following BE counts:
BEFORE: (-2 + (-1 * %innercount) + %limit)
AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>)

Note that before is an i32 type, and the after is an i64.  Truncating the i64 produces the i32. 

llvm-svn: 362975
2019-06-10 19:18:53 +00:00
Jinsong Ji 9c7f93e914 [PowerPC][HTM]Fix $zero is not a GPRC register for builtin_ttest
This was found during HTM cleanup.
Adding a test for builtin_ttest would expose following issue.

*** Bad machine code: Illegal physical register for instruction ***
 - function:    test10
 - basic block: %bb.0 entry (0xf0e57497b58)
 - instruction: %5:crrc0 = TABORTWCI 0, $zero, 0
 - operand 2:   $zero
  $zero is not a GPRC register.
LLVM ERROR: Found 1 machine code errors.

Differential Revision: https://reviews.llvm.org/D63079

llvm-svn: 362974
2019-06-10 19:04:14 +00:00
Philip Reames 5d84ccb230 Prepare for multi-exit LFTR [NFC]
This change does the plumbing to wire an ExitingBB parameter through the LFTR implementation, and reorganizes the code to work in terms of a set of individual loop exits. Most of it is fairly obvious, but there's one key complexity which makes it worthy of consideration. The actual multi-exit LFTR patch is in D62625 for context.

Specifically, it turns out the existing code uses the backedge taken count from before a IV is widened. Oddly, we can end up with a different (more expensive, but semantically equivelent) BE count for the loop when requerying after widening.  For the nestedIV example from elim-extend, we end up with the following BE counts:
BEFORE: (-2 + (-1 * %innercount) + %limit)
AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>)

This is the only test in tree which seems sensitive to this difference. The actual result of using the wider BETC on this example is that we actually produce slightly better code. :)

In review, we decided to accept that test change.  This patch is structured to preserve the old behavior, but a separate change will immediate follow with the behavior change.  (I wanted it separate for problem attribution purposes.)

Differential Revision: https://reviews.llvm.org/D62880

llvm-svn: 362971
2019-06-10 17:51:13 +00:00
Francis Visoiu Mistrih a438432acc [FastISel] Skip creating unnecessary vregs for arguments
This behavior was added in r130928 for both FastISel and SD, and then
disabled in r131156 for FastISel.

This re-enables it for FastISel with the corresponding fix.

This is triggered only when FastISel can't lower the arguments and falls
back to SelectionDAG for it.

FastISel contains a map of "register fixups" where at the end of the
selection phase it replaces all uses of a register with another
register that FastISel sometimes pre-assigned. Code at the end of
SelectionDAGISel::runOnMachineFunction is doing the replacement at the
very end of the function, while other pieces that come in before that
look through the MachineFunction and assume everything is done. In this
case, the real issue is that the code emitting COPY instructions for the
liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg
assigned to the physreg is used, and if it's not, it will skip the COPY.
If a register wasn't replaced with its assigned fixup yet, the copy will
be skipped and we'll end up with uses of undefined registers.

This fix moves the replacement of registers before the emission of
copies for the live-ins.

The initial motivation for this fix is to enable tail calls for
swiftself functions, which were blocked because we couldn't prove that
the swiftself argument (which is callee-save) comes from a function
argument (live-in), because there was an extra copy (vreg to vreg).

A few tests are affected by this:

* llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21
(callee-save) but never reload it because it's attached to the return.
We now don't even spill it anymore.
* llvm/test/CodeGen/*/swiftself.ll: we tail-call now.
* llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this
test was not really testing the right thing, but it worked because the
same registers were re-used.
* llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes
* llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy
* llvm/test/CodeGen/Mips/*: get rid of spills and copies
* llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack
* llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack
* llvm/test/CodeGen/X86/swifterror.ll: same as AArch64
* llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed

Differential Revision: https://reviews.llvm.org/D62361

llvm-svn: 362963
2019-06-10 16:53:37 +00:00
Cameron McInally 670d0f478b [ExecutionEngine] Fix rL362941: Add UnaryOperator visitor to the interpreter
Missed break statements. This was D62881.

llvm-svn: 362958
2019-06-10 16:05:25 +00:00
Piotr Sobczak 9b11e93d90 [AMDGPU] Optimize image_[load|store]_mip
Summary:
Replace image_load_mip/image_store_mip
with image_load/image_store if lod is 0.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63073

llvm-svn: 362957
2019-06-10 15:58:51 +00:00
Simon Tatham 67065c5c70 Revert rL362953 and its followup rL362955.
These caused a build failure because I managed not to notice they
depended on a later unpushed commit in my current stack. Sorry about
that.

llvm-svn: 362956
2019-06-10 15:58:19 +00:00
Simon Tatham 42078d41d5 [ARM] Add the non-MVE instructions in Arm v8.1-M.
This should have been part of r362953, but I had a finger-trouble
incident and committed the old rather than new version of the patch.
Sorry.

llvm-svn: 362955
2019-06-10 15:41:58 +00:00
Sanjay Patel 9650c95b7e [InstCombine] allow unordered preds when canonicalizing to fabs()
We have a known-never-nan value via 'nnan', so an unordered predicate
is the same as its ordered sibling.

Similar to:
rL362937

llvm-svn: 362954
2019-06-10 15:39:00 +00:00
Simon Tatham baeea91933 [ARM] Add the non-MVE instructions in Arm v8.1-M.
This adds support for the new family of conditional selection /
increment / negation instructions; the low-overhead branch
instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole
list of registers at once; the new VMRS/VMSR and VLDR/VSTR
instructions to get data in and out of 8.1-M system registers,
particularly including the new VPR register used by MVE vector
predication.

To support this, we also add a register name 'zr' (used by the CSEL
family to force one of the inputs to the constant 0), and operand
types for lists of registers that are also allowed to include APSR or
VPR (used by CLRM). The VLDR/VSTR instructions also need some new
addressing modes.

The low-overhead branch instructions exist in their own separate
architecture extension, which we treat as enabled by default, but you
can say -mattr=-lob or equivalent to turn it off.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Reviewed By: samparker

Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62667

llvm-svn: 362953
2019-06-10 15:36:34 +00:00
Jeremy Morse bcff417292 [DebugInfo] Terminate all location-lists at end of block
This commit reapplies r359426 (which was reverted in r360301 due to
performance problems) and rolls in D61940 to address the performance problem.
I've combined the two to avoid creating a span of slow-performance, and to
ease reverting if more problems crop up.

The summary of D61940: This patch removes the "ChangingRegs" facility in
DbgEntityHistoryCalculator, as its overapproximate nature can produce incorrect
variable locations. An unchanging register doesn't mean a variable doesn't
change its location.

The patch kills off everything that calculates the ChangingRegs vector.
Previously ChangingRegs spotted epilogues and marked registers as unchanging if
they weren't modified outside the epilogue, increasing the chance that we can
emit a single-location variable record. Without this feature,
debug-loc-offset.mir and pr19307.mir become temporarily XFAIL. They'll be
re-enabled by D62314, using the FrameDestroy flag to identify epilogues, I've
split this into two steps as FrameDestroy isn't necessarily supported by all
backends.

The logic for terminating variable locations at the end of a basic block now
becomes much more enjoyably simple: we just terminate them all.

Other test changes: inlined-argument.ll becomes XFAIL, but for a longer term.
The current algorithm for detecting that a variable has a single-location
doesn't work in this scenario (inlined function in multiple blocks), only other
bugs were making this test work. fission-ranges.ll gets slightly refreshed too,
as the location of "p" is now correctly determined to be a single location.

Differential Revision: https://reviews.llvm.org/D61940

llvm-svn: 362951
2019-06-10 15:23:46 +00:00
Sanjay Patel 85de9634e6 [InstCombine] fix bug in canonicalization to fabs()
Forgot to translate the predicate clauses in rL362943.

llvm-svn: 362945
2019-06-10 14:57:45 +00:00
Sanjay Patel 8b6d9f60ed [InstCombine] change canonicalization to fabs() to use FMF on fsub
Similar to rL362909:
This isn't the ideal fix (use FMF on the select), but it's still an
improvement until we have better FMF propagation to selects and other
FP math operators.

I don't think there's much risk of regression from this change by
not including the FMF on the fcmp any more. The nsz/nnan FMF
should be the same on the fcmp and the fsub because they have the
same operand.

llvm-svn: 362943
2019-06-10 14:46:36 +00:00
Simon Tatham b87669f166 [ARM] Disallow PC, and optionally SP, in VMOVRH and VMOVHR.
Arm v8.1-M supports the VMOV instructions that move a half-precision
value to and from a GPR, but not if the GPR is SP or PC.

To fix this, I've changed those instructions to use the rGPR register
class instead of GPR. rGPR always excludes PC, and it excludes SP
except in the presence of the HasV8Ops target feature (i.e. Arm v8-A).
So the effect is that VMOV.F16 to and from PC is now illegal
everywhere, but VMOV.F16 to and from SP is illegal only on non-v8-A
cores (which I believe is all as it should be).

Reviewers: dmgreen, samparker, SjoerdMeijer, ostannard

Reviewed By: ostannard

Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60704

llvm-svn: 362942
2019-06-10 14:43:55 +00:00
Cameron McInally ce49e2231b [ExecutionEngine] Add UnaryOperator visitor to the interpreter
This is to support the unary FNeg instruction.

Differential Revision: https://reviews.llvm.org/D62881

llvm-svn: 362941
2019-06-10 14:38:48 +00:00
Sanjay Patel 8cd8c5784b [InstCombine] allow unordered preds when canonicalizing to fabs()
PR42179:
https://bugs.llvm.org/show_bug.cgi?id=42179

llvm-svn: 362937
2019-06-10 14:14:51 +00:00
Andrea Di Biagio 47db08dbb1 [MCA] Further refactor the bottleneck analysis view. NFCI.
llvm-svn: 362933
2019-06-10 12:50:08 +00:00
George Rimar 1e41007aeb [yaml2obj/obj2yaml] - Make RawContentSection::Content and RawContentSection::Size optional
This is a follow-up for D62809.

Content and Size fields should be optional as was discussed in comments
of the D62809's thread. With that, we can describe a specific string table and
symbol table sections in a more correct way and also show appropriate errors.

The patch adds lots of test cases where the behavior is described in details.

Differential revision: https://reviews.llvm.org/D62957

llvm-svn: 362931
2019-06-10 12:43:18 +00:00
David Green d847aa573b [ARM] Enable Unroll UpperBound
This option allows loops with small max trip counts to be fully unrolled. This
can help with code like the remainder loops from manually unrolled loops like
those that appear in the cmsis dsp library. We would apparently previously
runtime unroll them with the default unroll count (4).

Differential Revision: https://reviews.llvm.org/D63064

llvm-svn: 362928
2019-06-10 10:22:14 +00:00
Nikola Prica abc1dff7e4 [DebugInfo] More strict debug range for stack variables
Variable's stack location can stretch longer than it should. If a
variable is placed at the stack in a some nested basic block its range
can be calculated to be up to the next occurrence of the variable's
DBG_VALUE, or up to the end of the function, thus covering a basic
blocks that should not be included in the variable’s location range.
This happens because the DbgEntityHistoryCalculator ends register
locations at the end of a basic block only if the variable’s location
register has been changed throughout the function, which is not the
case for the register used to reference stack objects.

This patch also tries to produce a single value location if the location
list builder managed to merge all the locations into one.

Reviewers: aprantl, dstenb, jmorse

Reviewed By: aprantl, dstenb, jmorse

Subscribers: djtodoro, ivanbaev, asowda

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D61600

llvm-svn: 362923
2019-06-10 08:41:06 +00:00
QingShan Zhang ab846da7e8 [DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c

static void store64(u64 x, unsigned char* y)
{
    for(int i = 0; i != 8; ++i)
        y[i] = (x >> ((7-i) * 8)) & 255;
}

static u64 load64(const unsigned char* y)
{
    u64 res = 0;
    for(int i = 0; i != 8; ++i)
        res |= (u64)(y[i]) << ((7-i) * 8);
    return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.

Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.

Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;

>
*((i32)p) = val;

i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;

>
*((i32)p) = BSWAP(val);

Differential Revision: https://reviews.llvm.org/D62897

llvm-svn: 362921
2019-06-10 05:40:21 +00:00
Craig Topper 9000a72a4b [X86] When promoting i16 compare with immediate to i32, try to use sign_extend for eq/ne if the input is truncated from a type with enough sign its.
Summary:
Our default behavior is to use sign_extend for signed comparisons and zero_extend for everything else. But for equality we have the freedom to use either extension. If we can prove the input has been truncated from something with enough sign bits, we can use sign_extend instead and let DAG combine optimize it out. A similar rule is used by type legalization in LegalizeIntegerTypes.

This gets rid of the movzx in PR42189. The immediate will still take 4 bytes instead of the 2 bytes plus 0x66 prefix a cmp di, 32767 would get, but it avoids a length changing prefix.

Reviewers: RKSimon, spatel, xbolva00

Reviewed By: xbolva00

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63032

llvm-svn: 362920
2019-06-10 04:50:12 +00:00
Craig Topper ceb807bbbc [X86] Disable f32->f64 extload when sse2 is enabled
Summary:
We can only use the memory form of cvtss2sd under optsize due to a partial register update. So previously we were emitting 2 instructions for extload when optimizing for speed. Also due to a late optimization in preprocessiseldag we had to handle (fpextend (loadf32)) under optsize.

This patch forces extload to expand so that it will always be in the (fpextend (loadf32)) form during isel. And when optimizing for speed we can just let each of those pieces select an instruction independently.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62710

llvm-svn: 362919
2019-06-10 04:37:16 +00:00
Vivek Pandya 11cb15f8ed Do not derive no-recurse attribute if function does not have exact definition.
This is fix for https://bugs.llvm.org/show_bug.cgi?id=41336

Reviewers: jdoerfert
Reviewed by: jdoerfert

Differential Revision: https://reviews.llvm.org/D63045

llvm-svn: 362918
2019-06-10 04:16:04 +00:00
Craig Topper dd10099d5c [X86] Use EVEX instructions for f128 FAND/FOR/FXOR when avx512vl is enabled.
llvm-svn: 362915
2019-06-10 01:18:55 +00:00
Craig Topper f7ba8b808a [X86] Convert f32/f64 FANDN/FAND/FOR/FXOR to vector logic ops and scalar_to_vector/extract_vector_elts to reduce isel patterns.
Previously we did the equivalent operation in isel patterns with
COPY_TO_REGCLASS operations to transition. By inserting
scalar_to_vetors and extract_vector_elts before isel we can
allow each piece to be selected individually and accomplish the
same final result.

I ideally we'd use vector operations earlier in lowering/combine,
but that looks to be more difficult.

The scalar-fp-to-i64.ll changes are because we have a pattern for
using movlpd for store+extract_vector_elt. While an f64 store
uses movsd. The encoding sizes are the same.

llvm-svn: 362914
2019-06-10 00:41:07 +00:00
Nico Weber 80fee25776 Revert r361953 "[SVE][IR] Scalable Vector IR Type"
This reverts commit f4fc01f8dd.
It caused a 3-4x slowdown when doing thinlto links, PR42210.

llvm-svn: 362913
2019-06-09 19:27:50 +00:00
David Bolvansky dcf5e6abdf [TargetLowering] Simplify (ctpop x) == 1
Reviewers: craig.topper, spatel, RKSimon, bkramer

Reviewed By: spatel

Subscribers: javed.absar, lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63004

llvm-svn: 362912
2019-06-09 18:18:57 +00:00
Roman Lebedev d669758d84 [InstCombine] foldICmpWithLowBitMaskedVal(): 'icmp sgt/sle': avoid miscompiles
A precondition 'x != 0' was forgotten by me:
https://rise4fun.com/Alive/JFNP
https://rise4fun.com/Alive/jHvL

These 4 folds with non-constants could be re-enabled,
but for now let's go for the simplest solution.

https://bugs.llvm.org/show_bug.cgi?id=42198

llvm-svn: 362911
2019-06-09 16:30:42 +00:00
Sanjay Patel 87cd16a86e [InstCombine] change canonicalization to fabs() to use FMF on fneg
This isn't the ideal fix (use FMF on the select), but it's still an
improvement until we have better FMF propagation to selects and other
FP math operators.

I don't think there's much risk of regression from this change by
not including the FMF on the fcmp any more. The nsz/nnan FMF
should be the same on the fcmp and the fneg (fsub) because they
have the same operand.

This works around the most glaring FMF logical inconsistency cited
in PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086

llvm-svn: 362909
2019-06-09 16:22:01 +00:00
Sanjay Patel 866db10228 [InstSimplify] reduce code duplication for fcmp folds; NFC
llvm-svn: 362904
2019-06-09 13:58:46 +00:00
Sanjay Patel 73f5a855b3 [InstSimplify] enhance fcmp fold with never-nan operand
This is another step towards correcting our usage of fast-math-flags when applied on an fcmp.
In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of
the fcmp. But I'm leaving that clause in until we're more confident that we can stop
relying on fcmp's FMF.

By using the more general "isKnownNeverNaN()", we gain a simplification shown on the
tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN).
On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction
in addition to the FMF on the fcmp.

This is a continuation of D62979 / rL362879.

llvm-svn: 362903
2019-06-09 13:48:59 +00:00
Anton Afanasyev 623d9ba068 [MIR] Add simple PRE pass to MachineCSE
This is the second part of the commit fixing PR38917 (hoisting
partitially redundant machine instruction). Most of PRE (partitial
redundancy elimination) and CSE work is done on LLVM IR, but some of
redundancy arises during DAG legalization. Machine CSE is not enough
to deal with it. This simple PRE implementation works a little bit
intricately: it passes before CSE, looking for partitial redundancy
and transforming it to fully redundancy, anticipating that the next
CSE step will eliminate this created redundancy. If CSE doesn't
eliminate this, than created instruction will remain dead and eliminated
later by Remove Dead Machine Instructions pass.

The third part of the commit is supposed to refactor MachineCSE,
to make it more clear and to merge MachinePRE with MachineCSE,
so one need no rely on further Remove Dead pass to clear instrs
not eliminated by CSE.

First step: https://reviews.llvm.org/D54839

Fixes llvm.org/PR38917

This is fixed recommit of r361356 after PowerPC64 multistage build failure.

llvm-svn: 362901
2019-06-09 12:15:47 +00:00
Ayke van Laethem f18cf230e4 [CaptureTracking] Don't let comparisons against null escape inbounds pointers
Pointers that are in-bounds (either through dereferenceable_or_null or
thorough a getelementptr inbounds) cannot be captured with a comparison
against null. There is no way to construct a pointer that is still in
bounds but also NULL.

This helps safe languages that insert null checks before load/store
instructions. Without this patch, almost all pointers would be
considered captured even for simple loads. With this patch, an icmp with
null will not be seen as escaping as long as certain conditions are met.

There was a lot of discussion about this patch. See the Phabricator
thread for detals.

Differential Revision: https://reviews.llvm.org/D60047

llvm-svn: 362900
2019-06-09 10:20:33 +00:00
Jatin Bhateja 2a30aeb010 [X86] NFCI : Comment updation for EVEX to VEX translation.
Reviewers: llvm-commits, jbhateja

Reviewed By: jbhateja

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63055

llvm-svn: 362898
2019-06-09 09:59:26 +00:00
Simon Pilgrim 5f337149fa Use for-range loop. NFCI.
llvm-svn: 362897
2019-06-09 09:07:30 +00:00
Amara Emerson 0d20969dea [AArch64][GlobalISel] Select immediate forms of cmp instructions.
A simple re-use of the immediate operand matcher and renderer functions.

rdar://43795178

llvm-svn: 362896
2019-06-09 07:31:25 +00:00
Craig Topper 2ba0e2518b [X86] Remove (store (f32 (extractelt (v4f32))) isel patterns which is redundant.
We emit a MOVSSmr and a COPY_TO_REGCLASS, but that's what we would get from
selecting the store and extractelt independently.

llvm-svn: 362895
2019-06-09 03:21:33 +00:00
Craig Topper 7d8494c41c [X86] Mutate scalar fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE during PreProcessIselDAG to cut down on number of isel patterns.
Similar was done for vectors in r362535. Removes about 1200 bytes from the isel table.

llvm-svn: 362894
2019-06-08 23:53:31 +00:00
Simon Pilgrim 6bae6d5a5d [DAGCombine] visitAND - merge (zext_inreg ((s)extload x)) -> (zextload x) combines. NFCI.
Same codegen, only differ by the oneuse limit for the sextload case.

llvm-svn: 362880
2019-06-08 17:02:00 +00:00
Sanjay Patel 4329c15f11 [InstSimplify] enhance fcmp fold with never-nan operand
This is 1 step towards correcting our usage of fast-math-flags when applied on an fcmp.
In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of
the fcmp. But I'm leaving that clause in until we're more confident that we can stop
relying on fcmp's FMF.

By using the more general "isKnownNeverNaN()", we gain a simplification shown on the
tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN).
On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction
in addition to the FMF on the fcmp.

I'll update the 'ult' case below here as a follow-up assuming no problems here.

Differential Revision: https://reviews.llvm.org/D62979

llvm-svn: 362879
2019-06-08 15:12:33 +00:00
David Green c5471c2a57 [ARM] Adjust isLegalT1AddressImmediate for non-legal types
Types such as float and i64's do not have legal loads in Thumb1, but will still
be loaded with a LDR (or potentially multiple LDR's). As such we can treat the
cost of addressing mode calculations the same as an i32 and get some optimisation
benefits.

Differential Revision: https://reviews.llvm.org/D62968

llvm-svn: 362874
2019-06-08 10:32:53 +00:00
David Green 342d1b81a3 [ARM] Add MVE addressing to isLegalT2AddressImmediate
Now with MVE being added, we can add the vector addressing mode costs for it.
These are generally imm7 multiplied by the size of the type being loaded /
stored.

Differential Revision: https://reviews.llvm.org/D62967

llvm-svn: 362873
2019-06-08 10:18:23 +00:00
David Green 4ecce205d5 [ARM] Add fp16 addressing to isLegalT2AddressImmediate
The fp16 version of VLDR takes a imm8 multiplied by 2. This updates the costs
to account for those, and adds extra testing. It is dependant upon hasFPRegs16
as this is what the load/store instructions require.

Differential Revision: https://reviews.llvm.org/D62966

llvm-svn: 362872
2019-06-08 10:09:02 +00:00
David Green 10fbaa96c5 [ARM] Add HasNEON for all Neon patterns in ARMInstrNEON.td. NFCI
We are starting to add an entirely separate vector architecture to the ARM
backend. To do that we need at least some separation between the existing NEON
and the new MVE code. This patch just goes through the Neon patterns and
ensures that they are predicated on HasNEON, giving MVE a stable place to start
from.

No tests yet as this is largely an NFC, and we don't have the other target that
will treat any of these intructions as legal.

Differential Revision: https://reviews.llvm.org/D62945

llvm-svn: 362870
2019-06-08 09:36:49 +00:00
Jonas Paulsson bca56ab073 [SystemZ] Fix CMakeLists.txt for alphabetical order (NFC).
llvm-svn: 362869
2019-06-08 06:42:02 +00:00
Jonas Paulsson fdc4ea34e3 [SystemZ, RegAlloc] Favor 3-address instructions during instruction selection.
This patch aims to reduce spilling and register moves by using the 3-address
versions of instructions per default instead of the 2-address equivalent
ones. It seems that both spilling and register moves are improved noticeably
generally.

Regalloc hints are passed to increase conversions to 2-address instructions
which are done in SystemZShortenInst.cpp (after regalloc).

Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are
the same), foldMemoryOperandImpl() can no longer trivially fold a spilled
source register since the reg/reg instruction is now 3-address. In order to
remedy this, new 3-address pseudo memory instructions are used to perform the
folding only when the dst and lhs virtual registers are known to be allocated
to the same physreg. In order to not let MachineCopyPropagation run and
change registers on these transformed instructions (making it 3-address), a
new target pass called SystemZPostRewrite.cpp is run just after
VirtRegRewriter, that immediately lowers the pseudo to a target instruction.

If it would have been possibe to insert a COPY instruction and change a
register operand (convert to 2-address) in foldMemoryOperandImpl() while
trusting that the caller (e.g. InlineSpiller) would update/repair the
involved LiveIntervals, the solution involving pseudo instructions would not
have been needed. This is perhaps a potential improvement (see Phabricator
post).

Common code changes:

* A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a
target pass immediately before MachineCopyPropagation.

* VirtRegMap is passed as an argument to foldMemoryOperand().

Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D60888

llvm-svn: 362868
2019-06-08 06:19:15 +00:00
Amara Emerson 829037a914 Factor out SelectionDAG's switch analysis and lowering into a separate component.
In order for GlobalISel to re-use the significant amount of analysis and
optimization code in SDAG's switch lowering, we first have to extract it and
create an interface to be used by both frameworks.

No test changes as it's NFC.

Differential Revision: https://reviews.llvm.org/D62745

llvm-svn: 362857
2019-06-08 00:05:17 +00:00
Keno Fischer eb4a561fa3 [GVN] non-functional code movement
Summary: Move some code around, in preparation for later fixes
to the non-integral addrspace handling (D59661)

Patch By Jameson Nash <jameson@juliacomputing.com>

Reviewed By: reames, loladiro
Differential Revision: https://reviews.llvm.org/D59729

llvm-svn: 362853
2019-06-07 23:08:38 +00:00
Matt Arsenault ddd2c9ac86 AMDGPU: Force skips around traps
llvm-svn: 362852
2019-06-07 23:02:52 +00:00
Alina Sbirlea eaea538d18 [DomTreeUpdater] Add all insert before all delete updates to reduce compile time.
Summary:
The cleanup in D62751 introduced a compile-time regression due to the way DT updates are performed.
Add all insert edges then all delete edges in DTU to match the previous compile time.
Compile time on the test provided by @mstorsjo before and after this patch on my machine:
113.046s vs 35.649s
Repro: clang -target x86_64-w64-mingw32 -c -O3 glew-preproc.c; on https://martin.st/temp/glew-preproc.c.

Reviewers: kuhar, NutshellySima, mstorsjo

Subscribers: jlebar, mstorsjo, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62981

llvm-svn: 362839
2019-06-07 20:43:55 +00:00
Craig Topper bd03230cb0 [X86] Remove unnecessary new line escape from the end of a macro. NFC
llvm-svn: 362837
2019-06-07 20:30:40 +00:00
Volkan Keles 97204a6788 [GlobalISel] IRTranslator: Translate the intrinsics ignored by CodeGen
Summary:
Translate `llvm.assume`, `llvm.var.annotation` and `llvm.sideeffect` to nothing
as they have no effect on CodeGen.

Reviewers: qcolombet, aditya_nandakumar, dsanders, paquette, aemerson, arsenm

Reviewed By: arsenm

Subscribers: hiraditya, wdng, rovka, kristof.beyls, javed.absar, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63022

llvm-svn: 362834
2019-06-07 20:19:27 +00:00
Nick Desaulniers 7ddd694d36 [APFloat] APFloat::Storage::Storage - refix use after move
Summary:
Re-land r360675 after it was reverted in r360770.

This was reported in:
https://llvm.org/reports/scan-build/

Based on feedback in:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190513/652286.html

Reviewers: RKSimon, efriedma

Reviewed By: RKSimon, efriedma

Subscribers: eli.friedman, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62767

llvm-svn: 362833
2019-06-07 19:51:22 +00:00
Lang Hames d4a8089f03 [ORC] Update symbol lookup to use a single callback with a required symbol state
rather than two callbacks.

The asynchronous lookup API (which the synchronous lookup API wraps for
convenience) used to take two callbacks: OnResolved (called once all requested
symbols had an address assigned) and OnReady to be called once all requested
symbols were safe to access). This patch updates the asynchronous lookup API to
take a single 'OnComplete' callback and a required state (SymbolState) to
determine when the callback should be made. This simplifies the common use case
(where the client is interested in a specific state) and will generalize neatly
as new states are introduced to track runtime initialization of symbols.

Clients who were making use of both callbacks in a single query will now need to
issue two queries (one for SymbolState::Resolved and another for
SymbolState::Ready). Synchronous lookup API clients who were explicitly passing
the WaitOnReady argument will now need neeed to pass a SymbolState instead (for
'WaitOnReady == true' use SymbolState::Ready, for 'WaitOnReady == false' use
SymbolState::Resolved). Synchronous lookup API clients who were using default
arugment values should see no change.

llvm-svn: 362832
2019-06-07 19:33:51 +00:00
Simon Pilgrim f0240ee76d [DAGCombine] visitAND - fix local shadow variable warnings. NFCI.
llvm-svn: 362825
2019-06-07 18:36:43 +00:00
Simon Pilgrim 4c9db2045a [DAGCombine] Use APInt::extractBits in "sub-splat" constant mask detection. NFCI.
llvm-svn: 362820
2019-06-07 18:07:06 +00:00
Sanjay Patel e490e4a0e7 [Analysis] simplify code for getSplatValue(); NFC
AFAIK, this is only currently called by TTI, but it could be
used from instcombine or CGP to help solve problems like:
https://bugs.llvm.org/show_bug.cgi?id=37428
https://bugs.llvm.org/show_bug.cgi?id=42174

llvm-svn: 362810
2019-06-07 16:09:54 +00:00
Jinsong Ji 7aafdef627 [MachineScheduler] checkResourceLimit boundary condition update
When we call checkResourceLimit in bumpCycle or bumpNode, and we
know the resource count has just reached the limit (the equations
 are equal). We should return true to mark that we are resource
limited for next schedule, or else we might continue to schedule
in favor of latency for 1 more schedule and create a schedule that
 actually overbook the resource.

When we call checkResourceLimit to estimate the resource limite before
scheduling, we don't need to return true even if the equations are
equal, as it shouldn't limit the schedule for it .

Differential Revision: https://reviews.llvm.org/D62345

llvm-svn: 362805
2019-06-07 14:54:47 +00:00
Stefan Stipanovic 128e8e8fb9 test-commit
llvm-svn: 362802
2019-06-07 14:18:02 +00:00
Matt Arsenault 94a609e343 TailDuplicator: Remove no-op analyzeBranch call
This could fail, which looked concerning. However nothing was actually
using the results of this. I assume this was intended to use the
anti-feature of analyzeBranch of removing instructions, but wasn't
actually calling it with AllowModify = true.

Fixes bug 42162.

llvm-svn: 362800
2019-06-07 13:33:34 +00:00
Joerg Sonnenberger b2e96169b0 [NFC] Don't export helpers of ConstantFoldCall
llvm-svn: 362799
2019-06-07 13:28:52 +00:00
Nico Weber d546b5052b llvm-lib: Disallow mixing object files with different machine types
lib.exe doesn't allow creating .lib files with object files that have
differing machine types. Update llvm-lib to match.

The motivation is to make it possible to infer the machine type of a
.lib file in lld, so that it can warn when e.g. a 32-bit .lib file is
passed to a 64-bit link (PR38965).

Fixes PR38782.

Differential Revision: https://reviews.llvm.org/D62913

llvm-svn: 362798
2019-06-07 13:24:34 +00:00
Sanjay Patel 6880bceda2 [x86] narrow extract subvector of vector select
This is a potentially large perf win for AVX1 targets because of the way we
auto-vectorize to 256-bit but then expect the backend to legalize/optimize
for the half-implemented AVX1 ISA.

On the motivating example from PR37428 (even though this patch doesn't solve
the vector shift issue):
https://bugs.llvm.org/show_bug.cgi?id=37428
...there's a 16% speedup when compiling with "-mavx" (perf tested on Haswell)
because we eliminate the remaining 256-bit vblendv ops.

I added comments on a couple of tests that require further work. If we have
256-bit logic ops separating the vselect and extract, we should probably narrow
everything to 128-bit, but that requires a larger pattern match.

Differential Revision: https://reviews.llvm.org/D62969

llvm-svn: 362797
2019-06-07 13:17:46 +00:00
Simon Tatham 5d66f2b0af [ARM] Fix bugs introduced by the fp64/d32 rework.
Change D60691 caused some knock-on failures that weren't caught by the
existing tests. Firstly, selecting a CPU that should have had a
restricted FPU (e.g. `-mcpu=cortex-m4`, which should have 16 d-regs
and no double precision) could give the unrestricted version, because
`ARM::getFPUFeatures` returned a list of features including subtracted
ones (here `-fp64`,`-d32`), but `ARMTargetInfo::initFeatureMap` threw
away all the ones that didn't start with `+`. Secondly, the
preprocessor macros didn't reliably match the actual compilation
settings: for example, `-mfpu=softvfp` could still set `__ARM_FP` as
if hardware FP was available, because the list of features on the cc1
command line would include things like `+vfp4`,`-vfp4d16` and clang
didn't realise that one of those cancelled out the other.

I've fixed both of these issues by rewriting `ARM::getFPUFeatures` so
that it returns a list that enables every FP-related feature
compatible with the selected FPU and disables every feature not
compatible, which is more verbose but means clang doesn't have to
understand the dependency relationships between the backend features.
Meanwhile, `ARMTargetInfo::handleTargetFeatures` is testing for all
the various forms of the FP feature names, so that it won't miss cases
where it should have set `HW_FP` to feed into feature test macros.

That in turn caused an ordering problem when handling `-mcpu=foo+bar`
together with `-mfpu=something_that_turns_off_bar`. To fix that, I've
arranged that the `+bar` suffixes on the end of `-mcpu` and `-march`
cause feature names to be put into a separate vector which is
concatenated after the output of `getFPUFeatures`.

Another side effect of all this is to fix a bug where `clang -target
armv8-eabi` by itself would fail to set `__ARM_FEATURE_FMA`, even
though `armv8` (aka Arm v8-A) implies FP-Armv8 which has FMA. That was
because `HW_FP` was being set to a value including only the `FPARMV8`
bit, but that feature test macro was testing only the `VFP4FPU` bit.
Now `HW_FP` ends up with all the bits set, so it gives the right
answer.

Changes to tests included in this patch:

* `arm-target-features.c`: I had to change basically all the expected
  results. (The Cortex-M4 test in there should function as a
  regression test for the accidental double-precision bug.)
* `arm-mfpu.c`, `armv8.1m.main.c`: switched to using `CHECK-DAG`
  everywhere so that those tests are no longer sensitive to the order
  of cc1 feature options on the command line.
* `arm-acle-6.5.c`: been updated to expect the right answer to that
  FMA test.
* `Preprocessor/arm-target-features.c`: added a regression test for
  the `mfpu=softvfp` issue.

Reviewers: SjoerdMeijer, dmgreen, ostannard, samparker, JamesNagurne

Reviewed By: ostannard

Subscribers: srhines, javed.absar, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D62998

llvm-svn: 362791
2019-06-07 12:42:54 +00:00
Sam Elliott f720647ddd [RISCV] Support Bit-Preserving FP in F/D Extensions
Summary:
This allows some integer bitwise operations to instead be performed by
hardware fp instructions. This is correct because the RISC-V spec
requires the F and D extensions to use the IEEE-754 standard
representation, and fp register loads and stores to be bit-preserving.

This is tested against the soft-float ABI, but with hardware float
extensions enabled, so that the tests also ensure the optimisation also
fires in this case.

Reviewers: asb, luismarques

Reviewed By: asb

Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62900

llvm-svn: 362790
2019-06-07 12:20:14 +00:00
Valery Pykhtin cb8de55f47 [AMDGPU] Constrain the AMDGPU inliner on maximum number of basic blocks in a caller function (compile time performance)
Differential revision: https://reviews.llvm.org/D62917

llvm-svn: 362789
2019-06-07 12:16:46 +00:00
Cullen Rhodes 1f0d251244 [AArch64][AsmParser] error on unexpected SVE predicate type suffix
Summary:
This patch fixes a bug in the assembler that permitted a type suffix on
predicate registers when not expected. For instance, the following was
previously valid:

    faddv h0, p0.q, z1.h

This bug was present in all SVE instructions containing predicates with
no type suffix and no predication form qualifier, i.e. /z or /m. The
latter instructions are already caught with an appropiate error message
by the assembler, e.g.:

            .text
    <stdin>:1:13: error: not expecting size suffix
    cmpne p1.s, p0.b/z, z2.s, 0
                ^

A similar issue for SVE vector registers was fixed in:

  https://reviews.llvm.org/D59636

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62942

llvm-svn: 362780
2019-06-07 08:46:56 +00:00
Cullen Rhodes f730548484 [AArch64][AsmParser] Provide better diagnostics for SVE predicates
Patch by Sander de Smalen (sdesmalen)

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62941

llvm-svn: 362779
2019-06-07 08:37:00 +00:00
Pengfei Wang f8b28931a7 [X86] -march=cooperlake (llvm)
Support intel -march=cooperlake in llvm

Patch by Shengchen Kan (skan)

Differential Revision: https://reviews.llvm.org/D62836

llvm-svn: 362776
2019-06-07 08:31:35 +00:00
Sam Parker 67f9dc60b8 Fix for lld buildbot
Removed unused (in non-debug builds) variable.

llvm-svn: 362775
2019-06-07 08:04:18 +00:00
Sam Parker c5ef502ee8 [CodeGen] Generic Hardware Loop Support
Patch which introduces a target-independent framework for generating
hardware loops at the IR level. Most of the code has been taken from
PowerPC CTRLoops and PowerPC has been ported over to use this generic
pass. The target dependent parts have been moved into
TargetTransformInfo, via isHardwareLoopProfitable, with
HardwareLoopInfo introduced to transfer information from the backend.
    
Three generic intrinsics have been introduced:
- void @llvm.set_loop_iterations
  Takes as a single operand, the number of iterations to be executed.
- i1 @llvm.loop_decrement(anyint)
  Takes the maximum number of elements processed in an iteration of
  the loop body and subtracts this from the total count. Returns
  false when the loop should exit.
- anyint @llvm.loop_decrement_reg(anyint, anyint)
  Takes the number of elements remaining to be processed as well as
  the maximum numbe of elements processed in an iteration of the loop
  body. Returns the updated number of elements remaining.

llvm-svn: 362774
2019-06-07 07:35:30 +00:00
Dylan McKay 04b418f246 [AVR] Expand 16-bit rotations during the legalization stage
In r356860, the legalization logic for BSWAP was modified to ISD::ROTL,
rather than the old ISD::{SHL, SRL, OR} nodes.

This works fine on AVR for 8-bit rotations, but 16-bit rotations are
currently unimplemented - they always trigger an assertion error in the
AVRExpandPseudoInsts pass ("RORW unimplemented").

This patch instructions the legalizer to expand 16-bit rotations into
the previous SHL, SRL, OR pattern it did previously.

This fixes the 'issue-cannot-select-bswap.ll' test. Interestingly, this
test failure seems flaky - it passes successfully on the avr-build-01
buildbot, but fails locally on my Arch Linux install.

llvm-svn: 362773
2019-06-07 06:55:00 +00:00
Fangrui Song c841b9abf0 [MC][ELF] Don't create relocations with section symbols for STB_LOCAL ifunc
We should keep the symbol type (STT_GNU_IFUNC) for a local ifunc because
it may result in an IRELATIVE reloc that the dynamic loader will use to
resolve the address at startup time.

There is another problem that is not fixed by this patch: a PC relative
relocation should also create a relocation with the ifunc symbol.

llvm-svn: 362767
2019-06-07 03:47:22 +00:00
Fangrui Song 19189993c9 [LV] Fix -Wunused-function after r362736
llvm-svn: 362762
2019-06-07 01:48:26 +00:00
Matt Arsenault c0edb8f5cf AMDGPU: Don't count mask branch pseudo towards skip threshold
llvm-svn: 362761
2019-06-07 00:14:55 +00:00
Matt Arsenault 99ee81b183 AMDGPU: Insert skips for blocks with FLAT
This already forced a skip for VMEM, so it should also be done for
flat. I'm somewhat skeptical about the benefit of this though.

llvm-svn: 362760
2019-06-07 00:14:45 +00:00
Nemanja Ivanovic ef4a3aa549 [PowerPC] Exploit the vector min/max instructions
Use the PPC vector min/max instructions for computing the corresponding
operation as these should be faster than the compare/select sequences
we currently emit.

Differential revision: https://reviews.llvm.org/D47332

llvm-svn: 362759
2019-06-06 23:49:01 +00:00
Matt Arsenault b6cfa129cc AMDGPU: Insert skip branches over return blocks
SIInsertSkips really doesn't understand the control flow, and makes
very stupid assumptions about the block layout. This was able to get
away with not skipping return blocks, since usually after
structurization there is only one placed at the end of the
function. Tail duplication can break this assumption.

llvm-svn: 362754
2019-06-06 22:51:51 +00:00
Alexey Lapshin b9f1e7b16e [DebugInfo] Incorrect debug info record generated for loop counter.
Incorrect Debug Variable Range was calculated while "COMPUTING LIVE DEBUG VARIABLES" stage.
Range for Debug Variable("i") computed according to current state of instructions
inside of basic block. But Register Allocator creates new instructions which were not taken
into account when Live Debug Variables computed. In the result DBG_VALUE instruction for
the "i" variable was put after these newly inserted instructions. This is incorrect.
Debug Value for the loop counter should be inserted before any loop instruction.

Differential Revision: https://reviews.llvm.org/D62650

llvm-svn: 362750
2019-06-06 21:19:39 +00:00
Alexander Timofeev 37bd9bd137 [AMDGPU] Partial revert for the ba447bae74
"Divergence driven ISel. Assign register class for cross block values
       according to the divergence."
       that discovered the design flaw leading to several issues that
       required to be solved before.

       This change reverts AMDGPU specific changes and keeps common part
       unaffected.

llvm-svn: 362749
2019-06-06 21:13:02 +00:00
Craig Topper f320f26716 [X86] Make a bunch of merge masked binops commutable for loading folding.
This primarily affects add/fadd/mul/fmul/and/or/xor/pmuludq/pmuldq/max/min/fmaxc/fminc/pmaddwd/pavg.

We already commuted the unmasked and zero masked versions.

I've added 512-bit stack folding tests for most of the instructions
affected. I've tested needing commuting and not commuting across
unmasked, merged masked, and zero masked. The 128/256 bit instructions
should behave similarly.

llvm-svn: 362746
2019-06-06 21:00:04 +00:00
Craig Topper ca541b20d0 [CFLGraph] Add support for unary fneg instruction.
Differential Revision: https://reviews.llvm.org/D62791

llvm-svn: 362737
2019-06-06 19:21:23 +00:00
Renato Golin 9e97caf594 [LV] Wrap LV illegality reporting in a function. NFC.
A function for loop vectorization illegality reporting has been
introduced:

void LoopVectorizationLegality::reportVectorizationFailure(
    const StringRef DebugMsg, const StringRef OREMsg,
    const StringRef ORETag, Instruction * const I) const;

The function prints a debug message when the debug for the compilation
unit is enabled as well as invokes the optimization report emitter to
generate a message with a specified tag. The function doesn't cover any
complicated logic when a custom lambda should be passed to the emitter,
only generating a message with a tag is supported.

The function always prints the instruction `I` after the debug message
whenever the instruction is specified, otherwise the debug message
ends with a dot: 'LV: Not vectorizing: Disabled/already vectorized.'

Patch by Pavel Samolysov <samolisov@gmail.com>

llvm-svn: 362736
2019-06-06 19:15:52 +00:00
Jason Liu 60ec248148 [AIX] Implement function descriptor on SDAG
Summary:
(1) Function descriptor on AIX
On AIX, a called routine may have 2 distinct symbols associated with it:
 * A function descriptor (Name)
 * A function entry point (.Name)

The descriptor structure on AIX is the same as those in the ELF V1 ABI:
 * The address of the entry point of the function.
 * The TOC base address for the function.
 * The environment pointer.

The descriptor symbol uses the same name as the source level function in C.
The function entry point is analogous to the symbol we would generate for a
 function in a non-descriptor-based ABI, except that it is renamed by
prepending a ".".

Which symbol gets referenced depends on the context:
 * Taking the address of the function references the descriptor symbol.
 * Calling the function references the entry point symbol.

(2) Speaking of implementation on AIX, for direct function call target, we
 create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to
 replace original TargetGlobalAddress SDNode. Then down the path, we can
 take advantage of this MCSymbol.

Patch by: Xiangling_L

Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara

Differential Revision: https://reviews.llvm.org/D62532

llvm-svn: 362735
2019-06-06 19:13:36 +00:00
Craig Topper 6cda33ba36 [InlineCost] Add support for unary fneg.
This adds support for unary fneg based on the implementation of BinaryOperator without the soft float FP cost.

Previously we would just delegate to visitUnaryInstruction. I think the only real change is that we will pass the FastMath flags to SimplifyFNeg now.

Differential Revision: https://reviews.llvm.org/D62699

llvm-svn: 362732
2019-06-06 19:02:18 +00:00
Philip Reames 101915cfda [LoopPred] Fix a bug in unconditional latch bailout introduced in r362284
This is a really silly bug that even a simple test w/an unconditional latch would have caught.  I tried to guard against the case, but put it in the wrong if check.  Oops.

llvm-svn: 362727
2019-06-06 18:02:36 +00:00
Simon Pilgrim 842c7792aa [DAGCombine] MergeConsecutiveStores - improve non-temporal load\store handling (PR42123)
This patch is the first step towards ensuring MergeConsecutiveStores correctly handles non-temporal loads\stores:

1 - When merging load\stores we must ensure that they all have the same non-temporal flag. This is unlikely to occur, but can in strange cases where we're storing at the end of one page and the beginning of another.

2 - The merged load\store node must retain the non-temporal flag.

Differential Revision: https://reviews.llvm.org/D62910

llvm-svn: 362723
2019-06-06 17:04:13 +00:00
Dmitri Gribenko 5438cc6910 Remove unused PPC.h includes under llvm/lib/Target/PowerPC.
llvm-svn: 362718
2019-06-06 16:47:06 +00:00
Craig Topper 6b67dfa54c [X86] Make masked floating point equality/ordered compares commutable for load folding purposes.
Same as what is supported for the unmasked form.

llvm-svn: 362717
2019-06-06 16:39:04 +00:00
Whitney Tsang 03e8369a72 [DA] Add an option to control delinearization validity checks
Summary: Dependence Analysis performs static checks to confirm validity
of delinearization. These checks often fail for 64-bit targets due to
type conversions and integer wrapping that prevent simplification of the
SCEV expressions. These checks would also fail at compile-time if the
lower bound of the loops are compile-time unknown.

For example:

void foo(int n, int m, int a[][m]) {
  for (int i = 0; i < n; ++i)
    for (int j = 0; j < m; ++j) {
      a[i][j] = a[i+1][j-2];
    }
}

opt -mem2reg -instcombine -indvars -loop-simplify -loop-rotate -inline
-pass-remarks=.* -debug-pass=Arguments
-da-permissive-validity-checks=false k3.ll -analyze -da
will produce the following by default:

da analyze - anti [* *|<]!
but will produce the following expected dependence vector if the
validity checks are disabled:

da analyze - consistent anti [1 -2]!
This revision will introduce a debug option that will leave the validity
checks in place by default, but allow them to be turned off. New tests
are added for cases where it cannot be proven at compile-time that the
individual subscripts stay in-bound with respect to a particular
dimension of an array. These tests enable the option to provide user
guarantee that the subscripts do not over/under-flow into other
dimensions, thereby producing more accurate dependence vectors.

For prior discussion on this topic, leading to this change, please see
the following thread:
http://lists.llvm.org/pipermail/llvm-dev/2019-May/132372.html

Reviewers: Meinersbur, jdoerfert, kbarton, dmgreen, fhahn
Reviewed By: Meinersbur, jdoerfert, dmgreen
Subscribers: fhahn, hiraditya, javed.absar, llvm-commits, Whitney,
etiotto
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D62610

llvm-svn: 362711
2019-06-06 15:12:49 +00:00
Jason Liu 0338b88861 [AIX] Implement call lowering with parameters could pass onto GPRs
Summary:
This patch implements SDAG call lowering on AIX for functions
which only have parameters that could fit into GPRs.

Reviewers: hubert.reinterpretcast, syzaara

Differential Revision: https://reviews.llvm.org/D62823

llvm-svn: 362708
2019-06-06 14:36:43 +00:00
Thomas Preud'homme 71d3f227a7 FileCheck [6/12]: Introduce numeric variable definition
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch introduces support for defining
numeric variable in a CHECK directive.

This commit introduces support for defining numeric variable from a
litteral value in the input text. Numeric expressions can then use the
variable provided it is on a later line.

Copyright:
    - Linaro (changes up to diff 183612 of revision D55940)
    - GraphCore (changes in later versions of revision D55940 and
                 in new revision created off D55940)

Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk

Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60386

llvm-svn: 362705
2019-06-06 13:21:06 +00:00
Adhemerval Zanella 559e69a821 AArch64] Handle ISD::LRINT and ISD::LLRINT for float16
This patch is a follow up for D62018 to add lrint/llrint
support for float16.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62863

llvm-svn: 362700
2019-06-06 12:38:11 +00:00
Benjamin Kramer f1249442cf Revert "[SCEV] Use wrap flags in InsertBinop"
This reverts commit r362687. Miscompiles llvm-profdata during selfhost.

llvm-svn: 362699
2019-06-06 12:35:46 +00:00
Adhemerval Zanella bce9e11a7b [AArch64] Handle ISD::LROUND and ISD::LLROUND for float16
This patch is a follow up for D61391 to add lround/llround
support for float16.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62861

llvm-svn: 362698
2019-06-06 11:53:26 +00:00
Dmitri Gribenko 8c2c072582 Include what you use in LanaiAsmParser.cpp
llvm-svn: 362696
2019-06-06 10:37:06 +00:00
Simon Pilgrim da993d08c8 [DAGCombine] Cleanup isNegatibleForFree/GetNegatedExpression. NFCI.
Prep work for PR42105 - clang-format, use auto for cast and merge nested if()s

llvm-svn: 362695
2019-06-06 10:21:18 +00:00
Petar Avramovic 81132ce0e9 [MIPS GlobalISel] Select sqrt
Select G_FSQRT for MIPS32.

Differential Revision: https://reviews.llvm.org/D62905

llvm-svn: 362692
2019-06-06 10:00:41 +00:00
Petar Avramovic 0a1fd355b2 [MIPS GlobalISel] Select fabs
Select G_FABS for MIPS32.

Differential Revision: https://reviews.llvm.org/D62903

llvm-svn: 362690
2019-06-06 09:22:37 +00:00
Petar Avramovic a7d0006447 [MIPS GlobalISel] Select fpext and fptrunc
Select G_FPEXT and G_FPTRUNC for MIPS32.

Differential Revision: https://reviews.llvm.org/D62902

llvm-svn: 362689
2019-06-06 09:16:58 +00:00
Petar Avramovic faaa2b5d21 [MIPS GlobalISel] Select floor and ceil
Select G_FFLOOR and G_FCEIL for MIPS32.

Differential Revision: https://reviews.llvm.org/D62901

llvm-svn: 362688
2019-06-06 09:02:24 +00:00
Sam Parker 7cc580f5e9 [SCEV] Use wrap flags in InsertBinop
If the given SCEVExpr has no (un)signed flags attached to it, transfer
these to the resulting instruction or use them to find an existing
instruction.

Differential Revision: https://reviews.llvm.org/D61934

llvm-svn: 362687
2019-06-06 08:56:26 +00:00
Amara Emerson d3144a4abc [AArch64][GlobalISel] Add manual selection support for G_ZEXTLOADs to s64.
We already get support for G_ZEXTLOAD to s32 from the importer, but it can't
deal with the SUBREG_TO_REG in the pattern. Tweaking the existing manual
selection code for G_LOAD to handle an additional SUBREG_TO_REG when dealing
with G_ZEXTLOAD isn't much work.

Also add tests to check the imported pattern selections to s32 work.

llvm-svn: 362681
2019-06-06 07:58:37 +00:00
Amara Emerson d940e20051 [AArch64][GlobalISel] Add the new changes to fix PR42129 that were supposed to go into r362666.
The changes weren't staged so ended up just re-commiting the unmodified reverted change.

llvm-svn: 362677
2019-06-06 07:33:47 +00:00
Craig Topper 9226ba6b37 [X86] Don't turn avx masked.load with constant mask into masked.load+vselect when passthru value is all zeroes.
This is intended to enable the use of an immediate blend or
more optimal instruction. But if the passthru is zero we don't
need any additional instructions.

llvm-svn: 362675
2019-06-06 05:41:27 +00:00
Amara Emerson c37ff0d138 Revert "Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp""
When looking through copies, make sure to not try to find the vreg def of a physreg.
Normally getVRegDef will return nullptr in this case, but if there happens to be
multiple defs then it will assert.

This fixes PR42129.

llvm-svn: 362666
2019-06-05 23:46:16 +00:00
Matt Arsenault 34c8b835b1 AMDGPU: Don't fix emergency stack slot at offset 0
This forced the caller to be aware of this, which is an ugly ABI
feature.

Partially reverts r295877. The original reasons for doing this are
mostly fixed. Alloca is now in a non-0 address space, so it should be
OK to have 0 as a valid pointer. Since we treat the absolute address
as the pointer value, this part only really needed to apply to
kernels.

Since r357093, we avoid the need to increment/decrement the offset
register in more cases, and since r354816 the scavenger can fail
without spilling, so it's less critical that we try to avoid an offset
that fits in the MUBUF offset.

Restrict to callable functions for now to split this into 2 steps to
limit thte number of test updates and in case anything breaks.

llvm-svn: 362665
2019-06-05 22:37:50 +00:00
Cameron McInally c72fbe5dc1 [MSAN] Add unary FNeg visitor to the MemorySanitizer
Differential Revision: https://reviews.llvm.org/D62909

llvm-svn: 362664
2019-06-05 22:37:05 +00:00
Ulrich Weigand 6c5d5ce551 Allow target to handle STRICT floating-point nodes
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).

This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.

To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:

- A new MCID flag "mayRaiseFPException" that the target should set on any
  instruction that possibly can raise FP exception according to the
  architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
  instruction resulting from expansion of any constrained FP intrinsic.

Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).

Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.

The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.

This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.

Reviewed By: andrew.w.kaylor

Differential Revision: https://reviews.llvm.org/D55506

llvm-svn: 362663
2019-06-05 22:33:10 +00:00
Petr Hosek 2f94203e23 Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp"
This reverts commit r362435 as this triggers ICE, see PR42129 for details.

llvm-svn: 362662
2019-06-05 22:27:31 +00:00
Matt Arsenault b812b7a45e AMDGPU: Invert frame index offset interpretation
Since the beginning, the offset of a frame index has been consistently
interpreted backwards. It was treating it as an offset from the
scratch wave offset register as a frame register. The correct
interpretation is the offset from the SP on entry to the function,
before the prolog. Frame index elimination then should select either
SP or another register as an FP.

Treat the scratch wave offset on kernel entry as the pre-incremented
SP. Rely more heavily on the standard hasFP and frame pointer
elimination logic, and clean up the private reservation code. This
saves a copy in most callee functions.

The kernel prolog emission code is still kind of a mess relying on
checking the uses of physical registers, which I would prefer to
eliminate.

Currently selection directly emits MUBUF instructions, which require
using a reference to some register. Use the register chosen for SP,
and then ignore this later. This should probably be cleaned up to use
pseudos that don't refer to any specific base register until frame
index elimination.

Add a workaround for shaders using large numbers of SGPRs. I'm not
sure these cases were ever working correctly, since as far as I can
tell the logic for figuring out which SGPR is the scratch wave offset
doesn't match up with the shader input initialization in the shader
programming guide.

llvm-svn: 362661
2019-06-05 22:20:47 +00:00
Mircea Trofin e3eeacd70a [CallSite removal] Refactoring llvm::InlineFunction APIs
Summary:
This change only unifies the API previous API pair accepting
CallInst and InvokeInst, thus making it easier to refactor
inliner pass ode to CallBase. The implementation of the unified
API still relies on the CallSite implementation.

Reviewers: eraman, chandlerc, jdoerfert

Reviewed By: jdoerfert

Subscribers: jdoerfert, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62283

llvm-svn: 362656
2019-06-05 21:28:13 +00:00
Sanjay Patel ac111e526d [InstCombine] simplify code for bitcast of insertelement; NFC
llvm-svn: 362655
2019-06-05 21:26:52 +00:00
Matt Arsenault 663d762c9a NewGVN: Handle addrspacecast
The AllConstant check needs to be moved out of the if/else if chain to
avoid a test regression. The "there is no SimplifyZExt" comment
puzzles me, since there is SimplifyCastInst. Additionally, the
Simplify* calls seem to not see the operand as constant, so this needs
to be tried if the simplify failed.

llvm-svn: 362653
2019-06-05 21:15:52 +00:00
Craig Topper 3975b15dba [X86] Fix mistake that marked VADDSSrrb_Int/VADDSDrrb_Int/VMULSSrrb_Int/VMULSDrrb_Int as commutable.
One of the sources controls the pass through value for the upper bits
of the result so we can't really commute it.

In practice this problem isn't a functional issue because we would
only try to commute this instruction in order to fold a load. But
we can't do embedded rounding and fold a load at the same time. So
the load fold would never succeed so I don't think we would ever
commute or at least keep the version after commuting.

llvm-svn: 362647
2019-06-05 21:00:31 +00:00
Whitney Tsang 2d0896c1cb [LOOPINFO] Extend Loop object to add utilities to get the loop bounds,
step, and loop induction variable.

Summary: This PR extends the loop object with more utilities to get loop
bounds, step, and loop induction variable. There already exists passes
which try to obtain the loop induction variable in their own pass, e.g.
loop interchange. It would be useful to have a common area to get these
information.

/// Example:
/// for (int i = lb; i < ub; i+=step)
///   <loop body>
/// --- pseudo LLVMIR ---
/// beforeloop:
///   guardcmp = (lb < ub)
///   if (guardcmp) goto preheader; else goto afterloop
/// preheader:
/// loop:
///   i1 = phi[{lb, preheader}, {i2, latch}]
///   <loop body>
///   i2 = i1 + step
/// latch:
///   cmp = (i2 < ub)
///   if (cmp) goto loop
/// exit:
/// afterloop:
///
/// getBounds
///   getInitialIVValue      --> lb
///   getStepInst            --> i2 = i1 + step
///   getStepValue           --> step
///   getFinalIVValue        --> ub
///   getCanonicalPredicate  --> '<'
///   getDirection           --> Increasing
/// getInductionVariable          --> i1
/// getAuxiliaryInductionVariable --> {i1}
/// isCanonical                   --> false

Reviewers: kbarton, hfinkel, dmgreen, Meinersbur, jdoerfert, syzaara,
fhahn
Reviewed By: kbarton
Subscribers: tvvikram, bmahjour, etiotto, fhahn, jsji, hiraditya,
llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D60565

llvm-svn: 362644
2019-06-05 20:42:47 +00:00
Tim Northover 8d7f118ab2 InstCombine: correctly change byval type attribute alongside call args.
When the byval attribute has a type, it must match the pointee type of
any parameter; but InstCombine was not updating the attribute when
folding casts of various kinds away.

llvm-svn: 362643
2019-06-05 20:38:17 +00:00
Tim Northover 607c8a9d14 IR: make getParamByValType Just Work. NFC.
Most parts of LLVM don't care whether the byval type is derived from an
explicit Attribute or from the parameter's pointee type, so it makes
sense for the main access function to just return the right value.

The very few users who do care (only BitcodeReader so far) can find out
how it's specified by accessing the Attribute directly.

llvm-svn: 362642
2019-06-05 20:37:47 +00:00
Matt Arsenault 4fb580c314 AMDGPU: Remove amdgpu-max-work-group-size attribute
This has been deprecated for a long time, and mesa recently switched
to amdgpu-flat-work-group-size.

llvm-svn: 362641
2019-06-05 20:32:32 +00:00
Matt Arsenault 0f8a764e8f AMDGPU: Fix using 2 different enums for same operand flags
These enums are really for the same namespace of flags set on
arbitrary MachineOperands, so merge them to avoid value collisions.

llvm-svn: 362640
2019-06-05 20:32:25 +00:00
Dan Gohman 53572d0470 [WebAssembly] Limit PIC support to the Emscripten target
The current PIC support currently only works with Emscripten, so
disable it for other targets.

This is the PIC portion of https://reviews.llvm.org/D62542.

Reviewed By: dschuff, sbc100

llvm-svn: 362638
2019-06-05 20:01:01 +00:00
Craig Topper d0fff89b81 [X86] Add the vector integer min/max instructions to isAssociativeAndCommutative.
As far as I know these should be freely reassociatable just like
the floating point MAXC/MINC instructions.

The *reduce* test changes are largely regressions and caused by
the "generic" CPU we default to not having a scheduler model.

The machine-combiner-int-vec.ll test shows the positive benefits
of this change.

Differential Revision: https://reviews.llvm.org/D62787

llvm-svn: 362629
2019-06-05 18:25:09 +00:00
Simon Pilgrim 77d6adc491 Fix shadow local variable warning. NFCI.
llvm-svn: 362622
2019-06-05 17:26:29 +00:00
Sanjay Patel 2bf82879bd [x86] split more 256-bit stores of concatenated vectors
As suggested in D62498 - collectConcatOps() matches both
concat_vectors and insert_subvector patterns, and we see
more test improvements by using the more general match.

llvm-svn: 362620
2019-06-05 16:40:57 +00:00
Simon Pilgrim de586bd1fd [X86][AVX] Generalize split256BitStore to splitVectorStore. NFCI.
Enables us to use this to split 512-bit vectors in future patches.

llvm-svn: 362617
2019-06-05 16:14:14 +00:00
Whitney Tsang 590b1aee60 Revert "Title: [LOOPINFO] Extend Loop object to add utilities to get the loop"
This reverts commit d34797dfc2.

llvm-svn: 362615
2019-06-05 15:32:56 +00:00
Dinar Temirbulatov 15c657d13d [SLP] Fix regression in broadcasts caused by operand reordering patch D59973.
This patch fixes a regression caused by the operand reordering refactoring patch https://reviews.llvm.org/D59973 .
The fix changes the strategy to Splat instead of Opcode, if broadcast opportunities are found.
Please see the lit test for some examples.

Committed on behalf of @vporpo (Vasileios Porpodas)
    
Differential Revision: https://reviews.llvm.org/D62427

llvm-svn: 362613
2019-06-05 15:26:28 +00:00
Sanjay Patel ad62a3a299 [LoopUtils][SLPVectorizer] clean up management of fast-math-flags
Instead of passing around fast-math-flags as a parameter, we can set those
using an IRBuilder guard object. This is no-functional-change-intended.

The motivation is to eventually fix the vectorizers to use and set the
correct fast-math-flags for reductions. Examples of that not behaving as
expected are:
https://bugs.llvm.org/show_bug.cgi?id=23116 (should be able to reduce with less than 'fast')
https://bugs.llvm.org/show_bug.cgi?id=35538 (possible miscompile for -0.0)
D61802 (should be able to reduce with IR-level FMF)

Differential Revision: https://reviews.llvm.org/D62272

llvm-svn: 362612
2019-06-05 14:58:04 +00:00
Benjamin Kramer b90b354798 [LoopInfo] Fix unused variable warning. NFC.
llvm-svn: 362610
2019-06-05 14:43:58 +00:00
Whitney Tsang d34797dfc2 Title: [LOOPINFO] Extend Loop object to add utilities to get the loop
bounds, step, and loop induction variable.

Summary: This PR extends the loop object with more utilities to get loop
bounds, step, and loop induction variable. There already exists passes
which try to obtain the loop induction variable in their own pass, e.g.
loop interchange. It would be useful to have a common area to get these
information.

/// Example:
/// for (int i = lb; i < ub; i+=step)
///   <loop body>
/// --- pseudo LLVMIR ---
/// beforeloop:
///   guardcmp = (lb < ub)
///   if (guardcmp) goto preheader; else goto afterloop
/// preheader:
/// loop:
///   i1 = phi[{lb, preheader}, {i2, latch}]
///   <loop body>
///   i2 = i1 + step
/// latch:
///   cmp = (i2 < ub)
///   if (cmp) goto loop
/// exit:
/// afterloop:
///
/// getBounds
///   getInitialIVValue      --> lb
///   getStepInst            --> i2 = i1 + step
///   getStepValue           --> step
///   getFinalIVValue        --> ub
///   getCanonicalPredicate  --> '<'
///   getDirection           --> Increasing
/// getInductionVariable          --> i1
/// getAuxiliaryInductionVariable --> {i1}
/// isCanonical                   --> false

Reviewers: kbarton, hfinkel, dmgreen, Meinersbur, jdoerfert, syzaara,
fhahn
Reviewed By: kbarton
Subscribers: tvvikram, bmahjour, etiotto, fhahn, jsji, hiraditya,
llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D60565

llvm-svn: 362609
2019-06-05 14:34:12 +00:00
Petar Avramovic 22e99c434f [MIPS GlobalISel] Select fcmp
Select floating point compare for MIPS32.

Differential Revision: https://reviews.llvm.org/D62721

llvm-svn: 362603
2019-06-05 14:03:13 +00:00
Sjoerd Meijer a1bb4fb79d [ARM] Allow "-march=foo+fp" to vary with foo
This is the LLVM part of this change, the Clang part contains the full
description in its commit message.

Differential Revision: https://reviews.llvm.org/D60697

llvm-svn: 362600
2019-06-05 13:11:51 +00:00
Simon Pilgrim 886a55eaa0 [X86][AVX] combineX86ShuffleChain - combine shuffle(extractsubvector(x),extractsubvector(y))
We already handle the case where we combine shuffle(extractsubvector(x),extractsubvector(x)), this relaxes the requirement to permit different sources as long as they have the same value type.

This causes a couple of cases where the VPERMV3 binary shuffles occur at a wider width than before, which I intend to improve in future commits - but as only the subvector's mask indices are defined, these will broadcast so we don't see any increase in constant size.

llvm-svn: 362599
2019-06-05 12:56:53 +00:00
Simon Pilgrim 5a81af547c [TargetLowering] SimplifyDemandedBits - pull out shift value type. NFCI.
Will be used more in an upcoming patch.

llvm-svn: 362595
2019-06-05 10:59:04 +00:00
Simon Pilgrim db134aaec2 [IPO] Disabled 'default only' switch statements to fix MSVC warnings.
@jdoerfert Looks like these are placeholders for incoming abstract attributes patches so I've just commented the code out, even though this is usually frowned upon.

llvm-svn: 362592
2019-06-05 10:04:05 +00:00
Dmitri Gribenko 6fc4c1cc54 Include what you use in PPCFrameLowering.h
llvm-svn: 362590
2019-06-05 08:58:00 +00:00
Yevgeny Rouban a3e16719c4 Resubmit "[CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst"
This reverts commit 5b32f60ec3.
The fix is in commit 4f9e68148b.

This patch fixes the CorrelatedValuePropagation pass to keep
prof branch_weights metadata of SwitchInst consistent.
It makes use of SwitchInstProfUpdateWrapper.
New tests are added.

Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D62126

llvm-svn: 362583
2019-06-05 05:46:40 +00:00
Michael Liao fa449a9bb2 Suppress false-positive GCC -Wreturn-type warning.
llvm-svn: 362582
2019-06-05 04:18:12 +00:00
Johannes Doerfert aade782a98 [Attributor] Pass infrastructure and fixpoint framework
NOTE: Note that no attributes are derived yet. This patch will not go in
      alone but only with others that derive attributes. The framework is
      split for review purposes.

This commit introduces the Attributor pass infrastructure and fixpoint
iteration framework. Further patches will introduce abstract attributes
into this framework.

In a nutshell, the Attributor will update instances of abstract
arguments until a fixpoint, or a "timeout", is reached. Communication
between the Attributor and the abstract attributes that are derived is
restricted to the AbstractState and AbstractAttribute interfaces.

Please see the file comment in Attributor.h for detailed information
including design decisions and typical use case. Also consider the class
documentation for Attributor, AbstractState, and AbstractAttribute.

Reviewers: chandlerc, homerdin, hfinkel, fedor.sergeev, sanjoy, spatel, nlopes, nicholas, reames

Subscribers: mehdi_amini, mgorny, hiraditya, bollu, steven_wu, dexonsmith, dang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59918

llvm-svn: 362578
2019-06-05 03:02:24 +00:00
Nemanja Ivanovic 7c842fadf1 [PowerPC] Collapse RLDICL/RLDICR into RLDIC when possible
Generally speaking, we lower to an optimal rotate sequence for nodes visible in
the SDAG. However, there are instances where the two rotates are not visible at
ISEL time - most notably those in a very common sequence when lowering switch
statements to jump tables.

A common situation is a switch on a 32-bit integer. This value has to have the
upper 32 bits cleared and because jump table offsets are word offsets, the value
needs to be shifted left by 2 bits. We currently emit the clear and the left
shift as two separate instructions, but this is not needed as we can lower it to
a single RLDIC.

This patch just cleans that up.

Differential revision: https://reviews.llvm.org/D60402

llvm-svn: 362576
2019-06-05 02:36:40 +00:00
Fangrui Song f090e6f7b6 [llvm-objdump/llvm-readobj/obj2yaml/yaml2obj] Support DT_PPC_GOT and DT_PPC_OPT
In glibc, DT_PPC_GOT indicates that PowerPC32 Secure PLT ABI is used.
I plan to use it in D62464.

DT_PPC_OPT currently indicates if a TLSDESC inspired TLS optimization is
enabled.

Reviewed By: grimar, jhenderson, rupprecht

Differential Revision: https://reviews.llvm.org/D62851

llvm-svn: 362569
2019-06-05 01:36:48 +00:00
Nemanja Ivanovic fe97754acf Initial support for IBM MASS vector library
This is the LLVM portion of patch https://reviews.llvm.org/D59881.
The clang portion is to follow.

llvm-svn: 362568
2019-06-05 01:31:43 +00:00
Craig Topper 78fdce25a1 [X86] Cleanup convertIntLogicToFPLogic a little. NFCI
-Use early returns to reduce indentation
-Replace multipe ifs with a switch.
-Replace an assert with an llvm_unreachable default in the switch.
-Check that the FP type we're going to use for the
 X86ISD::FAND/FOR/FXOR is legal rather than checking that the
 integer type matches the width of a legal scalar fp type. This all
 runs after legalization so it shouldn't really matter, but making
 sure we're using a valid type in the X86ISD node is really
 whats important.

llvm-svn: 362565
2019-06-05 01:00:34 +00:00
Cameron McInally 5c7245b830 [Scalarizer] Add UnaryOperator visitor to scalarization pass
Differential Revision: https://reviews.llvm.org/D62858

llvm-svn: 362558
2019-06-04 23:01:36 +00:00
Amara Emerson 2d37cb82f0 [AArch64][GlobalISel] Make extloads to i64 legal.
Although we had the support in the prelegalizer combiner to generate the
G_SEXTLOAD or G_ZEXTLOAD ops, the legalizer definitions for arm64 had them as
lowering back to separate ops.

llvm-svn: 362553
2019-06-04 21:51:34 +00:00
Thomas Lively 3d9ca00e74 [WebAssembly] Fix ISel crash on sext_inreg/extract type mismatch
Summary:
Adjusts the index and adds a bitcast around the vector operand of
EXTRACT_VECTOR_ELT so that its lane type matches the source type of
its parent sext_inreg. Without this bitcast the ISel patterns do not
match and ISel fails.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62646

llvm-svn: 362547
2019-06-04 21:08:20 +00:00
Johannes Doerfert 6b432dca5d [SelectionDAG][FIX] Allow "returned" arguments to be bit-casted
Summary:
An argument that is return by a function but bit-casted before can still
be annotated as "returned". Make sure we do not crash for this case.

Reviewers: sunfish, stephenwlin, niravd, arsenm

Subscribers: wdng, hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59917

llvm-svn: 362546
2019-06-04 20:34:43 +00:00
Johannes Doerfert 40107ce753 Introduce Value::stripPointerCastsSameRepresentation
This patch allows current users of Value::stripPointerCasts() to force
the result of the function to have the same representation as the value
it was called on. This is useful in various cases, e.g., (non-)null
checks.

In this patch only a single call site was adjusted to fix an existing
misuse that would cause nonnull where they may be wrong. Uses in
attribute deduction and other areas, e.g., D60047, are to be expected.

For a discussion on this topic, please see [0].

[0] http://lists.llvm.org/pipermail/llvm-dev/2018-December/128423.html

Reviewers: hfinkel, arsenm, reames

Subscribers: wdng, hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61607

llvm-svn: 362545
2019-06-04 20:21:46 +00:00
Nico Weber 1dce82636c llvm-undname: Correctly demangle vararg parameters
FunctionSignatureNode already had an IsVariadic field,
but it wasn't used anywhere yet. Set it and use it.

llvm-svn: 362541
2019-06-04 19:10:08 +00:00
Nico Weber 4638548468 llvm-undname: More coverage-related cleanups
- The loop in demangleFunctionParameterList() only exits
  on Error, @, and Z. All 3 cases were handled, so the
  rest of the function is DEMANGLE_UNREACHABLE.

- The loop in demangleTemplateParameterList() always returns
  on Error, so there's no need to check for that in the loop
  header and after the loop.

- Add test cases for invalid function parameter manglings.

- Add a (redundant) test case for a simple template parameter
  list mangling.

- Add a test case pointing out that varargs functions aren't
  demangled correctly.

llvm-svn: 362540
2019-06-04 18:49:05 +00:00
Nemanja Ivanovic aed7227b71 Revert r362472 as it is breaking PPC build bots
The patch https://reviews.llvm.org/rL362472 broke PPC LNT buildbots.
Reverting it to bring the bots back to green.

llvm-svn: 362539
2019-06-04 18:48:43 +00:00
Alina Sbirlea bfceed49ce [Utils] Clean another duplicated util method.
Summary:
Following the cleanup in D48202, method foldBlockIntoPredecessor has the
same behavior. Replace its uses with MergeBlockIntoPredecessor.
Remove foldBlockIntoPredecessor.

Reviewers: chandlerc, dmgreen

Subscribers: jlebar, javed.absar, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62751

llvm-svn: 362538
2019-06-04 18:45:15 +00:00
Nico Weber 878df1c2a9 llvm-undname: Add test coverage for demangleInitFiniStub()
llvm-svn: 362536
2019-06-04 18:06:28 +00:00
Craig Topper 137de38009 [X86] Mutate fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE during PreProcessIselDAG to cut down on pattern permutations
We already need to have patterns for X86ISD::RNDSCALE to support software intrinsics. But we currently have 5 sets of patterns for the 5 rounding operations. For of these 6 patterns we have to support 3 vectors widths, 2 element sizes, sse/vex/evex encodings, load folding, and broadcast load folding. This results in a fair amount of bytes in the isel table.

This patch adds code to PreProcessIselDAG to morph the fceil/ffloor/ftrunc/fnearbyint/frint to X86ISD::RNDSCALE. This way we can remove everything, but the intrinsic pattern while still allowing the operations to be considered Legal for DAGCombine and Legalization. This shrinks the DAGISel by somewhere between 9K and 10K.

There is one complication to this, the STRICT versions of these nodes are currently mutated to their none strict equivalents at isel time when the node is visited. This won't be true in the future since that loses the chain ordering information. For now I've also added support for the non-STRICT nodes to Select so we can change the STRICT versions there after they've been mutated to their non-STRICT versions. We'll probably need a STRICT version of RNDSCALE or something to handle this in the future. Which will take us back to needing 2 sets of patterns for strict and non-strict, but that's still better than the 11 or 12 sets of patterns we'd need.

We can probably do something similar for scalar, but I haven't looked at it yet.

Differential Revision: https://reviews.llvm.org/D62757

llvm-svn: 362535
2019-06-04 18:03:07 +00:00
Benjamin Kramer 03ff1b3c30 [X86] Fold single-use variable into assert. NFC.
Avoids an unused variable warning in Release builds.

llvm-svn: 362534
2019-06-04 18:01:07 +00:00
Craig Topper 09a4415803 [DAGCombiner][X86] Fold (not (neg X)) -> (add X, -1)
This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial.

Fixes PR42118

Differential Revision: https://reviews.llvm.org/D62828

llvm-svn: 362533
2019-06-04 17:44:18 +00:00
Alex Brachet c33944832c [MACHO] Replaced calls to getStruct with getStructOrErr in functions returning Error or Expected or similar
llvm-svn: 362526
2019-06-04 16:55:30 +00:00
Sanjay Patel 606eb2367f [x86] split 256-bit store of concatenated vectors
This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3

But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.

We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is a reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.

Differential Revision: https://reviews.llvm.org/D62498

llvm-svn: 362524
2019-06-04 16:40:04 +00:00
Peter Smith f15e3d856f [AArch64][ELF] Add support for PLT decoding with BTI instructions present
Arm Architecture v8.5a introduces Branch Target Identification (BTI). When
enabled all indirect branches must target a bti instruction of the
appropriate form. As PLT sequences may sometimes be the target of an
indirect branch and PLT[0] always is, a static linker may need to generate
PLT sequences that contain "bti c" as the first instruction. In effect:
bti     c
adrp    x16, page offset to .got.plt
...
Instead of:
adrp    x16, page offset to .got.plt
...
At present the PLT decoding assumes the adrp will always be the first
instruction. This patch adds support for a single "bti c" to prefix it. A
test binary has been uploaded with such a PLT sequence. A forthcoming LLD
patch will make heavy use of the PLT decoding code.

Differential Revision: https://reviews.llvm.org/D62598

llvm-svn: 362523
2019-06-04 16:35:40 +00:00
Nico Weber d98a0a362f llvm-undname: Yet more coverage for error paths
- For error returns in demangleSpecialTableNode(),
  demangleLocalStaticGuard(), RTTITypeDescriptor,
  demangleRttiBaseClassDescriptorNode(), demangleUnsigned(),
  demangleUntypedVariable() (via RttiBaseClassArray)

- For ?_A and ?_P which are handled at early levels of the
  demangler but are not implemented in a later stage; this
  is now more obvious

- Replace a "default:" with an explicit list of cases, to
  get -Wswitch check we list all cases

llvm-svn: 362520
2019-06-04 16:25:28 +00:00
Nikita Popov df621bdfc8 [LVI][CVP] Add support for urem, srem and sdiv
The underlying ConstantRange functionality has been added in D60952,
D61207 and D61238, this just exposes it for LVI.

I'm switching the code from using a whitelist to a blacklist, as
we're down to one unsupported operation here (xor) and writing it
this way seems more obvious :)

Differential Revision: https://reviews.llvm.org/D62822

llvm-svn: 362519
2019-06-04 16:24:09 +00:00
Nico Weber c1a0e6fe6b llvm-undname: More no-op changes to increase test coverage
- Add test coverage around invalid anon namespaces and
  for error paths in demanglePrimitiveType() and in
  demangleFullyQualifiedTypeName()

- Use DEMANGLE_UNREACHABLE in two more unreachable places

llvm-svn: 362514
2019-06-04 15:38:00 +00:00
Jinsong Ji 3144d7a2da [PowerPC] P9 Scheduling Model: dispatching rule fixes
This is to address some of the problems in existing P9 resource modeling,
especially about the dispatching rules.

Instead of using a hypothetical DISPATCHER , we try to use the number of
actual dispatch slots, and define SchedWriteRes to model dispatch rules,
then update instruction classes according to dispatch rules.

All the dispatch rules and instruction classes update are made according
to POWER9 User Manual.

Differential Revision: https://reviews.llvm.org/D61873

llvm-svn: 362509
2019-06-04 15:22:23 +00:00
Sanjay Patel 1e63dd0b44 [SelectionDAG][x86] limit post-legalization store merging by type
The proposal in D62498 showed that x86 would benefit from vector
store splitting, but that may conflict with the generic DAG
combiner's store merging transforms.

Add memory type to the existing TLI hook that enables the merging
transforms, so we can limit those changes to scalars only for x86.

llvm-svn: 362507
2019-06-04 15:15:59 +00:00
Nico Weber 880d21d3cb llvm-undname: Several behavior-preserving changes to increase coverage
- Replace `Error = true` in a few branches that are truly unreachable
  with DEMANGLE_UNREACHABLE

- Remove early return early in startsWithLocalScopePattern() because
  it's redundant with the next two early returns

- Remove unreachable `case '0'` (it's handled in the branch below)

- Remove an unused bool return

- Add test coverage for several early error returns, mostly in
  array type parsing

llvm-svn: 362506
2019-06-04 15:13:30 +00:00
Simon Pilgrim a6e289e9f8 [X86][SSE] Pulled out (sub (xor X, M), M) 'ConditionalNegate' out pattern match code. NFCI.
As discussed on D62777 - we should be able to use this in more SSE41+ cases as well but that requires us to separate it from the OR(AND(),ANDN()) matcher.

llvm-svn: 362504
2019-06-04 15:02:33 +00:00
Shawn Landden 669775f9db [Support] make countLeadingZeros() countTrailingZeros() countLeadingOnes() and countTrailingOnes() return unsigned
This matches APInt's versions of these functions, and there is no need for these to be size_t.

(as well as __builtin_clzll())

Differential Revision: https://reviews.llvm.org/D60823

llvm-svn: 362503
2019-06-04 14:51:15 +00:00
Dmitri Gribenko 454fc77872 Include what you use in PPCRegisterInfo.cpp
llvm-svn: 362495
2019-06-04 12:55:00 +00:00
Peter Smith 49d7221f71 [AArch64][ELF][llvm-readobj] Add support for BTI and PAC dynamic tags
ELF for the 64-bit Arm Architecture defines two processor-specific dynamic
tags:
DT_AARCH64_BTI_PLT 0x70000001, d_val
DT_AARCH64_PAC_PLT 0x70000003, d_val

These presence of these tags indicate that PLT sequences have been
protected using Branch Target Identification and Pointer Authentication
respectively. The presence of both indicates that the PLT sequences have
been protected with both Branch Target Identification and Pointer
Authentication.

This patch adds the tags and tests for llvm-readobj and yaml2obj.

As some of the processor specific dynamic tags overlap, this patch splits
them up, keeping their original default value if they were not previously
mentioned explicitly in a switch case.

Differential Revision: https://reviews.llvm.org/D62596

llvm-svn: 362493
2019-06-04 11:44:33 +00:00
Roman Lebedev 3dce0326fe [DAGCombine][X86][AArch64][MIPS][LANAI] (C - x) - y -> C - (x + y) fold (PR41952)
Summary:
This *might* be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet.

As far as i can tell, there are no regressions here (ignoring x86-32),
all changes are either good or neutral.

This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`)
`@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/vMd3

Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma

Reviewed By: RKSimon

Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62774

llvm-svn: 362488
2019-06-04 11:06:21 +00:00
Roman Lebedev be6ce7b3f2 [DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold
Summary:
All changes except ARM look **great**.
https://rise4fun.com/Alive/R2M

The regression `test/CodeGen/ARM/addsubcarry-promotion.ll`
is recovered fully by D62392 + D62450.

Reviewers: RKSimon, craig.topper, spatel, rogfer01, efriedma

Reviewed By: efriedma

Subscribers: dmgreen, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62266

llvm-svn: 362487
2019-06-04 11:06:08 +00:00
Simon Pilgrim ad298f86b7 [SelectionDAG] ComputeNumSignBits - support constant pool values from target
As I mentioned on D61887 we don't get many hits on ComputeNumSignBits as we did on computeKnownBits.

The case we do get is interesting though - it allows us to use the 'ConditionalNegate' combine in combineLogicBlendIntoPBLENDV to remove a select.

It comes too late for SSE41 (BLENDV) cases, but SSE2 tests can hit it now. We should probably try to make use of this for SSE41+ targets as well - avoiding variable blends is usually a good idea. I'll investigate as a followup.

Differential Revision: https://reviews.llvm.org/D62777

llvm-svn: 362486
2019-06-04 10:49:06 +00:00
Simon Pilgrim 3178546a27 [SelectionDAG] ComputeNumSignBits - clang-format + improve *EXTLOAD comments. NFCI.
Pre-commit requested for D62777.

llvm-svn: 362485
2019-06-04 10:17:56 +00:00
Owen Reynolds 5d5078e341 [llvm-ar] Reapply Fix relative thin archive path handling
Includes a fix for an introduced build failure due to a post c++11 use of std::mismatch. 

This fixes some thin archive relative path issues, paths are shortened where possible and paths are output correctly when using the display table command.

Differential Revision: https://reviews.llvm.org/D59491

llvm-svn: 362484
2019-06-04 10:13:03 +00:00
Simon Pilgrim 3018d505a3 [SelectionDAG] Add fpto[us]i(undef) --> undef constant fold
Follow up to D62807.

Differential Revision: https://reviews.llvm.org/D62811

llvm-svn: 362483
2019-06-04 10:04:55 +00:00
Mikhail Maltsev 08da01b496 [ARM] Add FP16 vector insert/extract patterns
This change adds two FP16 extraction and two insertion patterns
(one per possible vector length).
Extractions are handled by copying a Q/D register into one of VFP2
class registers, where single FP32 sub-registers can be accessed. Then
the extraction of even lanes are simple sub-register extractions
(because we don't care about the top parts of registers for FP16
operations). Odd lanes need an additional VMOVX instruction.

Unfortunately, insertions cannot be handled in the same way, because:
* There is no instruction to insert FP16 into an even lane (VINS only
  works with odd lanes)
* The patterns for odd lanes will have a form of a DAG (not a tree),
  and will not be implementable in pure tablegen

Because of this insertions are handled in the same way as 16-bit
integer insertions (with conversions between FP registers and GPRs
using VMOVHR instructions).

Without these patterns the ARM backend would sometimes fail during
instruction selection.

This patch also adds patterns which combine:
* an FP16 element extraction and a store into a single VST1
  instruction
* an FP16 load and insertion into a single VLD1 instruction

Differential Revision: https://reviews.llvm.org/D62651

llvm-svn: 362482
2019-06-04 09:39:55 +00:00
Dmitri Gribenko 63846039f5 Silenced a warning "implicit conversion turns string literal into bool" introduced in r362473
llvm-svn: 362480
2019-06-04 09:31:07 +00:00
Dmitri Gribenko 73a15d4b78 Include what you use in PPC.h
llvm-svn: 362477
2019-06-04 09:16:35 +00:00
Dmitri Gribenko 067a17b51d Include what you use in PPCMachineScheduler.cpp
llvm-svn: 362476
2019-06-04 09:16:31 +00:00
Dmitri Gribenko 9d1c5ea165 Include what you use in PPCRegisterInfo.h
llvm-svn: 362475
2019-06-04 09:13:08 +00:00
Yevgeny Rouban 4f9e68148b Make SwitchInstProfUpdateWrapper safer
While prof branch_weights inconsistencies are being fixed patch
by patch (pass by pass) we need SwitchInstProfUpdateWrapper to
be safe with respect to inconsistent metadata that can come from
passes that have not been fixed yet. See the bug found by @nikic
in https://reviews.llvm.org/D62126.

This patch introduces one more state (called Invalid) to the
wrapper class that allows users to work with the underlying
SwitchInst ignoring the prof metadata changes.

Created a unit test for the SwitchInstProfUpdateWrapper class.

Reviewers: davidx, nikic, eraman, reames, chandlerc
Reviewed By: davidx
Differential Revision: https://reviews.llvm.org/D62656

llvm-svn: 362473
2019-06-04 09:03:39 +00:00
QingShan Zhang 11de0e71b0 [DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c

static void store64(u64 x, unsigned char* y)
{
    for(int i = 0; i != 8; ++i)
        y[i] = (x >> ((7-i) * 8)) & 255;
}

static u64 load64(const unsigned char* y)
{
    u64 res = 0;
    for(int i = 0; i != 8; ++i)
        res |= (u64)(y[i]) << ((7-i) * 8);
    return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.

Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.

Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;

>
*((i32)p) = val;

i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;

>
*((i32)p) = BSWAP(val);

Differential Revision: https://reviews.llvm.org/D61843

llvm-svn: 362472
2019-06-04 08:53:53 +00:00
Simon Tatham ac02445524 [ARM] Turn some undefined encoding bits into 0s.
The family of 32-bit Thumb instruction encodings that include t2ORR,
t2AND and t2EOR are all listed in the ArmARM as having (0) in bit 15.
The Tablegen descriptions of those instructions listed them as ?. This
change tightens that up by making them into 0 + Unpredictable.

In the specific case of t2ORR, we tighten it up still further by
making the zero bit mandatory. This change comes from Arm v8.1-M, in
which encodings with that bit equal to 1 will now be used for
different instructions.


Reviewers: dmgreen, samparker, SjoerdMeijer, efriedma

Reviewed By: dmgreen, efriedma

Subscribers: efriedma, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60705

llvm-svn: 362470
2019-06-04 08:28:48 +00:00
Cameron McInally 89f9af5487 [SCCP] Add UnaryOperator visitor to SCCP for unary FNeg
Differential Revision: https://reviews.llvm.org/D62819

llvm-svn: 362449
2019-06-03 21:53:56 +00:00
Michael Berg 6ff978ee05 Propagate fmf for setcc in SDAG for select folds
llvm-svn: 362448
2019-06-03 21:53:26 +00:00
Matt Arsenault 0ceda9fb5c AMDGPU: Disable stack realignment for kernels
This is something of a workaround, and the state of stack realignment
controls is kind of a mess. Ideally, we would be able to specify the
stack is infinitely aligned on entry to a kernel.

TargetFrameLowering provides multiple controls which apply at
different points. The StackRealignable field is used during
SelectionDAG, and for some reason distinct from this
hook. StackAlignment is a single field not dependent on the
function. It would probably be better to make that dependent on the
calling convention, and the maximum value for kernels.

Currently this doesn't really change anything, since the frame
lowering mostly does its own thing. This helps avoid regressions in a
future change which will rely more heavily on hasFP.

llvm-svn: 362447
2019-06-03 21:33:22 +00:00
Jessica Paquette 7500c97ce4 [AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp
Instead of emitting all of the test stuff for a compare when it's only used by
a select, instead, just emit the compare + select. The select will use the
value of NZCV correctly, so we don't need to emit all of the test instructions
etc.

For now, only support fp selects which use G_FCMP. Also only support condition
codes which will only require one select to represent.

Also add a test.

Differential Revision: https://reviews.llvm.org/D62695

llvm-svn: 362446
2019-06-03 20:47:20 +00:00
George Burgess IV c24a2f4ad9 CFLAA: reflow comments; NFC
llvm-svn: 362442
2019-06-03 19:56:22 +00:00
Craig Topper 7a4eabef39 [CFLGraph] Add FAdd to visitConstantExpr.
This looks like an oversight as all the other binary operators are present.

Accidentally noticed while auditing places that need FNeg handling.

No test because as noted in the review it would be contrived and amount to "don't crash"

Differential Revision: https://reviews.llvm.org/D62790

llvm-svn: 362441
2019-06-03 19:35:52 +00:00
Craig Topper dcf865f0ca [X86] Fix the pattern for merge masked vcvtps2pd.
r362199 fixed it for zero masking, but not zero masking. The load
folding in the peephole pass hid the bug. This patch turns off
the peephole pass on the relevant test to ensure coverage.

llvm-svn: 362440
2019-06-03 19:29:14 +00:00
Michael Berg 0b7f98da65 Propagate fmf for setcc/select folds
Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG.

Reviewers: qcolombet, spatel

Reviewed By: qcolombet

Subscribers: nemanjai, jsji

Differential Revision: https://reviews.llvm.org/D62552

llvm-svn: 362439
2019-06-03 19:12:15 +00:00
Nemanja Ivanovic bad43d8f49 [PowerPC] Look through copies for compare elimination
We currently miss the opportunities for optmizing comparisons in the peephole
optimizer if the input is the result of a COPY since we look for record-form
versions of the producing instruction.

This patch simply lets the optimization peek through copies.

Differential revision: https://reviews.llvm.org/D59633

llvm-svn: 362438
2019-06-03 19:09:15 +00:00
Matt Arsenault 8dbeb9256c TTI: Improve default costs for addrspacecast
For some reason multiple places need to do this, and the variant the
loop unroller and inliner use was not handling it.

Also, introduce a new wrapper to be slightly more precise, since on
AMDGPU some addrspacecasts are free, but not no-ops.

llvm-svn: 362436
2019-06-03 18:41:34 +00:00
Nikita Popov c061b99c5b [ConstantRange] Add sdiv() support
The implementation is conceptually simple: We separate the LHS and
RHS into positive and negative components and then also compute the
positive and negative components of the result, taking into account
that e.g. only pos/pos and neg/neg will give a positive result.

However, there's one significant complication: SignedMin / -1 is UB
for sdiv, and we can't just ignore it, because the APInt result of
SignedMin would break the sign segregation. Instead we drop SignedMin
or -1 from the corresponding ranges, taking into account some edge
cases with wrapped ranges.

Because of the sign segregation, the implementation ends up being
nearly fully precise even for wrapped ranges (the remaining
imprecision is due to ranges that are both signed and unsigned
wrapping and are divided by a trivial divisor like 1). This means
that the testing cannot just check the signed envelope as we
usually do. Instead we collect all possible results in a bitvector
and construct a better sign wrapped range (than the full envelope).

Differential Revision: https://reviews.llvm.org/D61238

llvm-svn: 362430
2019-06-03 18:19:54 +00:00
Andrew Kaylor 4172dbab5d Fix a crash when the default of a switch is removed
This patch fixes a problem that occurs in LowerSwitch when a switch statement has a PHI node as its condition, and the PHI node only has two incoming blocks, and one of those incoming blocks is through an unreachable default in the switch statement. When this condition occurs, LowerSwitch holds a pointer to the condition value, but removes the switch block as a predecessor of the PHI block, causing the PHI node to be replaced. LowerSwitch then tries to use its stale pointer to the original condition value, causing a crash.

Differential Revision: https://reviews.llvm.org/D62560

llvm-svn: 362427
2019-06-03 17:54:15 +00:00
Dmitri Gribenko 26c43d0ef8 Include what you use in Lanai.h
Other files were not relying on these transitive includes, so I'm
submitting this change separately.

llvm-svn: 362423
2019-06-03 17:02:15 +00:00
Dmitri Gribenko b8aeaf882e Include what you use in LanaiAsmPrinter.cpp
llvm-svn: 362422
2019-06-03 17:02:07 +00:00
Dmitri Gribenko dc136847e3 Include what you use in LanaiMemAluCombiner.cpp
llvm-svn: 362421
2019-06-03 17:02:02 +00:00
Dmitri Gribenko f4d22bd0b4 Include what you use in LanaiISelDAGToDAG.cpp
llvm-svn: 362420
2019-06-03 17:01:57 +00:00
Dmitri Gribenko 179154f6b9 Include what you use in LanaiFrameLowering.{cpp,h}
llvm-svn: 362419
2019-06-03 17:01:52 +00:00
Dmitri Gribenko 8e317e29da Include what you use in LanaiRegisterInfo.cpp
llvm-svn: 362416
2019-06-03 16:31:37 +00:00
Philip Reames 9ed1673703 [LoopPred] Convert a second member function to a static helper [NFC]
(And remember to actually mark the first one static.)

llvm-svn: 362415
2019-06-03 16:23:20 +00:00
Dmitri Gribenko 857de979a7 Revert "[llvm-ar] Fix relative thin archive path handling"
This reverts commit r362407.  It broke compilation of
llvm/lib/Object/ArchiveWriter.cpp:

error: type 'llvm::sys::path::const_iterator' does not provide a call
operator

llvm-svn: 362413
2019-06-03 16:21:37 +00:00
Nemanja Ivanovic 009d08f313 [PowerPC] Set PROT_READ flag for MF_EXEC to prevent segfaults on PPC machines
The big endian PPC buildbots are all failing now due to calls to cache
invalidation in unit tests on data that has only the PROT_EXEC flag set.
This has been an issue all along on FreeBSD but it can affect Linux machines
depending on configuration.

This patch mitigates the issue the same way it is mitigated on FreeBSD.

Since this is needed to bring the buildbots back to green, I plan to commit this
and allow for post-commit review, but I thought I would also post it here for
ease of access/readability.

Differential revision: https://reviews.llvm.org/D62741

llvm-svn: 362412
2019-06-03 16:20:59 +00:00
Philip Reames 0912b06f78 [LoopPred] Convert member function to free helper function [NFC]
llvm-svn: 362411
2019-06-03 16:17:14 +00:00
Dmitri Gribenko bedcaea99a Include what you use in LanaiInstrInfo.cpp
llvm-svn: 362408
2019-06-03 15:26:25 +00:00
Owen Reynolds fade9cbed7 [llvm-ar] Fix relative thin archive path handling
This fixes some thin archive relative path issues, paths are shortened where possible and paths are output correctly when using the display table command.

Differential Revision: https://reviews.llvm.org/D59491

llvm-svn: 362407
2019-06-03 15:26:07 +00:00
Dmitri Gribenko b3bd866c7f Include what you use in PPCInstrInfo.h
llvm-svn: 362405
2019-06-03 15:04:05 +00:00
Dmitri Gribenko 2b369f83c5 Include what you use in NVPTX.h
Other files were not relying on these transitive includes, so I'm
submitting this change separately.

llvm-svn: 362403
2019-06-03 14:37:26 +00:00
Dmitri Gribenko 14c69fefe6 Include what you use in NVPTX.h
I also fixed all other files that were including NVPTX.h and were
relying on transitive includes.

llvm-svn: 362402
2019-06-03 14:26:50 +00:00
Dmitry Preobrazhensky 9111f35f02 [AMDGPU][MC] Added support of SCC, VCCZ and EXECZ operands
See bug 39292: https://bugs.llvm.org/show_bug.cgi?id=39292

Reviewers: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D62660

llvm-svn: 362400
2019-06-03 13:51:24 +00:00
Simon Pilgrim cb7e4e8193 [SelectionDAG] Add [us]itofp(undef) --> 0 constant fold (PR39205)
We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction

Differential Revision: https://reviews.llvm.org/D62807

llvm-svn: 362397
2019-06-03 13:02:07 +00:00
Dmitri Gribenko 7a3e4ab286 Include what you use in LanaiInstPrinter.cpp
llvm-svn: 362395
2019-06-03 12:53:05 +00:00
Dmitri Gribenko 2f66316c96 Include what you use in LanaiMCCodeEmitter.cpp
LanaiMCCodeEmitter.cpp was not using any APIs from Lanai.h, and was only
including it for transitive dependencies.  Doing so is problematic from
include-what-you-use perspective, but it is also a layering issue (it
creates a dependency cycle between the primary Lanai target library and
the MCTargetDesc library).

llvm-svn: 362394
2019-06-03 12:42:48 +00:00
Dmitri Gribenko c69ee63cb9 Include what you use in LanaiDisassembler.cpp
llvm-svn: 362392
2019-06-03 12:37:11 +00:00
Nicolai Haehnle edfa756f3f AMDGPU/GFX10: V_CMPX_xxx instructions still have an omod operand
Summary: Change-Id: If6ee98e4a723b643bc37254fc6ef8b3812db16da

Reviewers: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62720

Change-Id: Id547ef152b2f92b24dc1c0efbf7e4467c4fb4b6e
llvm-svn: 362390
2019-06-03 12:07:41 +00:00
Dmitri Gribenko 8668fc0102 Include what you use in HexagonInstPrinter.cpp
HexagonInstPrinter.cpp was not using any APIs from HexagonAsmPrinter.h.
Doing so is problematic from include-what-you-use perspective, but it is
also a layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362389
2019-06-03 11:41:22 +00:00
Dmitri Gribenko 61b49ccb77 Include what you use in HexagonAsmPrinter.h
llvm-svn: 362388
2019-06-03 11:41:18 +00:00
Dmitri Gribenko 03d1b33041 Include what you use in HexagonMCInstrInfo.cpp
HexagonMCInstrInfo.cpp was not using any APIs from Hexagon.h.  Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362387
2019-06-03 11:25:37 +00:00
Dmitri Gribenko 970b9f961f Include what you use in HexagonMCCodeEmitter.cpp
HexagonMCCodeEmitter.cpp was not using any APIs from Hexagon.h.  Doing
so is problematic from include-what-you-use perspective, but it is also
a layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362386
2019-06-03 11:20:53 +00:00
Dmitri Gribenko ebe360edfa Include what you use in HexagonMCCompound.cpp
HexagonMCCompound.cpp was not using any APIs from Hexagon.h.  Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362385
2019-06-03 11:20:48 +00:00
Dmitri Gribenko 6e076a081a Include what you use in HexagonShuffler.cpp
HexagonShuffler.cpp was not using any APIs from Hexagon.h, and was only
including it for transitive dependencies.  Doing so is problematic from
include-what-you-use perspective, but it is also a layering issue (it
creates a dependency cycle between the primary Hexagon target library
and the MCTargetDesc library).

llvm-svn: 362384
2019-06-03 11:14:20 +00:00
Dmitri Gribenko 6214b577b7 Include what you use in HexagonMCChecker.cpp
HexagonMCChecker.cpp was not using any APIs from Hexagon.h.  Doing so is
problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362383
2019-06-03 11:14:15 +00:00
Dmitri Gribenko bf2a356ec0 Include what you use in HexagonMCTargetDesc.cpp
HexagonMCTargetDesc.cpp was not using any APIs from Hexagon.h.  Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362382
2019-06-03 11:14:10 +00:00
Dmitri Gribenko beb7f48a29 Include what you use in HexagonMCShuffler.cpp
HexagonMCShuffler.cpp was not using any APIs from Hexagon.h.  Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362381
2019-06-03 11:14:05 +00:00
Simon Tatham dc83a3c449 [ARM] Fix recent breakage of -mfpu=none.
The recent change D60691 introduced a bug in clang when handling
option combinations such as `-mcpu=cortex-m4 -mfpu=none`. Those
options together should select Cortex-M4 but disable all use of
hardware FP, but in fact, now hardware FP instructions can still be
generated in that mode.

The reason is because the handling of FPUVersion::NONE disables all
the same feature names it used to, of which the base one is `vfp2`.
But now there are further features below that, like `vfp2d16fp` and
(following D60694) `fpregs`, which also need to be turned off to
disable hardware FP completely.

Added a tiny test which double-checks that compiling a simple FP
function doesn't access the FP registers.

Reviewers: SjoerdMeijer, dmgreen

Reviewed By: dmgreen

Subscribers: lebedev.ri, javed.absar, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D62729

llvm-svn: 362380
2019-06-03 11:02:53 +00:00
Dmitri Gribenko 7ebfbebfe1 Include what you use in HexagonELFObjectWriter.cpp
HexagonELFObjectWriter.cpp was not using any APIs from Hexagon.h, and
was only including it for transitive dependencies.  Doing so is
problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362376
2019-06-03 09:56:40 +00:00
Nikola Prica 2d0106a110 [LiveDebugValues] Close range for previous variable's location when adding newly deduced location
When LiveDebugValues deduces new variable's location from spill, restore or
register copy instruction it should close old variable's location. Otherwise
we can have multiple block output locations for same variable. That could lead
to inserting two DBG_VALUEs for same variable to the beginning of the successor
block which results to ignoring of first DBG_VALUE.

Reviewers: aprantl, jmorse, wolfgangp, dstenb

Reviewed By: aprantl

Subscribers: probinson, asowda, ivanbaev, petarj, djtodoro

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D62196

llvm-svn: 362373
2019-06-03 09:48:29 +00:00
Dmitri Gribenko 0aa374a306 Include what you use in HexagonAsmBackend.cpp
HexagonAsmBackend.cpp was not using any APIs from Hexagon.h.  Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362372
2019-06-03 09:43:05 +00:00
Dmitri Gribenko 301f8fd632 Include what you use in HexagonAsmParser.cpp
HexagonAsmParser.cpp was not using any APIs from Hexagon.h.  Doing so is
problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the AsmParser library).

llvm-svn: 362370
2019-06-03 09:38:48 +00:00
Dmitri Gribenko c5327ab71d Include what you use in HexagonShuffler.h
HexagonShuffler.h was not using any APIs from Hexagon.h, and was only
including it for transitive dependencies.  Doing so is problematic from
include-what-you-use perspective, but it is also a layering issue (it
creates a dependency cycle between the primary Hexagon target library
and the MCTargetDesc library).

llvm-svn: 362369
2019-06-03 09:33:48 +00:00
Dmitri Gribenko 3c837201e0 Include what you use in BPFMCTargetDesc.cpp
BPFMCTargetDesc.cpp was not using any APIs from BPF.h.  Doing so is
problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
BPF target library and the MCTargetDesc library).

llvm-svn: 362368
2019-06-03 09:29:51 +00:00
Diogo N. Sampaio df92f84110 [ARM][FIX] Ran out of registers due tail recursion
Summary:
- pr42062
When compiling for MinSize,
ARMTargetLowering::LowerCall decides to indirect
multiple calls to a same function. However,
it disconsiders the limitation that thumb1
indirect calls require the callee to be in a
register from r0 to r3 (llvm limiation).
If all those registers are used by arguments, the
compiler dies with "error: run out of registers
during register allocation".
This patch tells the function
IsEligibleForTailCallOptimization if we intend to
perform indirect calls, as to avoid tail call
optimization.

Reviewers: dmgreen, efriedma

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62683

llvm-svn: 362366
2019-06-03 08:58:05 +00:00
Sam Parker a0bd6f8a1a [AArch64] Check for simple type in FPToUInt
DAGCombiner was hitting a SimpleType assertion when trying to combine
a v3f32 before type legalization.

bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41916

Differential Revision: https://reviews.llvm.org/D62734

llvm-svn: 362365
2019-06-03 08:49:17 +00:00
Jim Lin 20b14dacbb [AVR] Fix incorrect source regclass of LDWRdPtr
Summary:
LDWRdPtr would be expanded to ld+ldd. ldd only accepts the pointer register is Y or Z.
So the register class of pointer of LDWRdPtr should be PTRDISPREGS instead of PTRREGS.

Reviewers: dylanmckay

Reviewed By: dylanmckay

Subscribers: dylanmckay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62300

llvm-svn: 362351
2019-06-03 02:31:07 +00:00
Florian Hahn e71963c850 Recommit r360171: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor.
If we hit the limit, we do expand the outstanding tokenfactors.
Otherwise, we might drop nodes with users in the unexpanded
tokenfactors. This fixes the crashes reported by Jordan Rupprecht.

Reviewers: niravd, spatel, craig.topper, rupprecht

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D62633

llvm-svn: 362350
2019-06-03 01:30:19 +00:00
Nico Weber 54362477c7 llvm-undname; Add more test coverage for demangleFunctionClass()
Also add two FC_Far that seem to be missing, by symmetry from
the public and protected cases. (But FC_Far isn't really a thing
anymore, so this doesn't really have an observable effect.)

llvm-svn: 362344
2019-06-02 23:26:57 +00:00
Craig Topper 50b35caf30 [DAGCombiner][X86] Fold away masked store and scatter with all zeroes mask.
Similar to what was done for masked load and gather.

llvm-svn: 362342
2019-06-02 22:52:38 +00:00
Craig Topper 5f79d74946 [X86] Add test cases for masked store and masked scatter with an all zeroes mask. Fix bug in ScalarizeMaskedMemIntrin
Need to cast only to Constant instead of ConstantVector to allow
ConstantAggregateZero.

llvm-svn: 362341
2019-06-02 22:52:34 +00:00
Simon Pilgrim 8a32ca381d [CostModel][X86] Improve masked load/store AVX1/AVX2 costs
A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range.

e.g. SandyBridge
defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;
defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;

e.g. Btver2
defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>;
defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>;
defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>;
defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>;

Differential Revision: https://reviews.llvm.org/D61257

llvm-svn: 362338
2019-06-02 20:37:02 +00:00
Craig Topper a7bc31ebc6 [DAGCombiner] Replace masked loads with a zero mask with the passthru value
Similar to what was recently done for gathers in r362015.

llvm-svn: 362337
2019-06-02 18:58:46 +00:00
Simon Pilgrim 59a8db628b [TTI][X86] Cleanup getMaskedMemoryOpCost. NFCI.
Prep work before resurrecting D61257.

llvm-svn: 362335
2019-06-02 18:06:42 +00:00
Nico Weber b5cd6163f4 Remove code path that's dead after r358835
llvm-svn: 362333
2019-06-02 17:41:07 +00:00
Simon Pilgrim 71a39bcf68 [X86] isHorizontalBinOp - add extract_subvector(shuffle(x)) handling (PR39921)
Let's us match horizontal op patterns on fast-variable-shuffle targets (Haswell etc.)

llvm-svn: 362327
2019-06-02 15:47:49 +00:00
Simon Pilgrim 7a869e7036 [DAGCombine] Fold insert_subvector(bitcast(x),bitcast(y),c1) -> bitcast(insert_subvector(x,y),c2)
Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize.

Differential Revision: https://reviews.llvm.org/D59188

llvm-svn: 362324
2019-06-02 14:42:11 +00:00
Simon Pilgrim ffb4d2bff7 [DAG] isBitwiseNot / isConstOrConstSplat - add support for build vector undefs + truncation (PR41020)
Add (opt-in) support for implicit truncation to isConstOrConstSplat, which allows us to match truncated 'all ones' cases in isBitwiseNot.

PR41020 compares against using ISD::isBuildVectorAllOnes() instead, but that predicate silently accepts any UNDEF elements in the build vector which might not be what we want in isBitwiseNot - so I've added an opt-in 'AllowUndefs' flag that is set to false by default but will allow us to enable it on individual cases where its safe.

Differential Revision: https://reviews.llvm.org/D62783

llvm-svn: 362323
2019-06-02 11:56:39 +00:00
Simon Pilgrim 88522ce388 [TargetLowering] SimplifyDemandedBits - don't use OriginalDemanded variables in analysis.
These might have been replaced in multiple use cases.

llvm-svn: 362322
2019-06-02 10:12:55 +00:00
Simon Pilgrim 30a6caa3e7 [TargetLowering] SimplifyDemandedVectorElts - use same arg names as SimplifyDemandedBits. NFCI.
Helps with debugging as we recurse between them.

llvm-svn: 362321
2019-06-02 10:03:56 +00:00
Craig Topper f58ef87bb7 [DAGCombiner] Replace two unchecked dyn_casts with casts.
The results of the dyn_casts were immediately dereferenced on the next line
so they had better not be null.

I don't think there's any way for these dyn_casts to fail, so use a cast
of adding null check.

llvm-svn: 362315
2019-06-02 03:31:01 +00:00
Craig Topper 78c794a70b [X86] Fix several places that weren't passing what they though they were to MachineInstr::print
Over a year ago, MachineInstr gained a fourth boolean parameter that occurs
before the TII pointer. When this happened, several places started accidentally
passing TII into this boolean parameter instead of the TII parameter.

llvm-svn: 362312
2019-06-02 01:36:48 +00:00
Craig Topper 396a915c26 [X86] Add the SSE versions of PMULLW and PMULLD to isAssociativeAndCommutative.
llvm-svn: 362309
2019-06-02 00:42:58 +00:00
Nikita Popov 900578d1c1 [SimplifyIndVar] Refactor overflow check elimination code; NFC
Extract a willNotOverflow() helper function that is shared between
eliminateOverflowIntrinsic() and strengthenOverflowingOperation().
Use WithOverflowInst for the former.

We'll be able to reuse the same code for saturating intrinsics as
well.

llvm-svn: 362305
2019-06-01 20:21:53 +00:00
Craig Topper 7cebf0af40 [InlineCost] Don't add the soft float function call cost for the fneg idiom, fsub -0.0, %x
Summary: Fneg can be implemented with an xor rather than a function call so we don't need to add the function call overhead. This was pointed out in D62699

Reviewers: efriedma, cameron.mcinally

Reviewed By: efriedma

Subscribers: javed.absar, eraman, hiraditya, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62747

llvm-svn: 362304
2019-06-01 19:40:07 +00:00
Andrea Di Biagio 6a989c358c [MCA][Scheduler] Change how memory instructions are dispatched to the pending set. NFCI
llvm-svn: 362302
2019-06-01 15:22:37 +00:00
Simon Atanasyan 25694e0084 [mips] Extend range of register indexes accepted by cfcmsa/ctcmsa
The `cfcmsa` and `ctcmsa` instructions accept index of MSA control
register. The MIPS64 SIMD Architecture define eight MSA control
registers. But register index for `cfcmsa` and `ctcmsa` instructions
might be any number in 0..31 range. If the index is greater then 7,
`cfcmsa` writes zero to the destination registers and `ctcmsa` does
nothing [1].

[1] MIPS Architecture for Programmers Volume IV-j:
    The MIPS64 SIMD Architecture Module
https://www.mips.com/?do-download=the-mips64-simd-architecture-module

Differential Revision: https://reviews.llvm.org/D62597

llvm-svn: 362299
2019-06-01 13:55:18 +00:00
Dylan McKay 45eb4c7e55 [AVR] Disable register coalescing to the PTRDISPREGS class
If we would allow register coalescing on PTRDISPREGS class then register
allocator can lock Z register to some virtual register. Larger instructions
requiring a memory acces then fail during the register allocation phase since
there is no available register to hold a pointer if Y register was already
taken for a stack frame. This patch prevents it by keeping Z register
spillable. It does it by not allowing coalescer to lock it.

Original discussion on https://github.com/avr-rust/rust/issues/128.

llvm-svn: 362298
2019-06-01 12:38:56 +00:00
Nikita Popov 46d4dba6e6 [IndVarSimplify] Fixup nowrap flags during LFTR (PR31181)
Fix for https://bugs.llvm.org/show_bug.cgi?id=31181 and partial fix
for LFTR poison handling issues in general.

When LFTR moves a condition from pre-inc to post-inc, it may now
depend on value that is poison due to nowrap flags. To avoid this,
we clear any nowrap flag that SCEV cannot prove for the post-inc
addrec.

Additionally, LFTR may switch to a different IV that is dynamically
dead and as such may be arbitrarily poison. This patch will correct
nowrap flags in some but not all cases where this happens. This is
related to the adoption of IR nowrap flags for the pre-inc addrec.
(See some of the switch_to_different_iv tests, where flags are not
dropped or insufficiently dropped.)

Finally, there are likely similar issues with the handling of GEP
inbounds, but we don't have a test case for this yet.

Differential Revision: https://reviews.llvm.org/D60935

llvm-svn: 362292
2019-06-01 09:40:18 +00:00
Dylan McKay 038e3b9f57 Extend the DWARFExpression address handling to support 16-bit addresses
This allows the DWARFExpression class to handle addresses without
crashing on targets with 16-bit pointers like AVR.

This is required in order to generate assembly from clang via the '-S'
flag.

This fixes an error with the following message:

clang: llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h:132: llvm::DWARFExpression::DWARFExpression(llvm::DataExtractor, uint16_t, uint8_t):
       Assertion `AddressSize == 8 || AddressSize == 4' failed.
llvm-svn: 362290
2019-06-01 09:18:26 +00:00
Craig Topper c288a19bb7 [X86] Add AVX512BF16 and AVX512VP2INTERSECT instructions to the loading folding tables.
llvm-svn: 362288
2019-06-01 06:20:59 +00:00
Craig Topper 48fdb61766 [X86] Make the X86FoldTablesEmitter functional again. Fix the spacing in the output to make it easier to diff.
Fix a few other formatting issues in the manual table. And remove some
old FIXMEs.

llvm-svn: 362287
2019-06-01 06:20:55 +00:00
Nick Desaulniers b380846a12 [RuntimeDyld] fix too-small-bitmask error
Summary:
This was flagged in https://www.viva64.com/en/b/0629/ under "Snippet No.
33".

It seems that this statement is doing the standard bitwise trick for
adjusting a value to have a specific alignment.

The issue is that getStubAlignment() returns an unsigned, while DataSize
is declared a uint64_t. The right hand side of the expression is not
extended to 64b before bitwise negation, resulting in the top half of
the mask being 0s, which is not correct for realignment.

Reviewers: lhames, MaskRay

Reviewed By: MaskRay

Subscribers: RKSimon, MaskRay, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62227

llvm-svn: 362286
2019-06-01 04:51:26 +00:00
Richard Trieu 4e875464df Inline variable into assert to fix unused variable warning.
llvm-svn: 362285
2019-06-01 03:32:20 +00:00
Philip Reames 19afdf74bb [LoopPred] Eliminate a redundant/confusing cover function [NFC]
llvm-svn: 362284
2019-06-01 03:09:28 +00:00
Philip Reames 099eca832e [LoopPred] Handle a subset of NE comparison based latches
At the moment, LoopPredication completely bails out if it sees a latch of the form:
%cmp = icmp ne %iv, %N
br i1 %cmp, label %loop, label %exit
OR
%cmp = icmp ne %iv.next, %NPlus1
br i1 %cmp, label %loop, label %exit

This is unfortunate since this is exactly the form that LFTR likes to produce. So, go ahead and recognize simple cases where we can.

For pre-increment loops, we leverage the fact that LFTR likes canonical counters (i.e. those starting at zero) and a (presumed) range fact on RHS to discharge the check trivially.

For post-increment forms, the key insight is in remembering that LFTR had to insert a (N+1) for the RHS. CVP can hopefully prove that add nsw/nuw (if there's appropriate range on N to start with). This leaves us both with the post-inc IV and the RHS involving an nsw/nuw add, and SCEV can discharge that with no problem.

This does still need to be extended to handle non-one steps, or other harder patterns of variable (but range restricted) starting values. That'll come later.

Differential Revision: https://reviews.llvm.org/D62748

llvm-svn: 362282
2019-06-01 00:31:58 +00:00
Eli Friedman d8e8722791 [CodeGen] Fix hashing for MO_ExternalSymbol MachineOperands.
We were hashing the string pointer, not the string, so two instructions
could be identical (isIdenticalTo), but have different hash codes.

This showed up as a very rare, non-deterministic assertion failure
rehashing a DenseMap constructed by MachineOutliner.  So there's no
"real" testcase, just a unittest which checks that the hash function
behaves correctly.

I'm a little scared fixing this is going to cause a regression in
outlining or MachineCSE, but hopefully we won't run into any issues.

Differential Revision: https://reviews.llvm.org/D61975

llvm-svn: 362281
2019-06-01 00:08:54 +00:00
Tom Tan eb4d6142dc [COFF, ARM64] Add CodeView register mapping
CodeView has its own register map which is defined in cvconst.h. Missing this
mapping before saving register to CodeView causes debugger to show incorrect
value for all register based variables, like variables in register and local
variables addressed by register (stack pointer + offset).

This change added mapping between LLVM register and CodeView register so the
correct register number will be stored to CodeView/PDB, it aso fixed the
mapping from CodeView register number to register name based on current
CPUType but print PDB to yaml still assumes X86 CPU and needs to be fixed.

Differential Revision: https://reviews.llvm.org/D62608

llvm-svn: 362280
2019-05-31 23:43:31 +00:00
Nick Desaulniers 7fcad2f171 [PowerPC] check for INLINEASM_BR along w/ INLINEASM
Summary:
It looks like since INLINEASM_BR was created off of INLINEASM (r353563),
a few checks for INLINEASM needed to be updated to check for either
case.

pr/41999

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: nemanjai, hiraditya, kbarton, jsji, llvm-commits, craig.topper, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62403

llvm-svn: 362278
2019-05-31 23:02:13 +00:00
Reid Kleckner eddd6c25b5 [codeview] Revert inline line table change of r362264
Testing with debuggers shows that our previous behavior was correct.
The reason I thought MSVC did things differently is that MSVC prefers to
use the 0xB combined code offset and code length update opcode when
inline sites are discontiguous.

Keep the test changes, and update the llvm-pdbutil inline line table
dumper to account for this new interpretation of the opcodes.

llvm-svn: 362277
2019-05-31 22:55:03 +00:00
Matt Arsenault 302eedcbfa AMDGPU: Fix not adding ImplicitBufferPtr as a live-in
Fixes missing test from r293000.

llvm-svn: 362275
2019-05-31 22:47:36 +00:00
Erik Pilkington abb2a93c53 [SimplifyLibCalls] Fold more fortified functions into non-fortified variants
When the object size argument is -1, no checking can be done, so calling the
_chk variant is unnecessary. We already did this for a bunch of these
functions.

rdar://50797197

Differential revision: https://reviews.llvm.org/D62358

llvm-svn: 362272
2019-05-31 22:41:36 +00:00
Erik Pilkington 5234921119 NFC: Pull out a function to reduce some duplication
Part of https://reviews.llvm.org/D62358

llvm-svn: 362271
2019-05-31 22:41:31 +00:00
Craig Topper bc9e04d0c3 [SelectionDAG] Make the code in mutateStrictFPToFP less aware of how many operands each node has. NFCI
Just copy all of the operands except the chain and call MorphNode on that.
This removes the IsUnary and IsTernary flags.

Also always get the result type from the result type of the original
nodes. Previously we got it from the operand except for two nodes
where that didn't work.

llvm-svn: 362269
2019-05-31 22:18:45 +00:00
Nick Desaulniers 103bd108a7 [RegisterCoalescer] fix potential use of undef value. NFC
Summary:
Fixes a warning produced from scan-build (llvm.org/reports/scan-build/),
further warnings found by annotation isMoveInstr [[nodiscard]].

isMoveInstr potentially does not assign to its parameters, so if they
were uninitialized, they will potentially stay uninitialized.  It seems
most call sites pass references to uninitialized values, then use them
without checking the return value.

Reviewers: wmi

Reviewed By: wmi

Subscribers: MatzeB, qcolombet, hiraditya, tpr, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62109

llvm-svn: 362265
2019-05-31 21:20:13 +00:00
Reid Kleckner e98cf5fe47 [codeview] Fix inline line table accuracy for discontiguous segments
After improving the inline line table dumper in llvm-pdbutil and looking
at MSVC's inline line tables, it is clear that setting the length of the
inlined code region does not update the code offset. This means that the
delta to the beginning of a new discontiguous inlined code region should
be calculated relative to the last code offset, excluding the length.
Implementing this is a one line fix for MC: simply don't update
LastLabel.

While I'm updating these test cases, switch them to use llvm-objdump -d
and llvm-pdbutil. This allows us to show offsets of each instruction and
correlate the line table offsets to the actual code.

llvm-svn: 362264
2019-05-31 20:55:31 +00:00
Nikita Popov 7bafae55c0 Reapply [CVP] Simplify non-overflowing saturating add/sub
If we can determine that a saturating add/sub will not overflow based
on range analysis, convert it into a simple binary operation. This is
a sibling transform to the existing with.overflow handling.

Reapplying this with an additional check that the saturating intrinsic
has integer type, as LVI currently does not support vector types.

Differential Revision: https://reviews.llvm.org/D62703

llvm-svn: 362263
2019-05-31 20:48:26 +00:00
Nikita Popov 23a02f6a5f [CVP] Fix assertion failure on vector with.overflow
Noticed on D62703. LVI only handles plain integers, not vectors of
integers. This was previously not an issue, because vector support
for with.overflow is only a relatively recent addition.

llvm-svn: 362261
2019-05-31 20:42:07 +00:00
Craig Topper c669629e6c [X86] Resync Host.cpp with compiler-rt's cpu_model.c to enable 0x55 to be identified as cascadelake when avx512vnni is detected.
Some other formatting changes.

llvm-svn: 362256
2019-05-31 19:18:07 +00:00
Nikita Popov ccb63e0bfe Revert "[CVP] Simplify non-overflowing saturating add/sub"
This reverts commit 1e692d1777.

Causes assertion failure in builtins-wasm.c clang test.

llvm-svn: 362254
2019-05-31 19:04:47 +00:00
Puyan Lotfi 3ea6b24f41 [MIR-Canon] Don't do vreg skip for independent instructions if there are none.
We don't want to create vregs if there is nothing to use them for. That causes
verifier errors.

Differential Revision: https://reviews.llvm.org/D62740

llvm-svn: 362247
2019-05-31 17:34:25 +00:00
Nikita Popov 1e692d1777 [CVP] Simplify non-overflowing saturating add/sub
If we can determine that a saturating add/sub will not overflow
based on range analysis, convert it into a simple binary operation.
This is a sibling transform to the existing with.overflow handling.

Differential Revision: https://reviews.llvm.org/D62703

llvm-svn: 362242
2019-05-31 16:46:05 +00:00
Kevin P. Neal ac79007205 Revert revert of r362112 with minor SystemZ test file corrections.
[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes

This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.

Submitted by:	Drew Wock <drew.wock@sas.com>
Reviewed by:	Cameron McInally, Kevin P. Neal
Approved by:	Cameron McInally
Differential Revision:	https://reviews.llvm.org/D62546

llvm-svn: 362241
2019-05-31 16:32:12 +00:00
Stanislav Mekhanoshin fbbe5230f4 [AMDGPU] Use InliningThresholdMultiplier for inline hint
AMDGPU uses multiplier 9 for the inline cost. It is taken into account
everywhere except for inline hint threshold. As a result we are penalizing
functions with the inline hint making them less probable to be inlined
than those without the hint. Defaults are 225 for a normal function and
325 for a function with an inline hint. Currently we have effective
threshold 225 * 9 = 2025 for normal functions and just 325 for those with
the hint. That is fixed by this patch.

Differential Revision: https://reviews.llvm.org/D62707

llvm-svn: 362239
2019-05-31 16:19:26 +00:00
Guozhi Wei c3a24e93d5 [PPC] Correctly adjust branch probability in PPCReduceCRLogicals
In PPCReduceCRLogicals after splitting the original MBB into 2, the 2 impacted branches still use original branch probability. This is unreasonable. Suppose we have following code, and the probability of each successor is 50%.

    condc = conda || condb
    br condc, label %target, label %fallthrough

It can be transformed to following,

    br conda, label %target, label %newbb
  newbb:
    br condb, label %target, label %fallthrough

Since each branch has a probability of 50% to each successor, the total probability to %fallthrough is 25% now, and the total probability to %target is 75%. This actually changed the original profiling data. A more reasonable probability can be set to 70% to the false side for each branch instruction, so the total probability to %fallthrough is close to 50%.

This patch assumes the branch target with two incoming edges have same edge frequency and computes new probability fore each target, and keep the total probability to original targets unchanged.

Differential Revision: https://reviews.llvm.org/D62430

llvm-svn: 362237
2019-05-31 16:11:17 +00:00
Jinsong Ji 18e7bf5c4d [MachinePipeliner][NFC] Add some debug log and statistics
This is to add some log and statistics for debugging

Differential Revision: https://reviews.llvm.org/D62165

llvm-svn: 362233
2019-05-31 15:35:19 +00:00
Russell Gallop 802c9b59d5 ftime-trace: Trace loop passes
These can take a significant amount of time in some builds.

Suggested by Andrea Di Biagio.

Differential Revision: https://reviews.llvm.org/D62666

llvm-svn: 362219
2019-05-31 10:14:04 +00:00
Roman Lebedev 39390d8317 [InstCombine] 'C-(C2-X) --> X+(C-C2)' constant-fold
It looks this fold was already partially happening, indirectly
via some other folds, but with one-use limitation.
No other fold here has that restriction.

https://rise4fun.com/Alive/ftR

llvm-svn: 362217
2019-05-31 09:47:16 +00:00
Roman Lebedev 886c4ef35a [InstCombine] 'add (sub C1, X), C2 --> sub (add C1, C2), X' constant-fold
https://rise4fun.com/Alive/qJQ

llvm-svn: 362216
2019-05-31 09:47:04 +00:00
Cullen Rhodes 0fc3a07398 [AArch64][SVE2] Asm: support WHILE instructions
Summary:
Patch adds support for the following instructions:
    * WHILEGE, WHILEGT, WHILEHS, WHILEHI, WHILEWR, WHILERW

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62601

llvm-svn: 362215
2019-05-31 09:13:55 +00:00
Cullen Rhodes 087d1337f8 [AArch64][SVE2] Asm: support TBL/TBX instructions
Summary:
A three sources variant of the TBL instruction is added to the existing
SVE instruction in SVE2. This is implemented with minor changes to the
existing TableGen class. TBX is a new instruction with its own
definition.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62600

llvm-svn: 362214
2019-05-31 09:06:53 +00:00
Cullen Rhodes 2e870011b6 [AArch64][SVE2] Asm: support SVE2 store instructions
Summary:
Patch adds support for the following instructions:
    * STNT1B, STNT1H, STNT1S, STNT1D

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62599

llvm-svn: 362213
2019-05-31 08:59:40 +00:00
Petar Avramovic efcd3c0009 [MIPS GlobalISel] Handle position independent code
Handle position independent code for MIPS32.
When callee is global address, lower call will emit callee
as G_GLOBAL_VALUE and add target flag if needed.
Support $gp in getRegBankFromRegClass().
Select G_GLOBAL_VALUE, specially handle case when
there are target flags attached by lowerCall.

Differential Revision: https://reviews.llvm.org/D62589

llvm-svn: 362210
2019-05-31 08:27:06 +00:00
Petar Avramovic 9058b50fb2 [mips] Move initGlobalBaseReg to MipsFunctionInfo. NFC
Move initGlobalBaseReg from MipsSEDAGToDAGISel to MipsFunctionInfo.
This way functions used for handling position independent code during
instruction selection, getGlobalBaseReg and initGlobalBaseReg,
end up in same class.

Differential Revision: https://reviews.llvm.org/D62586

llvm-svn: 362206
2019-05-31 08:15:28 +00:00
Craig Topper b457e430f3 [InstructionSimplify] Add missing implementation of llvm::SimplifyUnOp. NFC
There are no callers currently, but the function is declared so we should at
least implement it.

llvm-svn: 362205
2019-05-31 08:10:23 +00:00
Petar Avramovic f4a6dd28b6 [MIPS GlobalISel] Lower call for callee that is register
Lower call for callee that is register for MIPS32.
Register should contain callee function address.

Differential Revision: https://reviews.llvm.org/D62585

llvm-svn: 362204
2019-05-31 08:06:17 +00:00
Craig Topper 31d00d80a2 [X86] Remove patterns for X86VSintToFP/X86VUintToFP+loadv4f32 to v2f64.
These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits.
Similar to PR42079.

Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the
load pattern used in the instructions.

This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe
we should use VZMOVL for widened loads even when we don't need the upper bits
as zeroes?

llvm-svn: 362203
2019-05-31 07:38:26 +00:00
Craig Topper b79cc5f802 [X86] Remove avx512 isel patterns for fpextend+load. Prefer to only match fp extloads instead.
DAG combine will usually fold fpextend+load to an fp extload anyway. So the
256 and 512 patterns were probably unnecessary. The 128 bit pattern was special
in that it looked for a v4f32 load, but then used it in an instruction that
only loads 64-bits. This is bad if the load happens to be volatile. We could
probably make the patterns volatile aware, but that's more work for something
that's probably rare. The peephole pass might kick in and save us anyway. We
might also be able to fix this with some additional DAG combines.

This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be
used. Previously we looked for the unlikely vselect+fpextend+load.

llvm-svn: 362199
2019-05-31 06:21:53 +00:00
Puyan Lotfi 0d63cef180 [MIR-Canon] Skip the first N vreg names lazily.
This consolidates the vreg skip code into one function (SkipVRegs()).
SkipVRegs() now knows if it should skip as if it is the first initialization or
subsequent skips.

The first skip is also done the first time createVirtualRegister is called by
the cursor instead of by the cursor's constructor. This prevents verifier
errors on machine functions that have no vregs (where the verifier will
complain that there are vregs when the function uses none).

Differential Revision: https://reviews.llvm.org/D62717

llvm-svn: 362195
2019-05-31 06:02:38 +00:00
Craig Topper 23066033a1 [X86] Correct the ins operand order for MASKPAIR16STORE to match other store instructions.
This makes the 5 address operands come first. And the data operand comes last.

This matches the operand order the instruction is created with. It's also the
expected order in X86MCInstLower. So everything appeared to work, but the
operands didn't match their declared type.

Fixes a -verify-machineinstrs failure.

Also remove the isel patterns from these instructions since they should only
be used for stack spills and reloads. I'm not even sure what types the patterns
were looking for to match.

llvm-svn: 362193
2019-05-31 05:20:27 +00:00
Puyan Lotfi 2a901401fe [MIR-Canon] Hardening propagateLocalCopies.
This is am almost NFC, it does the following:
- If there is no register class for a COPY's src or dst, bail.
- Fixes uses iterator invalidation bug.

Differential Revision: https://reviews.llvm.org/D62713

llvm-svn: 362191
2019-05-31 04:49:58 +00:00
Pengfei Wang 2e67d0c842 [X86] Add VP2INTERSECT instructions
Support Intel AVX512 VP2INTERSECT instructions in llvm

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D62366

llvm-svn: 362188
2019-05-31 02:50:41 +00:00
Sam Clegg 9d21f510ee Fix -DBUILD_SHARED_LIBS=ON build after rL362160
Differential Revision: https://reviews.llvm.org/D62709

llvm-svn: 362180
2019-05-31 01:04:00 +00:00
Craig Topper 70dc2200a2 [X86] Remove result type constraints from the extloadv2f32/extloadv4f32/extloadv8f32 PatFrags. NFC
The result types aren't mentioned in the pattern name so really shouldn't be in the PatFrags.

The users of these either have their own type constraint or rely on the type constranit system to realize the only legal extend would be to f64.

llvm-svn: 362175
2019-05-30 23:35:24 +00:00
Matt Arsenault 18659f84b2 MISched: Fix -misched-regpressure=0 if subreg liveness enabled
Test is waiting on fixing several more crashes in the AMDGPU scheduler
implementation with this.

llvm-svn: 362174
2019-05-30 23:31:36 +00:00
Craig Topper d6b74cc859 [X86] Remove code that unnecessarily sets EXTLOAD with src type of v2f32/v4f32/v8f32 as Legal for SSE2/AVX/AVX512 respectively. NFC
The LoadExt table defaults to all combinations being Legal. For
vector types, only src VTs with an i1 element type were ever changed.
So we don't need to mark them legal manually.

llvm-svn: 362170
2019-05-30 22:29:06 +00:00
Francis Visoiu Mistrih 48998d10e0 [Remarks] Fix usage of enum class
Breaks the build on some compilers:

http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9720/steps/build%20stage%201/logs/stdio

llvm-svn: 362165
2019-05-30 22:01:56 +00:00
Francis Visoiu Mistrih 6ada11f134 [Remarks][NFC] Move the serialization to lib/Remarks
Separate the remark serialization to YAML from the LLVM Diagnostics.

This adds a new serialization abstraction: remarks::Serializer. It's
completely independent from lib/IR and it provides an easy way to
replace YAML by providing a new remarks::Serializer.

Differential Revision: https://reviews.llvm.org/D62632

llvm-svn: 362160
2019-05-30 21:45:59 +00:00
Puyan Lotfi daaecf98c9 [MIR-Canon] Fixing case where MachineFunction is empty.
In cases where the machine function is empty: bail on the RPO traversal.

Differential Revision: https://reviews.llvm.org/D62617

llvm-svn: 362158
2019-05-30 21:37:25 +00:00
Roman Lebedev 46511d75b5 [DAGCombine] Limit 'hoist add/sub binop w/ constant op' to non-opaque consts
I don't have a test case for these, but there is a test case for D62266
where, even after all the constant-folding patches, we still end up
with endless combine loop. Which makes sense, since we don't constant
fold for opaque constants.

llvm-svn: 362156
2019-05-30 21:10:37 +00:00
Nikita Popov e906f2a370 [CVP] Generalize willNotOverflow(); NFC
Change argument from WithOverflowInst to BinaryOpIntrinsic, so this
function can also be used for saturating math intrinsics.

llvm-svn: 362152
2019-05-30 21:03:10 +00:00
Lang Hames a100042b27 [RuntimeDyld] Update reserveAllocationSpace to account for stub padding.
This should fix the buildbot failures caused by r362139.

llvm-svn: 362151
2019-05-30 20:58:28 +00:00
Martin Storsjo 0fe645c086 [InstCombine] Avoid use after free in DenseMap, when built with GCC
Previously, this used a statement like this:
    Map[A] = Map[B];

This is equivalent to the following:
    const auto &Src = Map[B];
    auto &Dest = Map[A];
    Dest = Src;

The second statement, "auto &Dest = Map[A];" can insert a new
element into the DenseMap, which can potentially grow and reallocate
the DenseMap's internal storage, which will invalidate the existing
reference to the source. When doing the actual assignment,
the Src reference is dereferenced, accessing memory that was
freed when the DenseMap grew.

This issue hasn't shown up when LLVM was built with Clang, because
the right hand side ended up dereferenced before evaulating the
left hand side. (If the value type is a larger data type, Clang doesn't
do this but behaves like GCC.)

With GCC, a cast to Value* isn't enough to make it dereference the
right hand side reference before invoking operator[] (while that is
enough to make Clang/LLVM do the right thing for larger types), but
storing it in an intermediate variable in a separate statement works.

This fixes PR42065.

Differential Revision: https://reviews.llvm.org/D62624

llvm-svn: 362150
2019-05-30 20:53:21 +00:00
Roman Lebedev a4e3b50e26 [DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold. Try 2
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.

No surprising test changes.

https://rise4fun.com/Alive/pbT

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62257

llvm-svn: 362146
2019-05-30 20:37:49 +00:00
Roman Lebedev 57aa36ff91 [DAGCombine] (x - C) - y -> (x - y) - C fold. Try 3
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 362145
2019-05-30 20:37:39 +00:00
Roman Lebedev 63b4741534 [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 3
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 362144
2019-05-30 20:37:29 +00:00
Roman Lebedev 05ad5fd213 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 3
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 362143
2019-05-30 20:37:18 +00:00
Roman Lebedev 1d9ec7a81b [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 3
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 362142
2019-05-30 20:36:54 +00:00
Lang Hames 0e124b37bd [RuntimeDyld] Apply padding and alignment bumps to all sections with stubs, and
increase the MachO/x86-64 stub alignment to 8.

Stub alignment should be guaranteed for any section containing RuntimeDyld
stubs/GOT-entries. To do this we should pad and align all sections containing
stubs, not just code sections.

This commit also bumps the MachO/x86-64 stub alignment to 8, so that GOT entries
will be aligned.

llvm-svn: 362139
2019-05-30 19:59:20 +00:00