Commit Graph

124284 Commits

Author SHA1 Message Date
Cullen Rhodes 75dfbdc2da [AArch64][SVE2] Asm: support Floating Point Widening Multiply-Add
Summary:
Patch adds support for the indexed and unpredicated vectors forms of the
FMLALB, FMLALT, FMLSLB and FMLSLT instructions.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62386

llvm-svn: 361935
2019-05-29 08:53:06 +00:00
Cullen Rhodes 4f58ad4e72 [AArch64][SVE2] Asm: support SVE2 Floating Point Pairwise Group
Summary:
Patch adds support for the following instructions:

SVE2 floating-point pairwise operations:
    * FADDP, FMAXNMP, FMINNMP, FMAXP, FMINP

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62383

llvm-svn: 361933
2019-05-29 08:40:33 +00:00
Richard Trieu c77aff7e17 Inline a variable into debug section to fix unused variable warning.
llvm-svn: 361927
2019-05-29 04:09:32 +00:00
Richard Trieu e8698ead9d Inline value into debug statement to avoid unused variable warning.
llvm-svn: 361924
2019-05-29 03:43:01 +00:00
Peter Collingbourne 31fda09b2d Add IR support, ELF section and user documentation for partitioning feature.
The partitioning feature was proposed here:
http://lists.llvm.org/pipermail/llvm-dev/2019-February/130583.html

This is mostly just documentation. The feature itself will be contributed
in subsequent patches.

Differential Revision: https://reviews.llvm.org/D60242

llvm-svn: 361923
2019-05-29 03:29:01 +00:00
Peter Collingbourne 10c548cdfa IR: Give the TypeAllocator a more generic name and start using it for section names as well. NFCI.
This prepares us to start using it for partition names.

llvm-svn: 361922
2019-05-29 03:28:51 +00:00
Jinsong Ji f6cb3bcb4c Support resource tracking with InstrSchedModel
The current design use DFA to do resource tracking in SMS,
and DFA only support InstrItins, and also has scaling limitation.

This patch extend SMS to allow Subtarget to use ProcResource in
InstrSchedModel instead.

Differential Revision: https://reviews.llvm.org/D62163

llvm-svn: 361919
2019-05-29 03:02:59 +00:00
Pengfei Wang 72e3f9662b Revert "[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to"
This reverts commit c1b3716614bc0a107e6f41a7d3d503baefad8a5b.

llvm-svn: 361918
2019-05-29 02:49:59 +00:00
Pengfei Wang 818c652643 [X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to
avoid static check fail

RegClassOrBank is an object of RegClassOrRegBank, which is defined as
using llvm::RegClassOrRegBank = typedef PointerUnion<const
TargetRegisterClass *, const RegisterBank *>
so control flow can not get here. Use ""llvm_unreachable" here to avoid
"null pointer" confusion.

Patch by Shengchen Kan (skan)

Differential Revision: https://reviews.llvm.org/D62006

Signed-off-by: pengfei <pengfei.wang@intel.com>
llvm-svn: 361912
2019-05-29 02:20:37 +00:00
Fangrui Song 656afe370d [X86] Fix x86-64 call *foo@tlsdesc(%rax) and support R_386_TLSGOTDESC R_386_TLS_DESC_CALL
D18885 emitted 5 bytes for call *foo@tlsdesc(%rax). It should use the
2-byte form instead and let R_X86_64_TLSDESC_CALL apply to the beginning
of the call instruction.

The 2-byte form was deliberately chosen to make ->LE and ->IE relaxation work:

    0:   48 8d 05 00 00 00 00    lea    0x0(%rip),%rax        # 7 <.text+0x7>
                         3: R_X86_64_GOTPC32_TLSDESC     a-0x4
    7:   ff 10                   callq  *(%rax)
                         7: R_X86_64_TLSDESC_CALL        a

=>

    0:   48 c7 c0 fc ff ff ff    mov    $0xfffffffffffffffc,%rax
    7:   66 90                   xchg   %ax,%ax

Also change the symbol type to STT_TLS when VK_TLSCALL or VK_TLSDESC is
seen.

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D62512

llvm-svn: 361910
2019-05-29 02:02:59 +00:00
Thomas Lively 26d711be6e [WebAssembly] Add signatures for RINT builtins
Reviewers: azakai, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, aheejin, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62564

llvm-svn: 361904
2019-05-29 01:06:00 +00:00
Quentin Colombet a6f57ad2c9 [RegUsageInfoCollector] Don't mark as saved registers that don't have subregister lanes
To determine the list of clobbered registers, the RegUsageInfoCollector pass
uses the list of callee saved registers provided by the target and then augments
it with the list of registers which have all their subregisters saved. It then
basically does the difference between all the registers and the saved registers
to come up with what is clobbered (plus it checks that the register is defined
within that functions).

The patch fixes a bug where when register does not have any subregister lane,
hence when checking if any of its subregister are not saved, we would find none
and think the register is saved as well.

That's obviously wrong.

The code was actually kind of checking for something like that with the
CoveredBySubRegs bit. What this bit says is that a register is completely
covered by its subregisters.
We required that this bit was set, to check that a register was saved by its
subregister lanes, since without this bit, we potentially would miss to check
some part of the register.

However, this bit is used de facto on registers that don't have any
subregisters (e.g., on ARM) and the code was not prepared for that.

This patch fixes this by checking that a register has subregisters before
declaring it saved when none of its lanes are modified.

llvm-svn: 361901
2019-05-28 23:43:12 +00:00
Lang Hames eb5ee3004f [ORC] Track JIT symbol states more explicitly.
Prior to this patch, JITDylibs inferred symbol states (whether a symbol was
newly added, materializing, resolved, or ready to run) via a combination of (1)
bits in the JITSymbolFlags member, and (2) the state of some internal JITDylib
data structures. This patch explicitly tracks symbol states by adding a new
SymbolState member to the symbol table entries, and removing the 'Lazy' and
'Materializing' bits from JITSymbolFlags. This is a first step towards adding
additional states representing initialization phases (e.g. eh-frame registration,
registration with the language runtime, and static initialization).

llvm-svn: 361899
2019-05-28 23:35:44 +00:00
Jessica Paquette b73ea75b38 [AArch64][GlobalISel] Select FCMPSri/FCMPDri when comparing against 0.0
Add support for selecting FCMPSri and FCMPDri when comparing against 0.0, and
factor out opcode selection for G_FCMP into its own function.

Add a test to show that we don't do this with other immediates.

Differential Revision: https://reviews.llvm.org/D62539

llvm-svn: 361888
2019-05-28 22:52:49 +00:00
Heejin Ahn 5514658591 [WebAssembly] Support for atomic fences
Summary:
This adds support for translation of LLVM IR fence instruction. We
convert a singlethread fence to a pseudo compiler barrier which becomes
0 instructions in final binary, and a thread fence to an idempotent
atomicrmw instruction to a memory address.

Reviewers: dschuff, jfb, sunfish, tlively

Subscribers: sbc100, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D50277

llvm-svn: 361884
2019-05-28 22:09:12 +00:00
Rong Xu e88173abc0 [PGO] Handle cases of failing to split critical edges
Fix PR41279 where critical edges to EHPad are not split.
The fix is to not instrument those critical edges. We used to be able to know
the size of counters right after MST is computed. With this, we have to
pre-collect the instrument BBs to know the size, and then instrument them.

Differential Revision: https://reviews.llvm.org/D62439

llvm-svn: 361882
2019-05-28 21:45:56 +00:00
Nikita Popov 5b32f60ec3 Revert "[CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst"
This reverts commit 53f2f32865.

As reported on D62126, this causes assertion failures if the switch
has incorrect branch_weights metadata, which may happen as a result
of other transforms not handling it correctly yet.

llvm-svn: 361881
2019-05-28 21:28:24 +00:00
Konstantin Zhuravlyov fe23ed2c68 AMDGPU: Temporary drop s_mul_hi_i/u32 patterns
It introduces performance regressions in several applications.

This has already been submitted downstream.

llvm-svn: 361879
2019-05-28 21:18:34 +00:00
Adhemerval Zanella 34d8daae53 [AArch64] Handle ISD::LRINT and ISD::LLRINT
This patch optimizes ISD::LRINT and ISD::LLRINT to frintx plus
fcvtzs. It currently only handles the scalar version.

Reviewed By: SjoerdMeijer, mstorsjo

Differential Revision: https://reviews.llvm.org/D62018

llvm-svn: 361877
2019-05-28 21:04:29 +00:00
Adhemerval Zanella 6d7bf5e8df [CodeGen] Add lrint/llrint builtins
This patch add the ISD::LRINT and ISD::LLRINT along with new
intrinsics.  The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.

The idea is to optimize lrint/llrint generation for AArch64
in a subsequent patch.  Current semantic is just route it to libm
symbol.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D62017

llvm-svn: 361875
2019-05-28 20:47:44 +00:00
Roman Lebedev dfc34f0211 [DAGCombine] (x - C) - y -> (x - y) - C fold. Try 2
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

This is a recommit, originally committed in rL361856, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 361874
2019-05-28 20:40:10 +00:00
Roman Lebedev d485c6bc9f [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 2
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

This is a recommit, originally committed in rL361855, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 361873
2019-05-28 20:40:03 +00:00
Roman Lebedev 96c9986199 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 2
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

This is a recommit, originally committed in rL361853, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 361872
2019-05-28 20:39:55 +00:00
Roman Lebedev 2feb7e56e2 [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 2
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 361871
2019-05-28 20:39:39 +00:00
Michael Liao 5fc1dfa784 [AMDGPU] Correct the handling of inlineasm output registers.
Summary:
- There's a regression due to the cross-block RC assignment. Use the
  proper way to derive the output register RC in inline asm.

Reviewers: rampitec, alex-t

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62537

llvm-svn: 361868
2019-05-28 19:37:09 +00:00
Roman Lebedev 272d70c366 Revert DAGCombine "hoist binop with const" folds
Appear to introduce test-suite compile-time hang.

http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/22825

This reverts r361852,r361853,r361854,r361855,r361856

llvm-svn: 361865
2019-05-28 19:04:21 +00:00
Nikita Popov c51cdacab9 [InstCombine] Clean up saturing math overflow optimizations; NFC
Reduce duplication and make it easier to handle signed
always-overflows conditions in the future.

llvm-svn: 361863
2019-05-28 18:59:21 +00:00
Nikita Popov 332c100562 [ValueTracking][ConstantRange] Distinguish low/high always overflow
In order to fold an always overflowing signed saturating add/sub,
we need to know in which direction the always overflow occurs.
This patch splits up AlwaysOverflows into AlwaysOverflowsLow and
AlwaysOverflowsHigh to pass through this information (but it is
not used yet).

Differential Revision: https://reviews.llvm.org/D62463

llvm-svn: 361858
2019-05-28 18:08:31 +00:00
Nikita Popov 2fb0a820df [IR] Add SaturatingInst and BinaryOpIntrinsic classes
Based on the suggestion in D62447, this adds a SaturatingInst class
that represents the saturating add/sub family of intrinsics. It
exposes the same interface as WithOverflowInst, for this reason I
have also added a common base class BinaryOpIntrinsic that holds the
actual implementation code and will be useful in some places handling
both overflowing and saturating math.

Differential Revision: https://reviews.llvm.org/D62466

llvm-svn: 361857
2019-05-28 18:08:06 +00:00
Roman Lebedev 7669665432 [DAGCombine] (x - C) - y -> (x - y) - C fold
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 361856
2019-05-28 17:54:21 +00:00
Roman Lebedev 8c9b3e4e4a [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 361855
2019-05-28 17:54:13 +00:00
Roman Lebedev 6a24c9b9ab [DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.

No surprising test changes.

https://rise4fun.com/Alive/pbT

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62257

llvm-svn: 361854
2019-05-28 17:54:04 +00:00
Roman Lebedev 1499f65ac1 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 361853
2019-05-28 17:53:54 +00:00
Roman Lebedev 19f51ec04a [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 361852
2019-05-28 17:53:43 +00:00
Sanjay Patel f7980e727f Revert "[x86] split 256-bit store of concatenated vectors"
This reverts commit d5a8637072.

Most likely suspect for this bot failure:
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9684

llvm-svn: 361850
2019-05-28 17:37:58 +00:00
Matt Arsenault 24e80b8d04 AMDGPU: Don't enable all lanes with non-CSR VGPR spills
If the only VGPRs used for SGPR spilling were not CSRs, this was
enabling all laness and immediately restoring exec. This is the usual
situation in leaf functions.

llvm-svn: 361848
2019-05-28 16:46:02 +00:00
Michael Liao 7166843f1e [AMDGPU] Fix the mis-handling of `vreg_1` copied from scalar register.
Summary:
- Don't treat the use of a scalar register as `vreg_1` an VGPR usage.
  Otherwise, that promotes that scalar register into vector one, which
  breaks the assumption that scalar register holds the lane mask.
- The issue is triggered in a complicated case, where if the uses of
  that (lane mask) scalar register is legalized firstly before its
  definition, e.g., due to the mismatch block placement and its
  topological order or loop. In that cases, the legalization of PHI
  introduces the use of that scalar register as `vreg_1`.

Reviewers: rampitec, nhaehnle, arsenm, alex-t

Subscribers: kzhuravl, jvesely, wdng, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62492

llvm-svn: 361847
2019-05-28 16:29:39 +00:00
Simon Tatham 760df47b77 [ARM] Replace fp-only-sp and d16 with fp64 and d32.
Those two subtarget features were awkward because their semantics are
reversed: each one indicates the _lack_ of support for something in
the architecture, rather than the presence. As a consequence, you
don't get the behavior you want if you combine two sets of feature
bits.

Each SubtargetFeature for an FP architecture version now comes in four
versions, one for each combination of those options. So you can still
say (for example) '+vfp2' in a feature string and it will mean what
it's always meant, but there's a new string '+vfp2d16sp' meaning the
version without those extra options.

A lot of this change is just mechanically replacing positive checks
for the old features with negative checks for the new ones. But one
more interesting change is that I've rearranged getFPUFeatures() so
that the main FPU feature is appended to the output list *before*
rather than after the features derived from the Restriction field, so
that -fp64 and -d32 can override defaults added by the main feature.

Reviewers: dmgreen, samparker, SjoerdMeijer

Subscribers: srhines, javed.absar, eraman, kristof.beyls, hiraditya, zzheng, Petar.Avramovic, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D60691

llvm-svn: 361845
2019-05-28 16:13:20 +00:00
Fangrui Song 448a79d123 [AArch64] Delete unused VariantKind in AArch64MCExpr
llvm-svn: 361844
2019-05-28 16:11:56 +00:00
David Greene 561fcc0d63 [X86-64] Fix 256-bit SET0 lowering for non-VLX targets
If we don't have VLX then 256-bit SET0 should be lowered
to VPXOR with ZMM registers.  This restores functionality
accidentally removed by r309926.

Differential Revision: https://reviews.llvm.org/D62415

llvm-svn: 361843
2019-05-28 15:37:01 +00:00
Nico Weber a2ca6e7803 llvm-undname: Support demangling char8_t
Ports clang's mangling support added in r354633 to llvm-undname.

llvm-svn: 361839
2019-05-28 15:30:04 +00:00
Nico Weber 88ab281b4d llvm-undname: Add support for local static thread guards
llvm-svn: 361835
2019-05-28 14:54:49 +00:00
Jason Liu 9212206d25 [XCOFF] Implement parsing symbol table for xcoffobjfile and output as yaml format
Summary:
This patch implement parsing symbol table for xcoffobjfile and
output as yaml format. Parsing auxiliary entries of a symbol
will be in a separate patch.

The XCOFF object file (aix_xcoff.o) used in the test comes from
-bash-4.2$ cat test.c
extern int i;
extern int TestforXcoff;
int main()
{
i++;
TestforXcoff--;
}

Patch by DiggerLin

Reviewers: sfertile, hubert.reinterpretcast, MaskRay, daltenty

Differential Revision: https://reviews.llvm.org/D61532

llvm-svn: 361832
2019-05-28 14:37:59 +00:00
Sanjay Patel d5a8637072 [x86] split 256-bit store of concatenated vectors
This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3

But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.

We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is the reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.

Differential Revision: https://reviews.llvm.org/D62498

llvm-svn: 361822
2019-05-28 13:54:17 +00:00
Simon Pilgrim 9cd9624fb6 [DAG] LegalizeVectorTypes - reduce scope of local variables. NFCI.
Move the element index/count variables into the block where they are actually used - appeases cppcheck and helps avoid shadow variable warnings.

llvm-svn: 361821
2019-05-28 13:46:26 +00:00
David Stenberg 5d0e6b6755 Stop undef fragments from closing non-overlapping fragments
Summary:
When DwarfDebug::buildLocationList() encountered an undef debug value,
it would truncate all open values, regardless if they were overlapping or
not. This patch fixes so that it only does that for overlapping fragments.

This change unearthed a bug that I had introduced in D57511,
which I have fixed in this patch. The code in DebugHandlerBase that
changes labels for parameter debug values could break DwarfDebug's
assumption that the labels for the entries in the debug value history
are monotonically increasing. Before this patch, that bug could result
in location list entries whose ending address was lower than the
beginning address, and with the changes for undef debug values that this
patch introduces it could trigger an assertion, due to attempting to
emit location list entries with empty ranges. A reproducer for the bug
is added in param-reg-const-mix.mir.

Reviewers: aprantl, jmorse, probinson

Reviewed By: aprantl

Subscribers: javed.absar, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D62379

llvm-svn: 361820
2019-05-28 13:23:25 +00:00
Matt Arsenault d3ed418ad3 MIR: Fix printer crashing on dead CSR frame indexes
llvm-svn: 361819
2019-05-28 13:08:31 +00:00
Sanjay Patel 6bf4ca9d2e [x86] fix 256-bit vector store splitting to honor 'volatile'
Forking this out of the discussion in D62498
(and assuming that will be committed later, so adding the helper function here).
The LangRef says:
"the backend should never split or merge target-legal volatile load/store instructions."

Differential Revision: https://reviews.llvm.org/D62506

llvm-svn: 361815
2019-05-28 12:58:07 +00:00
Benjamin Kramer 57e267a2e9 [X86] Custom lower CONCAT_VECTORS of v2i1
The generic legalizer cannot handle this. Add an assert instead of
silently miscompiling vectors with elements smaller than 8 bits.

llvm-svn: 361814
2019-05-28 12:52:57 +00:00
Graham Hunter 19e91253c0 [NFC] Test commit, delete trailing whitespace
llvm-svn: 361813
2019-05-28 12:36:39 +00:00
Hans Wennborg d936e40575 Re-commit r357452 (take 2): "SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)"
This was reverted in r360086 as it was supected of causing mysterious test
failures internally. However, it was never concluded that this patch was the
root cause.

> The code was previously checking that candidates for sinking had exactly
> one use or were a store instruction (which can't have uses). This meant
> we could sink call instructions only if they had a use.
>
> That limitation seemed a bit arbitrary, so this patch changes it to
> "instruction has zero or one use" which seems more natural and removes
> the need to special-case stores.
>
> Differential revision: https://reviews.llvm.org/D59936

llvm-svn: 361811
2019-05-28 12:19:38 +00:00
Yevgeny Rouban 53f2f32865 [CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst
This patch fixes the CorrelatedValuePropagation pass to keep
prof branch_weights metadata of SwitchInst consistent.
It makes use of SwitchInstProfUpdateWrapper.
New tests are added.

Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D62126

llvm-svn: 361808
2019-05-28 11:33:50 +00:00
Simon Pilgrim 4b48aa0e30 [X86] X86CmovConverterPass::collectCmovCandidates - fix uninitialized variable warnings. NFCI.
llvm-svn: 361804
2019-05-28 10:53:23 +00:00
Cullen Rhodes f57bd6bd23 [AArch64][SVE2] Asm: support SVE2 Floating Point Convert Group
Summary:
Patch adds support for the following intructions:

SVE2 floating-point convert precision:
    * FCVTXNT, FCVTNT, FCVTLT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62382

llvm-svn: 361801
2019-05-28 09:36:52 +00:00
Cullen Rhodes 8e91dd7934 [AArch64][SVE2] Asm: support SVE2 Crypto Extensions Group
Summary:
Patch adds support for the following instructions:

SVE2 crypto constructive binary operations:
    * SM4EKEY, RAX1

SVE2 crypto destructive binary operations:
    * AESE, AESD, SM4E

SVE2 crypto unary operations:
    * AESMC, AESIMC

AESE, AESD, AESMC and AESIMC are enabled with +sve2-aes.  SM4E and
SM4EKEY are enabled with +sve2-sm4. RAX1 is enabled with +sve2-sha3.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62307

llvm-svn: 361797
2019-05-28 09:13:17 +00:00
Cullen Rhodes c4ed601bd9 [AArch64][SVE2] Asm: support SVE2 Histogram Computation Groups
Summary:
Patch adds support for the following instructions:

SVE2 histogram generation (segment):
    * HISTSEG

SVE2 histogram generation (vector):
    * HISTCNT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62306

llvm-svn: 361796
2019-05-28 08:51:59 +00:00
Cullen Rhodes 7d9cac5bba [AArch64][SVE2] Asm: support SVE2 Misc Group
Summary:
Patch adds support for the following instructions:

SVE2 bitwise exclusive-or interleaved:
    * EORBT, EORTB

SVE2 bitwise permute:
    * BEXT, BDEP, BGRP

SVE2 bitwise shift left long:
    * SSHLLB, SSHLLT, USHLLB, USHLLT

SVE2 integer add/subtract interleaved long:
    * SADDLBT, SSUBLBT, SSUBLTB

BDEP, BEXT and BGRP are enabled with SVE2 feature +bitperm, all other
instructions in this group are enabled with +sve2.

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62304

llvm-svn: 361795
2019-05-28 08:42:22 +00:00
Craig Topper ab53c5e5ab [InlineCost] Fix a couple comments. NFC
Replace "unary operator" with "unary instruction" in visitUnaryInstruction since
we now have a UnaryOperator class which might needs its own visit function.

Fix a copy/paste in visitCastInst that appears to have been copied from
visitPtrToInt.

llvm-svn: 361794
2019-05-28 07:25:27 +00:00
Craig Topper 50d502826b [CostModel] Add really basic support for being able to query the cost of the FNeg instruction.
Summary:
This reuses the getArithmeticInstrCost, but passes dummy values of the second
operand flags.

The X86 costs are wrong and can be improved in a follow up. I just wanted to
stop it from reporting an unknown cost first.

Reviewers: RKSimon, spatel, andrew.w.kaylor, cameron.mcinally

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62444

llvm-svn: 361788
2019-05-28 04:09:18 +00:00
Nico Weber f83c39e53f llvm-undname: Remove unreachable statement
llvm-svn: 361786
2019-05-28 01:20:36 +00:00
Nico Weber 82dc06c340 llvm-undname: Extract demangleMD5Name() method; no behavior change
llvm-svn: 361783
2019-05-27 23:10:42 +00:00
Lang Hames 23343c5d90 [RuntimeDyld][ARM] Fix an incorrect assertion condition.
Fixes https://llvm.org/PR42036

llvm-svn: 361782
2019-05-27 21:34:31 +00:00
Matt Arsenault ca84c4be4b RegAllocFast: Set MayLiveAcrossBlocks when allocating uses
Setting mayLiveOut based only on use instructions after allocating the
def block did not work if the use block was allocated before the def
block, since the virtual register uses were already removed.

Fixes bug 41973.

llvm-svn: 361781
2019-05-27 20:37:31 +00:00
Sanjay Patel 2f99d009c1 [SelectionDAG] fold concat of extract subvectors
This is derived from the related fold for build vectors.
We also have a version of this in DAGCombiner. The benefit of
having this fold at node creation time is (1) efficiency and
(2) preventing infinite looping from creating patterns that
should not exist in the first place.

Currently, the inf-loop could happen with MergeConsecutiveStores()
because it naively creates concat of extracts when forming a wider
vector store. That could fight with target-specific store narrowing.

llvm-svn: 361780
2019-05-27 20:26:21 +00:00
Sanjay Patel e13ae3e4d8 [SelectionDAG] fix formatting and redundant comments; NFC
There's a possible missing fold here for extracting from the
same source vector. It's similar to a check that we use to
squash a build vector with all extracted elements from the
same source vector.

llvm-svn: 361778
2019-05-27 18:26:43 +00:00
Michael Liao 9c70c574b4 [SelectionDAG] Enhance the simplification of `copyto` from `implicit-def`.
Summary:
- The current implementation simplifies the case where the source of
  `copyto` is `implicit-def`ed. However, it only works when that
  `implicit-def` is single-used since it detects that from
  `implicit-def` and cannot determine which destination vreg should be
  used if there are multiple uses.
- This patch changes that detection when `copyto` is being emitted. If
  that `copyto`'s source is defined from `implicit-def`, it simplifies
  it. Hence, it works even that `implicit-def` is multi-used.
- Except it simplifies the internal IR, it won't improve the quality of
  code generation. However, it helps to detect 'implicit-def` in a
  straight-forward manner in some passes, such as `si-i1-copies`. A test
  case is added.

Reviewers: sunfish, nhaehnle

Subscribers: jvesely, hiraditya, asbirlea, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62342

llvm-svn: 361777
2019-05-27 18:26:29 +00:00
Alexander Timofeev f4040a0dd8 [AMDGPU] Fix for the address sanitizer failure. Fixing typo
llvm-svn: 361776
2019-05-27 18:17:21 +00:00
Dmitri Gribenko 5379f1a6c5 Include what you use in AArch64AsmBackend.cpp
AArch64AsmBackend.cpp was not using any APIs from AArch64.h, and was
only including it for transitive dependencies.  Doing so is problematic
from include-what-you-use perspective, but it is also a layering issue
(it creates a dependency cycle between the primary AArch64 target
library and the MCTargetDesc library).

llvm-svn: 361774
2019-05-27 17:03:57 +00:00
Simon Pilgrim ebb053b139 [SelectionDAG] GetDemandedBits - add demanded elements wrapper implementation
The DemandedElts variable is pretty much inert at the moment - the original GetDemandedBits implementation calls it with an 'all ones' DemandedElts value so the function is active and behaves exactly as it used to.

llvm-svn: 361773
2019-05-27 16:39:25 +00:00
Simon Pilgrim d99f9373d3 [LLParser] Fix uninitialized flag variable warnings. NFCI.
Fixes a large number of warnings in the scan-build report on llvm builds.

llvm-svn: 361772
2019-05-27 16:33:15 +00:00
Alexander Timofeev 4a7c4069ae [AMDGPU] Fix for the address sanitizer failure caused by the ifollowing commit:
1a8b2ea611cf4ca7cb09562e0238cfefa27c05b5  Divergence driven ISel. Assign register class for cross block values according to the divergence.

llvm-svn: 361770
2019-05-27 15:03:29 +00:00
Dmitry Preobrazhensky b79af7930c [AMDGPU][MC] Enabled constant expressions as operands of s_waitcnt
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D61017

llvm-svn: 361763
2019-05-27 14:08:43 +00:00
Xing Xue 3860aad6e7 [MustExecute] Improve MustExecute to correctly handle loop nest
Summary:
for.outer:
  br for.inner
for.inner:
  LI <loop invariant load instruction>
for.inner.latch:
  br for.inner, for.outer.latch
for.outer.latch:
  br for.outer, for.outer.exit

LI is a loop invariant load instruction that post dominate for.outer, so LI should be able to move out of the loop nest. However, there is a bug in allLoopPathsLeadToBlock().

Current algorithm of allLoopPathsLeadToBlock()

  1. get all the transitive predecessors of the basic block LI belongs to (for.inner) ==> for.outer, for.inner.latch
  2. if any successors of any of the predecessors are not for.inner or for.inner's predecessors, then return false
  3. return true

Although for.inner.latch is for.inner's predecessor, but for.inner dominates for.inner.latch, which means if for.inner.latch is ever executed, for.inner should be as well. It should not return false for cases like this.

Author: Whitney (committed by xingxue)

Reviewers: kbarton, jdoerfert, Meinersbur, hfinkel, fhahn

Reviewed By: jdoerfert

Subscribers: hiraditya, jsji, llvm-commits, etiotto, bmahjour

Tags: #LLVM

Differential Revision: https://reviews.llvm.org/D62418

llvm-svn: 361762
2019-05-27 13:57:28 +00:00
Nikola Prica 441ad62531 Test commit (NFC)
Add blank line.

llvm-svn: 361761
2019-05-27 13:51:30 +00:00
Diana Picus 68b20c589c [ARM GlobalISel] Cleanup CallLowering a bit
We never actually use the Offsets produced by ComputeValueVTs, so remove
them until we need them.

llvm-svn: 361755
2019-05-27 10:30:33 +00:00
David L. Jones 0ff41b8a5a Revert r361356: "[MIR] Add simple PRE pass to MachineCSE"
This is problematic on buildbots, as discussed here: https://reviews.llvm.org/rL361356

It seems like the plan already was to revert, but that hasn't happened yet.

llvm-svn: 361746
2019-05-27 06:00:00 +00:00
Nico Weber cfe08bc7d6 llvm-undname: Make demangling of MD5 names more robust
Demangler::parse() for MD5 names would:

1. Put all remaining text into the MD5 name sight unseen
2. Not modify MangledName

This meant that if the demangler recursively called parse() (e.g. in
demangleLocallyScopedNamePiece()), every recursive call that started on
an MD5 name would add all remaining bytes to the output buffer but
only advance the input by a byte.  For valid inputs, MD5 types are
never (well, see comments for 2 exceptions) nested, but for invalid
input this could cause memory use quadratic in the input size.

llvm-svn: 361744
2019-05-27 00:48:59 +00:00
Florian Hahn 11b2f4fe50 [LoopInterchange] Fix handling of LCSSA nodes defined in headers and latches.
The code to preserve LCSSA PHIs currently only properly supports
reduction PHIs and PHIs for values defined outside the latches.

This patch improves the LCSSA PHI handling to cover PHIs for values
defined in the latches.

Fixes PR41725.

Reviewers: efriedma, mcrosier, davide, jdoerfert

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D61576

llvm-svn: 361743
2019-05-26 23:38:25 +00:00
Yonghong Song e698958ad8 [BPF] generate R_BPF_NONE relocation for BTF DataSec variables
The variables in BTF DataSec type encode in-section offset.
R_BPF_NONE should be generated instead of R_BPF_64_32.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D62460

llvm-svn: 361742
2019-05-26 21:26:06 +00:00
Alexander Timofeev ba447bae74 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
             the correct register classes to the cross block values beforehand. For the divergent targets
             same value type requires different register classes dependent on the value divergence.

    Reviewers: rampitec, nhaehnle

    Differential Revision: https://reviews.llvm.org/D59990

    This commit was reverted because of the build failure.
    The reason was mlformed patch.
    Build failure fixed.

llvm-svn: 361741
2019-05-26 20:33:26 +00:00
Andrea Di Biagio c2493ce4a4 [MCA][Scheduler] Improved critical memory dependency computation.
This fixes a problem where back-pressure increases caused by register
dependencies were not correctly notified if execution was also delayed by memory
dependencies.

llvm-svn: 361740
2019-05-26 19:50:31 +00:00
Simon Pilgrim 06e02856ab [SelectionDAG] GetDemandedBits - cleanup to more closely match SimplifyDemandedBits. NFCI.
Prep work before adding demanded elts support.

llvm-svn: 361739
2019-05-26 18:58:14 +00:00
Simon Pilgrim 2916b9e28c [SelectionDAG] MaskedValueIsZero - add demanded elements implementation
Will be used in an upcoming patch but I've updated the original implementation to call this to ensure test coverage.

llvm-svn: 361738
2019-05-26 18:43:44 +00:00
Andrea Di Biagio a549dd2560 [MCA] Refactor the logic that computes the critical memory dependency info. NFCI
CriticalRegDep has been renamed CriticalDependency, and it is now used by class
Instruction to store information about the critical register dependency and the
critical memory dependency. No functional change intendend.

llvm-svn: 361737
2019-05-26 18:41:35 +00:00
Shawn Landden 343578759e [SimplifyCFG] back out all SwitchInst commits
They caused the sanitizer builds to fail.

My suspicion is the change the countLeadingZeros().

llvm-svn: 361736
2019-05-26 18:15:51 +00:00
Simon Pilgrim a044410f37 [X86][SSE] Add shuffle combining support for ISD::ANY_EXTEND_VECTOR_INREG
Reuses what we already have in place for ISD::ZERO_EXTEND_VECTOR_INREG just with a different sentinel

llvm-svn: 361734
2019-05-26 16:00:35 +00:00
Simon Pilgrim e434368a67 Revert rL361731 : [LLParser] Fix uninitialized variable warnings. NFCI.
These 3 variables cause quite a few warnings in the scan-build report on llvm.
........
Revert accidental commit.

llvm-svn: 361732
2019-05-26 15:08:45 +00:00
Simon Pilgrim aabe7781a5 [LLParser] Fix uninitialized variable warnings. NFCI.
These 3 variables cause quite a few warnings in the scan-build report on llvm.

llvm-svn: 361731
2019-05-26 15:05:12 +00:00
Sanjay Patel 9317963920 [InstCombine] prevent crashing with invalid extractelement index
This was found/reduced from a fuzzer report:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14956

llvm-svn: 361729
2019-05-26 14:03:50 +00:00
Shawn Landden fa91ab85d9 [SimplifyCFG] ReduceSwitchRange: Improve on the case where the SubThreshold doesn't trigger
llvm-svn: 361728
2019-05-26 13:55:52 +00:00
Shawn Landden 30111c786f [SimplifyCFG] Run ReduceSwitchRange unconditionally, generalize
Rather than gating on "isSwitchDense" (resulting in necessesarily
sparse lookup tables even when they were generated), always run
this quite cheap transform.

This transform is useful not just for generating tables.
LowerSwitch also wants this: read LowerSwitch.cpp:257.

Be careful to not generate worse code, by introducing a
SubThreshold heuristic.

Instead of just sorting by signed, generalize the finding of the
best base.

And now that it is run unconditionally, do not replicate its
functionality in SwitchToLookupTable (which could use a Sub
when having a hole is smaller, hence the SubThreshold
heuristic located in a single place).
This simplifies SwitchToLookupTable, and fixes
some ugly corner cases due to the use of signed numbers,
such as a table containing i16 32768 and 32769, of which
32769 would be interpreted as -32768, and now the code thinks
the table is size 65536.

(We still use unconditional subtraction when building a single-register mask,
but I think this whole block should go when the more general sparse
map is added, which doesn't leave empty holes in the table.)

And the reason test4 and test5 did not trigger was documented wrong:
it was because they were not considered sufficiently "dense".

Also, fix generation of invalid LLVM-IR: shl by bit-width.

llvm-svn: 361727
2019-05-26 13:55:14 +00:00
Shawn Landden 444eaaf1cc [SimpligyCFG] NFC, remove GCD that was only used for powers of two
and replace with an equilivent countTrailingZeros.

GCD is much more expensive than this, with repeated division.

This depends on D60823

llvm-svn: 361726
2019-05-26 13:54:04 +00:00
Shawn Landden b7cc093db2 [Support] make countLeadingZeros() and countTrailingZeros() return unsigned
This matches countLeadingOnes() and countTrailingOnes(), and
APInt's countLeadingZeros() and countTrailingZeros().

(as well as __builtin_clzll())

llvm-svn: 361724
2019-05-26 13:49:58 +00:00
Nikita Popov d0f13e618f [ValueTracking] Base computeOverflowForUnsignedMul() on ConstantRange code; NFCI
The implementation in ValueTracking and ConstantRange are equally
powerful, reuse the one in ConstantRange, which will make this easier
to extend.

llvm-svn: 361723
2019-05-26 13:22:01 +00:00
Nikita Popov 39f2bebf41 [InstCombine] Refactor OptimizeOverflowCheck; NFCI
Extract method to compute overflow based on binop and signedness,
and then make the result handling code generic. This extends the
always-overflow handling to signed muls, but has currently no effect,
as we don't compute always overflow for them (thus NFC).

llvm-svn: 361721
2019-05-26 11:43:37 +00:00
Nikita Popov 352f598795 [InstCombine] Remove OverflowCheckFlavor; NFC
Instead pass binary op and signedness. The extra enum only makes
things more complicated in this case.

llvm-svn: 361720
2019-05-26 11:43:31 +00:00
David Green 0dbafe191e [ARM] Select fp16 fma
This adds a pattern for fma, similar to the float and double patterns.

Differential Revision: https://reviews.llvm.org/D62330

llvm-svn: 361719
2019-05-26 11:34:30 +00:00
David Green 21542cd6f4 [ARM] Select a number of fp16 rounding functions
This add patterns for fp16 round and ceil etc. Same as the float and double
patterns.

Differential Revision: https://reviews.llvm.org/D62326

llvm-svn: 361718
2019-05-26 11:13:00 +00:00
David Green c9f4b7d201 [ARM] Promote various fp16 math intrinsics
Promote a number of fp16 math intrinsics to float, so that the relevant float
math routines can be used. Copysign is expanded so as to be handled in-place.

Differential Revision: https://reviews.llvm.org/D62325

llvm-svn: 361717
2019-05-26 10:59:21 +00:00
Simon Pilgrim 58a8541dcc [X86][AVX] combineBitcastvxi1 - peek through bitops to determine size of original vector
We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well.

There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well.

llvm-svn: 361716
2019-05-26 10:54:23 +00:00
David Green 2881325b17 [ARM] Select fp16 fabs
This adds a pattern for the fabs intrinsic, the same as float and double.

Differential Revision: https://reviews.llvm.org/D62324

llvm-svn: 361715
2019-05-26 10:51:58 +00:00
David Green aeade651f3 [ARM] Select fp16 fsqrt
This adds a pattern for the sqrt intrinsic, the same as float and double.

Differential Revision: https://reviews.llvm.org/D62322

llvm-svn: 361714
2019-05-26 10:42:24 +00:00
David Green caf8a11b65 [ARM] Promote fp16 frem
Promote fp16 frem operations on ARM to floats so they call fmodf.

Differential Revision: https://reviews.llvm.org/D62321

llvm-svn: 361713
2019-05-26 10:30:22 +00:00
David Bolvansky 0290a77aa8 [SimplifyCFG] Added condition assumption for unreachable blocks
Summary: PR41688

Reviewers: spatel, efriedma, craig.topper, hfinkel, reames

Reviewed By: hfinkel

Subscribers: javed.absar, dmgreen, fhahn, hfinkel, reames, nikic, lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61409

llvm-svn: 361707
2019-05-25 22:34:27 +00:00
Simon Pilgrim 40fa52b174 [X86] lowerBuildVectorToBitOp - support build_vector(shift()) -> shift(build_vector(),C)
Commonly occurs in sign-extension cases

llvm-svn: 361706
2019-05-25 18:02:17 +00:00
Robert Widmann b0fd12b689 [LLVM-C] Add Accessor for Mach-O Universal Binary Slices
Summary: Allow for retrieving an object file corresponding to an architecture-specific slice in a Mach-O universal binary file.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60378

llvm-svn: 361705
2019-05-25 16:47:27 +00:00
Nikita Popov d87eceda0e [X86] Combine fminnum/fmaxnum with non-nan operand to fmin/fmax
If we have a known non-nan operand, place it in the second operand
of fmin/fmax that is returned if either operand is nan.

Differential Revision: https://reviews.llvm.org/D62448

llvm-svn: 361704
2019-05-25 16:44:29 +00:00
Nikita Popov 6bb5041e94 [LVI][CVP] Add support for saturating add/sub
Adds support for the uadd.sat family of intrinsics in LVI, based on
ConstantRange methods from D60946.

Differential Revision: https://reviews.llvm.org/D62447

llvm-svn: 361703
2019-05-25 16:44:14 +00:00
Nikita Popov 8b1fa07639 [CVP] Remove unnecessary checks for empty GNWR; NFC
The guaranteed no-wrap region is never empty, it always contains at
least zero, so these optimizations don't ever apply.

To make this more obviously true, replace the conversative return
in makeGNWR with an assertion.

llvm-svn: 361698
2019-05-25 14:11:55 +00:00
Sanjay Patel 91131b6500 [SelectionDAG] soften assertion when legalizing narrow vector FP ops
The test based on PR42010:
https://bugs.llvm.org/show_bug.cgi?id=42010
...may show an inaccuracy for PPC's target defs, but we should not
be so aggressive with an assert here. There's no telling what out-of-tree
targets look like.

llvm-svn: 361696
2019-05-25 13:48:07 +00:00
Nikita Popov 024b18aca7 [LVI][CVP] Calculate with.overflow result range
In LVI, calculate the range of extractvalue(op.with.overflow(%x, %y), 0)
as the range of op(%x, %y). This is mainly useful in conjunction with
D60650: If the result of the operation is extracted in a branch guarded
against overflow, then the value of %x will be appropriately constrained
and the result range of the operation will be calculated taking that
into account.

Differential Revision: https://reviews.llvm.org/D60656

llvm-svn: 361693
2019-05-25 09:53:45 +00:00
Nikita Popov 17367b0d89 [LVI] Extract helper for binary range calculations; NFC
llvm-svn: 361692
2019-05-25 09:53:37 +00:00
Craig Topper 46e5052b8e [X86FixupLEAs] Turn optIncDec into a generic two address LEA optimizer. Support LEA64_32r properly.
INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags.

This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg.

One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input.

Differential Revision: https://reviews.llvm.org/D61472

llvm-svn: 361691
2019-05-25 06:17:47 +00:00
Craig Topper 4b08fcdeb1 [X86] Add zero idioms to the haswell, broadwell, and skylake schedule models. Add 256-bit fp xor to sandybridge zero idioms
This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate.

Differential Revision: https://reviews.llvm.org/D62360

llvm-svn: 361690
2019-05-25 04:47:49 +00:00
Peter Collingbourne 3b93737446 Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence."
Broke sanitizer bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio

llvm-svn: 361688
2019-05-25 01:52:38 +00:00
David Blaikie a17564c2f1 llvm-dwarfdump: Don't error on mixed units using/not using str_offsets
This lead to errors when dumping binaries with v4 and v5 units linked
together (but could've also errored on v5 units that did/didn't use
str_offsets).

Also improves error handling and messages around invalid str_offsets
contributions.

llvm-svn: 361683
2019-05-25 00:07:22 +00:00
Jessica Paquette 97d668d70f [GlobalISel][AArch64] Make FP constraint checks consider possible use/def banks
In a few places in getInstrMapping, we check if use/def instructions for the
instruction we're mapping have floating point constraints.

We can improve this check and reduce the number of copies in GISel-compiled code
if we make a couple observations:

- For a def instruction, it only matters if the def instruction must always
  output a value stored on a FPR

- For a use instruction, it only matters if the use instruction must always
  only take in values stored in FPRs

This adds two new functions:

- onlyUsesFP
- onlyDefinesFP

Then we can use those when we're checking the uses/defs instead.

Without this patch, the load, unmerge, store, and select in the added test
would have unnecessary copies.

Differential Revision: https://reviews.llvm.org/D62426

llvm-svn: 361679
2019-05-24 23:08:45 +00:00
Jessica Paquette bede937b16 [GlobalISel][AArch64] NFC: Factor out HasFPConstraints into a proper function
Factor it out into a function, and replace places where we had the same check
with the new function.

Differential Revision: https://reviews.llvm.org/D62421

llvm-svn: 361677
2019-05-24 22:12:21 +00:00
Jonas Devlieghere 0da8160df3 [dwarfdump] Add flag to limit the number of parents DIEs
This adds `-parent-recurse-depth` which limits the number of parent DIEs
being dumped.

Differential revision: https://reviews.llvm.org/D62359

llvm-svn: 361671
2019-05-24 21:11:28 +00:00
Jason Liu 8e1d921bb3 Implement call lowering without parameters on AIX
Summary:dd
This patch implements call lowering for calls without parameters
on AIX as initial support.

Reviewers: sfertile, hubert.reinterpretcast, aheejin, efriedma

Differential Revision: https://reviews.llvm.org/D61948

llvm-svn: 361669
2019-05-24 20:54:35 +00:00
Jessica Paquette 56503865ed [GlobalISel][AArch64] Improve register bank mappings for G_SELECT
The fcsel and csel instructions differ in only the register banks they work on.

So, they're entirely interchangeable otherwise.

With this in mind, this does two things:

- Teach AArch64RegisterBankInfo to consider the inputs to G_SELECT as well as
  the outputs.
- Teach it to choose the best register bank mapping based off the constraints
  of the inputs and outputs.

The "best" in this case means the one that requires the smallest number of
copies to properly emit a fcsel/csel.

For example, if the inputs are all already going to be on FPRs, we should
emit a fcsel, even if the output is a GPR. This costs one copy to produce the
result, but saves us from copying the inputs into GPRs.

Also update the regbank-select.mir to check that we end up with the right
select instruction.

Differential Revision: https://reviews.llvm.org/D62267

llvm-svn: 361665
2019-05-24 19:35:25 +00:00
Nick Desaulniers 33bc64202b [AArch64] check for INLINEASM_BR along w/ INLINEASM
Summary:
It looks like since INLINEASM_BR was created off of INLINEASM, a few
checks for INLINEASM needed to be updated to check for either case.

pr/41999

Reviewers: t.p.northover, peter.smith

Reviewed By: peter.smith

Subscribers: craig.topper, javed.absar, kristof.beyls, hiraditya, llvm-commits, peter.smith, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62402

llvm-svn: 361661
2019-05-24 19:00:13 +00:00
Nick Desaulniers 9f7bd71cf5 [ARM] additionally check for ARM::INLINEASM_BR w/ ARM::INLINEASM
Summary:
We were observing failures for arm32 allyesconfigs of the Linux kernel
with the asm goto Clang patch, where ldr's were being generated to
offsets too far away to encode in imm12.

It looks like since INLINEASM_BR was created off of INLINEASM, a few
checks for INLINEASM needed to be updated to check for either case.

pr/41999

Link: https://github.com/ClangBuiltLinux/linux/issues/490

Reviewers: peter.smith, kristof.beyls, ostannard, rengolin, t.p.northover

Reviewed By: peter.smith

Subscribers: jyu2, javed.absar, hiraditya, llvm-commits, nathanchance, craig.topper, kees, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62400

llvm-svn: 361659
2019-05-24 18:58:21 +00:00
Matt Arsenault 3d59e388ca AMDGPU: Activate all lanes when spilling CSR VGPR for SGPR spills
If some lanes weren't active on entry to the function, this could
clobber their VGPR values.

llvm-svn: 361655
2019-05-24 18:18:51 +00:00
Matt Arsenault 0ff901fba0 AMDGPU: Boost inline threshold with addrspacecasted alloca arguments
This was skipping GetUnderlyingObject for nonprivate addresses, but an
alloca could also be found through an addrspacecast if it's flat.

llvm-svn: 361649
2019-05-24 16:52:35 +00:00
Alexander Timofeev dffedea014 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
         the correct register classes to the cross block values beforehand. For the divergent targets
         same value type requires different register classes dependent on the value divergence.

Reviewers: rampitec, nhaehnle

Differential Revision: https://reviews.llvm.org/D59990

llvm-svn: 361644
2019-05-24 15:32:18 +00:00
Stefan Pintilie 522307fa40 [PowerPC] Remove CRBits Copy Of Unset/set CBit
For the situation, where we generate the following code:

       crxor 8, 8, 8
       < Some instructions>
.LBB0_1:
       < Some instructions>
       cror 1, 8, 8

cror (COPY of CRbit) depends on the result of the crxor instruction.
CR8 is known to be zero as crxor is equivalent to CRUNSET. We can simply use
crxor 1, 1, 1 instead to zero out CR1, which does not have any dependency on
any previous instruction.

This patch will optimize it to:

        < Some instructions>
.LBB0_1:
        < Some instructions>
        cror 1, 1, 1

Patch By: Victor Huang (NeHuang)

Differential Revision: https://reviews.llvm.org/D62044

llvm-svn: 361632
2019-05-24 12:05:37 +00:00
Cullen Rhodes b3e58df80c [AArch64][SVE2] Asm: support SVE2 String Processing Group
Summary:
Patch adds support for the SVE2 character match instructions MATCH and NMATCH.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62206

llvm-svn: 361627
2019-05-24 10:32:01 +00:00
Cullen Rhodes adb1d74bf9 [AArch64][SVE2] Asm: support SVE2 Narrowing Group
Summary:
Patch adds support for the following instructions:

SVE2 bitwise shift right narrow:
    * SQSHRUNB, SQSHRUNT, SQRSHRUNB, SQRSHRUNT, SHRNB, SHRNT, RSHRNB, RSHRNT,
      SQSHRNB, SQSHRNT, SQRSHRNB, SQRSHRNT, UQSHRNB, UQSHRNT, UQRSHRNB,
      UQRSHRNT

SVE2 integer add/subtract narrow high part:
    * ADDHNB, ADDHNT, RADDHNB, RADDHNT, SUBHNB, SUBHNT, RSUBHNB, RSUBHNT

SVE2 saturating extract narrow:
    * SQXTNB, SQXTNT, UQXTNB, UQXTNT, SQXTUNB, SQXTUNT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62205

llvm-svn: 361624
2019-05-24 10:22:30 +00:00
Cullen Rhodes 5f04f00282 [AArch64][SVE2] Asm: support SVE2 Accumulate Group
Summary:
Patch adds support for the following instructions:

SVE2 bitwise shift and insert:
    * SRI, SLI

SVE2 bitwise shift right and accumulate:
    * SSRA, USRA, SRSRA, URSRA

SVE2 complex integer add:
    * CADD, SQCADD

SVE2 integer absolute difference and accumulate:
    * SABA, UABA

SVE2 integer absolute difference and accumulate long:
    * SABALB, SABALT, UABALB, UABALT

SVE2 integer add/subtract long with carry:
    * ADCLB, ADCLT, SBCLB, SBCLT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62204

llvm-svn: 361622
2019-05-24 10:10:34 +00:00
Simon Pilgrim 95b8d9bbf8 [SelectionDAG] computeKnownBits - support constant pool values from target
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.

computeKnownBits then uses this function to improve codegen, notably vector code after legalization.

A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.

This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.

Differential Revision: https://reviews.llvm.org/D61887

llvm-svn: 361620
2019-05-24 10:03:11 +00:00
Cullen Rhodes 980f760515 [AArch64][SVE2] Asm: add PMULLB/PMULLT instructions
Summary:
This patch adds support for the polynomial multiplication instructions
PMULLB/PMULLT. The 64-bit source and 128-bit destination element
variants are enabled with crypto extensions (+sve2-aes), similar to the
NEON PMULL2 instruction. All other variants are enabled with +sve2.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62145

llvm-svn: 361619
2019-05-24 09:56:23 +00:00
Cullen Rhodes 8bcea9daaa [AArch64][SVE2] Asm: add integer add/sub long/wide instructions
Summary:
Patch adds support for the following instructions:

SVE2 integer add/subtract long:
    * SADDLB, SADDLT, UADDLB, UADDLT, SSUBLB, SSUBLT, USUBLB, USUBLT,
      SABDLB, SABDLT, UABDLB, UABDLT

SVE2 integer add/subtract wide:
    * SADDWB, SADDWT, UADDWB, UADDWT, SSUBWB, SSUBWT, USUBWB, USUBWT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62142

llvm-svn: 361615
2019-05-24 09:28:27 +00:00
Bjorn Pettersson b4771425f5 Use the DataLayout::typeSizeEqualsStoreSize helper. NFC
Just a minor refactoring to use the new helper method
DataLayout::typeSizeEqualsStoreSize(). This is done when
checking if getTypeSizeInBits is equal/non-equal to
getTypeStoreSizeInBits.

llvm-svn: 361613
2019-05-24 09:20:20 +00:00
Cullen Rhodes 968cb0e049 [AArch64][SVE2] Asm: add various bitwise shift instructions
Summary:
This patch adds support for the SVE2 saturating/rounding bitwise shift
left (predicated) group of instructions:

    * SRSHL, URSHL, SRSHLR, URSHLR, SQSHL, UQSHL, SQRSHL, UQRSHL,
      SQSHLR, UQSHLR, SQRSHLR, UQRSHLR

Immediate forms of the SQSHL and UQSHL instructions are also added to
the existing SVE bitwise shift by immediate (predicated) group, as well
as three new instructions SRSHR/URSHR/SQSHLU. The new instructions in
this group are encoded similarly and are implemented using the same
TableGen class with a minimal change (1 bit in encoding).

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62140

llvm-svn: 361612
2019-05-24 09:17:23 +00:00
Cullen Rhodes 6bca64fe5e [AArch64][SVE2] Asm: add saturating add/sub instructions
Summary:
Patch adds support for the following instructions:

    * SQADD, UQADD, SUQADD, USQADD
    * SQSUB, UQSUB, SQSUBR, UQSUBR

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62130

llvm-svn: 361611
2019-05-24 09:06:37 +00:00
Neil Henning 119c31ad93 StructurizeCFG: Relax uniformity checks.
This change relaxes the checks for hasOnlyUniformBranches such that our
region is uniform if:

1. All conditional branches that are direct children are uniform.
2. And either:
  a. All sub-regions are uniform.
  b. There is one or less conditional branches among the direct
     children.

Differential Revision: https://reviews.llvm.org/D62198

llvm-svn: 361610
2019-05-24 08:59:17 +00:00
Cullen Rhodes d9bb7b69ab [AArch64][SVE2] Asm: fix overlapping bit
Summary:
Bit 20 in sve2_int_arith_pred TableGen class was overlapping. The
encodings are not affected as bit 20 is defined by the opc bits
and this was overwriting the earlier error of setting bit 20 to 0.

Raised by Momchil: https://reviews.llvm.org/D62130

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62292

llvm-svn: 361609
2019-05-24 08:45:37 +00:00
Tim Northover 3b2157aeed GlobalISel: support swifterror attribute on AArch64.
swifterror marks an argument as a register pretending to be a pointer, so we
need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the
infrastructure can be reused from the DAG world.

llvm-svn: 361608
2019-05-24 08:40:13 +00:00
Tim Northover 3d7a057b0d CodeGen: factor out swifterror value tracking.
llvm-svn: 361607
2019-05-24 08:39:43 +00:00
Simon Atanasyan c1b482f2a5 [mips] Always check that `shift and add` optimization is efficient.
The D45316 introduced the `shouldTransformMulToShiftsAddsSubs` function
to check that breaking down constant multiplications into a series
of shifts, adds, and subs is efficient. Unfortunately, this function
does not check maximum number of steps on all paths of the algorithm.
This patch fixes this bug.

Fix for PR41929.

Differential Revision: https://reviews.llvm.org/D62166

llvm-svn: 361606
2019-05-24 08:39:40 +00:00
Bjorn Pettersson d63a2bb35f [DSE] Bugfix to avoid PartialStoreMerging involving non byte-sized stores
Summary:
The DeadStoreElimination pass now skips doing
PartialStoreMerging when stores overlap according to
OW_PartialEarlierWithFullLater and at least one of
the stores is having a store size that is different
from the size of the type being stored.

This solves problems seen in
  https://bugs.llvm.org/show_bug.cgi?id=41949
for which we in the past could end up with
mis-compiles or assertions.

The content and location of the padding bits is not
formally described (or undefined) in the LangRef
at the moment. So the solution is chosen based on
that we cannot assume anything about the padding bits
when having a store that clobbers more memory than
indicated by the type of the value that is stored
(such as storing an i6 using an 8-bit store instruction).

Fixes: https://bugs.llvm.org/show_bug.cgi?id=41949

Reviewers: spatel, efriedma, fhahn

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62250

llvm-svn: 361605
2019-05-24 08:32:02 +00:00
Sjoerd Meijer 937af54666 [ARM] ARMExpandPseudoInsts: add debug messages
This pass wasn't printing any messages at all, which I find really inconvenient
while debugging/tracing things. It now dumps the before and after of expanded
instructions. It doesn't do this yet for all instructions, but this is a good
start I guess.

Differential Revision: https://reviews.llvm.org/D62297

llvm-svn: 361604
2019-05-24 08:25:02 +00:00
QingShan Zhang 449bfdd1b0 [Power9] Add a specific heuristic to schedule the addi before the load
When we are scheduling the load and addi, if all other heuristic didn't take effect,
 we will try to schedule the addi before the load, to hide the latency, and avoid the
 true dependency added by RA. And this only take effects for Power9.

Differential Revision: https://reviews.llvm.org/D61930

llvm-svn: 361600
2019-05-24 05:30:09 +00:00
Yevgeny Rouban c652b3455e [NFC] SwitchInst: Introduce wrapper for prof branch_weights handling
This patch introduces a wrapper class that re-implements
several mutator methods of SwitchInst to handle changes
of prof branch_weights metadata along with remove/add
switch case methods.
Subsequent patches will use this wrapper to implement
prof branch_weights metadata handling for SwitchInst.

Reviewers: davidx, eraman, reames, chandlerc
Reviewed By: davidx
Differential Revision: https://reviews.llvm.org/D62122

llvm-svn: 361596
2019-05-24 04:34:23 +00:00
David Blaikie fc302c2b7f dwarfdump: Deterministically... determine whether parsing a DWARF32 or DWARF64 str_offsets header
Rather than trying one and then the other - use the kind of the CU to
select which kind of header to parse.

llvm-svn: 361589
2019-05-24 01:41:58 +00:00
Reid Kleckner b7a78c7dff [AArch64] Preserve X8 for thunks ending in variadic musttail calls
Summary:
On Windows, X8 may be used to pass in the address of an aggregate that
is returned indirectly. Therefore, it should be forwarded to variadic
musttail calls and preserved in thunks.

Fixes PR41997

Reviewers: mgrang, efriedma

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62344

llvm-svn: 361585
2019-05-24 01:27:20 +00:00
Serge Pavlov ed595e8627 [AArch64] Add nvcast patterns for v2f32 -> v1f64
Summary: Constant stores of f32 values can create such NvCast nodes.

Reviewers: t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62285

llvm-svn: 361584
2019-05-24 01:20:34 +00:00
David Blaikie 79872a88a0 dwarfdump: Add a bit more DWARF64 support
This test case was incorrect because it mixed DWARF32 and DWARF64 for a
single unit (DWARF32 unit referencing a DWARF64 str_offsets section). So
fix enough of the unit parsing for DWARF64 and make the test valid.

(not sure if anyone needs DWARF64 support though - support in
libDebugInfoDWARF has been added piecemeal and LLVM doesn't produce it
at all)

llvm-svn: 361582
2019-05-24 01:05:52 +00:00
Eli Friedman 052f87ae36 Revert r361460
It regresses https://bugs.llvm.org/show_bug.cgi?id=38309 (represented
by the testcase test/Transforms/GlobalOpt/globalsra-multigep.ll).

llvm-svn: 361581
2019-05-24 01:03:51 +00:00
Thomas Lively 55229f6b10 [WebAssembly] Expand more SIMD float ops
Summary: These were previously causing ISel failures.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62354

llvm-svn: 361577
2019-05-24 00:15:04 +00:00
Sanjay Patel 8869a98e82 [InstSimplify] fold insertelement-of-extractelement
This was partly handled in InstCombine (only the constant
index case), so delete that and zap it more generally in
InstSimplify.

llvm-svn: 361576
2019-05-24 00:13:58 +00:00
Sanjay Patel 093c922205 [InstCombine] remove redundant fold for extractelement; NFC
The out-of-bounds index pattern is handled by InstSimplify,
so the extractelement should be eliminated next time it is
visited.

llvm-svn: 361570
2019-05-23 23:33:38 +00:00
Sanjay Patel 4d4df6f144 [InstCombine] remove redundant fold for insertelement; NFC
The out-of-bounds index pattern is handled by InstSimplify.

llvm-svn: 361569
2019-05-23 23:33:34 +00:00
Alina Sbirlea d82ddfa7c3 [NewPassManager] Add tuning option: ForgetAllSCEVInLoopUnroll [NFC].
Summary: Mirror tuning option from old pass manager in new pass manager.

Reviewers: chandlerc

Subscribers: mehdi_amini, jlebar, zzheng, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61612

llvm-svn: 361560
2019-05-23 21:52:59 +00:00
Sanjay Patel e60cb7d1be [InstSimplify] insertelement V, undef, ? --> V
This was part of InstCombine, but it's better placed in
InstSimplify. InstCombine also had an unreachable but weaker
fold for insertelement with undef index, so that is deleted.

llvm-svn: 361559
2019-05-23 21:49:47 +00:00
Kit Barton 987fdfd9a7 Revert [LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, induction variable, and guard branch.
This reverts r361517 (git commit 2049e4dd8f)

llvm-svn: 361553
2019-05-23 20:53:05 +00:00
Sanjay Patel 7d6c0bce50 [DAGCombiner] make folds of binops safe for opcodes that produce >1 value
This is no-functional-change-intended currently because the definition
of isBinOp() only includes opcodes that produce 1 value. But if we
share that implementation with isCommutativeBinOp() as proposed in
D62191, then we need to make sure that the callers bail out for
opcodes that they are not prepared to handle correctly.

llvm-svn: 361547
2019-05-23 20:17:25 +00:00
Matt Arsenault 5c714cbdd8 AMDGPU: Correct maximum possible private allocation size
We were assuming a much larger possible per-wave visible stack
allocation than is possible:

faa3ae5138/src/core/runtime/amd_gpu_agent.cpp (L70)

Based on this, we can assume the high 15 bits of a frame index or sret
are 0. The frame index value is the per-lane offset, so the maximum
frame index value is MAX_WAVE_SCRATCH / wavesize.

Remove the corresponding subtarget feature and option that made
this configurable.

llvm-svn: 361541
2019-05-23 19:38:14 +00:00
Alina Sbirlea e4b27869c6 [NewPassManager] Add tuning option: LoopUnrolling [NFC].
Summary: Mirror tuning option from old pass manager in new pass manager.

Reviewers: chandlerc

Subscribers: jlebar, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61618

llvm-svn: 361540
2019-05-23 19:35:40 +00:00
Alina Sbirlea 63729b0c49 [SLPVectorizer] Set flag to previous default.
Summary:
The refactoring in r360276 moved the `RunSLPVectorization` flag and added the default explicitly. The default should have been `false`, as before.

The new pass manager used to have SLPVectorization on by default, now it's off in opt, and needs D61617 checked in to enable it in clang.

Reviewers: chandlerc

Subscribers: mehdi_amini, jlebar, eraman, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61955

llvm-svn: 361537
2019-05-23 19:07:41 +00:00
Sanjay Patel 3249be1e03 [InstCombine] be more careful when transforming a shuffle mask
This is reduced from a fuzzer test:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14890

Usually, demanded elements should be able to simplify shuffle
mask elements that are pointing to undef elements of its source
operands, but that doesn't happen in the test case.

llvm-svn: 361533
2019-05-23 18:46:03 +00:00
Robert Lougher 170dfeb2ff Resubmit r360436 "[X86] Avoid SFB - Fix inconsistent codegen with/without debug info"
Fixes https://bugs.llvm.org/show_bug.cgi?id=40969

The functions findPotentiallyBlockedCopies and buildCopy are currently not
accounting for the presence of debug instructions. In the former this results
in the optimization not being trigerred, and in the latter results in
inconsistent codegen.

This patch enables the optimization to be performed in a debug build and
ensures the codegen is consistent with non-debug builds.

Patch by Chris Dawson.

Differential Revision: https://reviews.llvm.org/D61680

llvm-svn: 361527
2019-05-23 18:15:12 +00:00
Thomas Lively e18b5c6237 [WebAssembly] Implement ReplaceNodeResults to fix a SIMD crash
Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61037

llvm-svn: 361526
2019-05-23 18:09:26 +00:00
Matt Arsenault 0f3ba44b57 AMDGPU/GlobalISel: Legality for integer min/max
llvm-svn: 361519
2019-05-23 17:58:48 +00:00
Kit Barton 2049e4dd8f [LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, induction variable, and guard branch.
Summary:
    This PR extends the loop object with more utilities to get loop bounds, step, induction variable, and guard branch. There already exists passes which try to obtain the loop induction variable in their own pass, e.g. loop interchange. It would be useful to have a common area to get these information. Moreover, loop fusion (https://reviews.llvm.org/D55851) is planning to use getGuard() to extend the kind of loops it is able to fuse, e.g. rotated loop with non-constant upper bound, which would have a loop guard.

      /// Example:
      /// for (int i = lb; i < ub; i+=step)
      ///   <loop body>
      /// --- pseudo LLVMIR ---
      /// beforeloop:
      ///   guardcmp = (lb < ub)
      ///   if (guardcmp) goto preheader; else goto afterloop
      /// preheader:
      /// loop:
      ///   i1 = phi[{lb, preheader}, {i2, latch}]
      ///   <loop body>
      ///   i2 = i1 + step
      /// latch:
      ///   cmp = (i2 < ub)
      ///   if (cmp) goto loop
      /// exit:
      /// afterloop:
      ///
      /// getBounds
      ///   getInitialIVValue      --> lb
      ///   getStepInst            --> i2 = i1 + step
      ///   getStepValue           --> step
      ///   getFinalIVValue        --> ub
      ///   getCanonicalPredicate  --> '<'
      ///   getDirection           --> Increasing
      /// getGuard             --> if (guardcmp) goto loop; else goto afterloop
      /// getInductionVariable          --> i1
      /// getAuxiliaryInductionVariable --> {i1}
      /// isCanonical                   --> false

    Committed on behalf of @Whitney (Whitney Tsang).

    Reviewers: kbarton, hfinkel, dmgreen, Meinersbur, jdoerfert, syzaara, fhahn

    Reviewed By: kbarton

    Subscribers: tvvikram, bmahjour, etiotto, fhahn, jsji, hiraditya, llvm-commits

    Tags: #llvm

    Differential Revision: https://reviews.llvm.org/D60565

llvm-svn: 361517
2019-05-23 17:56:35 +00:00
Thomas Lively eafe8ef6f2 [WebAssembly] Add multivalue and tail-call target features
Summary:
These features will both be implemented soon, so I thought I would
save time by adding the boilerplate for both of them at the same time.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D62047

llvm-svn: 361516
2019-05-23 17:26:47 +00:00
Thomas Preud'homme 7b7683d7a6 [FileCheck] Remove llvm:: prefix
Summary:
Remove all llvm:: prefixes in FileCheck library header and
implementation except for calls to make_unique and make_shared since
both files already use the llvm namespace.

Reviewers: jhenderson, jdenny, probinson, arichardson

Subscribers: hiraditya, arichardson, probinson, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62323

llvm-svn: 361515
2019-05-23 17:19:36 +00:00
Saleem Abdulrasool 7bbefb13ee Transforms: lower fadd and fsub atomicrmw instructions
`fadd` and `fsub` have recently (r351850) been added as `atomicrmw`
operations. This diff adds lowering cases for them to the LowerAtomic
transform.

Patch by Josh Berdine!

llvm-svn: 361512
2019-05-23 17:03:43 +00:00
Andrea Di Biagio 27b3b5d952 [MCA] Add the ability to compute critical register dependency of an instruction.
This patch adds the methods `getCriticalRegDep()` and `computeCriticalRegDep()` to
class InstructionBase.
The goal is to allow users to obtain information about the critical register
dependency that most affects the latency of an instruction.

These methods are currently unused. However, the long term plan is to use them
in order to allow the computation of a critical-path as part of the bottleneck
analysis. So, this is yet another step towards fixing PR37494.

llvm-svn: 361509
2019-05-23 16:32:19 +00:00
Shoaib Meenai 87226a7202 [AsmPrinter] Treat a narrowing PtrToInt like Trunc
When printing assembly for PtrToInt, AsmPrinter::lowerConstant
incorrectly assumed that if PtrToInt was not converting to an
int with exactly the same number of bits, it must be widening
to a larger int. But this isn't necessarily true; PtrToInt can
also shrink the size, which is useful when you want to produce
a known 32-bit pointer on a 64-bit platform (on x86_64 ELF
this yields a R_X86_64_32 relocation).

The old behavior of falling through to the widening case for a
narrowing PtrToInt yields bogus assembly code like this, which
fails to assemble because the no-op bit and it accidentally
creates is not a valid relocation:

```
        .long   a&-1
```

The fix is to treat a narrowing PtrToInt exactly the same as
it already treats Trunc: just emit the expression and let
the assembler deal with truncating it in the appropriate way.

Patch by Mat Hostetter <mjh@fb.com>.

Differential Revision: https://reviews.llvm.org/D61325

llvm-svn: 361508
2019-05-23 16:29:09 +00:00
Lewis Revill 74927554e2 [RISCV] Support assembling TLS LA pseudo instructions
This patch adds the pseudo instructions la.tls.ie and la.tls.gd, used in
the initial-exec and global-dynamic TLS models respectively when
addressing a global. The pseudo instructions are expanded in the
assembly parser.

llvm-svn: 361499
2019-05-23 14:46:27 +00:00
Petar Jovanovic aa28b6d198 [LiveDebugValues] Rename 'DMI' into 'DebugInstr' (NFC)
This will improve code readability.

Patch by Djordje Todorovic.

Differential Revision: https://reviews.llvm.org/D62295

llvm-svn: 361497
2019-05-23 13:49:06 +00:00
Andrea Di Biagio dd0d9e01ee [MCA] Introduce class LSUnitBase and let LSUnit derive from it.
Class LSUnitBase provides a abstract interface for all the concrete LS units in
llvm-mca.

Methods exposed by the public abstract LSUnitBase interface are:
 - Status isAvailable(const InstRef&);
 - void dispatch(const InstRef &);
 - const InstRef &isReady(const InstRef &);

LSUnitBase standardises the API, but not the data structures internally used by
LS units. This allows for more flexibility.
Previously, only method `isReady()` was declared virtual by class LSUnit.
Also, derived classes had to inherit all the internal data members of LSUnit.

No functional change intended.

llvm-svn: 361496
2019-05-23 13:42:47 +00:00
Clement Courbet 43882b16a3 [MergeICmps] Make the pass compatible with the new pass manager.
Reviewers: gchatelet, spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62287

llvm-svn: 361490
2019-05-23 12:35:26 +00:00
Andrea Di Biagio 28afd8dc71 [MCA] Make the bool conversion operator in class InstRef explicit. NFCI
This patch makes the bool conversion operator in InstRef explicit.
It also adds a operator< to hel comparing InstRef objects in sets.

llvm-svn: 361482
2019-05-23 10:50:01 +00:00
Petar Jovanovic ff47d83e78 [DwarfExpression] Refactor dwarf expression (NFC)
Refactor location description kind in order to be easier for extensions
(needed for D60866).
In addition, cut off some bits from the other class fields.

Patch by Djordje Todorovic.

Differential Revision: https://reviews.llvm.org/D62002

llvm-svn: 361480
2019-05-23 10:37:13 +00:00
Sam Parker 617cdc5a6d [ARM][CGP] Clear SafeWrap before each search
The previous patch added a member set to store instructions that we
could allow to wrap. But this wasn't cleared between searches meaning
that they could get promoted, incorrectly, during the promotion of a
separate valid chain.

Differential Revision: https://reviews.llvm.org/D62254

llvm-svn: 361462
2019-05-23 07:46:39 +00:00
Christian Bruel 4a7da98bd9 [GlobalOpt] recognize dead struct fields and propagate values
Summary:
Allow struct fields SRA and dead stores. This works by considering fields accesses from getElementPtr to be considered as a possible pointer root that can be cleaned up.
We check that the variable can be SRA by recursively checking the sub expressions with the new isSafeSubSROAGEP function.

basically this allows the array in following C code  to be optimized out 

struct Expr {
  int a[2];
  int b;
};

static struct Expr e;

int foo (int i)
{
  e.b = 2;
  e.a[i] = 1;
  return e.b;
}


Reviewers: greened, bkramer, nicholas, jmolloy

Reviewed By: jmolloy

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61911

llvm-svn: 361460
2019-05-23 05:53:10 +00:00
Thomas Lively 1a3cbe720c [WebAssembly] Implement __builtin_return_address for emscripten
Summary:
In this patch, `ISD::RETURNADDR` is lowered on the emscripten target
to the new Emscripten runtime function `emscripten_return_address`, which
implements the functionality.

Patch by Guanzhong Chen

Reviewers: tlively, aheejin

Reviewed By: tlively

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62210

llvm-svn: 361454
2019-05-23 01:24:01 +00:00
Fangrui Song 86c9ca48c3 [X86] Support -fno-plt __tls_get_addr calls
In general dynamic/local dynamic TLS models, with -fno-plt,

* x86: emit `calll *___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT`
  Note, on x86, if we can get rid of %ebx as the PIC register,
  it may be better to use a register not preserved across function calls.
* x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT`

Reorganize the code by separating 32-bit and 64-bit.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D62106

llvm-svn: 361453
2019-05-23 01:05:13 +00:00
Thomas Preud'homme f3b9bb3d69 [FileCheck] Introduce substitution subclasses
Summary:
With now a clear distinction between string and numeric substitutions,
this patch introduces separate classes to represent them with a parent
class implementing the common interface. Diagnostics in
printSubstitutions() are also adapted to not require knowing which
substitution is being looked at since it does not hinder clarity and
makes the implementation simpler.

Reviewers: jhenderson, jdenny, probinson, arichardson

Subscribers: llvm-commits, probinson, arichardson, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62241

llvm-svn: 361446
2019-05-23 00:10:29 +00:00
Thomas Preud'homme 1a944d27b2 FileCheck: Improve FileCheck variable terminology
Summary:
Terminology introduced by [[#]] blocks is confusing and does not
integrate well with existing terminology.

First, variables referred by [[]] blocks are called "pattern variables"
while the text a CHECK directive needs to match is called a "CHECK
pattern". This is inconsistent with variables in [[#]] blocks since
[[#]] blocks are also found in CHECK pattern yet those variables are
called "numeric variable".

Second, the replacing of both [[]] and [[#]] blocks by the value of the
variable or expression they contain is represented by a
FileCheckPatternSubstitution class. The naming refers to being a
substitution in a CHECK pattern but could be wrongly understood as being
a substitution of a pattern variable.

Third and lastly, comments use "numeric expression" to refer both to the
[[#]] blocks as well as to the numeric expressions these blocks contain
which get evaluated at match time.

This patch solves these confusions by
- calling variables in [[]] and [[#]] blocks as string and numeric
  variables respectively;
- referring to [[]] and [[#]] as substitution *blocks*, with the former
  being a string substitution block and the latter a numeric
  substitution block;
- calling [[]] and [[#]] blocks to be replaced by the value of a
  variable or expression they contain a substitution (as opposed to
  definition when these blocks are used to defined a variable), with the
  former being a string substitution and the latter a numeric
  substitution;
- renaming the FileCheckPatternSubstitution as a FileCheckSubstitution
  class with FileCheckStringSubstitution and
  FileCheckNumericSubstitution subclasses;
- restricting the use of "numeric expression" to refer to the expression
  that is evaluated in a numeric substitution.

While numeric substitution blocks only support numeric substitutions of
numeric expressions at the moment there are plans to augment numeric
substitution blocks to support numeric definitions as well as both a
numeric definition and numeric substitution in the same numeric
substitution block.

Reviewers: jhenderson, jdenny, probinson, arichardson

Subscribers: hiraditya, arichardson, probinson, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62146

llvm-svn: 361445
2019-05-23 00:10:14 +00:00
Matt Arsenault b79a25b124 TableGen: Handle nontrivial foreach range bounds
This allows using anything that isn't a literal integer as the bounds
for a foreach. Some of the diagnostics aren't perfect, but nobody ever
accused tablegen of having good errors. For example, the existing
wording suggests a bitrange is valid, but as far as I can tell this
has never worked.

Fixes bug 41958.

llvm-svn: 361434
2019-05-22 21:28:20 +00:00
Craig Topper 93f38e1f1a [X86] Explcitly disable VEXTRACT instruction matching for an immediate of 0. Remove a bunch of isel patterns that become unnecessary.
We effectively had a second set of isel patterns that tried to use a
regular store instruction and an extract_subreg instruction. Or a masked move
and an extract_subreg. These patterns were intended to override the
matching of VEXTRACT instructions by taking advantage of the priority
of the explicit immediate 0 for the index.

This patch instaed just disables the immediate 0 matchin the VEXTRACT
patterns. This each of the component pieces of the larger patterns will
match by themselves.

This found a bug of sorts were we didn't use 128-bit store for 512->128
extract on KNL. Its unclear what the right thing here should be.
Using the vextract avoids constraining the register allocator to use
xmm0-15. But it always results in a longer encoding if the register
allocator ends up choosing xmm0-15 anyway.

llvm-svn: 361431
2019-05-22 21:00:18 +00:00
Galina Kistanova ed49f6d8e6 Reverted r361134 because of a failing test left unattended for a long time.
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/17792/steps/test-check-all/logs/stdio
Failing Tests (1):
    LLVM :: CodeGen/AMDGPU/regbank-reassign.mir

llvm-svn: 361430
2019-05-22 20:42:56 +00:00
Craig Topper 9816d55776 [X86][InstCombine] Remove InstCombine code that turns X86 round intrinsics into llvm.ceil/floor. Remove some isel patterns that existed because that was happening.
We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into
llvm.floor/ceil.  The llvm.ceil/floor intrinsics are supposed to correspond
to the libm functions.  For the libm functions we need to disable the
precision exception so the llvm.floor/ceil functions should always map to
encodings 0x9 and 0xA.

We had a mix of isel patterns where some used 0x9 and 0xA and others used
0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA.

Since we have no way in isel of knowing where the llvm.ceil/floor came
from, we can't map X86 specific intrinsics with encodings 1 or 2 to it.
We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like
to see a use case and optimization advantage first.

I've left the backend test cases to show the blend we now emit without
the extra isel patterns. But I've removed the InstCombine tests completely.

llvm-svn: 361425
2019-05-22 20:04:55 +00:00
Craig Topper 2f1895e03d [X86] Add more icelake model numbers to getHostCPUName.
Using model numbers found in Table 2-1 of the May 2019 version
of the Intel Software Developer's Manual Volume 4.

llvm-svn: 361422
2019-05-22 19:51:35 +00:00
Alexey Lapshin 53726588f6 [DebugInfo][AArch64] Recognise target specific instruction as mov instr
This fix is for the problem from https://bugs.llvm.org/show_bug.cgi?id=38714.
Specifically, Simple Register Coalescing creates following conversion :

 undef %0.sub_32:gpr64 = ORRWrs $wzr, %3:gpr32common, 0, debug-location !24;

It copies 32-bit value from gpr32 into gpr64. But Live DEBUG_VALUE analysis
is not able to create debug location record for that instruction. So the problem
is in that debug info for argc variable is incorrect. The fix is
to write custom isCopyInstrImpl() which would recognize the ORRWrs instr.

llvm-svn: 361417
2019-05-22 18:48:58 +00:00
Hiroshi Yamauchi dfeb797455 [PGO][CHR] Speed up following long use-def chains.
Summary: Avoid visiting an instruction more than once by using a map.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62262

llvm-svn: 361416
2019-05-22 18:37:34 +00:00
Xing Xue 4246b75295 Disable EHFrameSupport in JITLink/RuntimeDyld on AIX
Summary:
EH Frames aren't supported on AIX with the system compiler, but the definition of HAVE_EHTABLE_SUPPORT misses this which causes linking problems on AIX. This patch updates the definition of HAVE_EHTABLE_SUPPORT in both JITLink and RuntimeDyld.

Author: daltenty

Reviewers: sfertile, xingxue, hubert.reinterpretcase

Reviewed By: xingxue

Subscribers: hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62203

llvm-svn: 361410
2019-05-22 17:41:27 +00:00
Matt Arsenault 418e23e33c AMDGPU: Move disassembler support check to constructor
Don't check for unsupported targets for every instruction.

llvm-svn: 361406
2019-05-22 16:28:48 +00:00
Matt Arsenault ca64ef2043 MC: Allow getMaxInstLength to depend on the subtarget
Keep it optional in cases this is ever needed in some global
context. Currently it's only used for getting an upper bound inline
asm code size.

For AMDGPU, gfx10 increases the maximum instruction size to
20-bytes. This avoids penalizing older subtargets when estimating code
size, and making some annoying branch relaxation test adjustments.

llvm-svn: 361405
2019-05-22 16:28:41 +00:00
Kees Cook c2187c20a4 [TargetLowering] Extend bool args to inline-asm according to getBooleanType
Summary:
This extends Krzysztof Parzyszek's X86-specific solution
(https://reviews.llvm.org/D60208) to the generic code pointed out by
James Y Knight.

Reviewers: kparzysz, craig.topper, nickdesaulniers

Subscribers: efriedma, sdardis, nemanjai, javed.absar, eraman, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, srhines, void, nickdesaulniers, jyknight

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60224

llvm-svn: 361404
2019-05-22 16:16:15 +00:00
Kees Cook a7a687e500 [TargetLowering] Add blank line (test commit)
llvm-svn: 361403
2019-05-22 16:02:13 +00:00
Nico Weber 09fb2029e5 llvm-undname: Fix an assert-on-invalid, found by oss-fuzz
If a template parameter refers to a pointer to member, but the mangling
of that was a string literal instead of a real symbol, llvm-undname used
to crash instead of rejecting the input.

llvm-svn: 361402
2019-05-22 15:53:23 +00:00
Sanjay Patel 5a4f7cf2ff [IR] allow fast-math-flags on select of FP values
This is a minimal start to correcting a problem most directly discussed in PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086

We have been hacking around a limitation for FP select patterns by using the
fast-math-flags on the condition of the select rather than the select itself.
This patch just allows FMF to appear with the 'select' opcode. No changes are
needed to "FPMathOperator" because it already includes select-of-FP because
that definition is based on the (return) value type.

Once we have this ability, we can start correcting and adding IR transforms
to use the FMF on a 'select' instruction. The instcombine and vectorizer test
diffs only show that the IRBuilder change is behaving as expected by applying
an FMF guard value to 'select'.

For reference:
rL241901 - allowed FMF with fcmp
rL255555 - allowed FMF with FP calls

Differential Revision: https://reviews.llvm.org/D61917

llvm-svn: 361401
2019-05-22 15:50:46 +00:00
Simon Pilgrim 3c05cad03e LoopVectorizationCostModel::selectInterleaveCount - assert we have a non-zero loop cost. NFCI.
The input LoopCost value can be zero, but if so it should be recalculated with the current VF. After that it should always be non-zero.

llvm-svn: 361387
2019-05-22 14:18:17 +00:00
Dmitry Preobrazhensky 7773fc478d [AMDGPU][MC] Corrected parsing of op_sel* and neg_* modifiers
See bug 41361: https://bugs.llvm.org/show_bug.cgi?id=41361

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D61012

llvm-svn: 361386
2019-05-22 13:59:01 +00:00
Simon Pilgrim 9b40dd6318 [Hexagon] assert getRegisterBitWidth returns non-zero value. NFCI.
Fixes scan-build warning.

llvm-svn: 361375
2019-05-22 12:25:46 +00:00
Simon Pilgrim cfe6fe06ab [VirtualFileSystem] Fix uninitialized variable warning. NFCI.
llvm-svn: 361371
2019-05-22 11:20:52 +00:00
Sjoerd Meijer aa4f1ffca4 [TargetMachine] error message unsupported code model
When the tiny code model is requested for a target machine that does not
support this, we get an error message (which is nice) but also this diagnostic
and request to submit a bug report:

    fatal error: error in backend: Target does not support the tiny CodeModel
    [Inferior 2 (process 31509) exited with code 0106]
    clang-9: error: clang frontend command failed with exit code 70 (use -v to see invocation)
    (gdb) clang version 9.0.0 (http://llvm.org/git/clang.git 29994b0c63a40f9c97c664170244a7bba5ecc15e) (http://llvm.org/git/llvm.git 95606fdf91c2d63a931e865f4b78b2e9828ddc74)
    Target: arm-arm-none-eabi
    Thread model: posix
    clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
    clang-9: note: diagnostic msg:
    ********************
    PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
    Preprocessed source(s) and associated run script(s) are located at:
    clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.c
    clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.sh
    clang-9: note: diagnostic msg:

But this is not a bug, this is a feature. :-) Not only is this not a bug, this
is also pretty confusing. This patch causes just to print the fatal error and
not the diagnostic:

fatal error: error in backend: Target does not support the tiny CodeModel

Differential Revision: https://reviews.llvm.org/D62236

llvm-svn: 361370
2019-05-22 10:40:26 +00:00
Martin Storsjo de6038b265 [llvm-dlltool] Respect NONAME keyword
This adds proper handling of the NONAME-keyword, which makes llvm-dlltool
generate an import using the ordinal instead of the name.

Patch by by Jannik Vogel, test added by Stefan Schmidt.

Differential Revision: https://reviews.llvm.org/D62175

llvm-svn: 361367
2019-05-22 09:49:54 +00:00
Clement Courbet f8f93ba90d Re-land r361257 "[MergeICmps][NFC] Make BCEAtom move-only.""
llvm-svn: 361366
2019-05-22 09:45:40 +00:00
Anton Afanasyev df00c6a54f [MIR] Add simple PRE pass to MachineCSE
This is the second part of the commit fixing PR38917 (hoisting
partitially redundant machine instruction). Most of PRE (partitial
redundancy elimination) and CSE work is done on LLVM IR, but some of
redundancy arises during DAG legalization. Machine CSE is not enough
to deal with it. This simple PRE implementation works a little bit
intricately: it passes before CSE, looking for partitial redundancy
and transforming it to fully redundancy, anticipating that the next
CSE step will eliminate this created redundancy. If CSE doesn't
eliminate this, than created instruction will remain dead and eliminated
later by Remove Dead Machine Instructions pass.

The third part of the commit is supposed to refactor MachineCSE,
to make it more clear and to merge MachinePRE with MachineCSE,
so one need no rely on further Remove Dead pass to clear instrs
not eliminated by CSE.

First step: https://reviews.llvm.org/D54839

Fixes llvm.org/PR38917

llvm-svn: 361356
2019-05-22 07:41:34 +00:00
Fangrui Song 1c61471ab1 [PPC64] Parse -elfv1 -elfv2 when specified on target triple
Summary:
For big-endian powerpc64, the default ABI is ELFv1. OpenPower ABI ELFv2 is supported when -mabi=elfv2 is specified. FreeBSD support for PowerPC64 ELFv2 ABI with LLVM is in progress[1]. This patch adds an alternative way to specify ELFv2 ABI on target triple [2].

The following results are expected:

ELFv1 when using:
-target powerpc64-unknown-freebsd12.0
-target powerpc64-unknown-freebsd12.0 -mabi=elfv1
-target powerpc64-unknown-freebsd12.0-elfv1

ELFv2 when using:
-target powerpc64-unknown-freebsd12.0 -mabi=elfv2
-target powerpc64-unknown-freebsd12.0-elfv2

[1] https://wiki.freebsd.org/powerpc/llvm-elfv2
[2] https://clang.llvm.org/docs/CrossCompilation.html

Patch by Alfredo Dal'Ava Júnior!

Differential Revision: https://reviews.llvm.org/D61950

llvm-svn: 361355
2019-05-22 07:29:59 +00:00
Sjoerd Meijer eec021658b [AArch64] Subtarget crypto extension defaults
The Armv8.2-A crypto extensions all defaulted to true, but should default to
false, like all the other extensions.

Differential Revision: https://reviews.llvm.org/D62180

llvm-svn: 361354
2019-05-22 07:10:27 +00:00
Nikita Popov 15df05152d [X86] Don't compare i128 through vector if construction not cheap (PR41971)
Fix for https://bugs.llvm.org/show_bug.cgi?id=41971. Make the
combineVectorSizedSetCCEquality() transform more conservative by
checking that the bitcast to the vector type will be cheap/free
for both operands. I'm considering it cheap if it's a constant,
a load or already a vector. I've dropped the explicit check for
f128 because it should fall out naturally (in the cases where
it'd be detrimental).

Differential Revision: https://reviews.llvm.org/D62220

llvm-svn: 361352
2019-05-22 06:47:06 +00:00
Chen Zheng b727b0483c [PowerPC] use meaningful name for displacement form aligned with x-form - NFC
llvm-svn: 361347
2019-05-22 03:17:39 +00:00
Chen Zheng 9970665f60 [PowerPC] [ISEL] select x-form instruction for unaligned offset
Differential Revision: https://reviews.llvm.org/D62173

llvm-svn: 361346
2019-05-22 02:57:31 +00:00
Pengfei Wang 6a0d432e9e [X86] [CET] Deal with return-twice function such as vfork, setjmp when
CET-IBT enabled

Return-twice functions will indirectly jump after the caller's position.
So when CET-IBT is enable, we should make sure these is endbr*
instructions follow these Return-twice function caller. Like GCC does.

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D61881

llvm-svn: 361342
2019-05-22 00:50:21 +00:00
Sanjay Patel 6a554188aa [InstCombine] fold shuffles of insert_subvectors
This should be a valid exception to the general rule of not creating new shuffle masks in IR...
because we already do it. :)
Also, DAG combining/legalization will undo this by widening the shuffle back out if needed.

Explanation for how we already do this: SLP or vector source can create chains of insert/extract
as shown in 1 of the examples from PR16739:
https://godbolt.org/z/NlK7rA
https://bugs.llvm.org/show_bug.cgi?id=16739

And we expect instcombine or DAGCombine to clean that up by creating relatively simple shuffles.

Differential Revision: https://reviews.llvm.org/D62024

llvm-svn: 361338
2019-05-22 00:32:25 +00:00
Matt Arsenault 2cba91b8db AMDGPU: Assume calls read exec
llvm-svn: 361333
2019-05-21 23:23:16 +00:00
Matt Arsenault dd1ffa00a5 AMDGPU: Assume call pseudos are convergent
There should probably be nonconvergent versions, but my guess is it
doesn't matter in practice.

llvm-svn: 361331
2019-05-21 23:23:10 +00:00
Matt Arsenault 60ba03e210 AMDGPU: Fix not marking new gfx10 SGPRs as CSRs
llvm-svn: 361330
2019-05-21 23:23:05 +00:00
Dan Gohman a49496fb2a [WebAssembly] Add the signature for the new llround builtin function
r360889 added new llround builtin functions. This patch adds their
signatures for the WebAssembly backend.

It also adds wasm32 support to utils/update_llc_test_checks.py, since
that's the script other targets are using for their testcases for this
feature.

Differential Revision: https://reviews.llvm.org/D62207

llvm-svn: 361327
2019-05-21 23:06:34 +00:00
Stanislav Mekhanoshin 44d17ca02e Fix register coalescer failure to prune value
Register coalescer fails for the test in the patch with the assertion in
JoinVals::ConflictResolution `DefMI != nullptr'. It attempts to join
live intervals for two adjacent instructions and erase the copy:

    %2:vreg_256 = COPY %1
    %3:vreg_256 = COPY killed %1

The LI needs to be adjusted to kill subrange for the erased instruction
and extend the subrange of the original def. That was done for the main
interval only but not for the subrange. As a result subrange had a VNI
pointing to the erased slot resulting in the above failure.

Differential Revision: https://reviews.llvm.org/D62162

llvm-svn: 361293
2019-05-21 19:32:41 +00:00
Leonard Chan 0bada7ce6c [Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them. The
result is saturated and clamped between the largest and smallest representable
values of the first 2 operands.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D55720

llvm-svn: 361289
2019-05-21 19:17:19 +00:00
Craig Topper ed6df47bae [X86] Remove an unneeded ZERO_EXTEND creation from LowerINTRINSIC_W_CHAIN. NFC
We were trying to ZERO_EXTEND from an i8 X86ISD::SETCC to i8 again.

llvm-svn: 361288
2019-05-21 19:03:45 +00:00
Sanjay Patel 10f6b39899 [SelectionDAG] fold insert subvector of undef into undef
DAGCombiner simplifies this more liberally as:
  // If inserting an UNDEF, just return the original vector.
  if (N1.isUndef())
    return N0;

So there's no way to make this visible in output AFAIK, but
doing this at node creation time should be slightly more efficient.

llvm-svn: 361287
2019-05-21 18:53:53 +00:00
Sanjay Patel 51dc59d090 [SelectionDAG] remove redundant code; NFCI
getNode() squashes concatenation of undefs via FoldCONCAT_VECTORS():
  // Concat of UNDEFs is UNDEF.
  if (llvm::all_of(Ops, [](SDValue Op) { return Op.isUndef(); }))
    return DAG.getUNDEF(VT);

llvm-svn: 361284
2019-05-21 18:28:22 +00:00
Clement Courbet 122c6e6f36 [MergeICmps] Make sorting strongly stable on the rhs.
Summary:
Because the sort order was not strongly stable on the RHS, whether the
chain could merge would depend on the order of the blocks in the Phi.

EXPENSIVE_CHECKS would shuffle the blocks before sorting, resulting in
non-deterministic merging.

Reviewers: gchatelet

Subscribers: hiraditya, llvm-commits, RKSimon

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62193

llvm-svn: 361281
2019-05-21 17:58:42 +00:00
Simon Pilgrim 4b82e50315 [X86][SSE] computeKnownBitsForTargetNode - add X86ISD::ANDNP support
Fixes PACKSS-PSHUFB shuffle regressions mentioned on D61692

llvm-svn: 361270
2019-05-21 15:20:24 +00:00
Sanjay Patel 78c3f58122 [DAGCombiner] prevent unsafe reassociation of FP ops
There are no FP callers of DAGCombiner::reassociateOps() currently,
but we can add a fast-math check to make sure this API is not being
misused.

This was noted as a potential risk (and that risk might increase) with:
D62191

llvm-svn: 361268
2019-05-21 14:47:38 +00:00
Clement Courbet 8361a10493 Revert r361257 "[MergeICmps][NFC] Make BCEAtom move-only."
Broke some bots.

llvm-svn: 361263
2019-05-21 14:24:46 +00:00
Clement Courbet 8fa970c2d8 [MergeICmps][NFC] Make BCEAtom move-only.
And handle for self-move. This is required so that llvm::sort can work
with EXPENSIVE_CHECKS, as it will do a random shuffle of the input
which can result in self-moves.

llvm-svn: 361257
2019-05-21 13:34:12 +00:00
Florian Hahn f9b28e53c7 [ScheduleDAGInstrs] Compute topological ordering on demand.
In most cases, the topological ordering does not get changed in
ScheduleDAGInstrs. We can compute the ordering on demand, similar to
D60125.

This drastically cuts down the number of times we need to compute the
topological ordering, e.g. for SPEC2006, SPEC2k and MultiSource, we get
the following stats for -O3 -flto on X86 (showing the top reductions,
with small absolute values filtered). The smallest reduction is -50%.

Slightly positive impact on compile-time (-0.1 % geomean speedup for
test-suite + SPEC & co, with -O1 on X86)

Tests: 243
Metric: pre-RA-sched.NumTopoInits

Program                                        base       patch  diff
 test-suite...ngs-C/fixoutput/fixoutput.test   115.00      3.00   -97.4%
 test-suite...ks/Prolangs-C/cdecl/cdecl.test   957.00     26.00   -97.3%
 test-suite...math/automotive-basicmath.test   107.00      3.00   -97.2%
 test-suite...rolangs-C++/deriv2/deriv2.test   144.00      6.00   -95.8%
 test-suite...lowfish/security-blowfish.test   410.00     18.00   -95.6%
 test-suite...frame_layout/frame_layout.test   441.00     23.00   -94.8%
 test-suite...rolangs-C++/employ/employ.test   159.00     11.00   -93.1%
 test-suite...s/Ptrdist/anagram/anagram.test   157.00     11.00   -93.0%
 test-suite...s-C/unix-smail/unix-smail.test   829.00     59.00   -92.9%
 test-suite...chmarks/Olden/power/power.test   154.00     11.00   -92.9%
 test-suite...T95/147.vortex/147.vortex.test   19876.00  1434.00  -92.8%
 test-suite...000/255.vortex/255.vortex.test   19881.00  1435.00  -92.8%
 test-suite...ce/Applications/Burg/burg.test   2203.00   168.00   -92.4%
 test-suite...urce/Applications/hbd/hbd.test   1067.00    85.00   -92.0%
 test-suite...ternal/HMMER/hmmcalibrate.test   3145.00   251.00   -92.0%
 test-suite.../Applications/spiff/spiff.test   1037.00    84.00   -91.9%
 test-suite...SPEC/CINT95/130.li/130.li.test   5913.00   487.00   -91.8%
 test-suite.../CINT95/134.perl/134.perl.test   12532.00  1041.00  -91.7%
 test-suite...ce/Benchmarks/Olden/bh/bh.test   220.00     19.00   -91.4%
 test-suite :: External/Nurbs/nurbs.test       2304.00   206.00   -91.1%
 test-suite...arks/VersaBench/dbms/dbms.test   773.00     75.00   -90.3%
 test-suite...ce/Applications/siod/siod.test   9043.00   878.00   -90.3%
 test-suite...pplications/treecc/treecc.test   4510.00   438.00   -90.3%
 test-suite...T2006/456.hmmer/456.hmmer.test   7093.00   697.00   -90.2%
 test-suite...s-C/Pathfinder/PathFinder.test   882.00     87.00   -90.1%
 test-suite.../CINT2000/176.gcc/176.gcc.test   64978.00  6721.00  -89.7%
 test-suite...cations/hexxagon/hexxagon.test   657.00     69.00   -89.5%
 test-suite...fice-ispell/office-ispell.test   2712.00   285.00   -89.5%
 test-suite.../CINT2006/403.gcc/403.gcc.test   139613.00 14992.00 -89.3%
 test-suite...lications/ClamAV/clamscan.test   25880.00  2785.00  -89.2%

Reviewers: MatzeB, atrick, efriedma, niravd

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D60839

llvm-svn: 361253
2019-05-21 13:04:53 +00:00
Paul Robinson 9c56326934 [DebugInfo] Handle '# line "file"' correctly for asm source.
This provides the correct file path for the original source, rather
than the preprocessed source.

Part of the fix for PR41839.

Differential Revision: https://reviews.llvm.org/D62074

llvm-svn: 361248
2019-05-21 11:59:03 +00:00
Bob Haarman 032f87bbb3 Revert r360902 "Resubmit: [Salvage] Change salvage debug info ..."
This reverts commit rr360902. It caused an assertion failure in
lib/IR/DebugInfoMetadata.cpp: Assertion `(OffsetInBits + SizeInBits <=
FragmentSizeInBits) && "new fragment outside of original fragment"'
failed.

PR41931.

llvm-svn: 361246
2019-05-21 11:53:41 +00:00
Paul Robinson 116e8d4876 [DebugInfo] Handle -main-file-name correctly for asm source.
This option provides only the base filename, not a full relative path.

Part of the fix for PR41839.

Differential Revision: https://reviews.llvm.org/D62071

llvm-svn: 361245
2019-05-21 11:52:27 +00:00
Clement Courbet a95d95d392 [MergeICmps] Preserve the dominator tree.
Summary: In preparation for D60318 .

Reviewers: gchatelet, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62068

llvm-svn: 361239
2019-05-21 11:02:23 +00:00
Fangrui Song cd36a2857e [PPC64] Update LocalEntry from assigned symbols
On PowerPC64 ELFv2 ABI, functions may have 2 entry points: global and local.
The local entry point location of a function is stored in the st_other field of the symbol, as an offset relative to the global entry point.

In order to make symbol assignments (e.g. .equ/.set) work properly with this, PPCTargetELFStreamer already copies the local entry bits from the source symbol to the destination one, on emitAssignment(). The problem is that this copy is performed only at the assignment location, where the source symbol may not yet have processed the .localentry directive, that sets the local entry. This may cause the destination symbol to end up with wrong local entry information. Other symbol info is not affected by this because, in this case, the destination symbol value is actually a symbol reference.

This change keeps track of these assignments, and update all needed st_other fields when finish() is called.

Patch by Leandro Lupori!

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D56586

llvm-svn: 361237
2019-05-21 10:41:25 +00:00
Florian Hahn 4a8835c655 [AArch64] Skip mask checks for masks with an odd number of elements.
Some checks in isShuffleMaskLegal expect an even number of elements,
e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access
invalid elements and crash. This patch adds checks to the impacted
functions.

Fixes PR41951

Reviewers: t.p.northover, dmgreen, samparker

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D60690

llvm-svn: 361235
2019-05-21 10:05:26 +00:00
Cullen Rhodes 7f47b75d18 [AArch64][SVE2] Asm: add integer unary instructions (predicated)
Summary:
Patch adds support for the following instructions:

    * URECPE, URSQRTE, SQABS, SQNEG

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62129

llvm-svn: 361230
2019-05-21 09:06:51 +00:00
Cullen Rhodes e798e8d9d2 [AArch64][SVE2] Asm: add integer pairwise arithmetic instructions
Summary:
Patch adds support for the following instructions:

    ADDP, SMAXP, UMAXP, SMINP, UMINP

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62128

llvm-svn: 361229
2019-05-21 08:59:00 +00:00
Sam Parker 3141bbd52d [ARM][CGP] Skip nuw in PrepareConstants
PrepareConstants step converts add/sub with 'negative' immediates to
sub/add with a 'positive' imm to make promotion more simple. nuw
already states that the add shouldn't cause an unsigned wrap, so
it shouldn't need any tweaking. Plus, we also don't allow a sub with
a 'negative' immediate to be safe wrap, so this functionality has
been removed. The PrepareConstants step now just handles the add
instructions that we've determined would be safe if they wrap around
zero.

Differential Revision: https://reviews.llvm.org/D62057

llvm-svn: 361227
2019-05-21 07:56:47 +00:00
Dylan McKay e967308da4 Add TargetLoweringInfo hook for explicitly setting the ABI calling convention endianess
Summary:
The endianess used in the calling convention does not always match the
endianess of the target on all architectures, namely AVR.

When an argument is too large to be legalised by the architecture and is
split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian
is queried to find the endianess that function arguments must be laid
out in.

This approach was recommended by Eli Friedman.

Originally reported in https://github.com/avr-rust/rust/issues/129.

Patch by Carl Peto.

Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma

Reviewed By: efriedma

Subscribers: JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62003

llvm-svn: 361222
2019-05-21 06:38:02 +00:00
Chen Zheng c4c407a0eb [PowerPC] use more meaningful name - NFC
llvm-svn: 361218
2019-05-21 03:54:42 +00:00
Lang Hames f088e195cc [ORC] Assert that JITDylibs have unique names.
Patch by Praveen Velliengiri. Thanks Praveen!

Differential Revision: https://reviews.llvm.org/D62139

llvm-svn: 361215
2019-05-21 03:23:08 +00:00
Nick Desaulniers 28e351af2a [ORC] fix use-after-move. NFC
Summary:
scan-build flagged a potential use-after-move in debug builds.  It's not
safe that a moved from value contains anything but garbage.  Manually
DRY up these repeated expressions.

Reviewers: lhames

Reviewed By: lhames

Subscribers: hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62112

llvm-svn: 361203
2019-05-20 22:17:43 +00:00
Matt Arsenault 6dd08e335f AMDGPU: Force skip branches over calls
Unfortunately the way SIInsertSkips works is backwards, and is
required for correctness. r338235 added handling of some special cases
where skipping is mandatory to avoid side effects if no lanes are
active. It conservatively handled asm correctly, but the same logic
needs to apply to calls.

Usually the call sequence code is larger than the skip threshold,
although the way the count is computed is really broken, so I'm not
sure if anything was likely to really hit this.

llvm-svn: 361202
2019-05-20 22:04:42 +00:00
Lang Hames 0dcf69eb82 [ORC] Remove some unreachable code.
Fixes http://llvm.org/PR41662.

llvm-svn: 361199
2019-05-20 21:30:33 +00:00
Cameron McInally 8bec58d5f7 [NFC][InstCombine] Add FIXME for one-use check on constant negation transforms.
llvm-svn: 361197
2019-05-20 21:00:42 +00:00
Lang Hames 93d2bdda6b [Support] Renamed member 'Size' to 'AllocatedSize' in MemoryBlock and OwningMemoryBlock.
Rename member 'Size' to 'AllocatedSize' in order to provide a hint that the
allocated size may be different than the requested size. Comments are added to
clarify this point.  Updated the InMemoryBuffer in FileOutputBuffer.cpp to track
the requested buffer size.

Patch by Machiel van Hooren. Thanks Machiel!

https://reviews.llvm.org/D61599

llvm-svn: 361195
2019-05-20 20:53:05 +00:00
Martin Storsjo 4ed18e5ef5 [AArch64] Handle lowering lround on windows, where long is 32 bit
Differential Revision: https://reviews.llvm.org/D62108

llvm-svn: 361192
2019-05-20 19:53:28 +00:00
Cameron McInally 2557ca296a [InstCombine] Add visitFNeg(...) visitor for unary Fneg
Also, break out a helper function, namely foldFNegIntoConstant(...), which performs transforms common between visitFNeg(...) and visitFSub(...).

Differential Revision: https://reviews.llvm.org/D61693

llvm-svn: 361188
2019-05-20 19:10:30 +00:00
Sanjay Patel 63fa690617 [InstSimplify] update stale comment; NFC
Missed this diff with rL361118.

llvm-svn: 361180
2019-05-20 17:52:18 +00:00
Craig Topper 97d4f7c194 [SelectionDAGBuilder] Flush PendingExports before creating INLINEASM_BR node for asm goto.
Since INLINEASM_BR is a terminator we need to flush the pending exports before
emitting it. If we don't do this, a TokenFactor can be inserted between it and
the BR instruction emitted to finish the callbr lowering.

It looks like nodes are glued to the INLINEASM_BR so I had to make sure we emit
the TokenFactor before that.

Differential Revision: https://reviews.llvm.org/D59981

llvm-svn: 361177
2019-05-20 17:08:02 +00:00
Nick Desaulniers bf940622c8 [DWARF] hoist nullptr checks. NFC
Summary:
This was flagged in https://www.viva64.com/en/b/0629/ under "Snippet No.
15" (see under #13). It looks like PVS studio flags nullptr checks where
the ptr is used inbetween creation and checking against nullptr.

Reviewers: JDevlieghere, probinson

Reviewed By: JDevlieghere

Subscribers: RKSimon, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62118

llvm-svn: 361176
2019-05-20 16:58:59 +00:00
Craig Topper cac6b76a76 [X86] Add icelake-client and tremont model numbers to getHostCPUName.
llvm-svn: 361174
2019-05-20 16:58:23 +00:00
Nick Desaulniers 639b29b1b5 [INLINER] allow inlining of blockaddresses if sole uses are callbrs
Summary:
It was supposed that Ref LazyCallGraph::Edge's were being inserted by
inlining, but that doesn't seem to be the case.  Instead, it seems that
there was no test for a blockaddress Constant in an instruction that
referenced the function that contained the instruction. Ex:

```
define void @f() {
  %1 = alloca i8*, align 8
2:
  store i8* blockaddress(@f, %2), i8** %1, align 8
  ret void
}
```

When iterating blockaddresses, do not add the function they refer to
back to the worklist if the blockaddress is referring to the contained
function (as opposed to an external function).

Because blockaddress has sligtly different semantics than GNU C's
address of labels, there are 3 cases that can occur with blockaddress,
where only 1 can happen in GNU C due to C's scoping rules:
* blockaddress is within the function it refers to (possible in GNU C).
* blockaddress is within a different function than the one it refers to
(not possible in GNU C).
* blockaddress is used in to declare a global (not possible in GNU C).

The second case is tested in:

```
$ ./llvm/build/unittests/Analysis/AnalysisTests \
  --gtest_filter=LazyCallGraphTest.HandleBlockAddress
```

This patch adjusts the iteration of blockaddresses in
LazyCallGraph::visitReferences to not revisit the blockaddresses
function in the first case.

The Linux kernel contains code that's not semantically valid at -O0;
specifically code passed to asm goto. It requires that asm goto be
inline-able. This patch conservatively does not attempt to handle the
more general case of inlining blockaddresses that have non-callbr users
(pr/39560).

https://bugs.llvm.org/show_bug.cgi?id=39560
https://bugs.llvm.org/show_bug.cgi?id=40722
https://github.com/ClangBuiltLinux/linux/issues/6
https://reviews.llvm.org/rL212077

Reviewers: jyknight, eli.friedman, chandlerc

Reviewed By: chandlerc

Subscribers: george.burgess.iv, nathanchance, mgorny, craig.topper, mengxu.gatech, void, mehdi_amini, E5ten, chandlerc, efriedma, eraman, hiraditya, haicheng, pirama, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58260

llvm-svn: 361173
2019-05-20 16:48:09 +00:00
Bjorn Pettersson eee0f2330d [AMDGPU] Fix std::array initializers to avoid warnings with older tool chains. NFC
A std::array is implemented as a template with an array
inside a struct. Older versions of clang, like 3.6,
require an extra set of curly braces around std::array
initializations to avoid warnings.

The C++ language was changed regarding this by CWG 1270.
So more modern tool chains does not complaing even if
leaving out one level of braces.

llvm-svn: 361171
2019-05-20 16:41:08 +00:00
Craig Topper af7a188453 [Intrinsics] Merge lround.i32 and lround.i64 into a single intrinsic with overloaded result type. Make result type for llvm.llround overloaded instead of fixing to i64
We shouldn't really make assumptions about possible sizes for long and long long. And longer term we should probably support vectorizing these intrinsics. By making the result types not fixed we can support vectors as well.

Differential Revision: https://reviews.llvm.org/D62026

llvm-svn: 361169
2019-05-20 16:27:09 +00:00
Craig Topper 203bfdd0f0 [DAGCombiner] Refactor code in visitShiftByConstant slightly to make it more readable. NFC
This changes the isShift variable to include the constant operand
check that was previously in the if statement.

While there fix an 80 column violation and an unnecessary use of
getNode. Also fix variable name capitalization.

llvm-svn: 361168
2019-05-20 16:26:55 +00:00
Matt Arsenault 5239298b0d R600: Fix unconditional return in loop
llvm-svn: 361167
2019-05-20 16:22:11 +00:00
Nikita Popov 9060b6df97 [SDAG] Vector op legalization for overflow ops
Fixes issue reported by aemerson on D57348. Vector op legalization
support is added for uaddo, usubo, saddo and ssubo (umulo and smulo
were already supported). As usual, by extracting TargetLowering methods
and calling them from vector op legalization.

Vector op legalization doesn't really deal with multiple result nodes,
so I'm explicitly performing a recursive legalization call on the
result value that is not being legalized.

There are some existing test changes because expansion happens
earlier, so we don't get a DAG combiner run in between anymore.

Differential Revision: https://reviews.llvm.org/D61692

llvm-svn: 361166
2019-05-20 16:09:22 +00:00
Matt Arsenault 7c8ec18964 RegAlloc: Fix verifier error with undef identity copies
The code did not match the example in the comment, and was checking
the undef flag on the copy dest instead of source. The existing tests
were only hitting the > 2 operands case.

llvm-svn: 361156
2019-05-20 14:09:36 +00:00
Cullen Rhodes 523789fa6b [AArch64][SVE2] Asm: add SADALP and UADALP instructions
Summary:
This patch adds support for the integer pairwise add and accumulate long
instructions SADALP/UADALP. These instructions are predicated.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62001

llvm-svn: 361154
2019-05-20 13:50:15 +00:00
Cameron McInally 2d2a46db8e [InstSimplify] Teach fsub -0.0, (fneg X) ==> X about unary fneg
Differential Revision: https://reviews.llvm.org/D62077

llvm-svn: 361151
2019-05-20 13:13:35 +00:00
Orlando Cazalet-Hyams ed67bf8d2f Resubmit "[DebugInfo] Update loop metadata for inlined loops"
This reverts commit 95805bc425.
I've squashed the test fix into this commit.

[DebugInfo] Update loop metadata for inlined loops

Currently, when a loop is cloned while inlining function (A) into function (B)
the loop metadata is copied and then not modified at all. The loop metadata can
encode the loop's start and end DILocations. Therefore, the new inlined loop in
function (B) may have loop metadata which shows start and end locations residing
in function (A).

This patch ensures loop metadata is updated while inlining so that the start and
end DILocations are given the "inlinedAt" operand. I've also added a regression
test for this.

This fix is required for D60831 because that patch uses loop metadata to
determine the DILocation for the branches of new loop preheaders.

Reviewers: aprantl, dblaikie, anemet

Reviewed By: aprantl

Subscribers: eraman, hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D61933

llvm-svn: 361149
2019-05-20 13:02:30 +00:00
Orlando Cazalet-Hyams 95805bc425 Revert "[DebugInfo] Update loop metadata for inlined loops"
This reverts commit 6e8f1a80cd.
Reverting patch while investigating build bot failure.

llvm-svn: 361143
2019-05-20 11:24:39 +00:00
Guillaume Chatelet e386a01e84 [NFC] Refactor visitIntrinsicCall so it doesn't return a const char*
Summary: API simplification

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61306

llvm-svn: 361140
2019-05-20 11:01:30 +00:00
Petar Jovanovic e85bbf564d [DebugInfoMetadata] Refactor DIExpression::prepend constants (NFC)
Refactor DIExpression::With* into a flag enum in order to be less
error-prone to use (as discussed on D60866).

Patch by Djordje Todorovic.

Differential Revision: https://reviews.llvm.org/D61943

llvm-svn: 361137
2019-05-20 10:35:57 +00:00
Cullen Rhodes 96c5929926 [AArch64][SVE2] Asm: add int halving add/sub (predicated) instructions
Summary:
This patch adds support for the predicated integer halving add/sub
instructions:

    * SHADD, UHADD, SRHADD, URHADD
    * SHSUB, UHSUB, SHSUBR, UHSUBR

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D62000

llvm-svn: 361136
2019-05-20 10:35:23 +00:00
Cullen Rhodes 0fc6347b35 [AArch64][SVE2] Asm: add saturating multiply-add interleaved long instructions
Summary:
Patch adds support for SQDMLALBT and SQDMLSLBT instructions.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D61998

llvm-svn: 361135
2019-05-20 10:29:48 +00:00
Fangrui Song 68774edcd6 Use llvm::sort. NFC
llvm-svn: 361134
2019-05-20 10:18:35 +00:00
Sander de Smalen f83cccf917 Match types of accumulator and result for llvm.experimental.vector.reduce.fadd/fmul
The scalar start/accumulator value of the fadd- and fmul reduction
should match the result type of the reduction, as well as the vector
element-type of the input vector. Although this was not explicitly
specified in the LangRef, it was taken for granted in code implementing
the reductions. The patch also fixes the LangRef by adding this
constraint.

Reviewed By: aemerson, nikic

Differential Revision: https://reviews.llvm.org/D60260

llvm-svn: 361133
2019-05-20 09:54:06 +00:00
Orlando Cazalet-Hyams 6e8f1a80cd [DebugInfo] Update loop metadata for inlined loops
Summary:
Currently, when a loop is cloned while inlining function (A) into function (B) the loop metadata is copied and then not modified at all. The loop metadata can encode the loop's start and end DILocations. Therefore, the new inlined loop in function (B) may have loop metadata which shows start and end locations residing in function (A).

This patch ensures loop metadata is updated while inlining so that the start and end DILocations are given the "inlinedAt" operand. I've also added a regression test for this.

This fix is required for D60831 because that patch uses loop metadata to determine the DILocation for the branches of new loop preheaders.

Reviewers: aprantl, dblaikie, anemet

Reviewed By: aprantl

Subscribers: eraman, hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D61933

llvm-svn: 361132
2019-05-20 09:40:44 +00:00
Guillaume Chatelet a760e69840 Revert "[NFC] Refactor visitIntrinsicCall so it doesn't return a const char*"
This reverts commit 706d3cd6388cc3446aab282f3af879862b10cbed.

llvm-svn: 361130
2019-05-20 09:00:12 +00:00
Guillaume Chatelet fa8c152576 [NFC] Refactor visitIntrinsicCall so it doesn't return a const char*
Summary: API simplification

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61306

llvm-svn: 361129
2019-05-20 08:52:10 +00:00
Carl Ritson 34e95ce259 [AMDGPU] gfx1010 Avoid SMEM WAR hazard for some s_waitcnt values
Summary:
Avoid introducing hazard mitigation when lgkmcnt is reduced to 0.
Clarify code comments to explain assumptions made for this hazard
mitigation.  Expand and correct test cases to cover variants of
s_waitcnt.

Reviewers: nhaehnle, rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62058

llvm-svn: 361124
2019-05-20 07:20:12 +00:00
Sanjay Patel 9ef99b4b11 [InstSimplify] fold fcmp (maxnum, X, C1), C2
This is the sibling transform for rL360899 (D61691):

  maxnum(X, GreaterC) == C --> false
  maxnum(X, GreaterC) <= C --> false
  maxnum(X, GreaterC) <  C --> false
  maxnum(X, GreaterC) >= C --> true
  maxnum(X, GreaterC) >  C --> true
  maxnum(X, GreaterC) != C --> true

llvm-svn: 361118
2019-05-19 14:26:39 +00:00
Dinar Temirbulatov 2ff72f6654 [SLP] Refactoring of EdgeInfo and UserTreeIdx in buildTree_rec().
This is a follow-up refactoring patch after the introduction of usable TreeEntry pointers in D61706.
The EdgeInfo struct can now use a TreeEntry pointer instead of an index in VectorizableTree.

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D61795

llvm-svn: 361110
2019-05-19 01:30:41 +00:00
Craig Topper 3164b50af7 [X86] Remove combineShift function. Just dispatch directly to the handler for each flavor from the main switch. NFC
llvm-svn: 361108
2019-05-19 01:01:46 +00:00
Matt Arsenault b04f3258dd GVN: Handle addrspacecast
llvm-svn: 361103
2019-05-18 14:36:06 +00:00
Simon Pilgrim 2b45a70fd6 MemCmpExpansion::getCompareLoadPairs - assert we find a comparison diff. NFCI.
Fix scan-build uninitialized warning and assert the final diff isn't null.

llvm-svn: 361095
2019-05-18 11:31:48 +00:00
Matt Arsenault 2f29220d6d AMDGPU/GlobalISel: Implement s64->s64 [SU]ITOFP
llvm-svn: 361082
2019-05-17 23:05:18 +00:00
Matt Arsenault 02b5ca8cd1 GlobalISel: Implement lower for S64->S32 [SU]ITOFP
This is ported from the custom AMDGPU DAG implementation. I think this
is a better default expansion than what the DAG currently uses, at
least if the target has CTLZ.

This implements the signed version in terms of the unsigned
conversion, which is implemented with bit operations. SelectionDAG has
several other implementations that should eventually be ported
depending on what instructions are legal.

llvm-svn: 361081
2019-05-17 23:05:13 +00:00
Sam Clegg 13717bd54b [WebAssembly] Remove expected failure of builtin-location.C test
This seems to have been fixed by https://reviews.llvm.org/D61956

Yay

Differential Revision: https://reviews.llvm.org/D62075

llvm-svn: 361071
2019-05-17 19:55:17 +00:00
Matt Arsenault f3cedf4823 GlobalISel: Define integer min/max instructions
Doesn't attempt to emit them for anything yet, but some legalizations
I want to port use them.

llvm-svn: 361061
2019-05-17 18:36:31 +00:00
Sanjay Patel 926e47751b [InstCombine] move bitcast after insertelement-with-bitcasted-operands
llvm-svn: 361058
2019-05-17 18:06:12 +00:00
Simon Pilgrim 065431c82b [X86][SSE] Fold movmsk(not(x)) -> not(movmsk)
Helps to improve folding of comparisons with movmsk results.

llvm-svn: 361056
2019-05-17 17:56:25 +00:00
Simon Pilgrim 2c2f8e74b9 [X86][SSE] Match all-of bool scalar reductions into a bitcast/movmsk + cmp.
Same as what we do for vector reductions in combineHorizontalPredicateResult, use movmsk+cmp for scalar (and(extract(x,0),extract(x,1)) reduction patterns. 

llvm-svn: 361052
2019-05-17 17:25:55 +00:00
Cameron McInally 067e946859 [InstSimplify] Add unary fneg to `fsub 0.0, (fneg X) ==> X` transform
Differential Revision: https://reviews.llvm.org/D62013

llvm-svn: 361047
2019-05-17 16:47:00 +00:00
Dmitry Preobrazhensky 198611b0ff [AMDGPU][MC] Corrected parsing of NAME:VALUE modifiers
See bug 41298: https://bugs.llvm.org/show_bug.cgi?id=41298

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D61009

llvm-svn: 361045
2019-05-17 16:04:17 +00:00
Roman Lebedev 64c756b991 [DAGCombiner] visitShiftByConstant(): drop bogus signbit check
Summary:
That check claims that the transform is illegal otherwise.
That isn't true:
1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter
   https://rise4fun.com/Alive/K4A
2. For `ISD::AND`, there is no restriction on constants:
   https://rise4fun.com/Alive/Wy3
3. For `ISD::OR`, there is no restriction on constants:
   https://rise4fun.com/Alive/GOH
3. For `ISD::XOR`, there is no restriction on constants:
   https://rise4fun.com/Alive/ml6

So, why is it there then?

This changes the testcase that was touched by @spatel in rL347478,
but i'm not sure that test tests anything particular?

Reviewers: RKSimon, spatel, craig.topper, jojo, rengolin

Reviewed By: spatel

Subscribers: javed.absar, llvm-commits, spatel

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61918

llvm-svn: 361044
2019-05-17 15:52:58 +00:00
Roman Lebedev 3275060fe8 [InstCombine] canShiftBinOpWithConstantRHS(): drop bogus signbit check
Summary:
In D61918 i was looking at dropping it in DAGCombiner `visitShiftByConstant()`,
but as @craig.topper pointed out, it was copied from here.

That check claims that the transform is illegal otherwise.
That isn't true:
1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter
   https://rise4fun.com/Alive/K4A
2. For `ISD::AND`, there is no restriction on constants:
   https://rise4fun.com/Alive/Wy3
3. For `ISD::OR`, there is no restriction on constants:
   https://rise4fun.com/Alive/GOH
3. For `ISD::XOR`, there is no restriction on constants:
   https://rise4fun.com/Alive/ml6

So, why is it there then?
As far as i can tell, it dates all the way back to original check-in rL7793.
I think we should just drop it.

Reviewers: spatel, craig.topper, efriedma, majnemer

Reviewed By: spatel

Subscribers: llvm-commits, craig.topper

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61938

llvm-svn: 361043
2019-05-17 15:52:49 +00:00
Dmitry Preobrazhensky 5ae3113969 [AMDGPU][MC] Enabled labels with s_call_b64 and s_cbranch_i_fork
See https://bugs.llvm.org/show_bug.cgi?id=41888

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D62016

llvm-svn: 361040
2019-05-17 14:57:04 +00:00
Simon Pilgrim 279314e81b [X86][AVX] Remove LowerCTTZ's AVX1 custom vector handling.
We can now rely on generic expansion to handle this.

llvm-svn: 361038
2019-05-17 14:37:19 +00:00
Simon Pilgrim 62c7032c18 [X86][AVX] isNOT - add extract_subvector(xor X, -1) -> extract_subvector(X) fold.
Prep work for the removal of the remaining x86 CTTZ vector lowering.

llvm-svn: 361035
2019-05-17 14:04:56 +00:00
Dmitry Preobrazhensky 43fcc79837 [AMDGPU][MC] Enabled expressions for most operands which accept integer values
See bug 40873: https://bugs.llvm.org/show_bug.cgi?id=40873

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D60768

llvm-svn: 361031
2019-05-17 13:17:48 +00:00
Matt Arsenault 1a02d30c87 AMDGPU: Fix unused variable warnings in release builds
llvm-svn: 361030
2019-05-17 12:59:27 +00:00
Matt Arsenault a510b570c2 AMDGPU/GlobalISel: Legalize G_FCEIL
llvm-svn: 361028
2019-05-17 12:20:05 +00:00
Matt Arsenault 6aebcd5499 AMDGPU/GlobalISel: Legalize G_INTRINSIC_TRUNC
llvm-svn: 361027
2019-05-17 12:20:01 +00:00
Matt Arsenault 6aafc5e19d AMDGPU/GlobalISel: Legalize G_FRINT
llvm-svn: 361026
2019-05-17 12:19:57 +00:00
Matt Arsenault 1448f5689e AMDGPU/GlobalISel: Legalize G_FCOPYSIGN
llvm-svn: 361025
2019-05-17 12:19:52 +00:00
Clement Courbet 90900fbc9f [MergeICmps][NFC] Add more debug.
llvm-svn: 361024
2019-05-17 12:07:51 +00:00
Matt Arsenault 568f193847 AMDGPU/GlobalISel: RegBankSelect for llvm.amdgcn.s.buffer.load
llvm-svn: 361023
2019-05-17 12:02:34 +00:00
Matt Arsenault a3b5a386fa AMDGPU/GlobalISel: Use subreg index instead of extra unmerge
This saves instructions and extra steps, but I'm not sure about
introducing subregister indexes at this point.

llvm-svn: 361022
2019-05-17 12:02:31 +00:00
Matt Arsenault b3dc73634c AMDGPU/GlobalISel: Use waterfall loop for buffer_load
This adds support for more complex waterfall loops that need to handle
operands > 32-bits, and multiple operands.

llvm-svn: 361021
2019-05-17 12:02:27 +00:00
Simon Pilgrim a6d3bd486b [X86] Pull out IsNOT helper. NFCI.
Return the input value for the NOT pattern: (xor X, -1) -> X

llvm-svn: 361012
2019-05-17 10:37:08 +00:00
Clement Courbet 632dfdda16 Re-land r360859: "[MergeICmps] Simplify the code."
With a fix for PR41917: The predecessor list was changing under our feet.

-  for (BasicBlock *Pred : predecessors(EntryBlock_)) {
+  while (!pred_empty(EntryBlock_)) {
+    BasicBlock* const Pred = *pred_begin(EntryBlock_);

llvm-svn: 361009
2019-05-17 09:43:45 +00:00
Rhys Perry c4bc61bad7 [AMDGPU] detect WaW hazards when moving/merging load/store instructions
Summary:
    In order to combine memory operations efficiently, the load/store
    optimizer might move some instructions around. It's usually safe
    to move instructions down past the merged instruction because the
    pass checks if memory operations can be re-ordered.

    Though, the current logic doesn't handle Write-after-Write hazards.

    This fixes a reflection issue with Monster Hunter World and DXVK.

    v2: - rebased on top of master
        - clean up the test case
        - handle WaW hazards correctly

    Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130

Original patch by Samuel Pitoiset.

Reviewers: tpr, arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: ronlieb, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D61313

llvm-svn: 361008
2019-05-17 09:32:23 +00:00
Cullen Rhodes 7f605c3550 [AArch64][SVE2] Asm: add saturating multiply-add long instructions
Summary:
Patch adds support for indexed and unpredicated vectors forms of the
following instructions:

    * SQDMLALB, SQDMLALT, SQDMLSLB, SQDMLSLT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D61997

llvm-svn: 361005
2019-05-17 09:29:43 +00:00
Cullen Rhodes 334130a199 [AArch64][SVE2] Asm: add integer multiply-add long instructions
Summary:
Patch adds support for indexed and unpredicated vectors forms of the
following instructions:

    * SMLALB, SMLALT, UMLALB, UMLALT, SMLSLB, SMLSLT, UMLSLB, UMLSLT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D61951

llvm-svn: 361003
2019-05-17 09:19:41 +00:00
Cullen Rhodes 0d47f00821 [AArch64][SVE2] Asm: add integer multiply long instructions
Summary:
Patch adds support for indexed and unpredicated vectors forms of the
following instructions:

    * SMULLB, SMULLT, UMULLB, UMULLT, SQDMULLB, SQDMULLT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D61936

llvm-svn: 361002
2019-05-17 09:04:44 +00:00
Craig Topper ae1597d360 [X86] Add FeatureFastScalarShiftMasks and FeatureFastVectorShiftMasks to the ignore list for inlining compatibility.
These are tuning flags and won't cause any codegen issue if we inline a function
with a different value.

llvm-svn: 360992
2019-05-17 06:40:21 +00:00
Fangrui Song ad7199f3e6 [PowerPC] Support .reloc *, R_PPC{,64}_NONE, *
This can be used to create references among sections. When --gc-sections
is used, the referenced section will be retained if the origin section
is retained.

llvm-svn: 360990
2019-05-17 06:04:11 +00:00
Fangrui Song ec6dc3089e [GlobalISel] Fix -Wsign-compare on 32-bit -DLLVM_ENABLE_ASSERTIONS=on builds
llvm-svn: 360989
2019-05-17 05:53:39 +00:00
Fangrui Song e18a6ad0b8 [MC][PowerPC] Clean up PPCAsmBackend
Replace the member variable Target with Triple
Use Triple instead of TheTarget.getName() to dispatch on 32-bit/64-bit.
Delete redundant parameters

llvm-svn: 360986
2019-05-17 05:44:26 +00:00
Ben Dunbobbin 1d16515fb4 [ELF] Implement Dependent Libraries Feature
This patch implements a limited form of autolinking primarily designed to allow
either the --dependent-library compiler option, or "comment lib" pragmas (
https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017) in
C/C++ e.g. #pragma comment(lib, "foo"), to cause an ELF linker to automatically
add the specified library to the link when processing the input file generated
by the compiler.

Currently this extension is unique to LLVM and LLD. However, care has been taken
to design this feature so that it could be supported by other ELF linkers.

The design goals were to provide:

- A simple linking model for developers to reason about.
- The ability to to override autolinking from the linker command line.
- Source code compatibility, where possible, with "comment lib" pragmas in other
  environments (MSVC in particular).

Dependent library support is implemented differently for ELF platforms than on
the other platforms. Primarily this difference is that on ELF we pass the
dependent library specifiers directly to the linker without manipulating them.
This is in contrast to other platforms where they are mapped to a specific
linker option by the compiler. This difference is a result of the greater
variety of ELF linkers and the fact that ELF linkers tend to handle libraries in
a more complicated fashion than on other platforms. This forces us to defer
handling the specifiers to the linker.

In order to achieve a level of source code compatibility with other platforms
we have restricted this feature to work with libraries that meet the following
"reasonable" requirements:

1. There are no competing defined symbols in a given set of libraries, or
   if they exist, the program owner doesn't care which is linked to their
   program.
2. There may be circular dependencies between libraries.

The binary representation is a mergeable string section (SHF_MERGE,
SHF_STRINGS), called .deplibs, with custom type SHT_LLVM_DEPENDENT_LIBRARIES
(0x6fff4c04). The compiler forms this section by concatenating the arguments of
the "comment lib" pragmas and --dependent-library options in the order they are
encountered. Partial (-r, -Ur) links are handled by concatenating .deplibs
sections with the normal mergeable string section rules. As an example, #pragma
comment(lib, "foo") would result in:

.section ".deplibs","MS",@llvm_dependent_libraries,1
         .asciz "foo"

For LTO, equivalent information to the contents of a the .deplibs section can be
retrieved by the LLD for bitcode input files.

LLD processes the dependent library specifiers in the following way:

1. Dependent libraries which are found from the specifiers in .deplibs sections
   of relocatable object files are added when the linker decides to include that
   file (which could itself be in a library) in the link. Dependent libraries
   behave as if they were appended to the command line after all other options. As
   a consequence the set of dependent libraries are searched last to resolve
   symbols.
2. It is an error if a file cannot be found for a given specifier.
3. Any command line options in effect at the end of the command line parsing apply
   to the dependent libraries, e.g. --whole-archive.
4. The linker tries to add a library or relocatable object file from each of the
   strings in a .deplibs section by; first, handling the string as if it was
   specified on the command line; second, by looking for the string in each of the
   library search paths in turn; third, by looking for a lib<string>.a or
   lib<string>.so (depending on the current mode of the linker) in each of the
   library search paths.
5. A new command line option --no-dependent-libraries tells LLD to ignore the
   dependent libraries.

Rationale for the above points:

1. Adding the dependent libraries last makes the process simple to understand
   from a developers perspective. All linkers are able to implement this scheme.
2. Error-ing for libraries that are not found seems like better behavior than
   failing the link during symbol resolution.
3. It seems useful for the user to be able to apply command line options which
   will affect all of the dependent libraries. There is a potential problem of
   surprise for developers, who might not realize that these options would apply
   to these "invisible" input files; however, despite the potential for surprise,
   this is easy for developers to reason about and gives developers the control
   that they may require.
4. This algorithm takes into account all of the different ways that ELF linkers
   find input files. The different search methods are tried by the linker in most
   obvious to least obvious order.
5. I considered adding finer grained control over which dependent libraries were
   ignored (e.g. MSVC has /nodefaultlib:<library>); however, I concluded that this
   is not necessary: if finer control is required developers can fall back to using
   the command line directly.

RFC thread: http://lists.llvm.org/pipermail/llvm-dev/2019-March/131004.html.

Differential Revision: https://reviews.llvm.org/D60274

llvm-svn: 360984
2019-05-17 03:44:15 +00:00
Fangrui Song 2463239777 [X86] Support .reloc *, R_{386,X86_64}_NONE, *
This can be used to create references among sections. When --gc-sections
is used, the referenced section will be retained if the origin section
is retained.

See R_MIPS_NONE (D13659), R_ARM_NONE (D61992), R_AARCH64_NONE (D61973) for similar changes.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D62014

llvm-svn: 360983
2019-05-17 03:25:39 +00:00
Fangrui Song aa6102ad8e [AArch64] Support .reloc *, R_AARCH64_NONE, *
Summary:
This can be used to create references among sections. When --gc-sections
is used, the referenced section will be retained if the origin section
is retained.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D61973

llvm-svn: 360981
2019-05-17 03:05:07 +00:00
Fangrui Song 43ca0e9eb8 [ARM] Support .reloc *, R_ARM_NONE, *
R_ARM_NONE can be used to create references among sections. When
--gc-sections is used, the referenced section will be retained if the
origin section is retained.

Add a generic MCFixupKind FK_NONE as this kind of no-op relocation is
ubiquitous on ELF and COFF, and probably available on many other binary
formats. See D62014.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D61992

llvm-svn: 360980
2019-05-17 02:51:54 +00:00
Philip Reames a74d654374 [LFTR] Strengthen assertions in genLoopLimit [NFCI]
llvm-svn: 360978
2019-05-17 02:18:03 +00:00
Philip Reames 45e7690796 [IndVars] Don't reimplement Loop::isLoopInvariant [NFC]
Using dominance vs a set membership check is indistinguishable from a compile time perspective, and the two queries return equivelent results.  Simplify code by using the existing function.

llvm-svn: 360976
2019-05-17 02:09:03 +00:00
Philip Reames 8e169cd266 [LFTR] Factor out a helper function for readability purpose [NFC]
llvm-svn: 360972
2019-05-17 01:39:58 +00:00
Philip Reames 9283f1847c Clarify comments on helpers used by LFTR [NFC]
I'm slowly wrapping my head around this code, and am making comment improvements where I can.

llvm-svn: 360968
2019-05-17 01:12:02 +00:00
Jonas Paulsson 9427961c89 [SystemZ] Bugfix in SystemZTargetLowering::combineIntDIVREM()
Make sure to not unroll a vector division/remainder (with a constant splat
divisor) after type legalization, since the scalar type may then be illegal.

Review: Ulrich Weigand
https://reviews.llvm.org/D62036

llvm-svn: 360965
2019-05-17 00:50:35 +00:00
Nico Weber d764e7c660 Revert r360859: "Reland r360771 "[MergeICmps] Simplify the code.""
It caused PR41917.

llvm-svn: 360963
2019-05-17 00:43:53 +00:00
David L. Jones 4a5e01faa4 [X86][AsmParser] Add mnemonics missed in r360954.
These are valid Jcc, but aren't based on the EFLAGS condition codes (Intel 64
and IA-32 Architetcures Software Developer's Manual Vol. 1, Appendix B). These
are covered in clang/test, but not llvm/test.

llvm-svn: 360960
2019-05-17 00:19:20 +00:00
Evgeniy Stepanov 7f281b2c06 HWASan exception support.
Summary:
Adds a call to __hwasan_handle_vfork(SP) at each landingpad entry.

Reusing __hwasan_handle_vfork instead of introducing a new runtime call
in order to be ABI-compatible with old runtime library.

Reviewers: pcc

Subscribers: kubamracek, hiraditya, #sanitizers, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D61968

llvm-svn: 360959
2019-05-16 23:54:41 +00:00
David L. Jones add7ed2281 [X86][AsmParser] Ignore "short" even harder in Intel syntax ASM.
In Intel syntax, it's not uncommon to see a "short" modifier on Jcc conditional
jumps, which indicates the offset should be a "short jump" (8-bit immediate
offset from EIP, -128 to +127). This patch expands to all recognized Jcc
condition codes, and removes the inline restriction.

Clang already ignores "jmp short" in inline assembly. However, only "jmp" and a
couple of Jcc are actually checked, and only inline (i.e., not when using the
integrated assembler for asm sources). A quick search through asm-containing
libraries at hand shows a pretty broad range of Jcc conditions spelled with
"short."

GAS ignores the "short" modifier, and instead uses an encoding based on the
given immediate. MS inline seems to do the same, and I suspect MASM does, too.
NASM will yield an error if presented with an out-of-range immediate value.

Example of GCC 9.1 and MSVC v19.20, "jmp short" with offsets that do and do not
fit within 8 bits: https://gcc.godbolt.org/z/aFZmjY

Differential Revision: https://reviews.llvm.org/D61990

llvm-svn: 360954
2019-05-16 23:27:07 +00:00
David L. Jones 11305984d0 [X86][AsmParser] Rename "ConditionCode" variable to "ConditionPredicate".
This better matches the verbiage in Intel documentation, and should help avoid
confusion between these two different kinds of values, both of which are parsed
from mnemonics.

llvm-svn: 360953
2019-05-16 23:27:05 +00:00
Reid Kleckner 08c15df29f [X86] Deduplicate symbol lowering logic, NFC
Summary:
This refactors four pieces of code that create SDNodes for references to
symbols:
- normal global address lowering (LEA, MOV, etc)
- callee global address lowering (CALL)
- external symbol address lowering (LEA, MOV, etc)
- external symbol address lowering (CALL)

Each of these pieces of code need to:
- classify the reference
- lower the symbol
- emit a RIP wrapper if needed
- emit a load if needed
- add offsets if needed

I think handling them all in one place will make the code easier to
maintain in the future.

Reviewers: craig.topper, RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61690

llvm-svn: 360952
2019-05-16 23:15:26 +00:00
Amy Huang c2029068bc Emit global variables as S_CONSTANT records for codeview debug info.
Summary:
This emits S_CONSTANT records for global variables.
Currently this emits records for the global variables already being tracked in the
LLVM IR metadata, which are just constant global variables; we'll also want S_CONSTANTs
for static data members and enums.

Related to https://bugs.llvm.org/show_bug.cgi?id=41615

Reviewers: rnk

Subscribers: aprantl, hiraditya, llvm-commits, thakis

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61926

llvm-svn: 360948
2019-05-16 22:28:52 +00:00
Tim Renouf e3cbdaf1b5 [CodeGen] Fixed de-optimization of legalize subvector extract
The recent introduction of v3i32 etc as an MVT, and its use in AMDGPU
3-dword memory instructions, caused a de-optimization problem for code
with such a load that then bitcasts via vector of i8, because v12i8 is
not an MVT so it legalizes the bitcast by widening it.

This commit adds the ability to widen a bitcast using extract_subvector
on the result, so the value does not need to go via memory.

Differential Revision: https://reviews.llvm.org/D60457

Change-Id: Ie4abb7760547e54a2445961992eafc78e80d4b64
llvm-svn: 360942
2019-05-16 21:49:06 +00:00
Lang Hames c97b50e224 [ORC] Change handling for SymbolStringPtr tombstones and empty keys.
SymbolStringPtr used to use nullptr as its empty value and (since it performed
ref-count operations on any non-nullptr) a pointer to a special pool-entry
instance as its tombstone.

This commit changes the scheme to use two invalid pointer values as the empty
and tombstone values, and broadens the ref-count guard to prevent ref-counting
operations from being performed on these pointers. This should improve the
performance of SymbolStringPtrs used in DenseMaps/DenseSets, as ref counting
operations will no longer be performed on the tombstone.

llvm-svn: 360925
2019-05-16 18:29:34 +00:00
Joerg Sonnenberger ec6ee797ec Fix typos in comment.
llvm-svn: 360921
2019-05-16 18:01:57 +00:00
Craig Topper f09b9d419f [X86] Use 0x9 instead of 0x1 as the immediate in some masked floor pattern. Similarly change 0x2 to 0xA for ceil.
This suppresses exceptions which is what we should be doing for ceil and floor. We already use the correct immediate
in patterns without masking.

llvm-svn: 360915
2019-05-16 16:53:50 +00:00
Don Hinton 8249a8889d [CommandLine] Don't allow duplicate categories.
Summary:
This is a fix to D61574, r360179, that allowed duplicate
OptionCategory's.  This change adds a check to make sure a category can
only be added once even if the user passes it twice.

Reviewed By: MaskRay

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61972

llvm-svn: 360913
2019-05-16 16:25:13 +00:00
Pavel Labath 2d29e16c30 Minidump: Add support for the MemoryList stream
Summary:
the stream format is exactly the same as for ThreadList and ModuleList
streams, only the entry types are slightly different, so the changes in
this patch are just straight-forward applications of established
patterns.

Reviewers: amccarth, jhenderson, clayborg

Subscribers: markmentovai, lldb-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61885

llvm-svn: 360908
2019-05-16 15:17:30 +00:00
Matt Arsenault 99e6f4d11a AMDGPU: Introduce TokenFactor for ABI register copies in call sequence
The call was missing chain dependencies on the pre-call copies. I
don't think this was causing any real issues however.

llvm-svn: 360906
2019-05-16 15:10:27 +00:00
Matt Arsenault df24c92c0f AMDGPU: Assume xnack is enabled by default
This is the conservatively correct default. It is always safe to
assume xnack is enabled, but not the converse.

Introduce a feature to blacklist targets where xnack can never be
meaningfully enabled. I'm not sure the targets this is applied to is
100% correct.

llvm-svn: 360903
2019-05-16 14:48:34 +00:00
Stephen Tozer 6f59b4b6d9 Resubmit: [Salvage] Change salvage debug info implementation to use DW_OP_LLVM_convert where needed
Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=40645

Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop casts
impossible. With the recent addition of DW_OP_LLVM_convert this salvaging is
now possible, and so can be used to fix the attached bug as well as any cases
where SExt instruction results are lost in the debugging metadata. This patch
introduces this fix by expanding the salvage debug info method to cover these
cases using the new operator.

Differential revision: https://reviews.llvm.org/D61184

llvm-svn: 360902
2019-05-16 14:41:01 +00:00
Sanjay Patel 152f81fae8 [InstSimplify] fold fcmp (minnum, X, C1), C2
minnum(X, LesserC) == C --> false
   minnum(X, LesserC) >= C --> false
   minnum(X, LesserC) >  C --> false
   minnum(X, LesserC) != C --> true
   minnum(X, LesserC) <= C --> true
   minnum(X, LesserC) <  C --> true

maxnum siblings will follow if there are no problems here.

We should be able to perform some other combines when the constants
are equal or greater-than too, but that would go in instcombine.

We might also generalize this by creating an FP ConstantRange
(similar to what we do for integers).

Differential Revision: https://reviews.llvm.org/D61691

llvm-svn: 360899
2019-05-16 14:03:10 +00:00
Xing Xue 2dee094a08 Fixes for builds that require strict X/Open and POSIX compatiblity
Summary:
- Use alternative to MAP_ANONYMOUS for allocating mapped memory if it isn't available
- Use strtok_r instead of strsep as part of getting program path
- Don't try to find the width of a terminal using "struct winsize" and TIOCGWINSZ on POSIX builds. These aren't defined under POSIX (even though some platforms make them available when they shouldn't), so just check if we are doing a X/Open or POSIX compliant build first.

Author: daltenty

Reviewers: hubert.reinterpretcast, xingxue, andusy

Reviewed By: hubert.reinterpretcast

Subscribers: MaskRay, jsji, hiraditya, kristina, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61326

llvm-svn: 360898
2019-05-16 14:02:13 +00:00
Adhemerval Zanella 2d28db6b9f [AArch64] Handle ISD::LROUND and ISD::LLROUND
This patch optimizes ISD::LROUND and ISD::LLROUND to fcvtas
instruction. It currently only handles the scalar version.

llvm-svn: 360894
2019-05-16 13:30:18 +00:00
Fangrui Song e183340c29 Recommit [Object] Change object::SectionRef::getContents() to return Expected<StringRef>
r360876 didn't fix 2 call sites in clang.

Expected<ArrayRef<uint8_t>> may be better but use Expected<StringRef> for now.

Follow-up of D61781.

llvm-svn: 360892
2019-05-16 13:24:04 +00:00
Adhemerval Zanella 73643b5041 [CodeGen] Add lround/llround builtins
This patch add the ISD::LROUND and ISD::LLROUND along with new
intrinsics.  The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.

The idea is to optimize lround/llround generation for AArch64
in a subsequent patch.  Current semantic is just route it to libm
symbol.

llvm-svn: 360889
2019-05-16 13:15:27 +00:00
Matt Arsenault 828b685ebe RegAllocFast: Improve hinting heuristic
Trace through multiple COPYs when looking for a physreg source. Add
hinting for vregs that will be copied into physregs (we only hinted
for vregs getting copied to a physreg previously).  Give hinted a
register a bonus when deciding which value to spill.  This is part of
my rewrite regallocfast series. In fact this one doesn't even have an
effect unless you also flip the allocation to happen from back to
front of a basic block. Nonetheless it helps to split this up to ease
review of D52010

Patch by Matthias Braun

llvm-svn: 360887
2019-05-16 12:50:39 +00:00
Matt Arsenault 27ac8408f6 GlobalISel: Add DstOp version of buildIntrinsic
llvm-svn: 360879
2019-05-16 12:22:56 +00:00
Hans Wennborg 4da9ff9fcf Revert r360876 "[Object] Change object::SectionRef::getContents() to return Expected<StringRef>"
It broke the Clang build, see llvm-commits thread.

> Expected<ArrayRef<uint8_t>> may be better but use Expected<StringRef> for now.
>
> Follow-up of D61781.

llvm-svn: 360878
2019-05-16 12:08:34 +00:00
Matt Arsenault a8f88c388f AMDGPU/GlobalISel: Correct regbank for 1-bit and/or/xor
Bool values should use the scc/vcc regbank since r350611.

llvm-svn: 360877
2019-05-16 12:06:41 +00:00
Fangrui Song a076ec54be [Object] Change object::SectionRef::getContents() to return Expected<StringRef>
Expected<ArrayRef<uint8_t>> may be better but use Expected<StringRef> for now.

Follow-up of D61781.

llvm-svn: 360876
2019-05-16 11:33:48 +00:00
Cullen Rhodes 472c6ef8b0 [AArch64][SVE2] Asm: implement CMLA/SQRDCMLAH instructions
Summary:
This patch adds support for the indexed and unpredicated vectors forms
of the CMLA and SQRDCMLAH instructions.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D61906

llvm-svn: 360871
2019-05-16 09:42:22 +00:00
Cullen Rhodes 07eba98dd7 [AArch64][SVE2] Asm: implement CDOT instruction
Summary:
The complex DOT instructions perform a dot-product on quadtuplets from
two source vectors and the resuling wide real or wide imaginary is
accumulated into the destination register. The instructions come in two
forms:

Vector form, e.g.
  cdot z0.s, z1.b, z2.b, #90    - complex dot product on four 8-bit quad-tuplets,
                                  accumulating results in 32-bit elements. The
                                  complex numbers in the second source vector are
                                  rotated by 90 degrees.

  cdot z0.d, z1.h, z2.h, #180   - complex dot product on four 16-bit quad-tuplets,
                                  accumulating results in 64-bit elements.
                                  The complex numbers in the second source
                                  vector are rotated by 180 degrees.

Indexed form, e.g.
  cdot z0.s, z1.b, z2.b[3], #0  - complex dot product on four 8-bit quad-tuplets,
                                  with specified quadtuplet from second source vector,
                                  accumulating results in 32-bit elements.
  cdot z0.d, z1.h, z2.h[1], #0  - complex dot product on four 16-bit quad-tuplets,
                                  with specified quadtuplet from second source vector,
                                  accumulating results in 64-bit elements.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer, rovka

Differential Revision: https://reviews.llvm.org/D61903

llvm-svn: 360870
2019-05-16 09:33:44 +00:00
Cullen Rhodes 064f6ab556 [AArch64][SVE2] Asm: add unpredicated integer multiply instructions
Summary:
Add support for the following instructions:

  * MUL (indexed and unpredicated vectors forms)
  * SQDMULH (indexed and unpredicated vectors forms)
  * SQRDMULH (indexed and unpredicated vectors forms)
  * SMULH (unpredicated, predicated form added in SVE)
  * UMULH (unpredicated, predicated form added in SVE)
  * PMUL (unpredicated)

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer, rovka

Differential Revision: https://reviews.llvm.org/D61902

llvm-svn: 360867
2019-05-16 09:07:26 +00:00
Fangrui Song 3e92df3e39 Add Triple::isPPC64()
llvm-svn: 360864
2019-05-16 08:31:22 +00:00
Clement Courbet c4fdd717ef Reland r360771 "[MergeICmps] Simplify the code."
This revision does not seem to be the culprit.

llvm-svn: 360859
2019-05-16 06:18:02 +00:00
Igor Kudrin 1ff8b7bdf1 [IRMover] Improve diagnostic messages for conflicting metadata
This does the similar for error messages as rL344011 has done for warnings.

With llvm::lto::LTO, the error might appear when LTO::run() is executed.
In that case, the calling code cannot know which module causes the error
and, subsequently, cannot hint the user.

Differential Revision: https://reviews.llvm.org/D61880

llvm-svn: 360857
2019-05-16 05:23:13 +00:00
Matt Arsenault 11be78bc7a GlobalISel: Add buildFConstant for APFloat
llvm-svn: 360853
2019-05-16 04:09:06 +00:00
Matt Arsenault 012ecbbbba GlobalISel: Fix indentation
llvm-svn: 360851
2019-05-16 04:08:46 +00:00
Matt Arsenault 55146d3139 GlobalISel: Add G_FCOPYSIGN
llvm-svn: 360850
2019-05-16 04:08:39 +00:00
Lang Hames c2fb896522 [JITLink][MachO] Use getSymbol64TableEntry for 64-bit MachO files.
Fixes a think-o. No test case: The nlist and nlist64 data structures happen to
line up for this field, so there's no way to construct a failing test case.

llvm-svn: 360830
2019-05-16 00:21:07 +00:00
Craig Topper e43bdf144c [X86] Delay creating index register negations during address matching until after we know for sure the match will succeed
If we're trying to match an LEA, its possible the LEA match will be deemed unprofitable. In which case the negation we created in matchAddress would be left dangling in the SelectionDAG. This could artificially increase use counts for other nodes in the DAG. Though I don't have an example of that. But it just seems like bad form to have dangling nodes in isel.

Differential Revision: https://reviews.llvm.org/D61047

llvm-svn: 360823
2019-05-15 21:59:53 +00:00
Reid Kleckner 4882490349 [codeview] Fix SDNode representation of annotation labels
Before this change, they were erroneously constructed with the EH_LABEL
SDNode opcode, which caused other passes to interact with them in
incorrect ways. See the FIXME about fastisel that this addresses in the
existing test case.

Fixes PR41890

llvm-svn: 360818
2019-05-15 21:46:05 +00:00
Simon Atanasyan 0b0cc23fb6 [mips] Use range-based `for` loops. NFC
llvm-svn: 360817
2019-05-15 21:26:25 +00:00
Mandeep Singh Grang 814435fe87 [AArch64] only indicate CFI on Windows if we emitted CFI
Summary:
Otherwise, we emit directives for CFI without any actual CFI opcodes to
go with them, which causes tools to malfunction.  The technique is
similar to what the x86 backend already does.

Fixes https://bugs.llvm.org/show_bug.cgi?id=40876

Patch by: froydnj (Nathan Froyd)

Reviewers: mstorsjo, eli.friedman, rnk, mgrang, ssijaric

Reviewed By: rnk

Subscribers: javed.absar, kristof.beyls, llvm-commits, dmajor

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61960

llvm-svn: 360816
2019-05-15 21:23:41 +00:00
Craig Topper 439228727a [X86] Strengthen type constraints on some specialized X86 ISD opcodes that don't have any flexibility. NFC
These particular instructions only operate on 128-bit vectors and have no wider equivalents. And the
element size is always known.

One could argue that MOVSS/MOVSD could be merged, but that's probably disruptive to code in
X86ISelLowering and probably low value.

llvm-svn: 360815
2019-05-15 21:16:28 +00:00
Reid Kleckner 7c438c5b07 [codeview] Finish support for reading and writing S_ANNOTATION records
Implement dumping via llvm-pdbutil and llvm-readobj.

llvm-svn: 360813
2019-05-15 20:53:39 +00:00
Pete Couperus 1ca049959f Uncomment LLVM_FALLTHROUGH.
llvm-svn: 360798
2019-05-15 19:46:17 +00:00
Taewook Oh 9d020de3e8 [PredicateInfo] Do not process unreachable operands.
Summary: We should excluded unreachable operands from processing as their DFS visitation order is undefined. When `renameUses` function sorts `OpsToRename` (https://fburl.com/d2wubn60), the comparator assumes that the parent block of the operand has a corresponding dominator tree node. This is not the case for unreachable operands and crashes the compiler.

Reviewers: dberlin, mgrang, davide

Subscribers: efriedma, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61154

llvm-svn: 360796
2019-05-15 19:35:38 +00:00
Nicolai Haehnle f672b6170c [MachineOperand] Add a ChangeToGA method
Summary:
Analogous to the other ChangeToXXX methods. See the next patch for a
use case.

Change-Id: I6548d614706834fb9109ab3c8fe915e9c6ece2a7

Reviewers: arsenm, kzhuravl

Subscribers: wdng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61651

llvm-svn: 360789
2019-05-15 17:48:10 +00:00
Nicolai Haehnle 664ceeda68 RegAlloc: try to fail more gracefully when out of registers
Summary:
The emitError path allows the program to continue, unlike report_fatal_error.
This is friendlier to use cases where LLVM is embedded in a larger program,
because the caller may be able to deal with the error somewhat gracefully.

Change the number of requested NOP bytes in the AArch64 and PowerPC
test cases to avoid triggering an unrelated assertion. The compilation
still fails, as verified by the test.

Change-Id: Iafb9ca341002a597b82e59ddc7a1f13c78758e3d

Reviewers: arsenm, MatzeB

Subscribers: qcolombet, nemanjai, wdng, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61489

llvm-svn: 360786
2019-05-15 17:29:58 +00:00
Hiroshi Yamauchi 7dfd087a9a [JumpThreading] A bug fix for stale loop info after unfold select
Summary:
The return value of a TryToUnfoldSelect call was not checked, which led to an
incorrectly preserved loop info and some crash.

The original crash was reported on https://reviews.llvm.org/D59514.

Reviewers: davidxl, amehsan

Reviewed By: davidxl

Subscribers: fhahn, brzycki, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61920

llvm-svn: 360780
2019-05-15 15:15:16 +00:00
Ryan Taylor 29257eb76c [AMDGPU] Increases available SGPR for Calling Convention
Summary:
SGPR in CC can be either hw initialized or set by other chained shaders
and so this increases the SGPR count availalbe to CC to 105.

Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61261

llvm-svn: 360778
2019-05-15 14:43:55 +00:00
Cameron McInally 0c82d9b5a2 Teach InstSimplify -X + X --> 0.0 about unary FNeg
Differential Revision: https://reviews.llvm.org/D61916

llvm-svn: 360777
2019-05-15 14:31:33 +00:00
Clement Courbet eaf4413d2d Revert r360771 "[MergeICmps] Simplify the code."
Breaks a bunch of builbdots.

llvm-svn: 360776
2019-05-15 14:21:59 +00:00
Clement Courbet 0d071be474 [MergeICmps] Fix r360771.
Twine references a StringRef by reference, not value...

llvm-svn: 360775
2019-05-15 14:00:45 +00:00
Stephen Tozer 0d02f2ff4f Revert "[Salvage] Change salvage debug info implementation to use DW_OP_LLVM_convert where needed"
This reverts r360772 due to build issues.
Reverted commit: 17dd4d7403.

llvm-svn: 360773
2019-05-15 13:41:44 +00:00
Stephen Tozer 17dd4d7403 [Salvage] Change salvage debug info implementation to use DW_OP_LLVM_convert where needed
Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=40645

Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop
casts impossible. With the recent addition of DW_OP_LLVM_convert this
salvaging is now possible, and so can be used to fix the attached bug as
well as any cases where SExt instruction results are lost in the
debugging metadata. This patch introduces this fix by expanding the
salvage debug info method to cover these cases using the new operator.

Differential revision: https://reviews.llvm.org/D61184

llvm-svn: 360772
2019-05-15 13:15:48 +00:00
Clement Courbet 157ae639fa [MergeICmps] Simplify the code.
Instead of patching the original blocks, we now generate new blocks and
delete the old blocks. This results in simpler code with a less twisted
control flow (see the change in `entry-block-shuffled.ll`).

This will make https://reviews.llvm.org/D60318 simpler by making it more
obvious where control flow created and deleted.

Reviewers: gchatelet

Subscribers: hiraditya, llvm-commits, spatel

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61736

llvm-svn: 360771
2019-05-15 13:04:24 +00:00
Simon Pilgrim 2dd6a0c0c3 Revert rL360675 : [APFloat] APFloat::Storage::Storage - fix use after move
This was mentioned both in https://www.viva64.com/en/b/0629/ and by scan-build checks
........
There's concerns this may just introduce a use-after-free instead.....

llvm-svn: 360770
2019-05-15 13:03:10 +00:00
David Green 0582b22f10 [ARM] Don't use the Machine Scheduler for cortex-m at minsize
The new cortex-m schedule in rL360768 helps performance, but can increase the
amount of high-registers used. This, on average, ends up increasing the
codesize by a fair amount (because less instructions are converted from T2 to
T1). On cortex-m at -Oz, where we are quite size-paranoid, it is better to use
the existing DAG scheduler with the RegPressure scheduling preference (at least
until the issues around T2 vs T1 instructions can be improved).

I have also made sure that the Sched::RegPressure dag scheduler is always
chosen for MinSize.

The test shows one case where we increase the number of registers used.

Differential Revision: https://reviews.llvm.org/D61882

llvm-svn: 360769
2019-05-15 12:58:02 +00:00
David Green d2d0f46cd2 [ARM] Cortex-M4 schedule
This patch adds a simple Cortex-M4 schedule, renaming the existing M3
schedule to M4 and filling in the latencies as-per the Cortex-M4 TRM:
https://developer.arm.com/docs/ddi0439/latest

Most of these are 1, with the important exception being loads taking 2
cycles. A few others are also higher, but I don't believe they make a
large difference. I've repurposed the M3 schedule as the latencies are
mostly the same between the two cores, with the M4 having more FP and
DSP instructions. We also turn on MISched and UseAA for the cores that
now use this.

It also adds some schedule Write's to various instruction to make things
simpler.

Differential Revision: https://reviews.llvm.org/D54142

llvm-svn: 360768
2019-05-15 12:41:58 +00:00
Simon Atanasyan 4c68c5ae71 [mips] LLVM and GAS now use same instructions for CFA Definition. NFCI
LLVM previously used `DW_CFA_def_cfa` instruction in .eh_frame to set
the register and offset for current CFA rule. We change it to
`DW_CFA_def_cfa_register` which is the same one used by GAS that only
changes the register but keeping the old offset.

Patch by Mirko Brkusanin.

Differential Revision: https://reviews.llvm.org/D61899

llvm-svn: 360765
2019-05-15 12:05:27 +00:00
Florian Hahn 9e778e6c73 [LV] Move getScalarizationOverhead and vector call cost computations to CM. (NFC)
This reduces the number of parameters we need to pass in and they seem a
natural fit in LoopVectorizationCostModel. Also simplifies things for
D59995.

As a follow up refactoring, we could only expose a expose a
shouldUseVectorIntrinsic() helper in LoopVectorizationCostModel, instead
of calling getVectorCallCost/getVectorIntrinsicCost in
InnerLoopVectorizer/VPRecipeBuilder.

Reviewers: Ayal, hsaito, dcaballe, rengolin

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D61638

llvm-svn: 360758
2019-05-15 10:05:49 +00:00
Clement Courbet d9d0665d1c [[DAGCombiner][NFC] Add a comment.
As suggested in D61846.

llvm-svn: 360755
2019-05-15 08:21:18 +00:00
Craig Topper 384d46c0d5 [X86] Use OR32mi8Locked instead of LOCK_OR32mi8 in emitLockedStackOp.
They encode the same way, but OR32mi8Locked sets hasUnmodeledSideEffects set
which should be stronger than the mayLoad/mayStore on LOCK_OR32mi8. I think
this makes sense since we are using it as a fence.

This also seems to hide the operation from the speculative load hardening pass
so I've reverted r360511.

llvm-svn: 360747
2019-05-15 04:15:46 +00:00
Fangrui Song f4dfd63c74 [IR] Disallow llvm.global_ctors and llvm.global_dtors of the 2-field form in textual format
The 3-field form was introduced by D3499 in 2014 and the legacy 2-field
form was planned to be removed in LLVM 4.0

For the textual format, this patch migrates the existing 2-field form to
use the 3-field form and deletes the compatibility code.
test/Verifier/global-ctors-2.ll checks we have a friendly error message.

For bitcode, lib/IR/AutoUpgrade UpgradeGlobalVariables will upgrade the
2-field form (add i8* null as the third field).

Reviewed By: rnk, dexonsmith

Differential Revision: https://reviews.llvm.org/D61547

llvm-svn: 360742
2019-05-15 02:35:32 +00:00
Philip Reames 658cad1287 [NFC] Reuse a helper function to eliminate duplicate code
llvm-svn: 360740
2019-05-15 01:39:07 +00:00
Richard Trieu 5f7d4ab5f9 [XCore] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360738
2019-05-15 01:28:30 +00:00
Richard Trieu 0116385452 [X86] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360736
2019-05-15 01:17:58 +00:00
Richard Trieu c6c421379d [WebAssembly] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360735
2019-05-15 01:03:00 +00:00
Richard Trieu 1e6f98b89d [SystemZ] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360734
2019-05-15 00:46:18 +00:00
Richard Trieu cf82d4a483 [Sparc] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360733
2019-05-15 00:35:37 +00:00
Richard Trieu 51fc56d603 [RISCV] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360732
2019-05-15 00:24:15 +00:00
Richard Trieu ee6ced196d [PowerPC] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360731
2019-05-15 00:09:58 +00:00
Richard Trieu e8f83befd5 [NVPTX] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360729
2019-05-14 23:56:18 +00:00
Richard Trieu a57ce32eff [MSP430] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360728
2019-05-14 23:45:18 +00:00
Richard Trieu 313b78150c [Mips] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360727
2019-05-14 23:34:37 +00:00
Richard Trieu 2e50dc78c5 [Lanai] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360726
2019-05-14 23:17:18 +00:00
Richard Trieu 7ef172998b [Hexagon] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360724
2019-05-14 23:04:55 +00:00
Richard Trieu a68ee931e6 [BPF] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360722
2019-05-14 22:54:06 +00:00
Richard Trieu e982b42003 [AVR] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360721
2019-05-14 22:41:58 +00:00
Philip Reames 445f942fc4 Use an offset from TOS for idempotent rmw locked op lowering
This was the portion split off D58632 so that it could follow the redzone API cleanup. Note that I changed the offset preferred from -8 to -64. The difference should be very minor, but I thought it might help address one concern which had been previously raised.

Differential Revision: https://reviews.llvm.org/D61862

llvm-svn: 360719
2019-05-14 22:32:42 +00:00
Richard Trieu f3011b9b10 [ARM] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360718
2019-05-14 22:29:50 +00:00
Richard Trieu 7f9a008a2d [ARC] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360716
2019-05-14 22:06:04 +00:00
Richard Trieu 8ce2ee9d56 [AMDGPU] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360713
2019-05-14 21:54:37 +00:00
Richard Trieu b26592e04d [AArch64] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360709
2019-05-14 21:33:53 +00:00
Leonard Chan 0cdd3b1d81 [NewPM] Port HWASan and Kernel HWASan
Port hardware assisted address sanitizer to new PM following the same guidelines as msan and tsan.

Changes:
- Separate HWAddressSanitizer into a pass class and a sanitizer class.
- Create new PM wrapper pass for the sanitizer class.
- Use the getOrINsert pattern for some module level initialization declarations.
- Also enable kernel-kwasan in new PM
- Update llvm tests and add clang test.

Differential Revision: https://reviews.llvm.org/D61709

llvm-svn: 360707
2019-05-14 21:17:21 +00:00
Florian Hahn 53c9d585b5 [LICM] Allow AliasSetMap to contain top-level loops.
When an outer loop gets deleted by a different pass, before LICM visits
it, we cannot clean up its sub-loops in AliasSetMap, because at the
point we receive the deleteAnalysisLoop callback for the outer loop, the loop
object is already invalid and we cannot access its sub-loops any longer.

Reviewers: asbirlea, sanjoy, chandlerc

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D61904

llvm-svn: 360704
2019-05-14 19:41:36 +00:00
Dmitry Preobrazhensky ee51d851ea [AMDGPU][GFX8][GFX9] Corrected predicate of v_*_co_u32 aliases
Reviewers: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D61905

llvm-svn: 360702
2019-05-14 19:16:24 +00:00
Nikita Popov 48c4e4fa80 [LVI][CVP] Add support for abs/nabs select pattern flavor
Based on ConstantRange support added in D61084, we can now handle
abs and nabs select pattern flavors in LVI.

Differential Revision: https://reviews.llvm.org/D61794

llvm-svn: 360700
2019-05-14 18:53:47 +00:00
Alina Sbirlea 80c6e79602 [MemorySSA] LoopSimplify preserves MemorySSA only when flag is flipped.
LoopSimplify can preserve MemorySSA after r360270.
But the MemorySSA analysis is retrieved and preserved only when the
EnableMSSALoopDependency is set to true. Use the same conditional to
mark the pass as preserved, otherwise subsequent passes will get an
invalid analysis.
Resolves PR41853.

llvm-svn: 360697
2019-05-14 18:07:18 +00:00
Philip Reames 75ad8c5d63 Fix a release mode warning introduced in r360694
llvm-svn: 360696
2019-05-14 17:50:06 +00:00
Philip Reames bd8d309111 [IndVars] Extend reasoning about loop invariant exits to non-header blocks
Noticed while glancing through the code for other reasons.  The extension is trivial enough, decided to just do it.

llvm-svn: 360694
2019-05-14 17:20:10 +00:00
Cameron McInally 7c5c0c9fe5 Support FNeg in SpeculativeExecution pass
Differential Revision: https://reviews.llvm.org/D61910

llvm-svn: 360692
2019-05-14 16:51:18 +00:00
Stanislav Mekhanoshin 05791d90c9 [AMDGPU] Fixed handling of imemdiate i1 literals
This bug was exposed by the rL360395.

Differential Revision: https://reviews.llvm.org/D61812

llvm-svn: 360689
2019-05-14 16:18:00 +00:00
Tim Renouf 33cb8f5b54 [AMDGPU] Fixed +DumpCode
The +DumpCode attribute is a horrible hack in AMDGPU to embed the
disassembly of the generated code into the elf file. It is used by LLPC
to implement an extension that allows the application to read back the
disassembly of the code. Longer term, we should re-implement that by
using the LLVM disassembler from the Vulkan driver.

Recent LLVM changes broke +DumpCode. With -filetype=asm it crashed, and
with -filetype=obj I think it did not include any instructions, only the
labels. Fixed with this commit: now it has no effect with -filetype=asm,
and works as intended with -filetype=obj.

Differential Revision: https://reviews.llvm.org/D60682

Change-Id: I6436d86fe2ea220d74a643a85e64753747c9366b
llvm-svn: 360688
2019-05-14 16:17:14 +00:00
Simon Pilgrim c2d9cfd925 [X86] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD targets (PR40758)
D61068 handled vector shifts, this patch does the same for scalars where there are similar number of pipes for shifts as bit ops - this is true almost entirely for AMD targets where the scalar ALUs are well balanced.

This combine avoids AND immediate mask which usually means we reduce encoding size.

Some tests show use of (slow, scaled) LEA instead of SHL in some cases, but thats due to particular shift immediates - shift+mask generate these just as easily.

Differential Revision: https://reviews.llvm.org/D61830

llvm-svn: 360684
2019-05-14 15:21:28 +00:00
Cullen Rhodes 3b917019a5 [AArch64][SVE2] Asm: add SQRDMLAH/SQRDMLSH instructions
Summary:
This patch adds support for the indexed and unpredicated vectors forms of the
SQRDMLAH and SQRDMLSH instructions.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D61515

llvm-svn: 360683
2019-05-14 15:10:16 +00:00
Cullen Rhodes e029da46e6 [AArch64][SVE2] Asm: add integer multiply-add/subtract (indexed) instructions
Summary:
This patch adds support for the following instructions:

  MLA mul-add, writing addend (Zda = Zda +  Zn * Zm[idx])
  MLS mul-sub, writing addend (Zda = Zda + -Zn * Zm[idx])

Predicated forms of these instructions were added in SVE.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D61514

llvm-svn: 360682
2019-05-14 15:01:00 +00:00
Fangrui Song 2f6ef2fc92 DWARF v5: emit DW_AT_addr_base if DW_AT_low_pc references .debug_addr
The condition !AddrPool.empty() is tested before attachRangesOrLowHighPC(), which may add an entry to AddrPool. We emit DW_AT_low_pc (DW_FORM_addrx) but may incorrectly omit DW_AT_addr_base for LineTablesOnly. This can be easily reproduced:

clang -gdwarf-5 -gmlt -c a.cc

Fix this by moving !AddrPool.empty() below.

This was discovered while investigating an lld crash (fixed by D61889) on such object files: ld.lld --gdb-index a.o

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D61891

llvm-svn: 360678
2019-05-14 14:37:26 +00:00
Lei Huang 22561972af [PowerPC] Custom lower known CR bit spills
For known CRBit spills, CRSET/CRUNSET, it is more efficient to load and spill
the known value instead of extracting the bit.

eg. This sequence is currently used to spill a CRUNSET:
    crclr   4*cr5+lt
    mfocrf  r3,4
    rlwinm  r3,r3,20,0,0
    stw     r3,132(r1)

This patch custom lower it to:
    li  r3,0
    stw r3,132(r1)

Differential Revision: https://reviews.llvm.org/D61754

llvm-svn: 360677
2019-05-14 14:27:06 +00:00
Simon Pilgrim 9fd3be294c [APFloat] APFloat::Storage::Storage - fix use after move
This was mentioned both in https://www.viva64.com/en/b/0629/ and by scan-build checks

llvm-svn: 360675
2019-05-14 14:13:30 +00:00
Kit Barton 37b7922daa Save the induction binary operator in IVDescriptors for non FP induction variables.
Summary:
Currently InductionBinOps are only saved for FP induction variables, the PR extends it with non FP induction variable, so user of IVDescriptors can query the InductionBinOps for integer induction variables.

The changes in hasUnsafeAlgebra() and getUnsafeAlgebraInst() are required for the existing LIT test cases to pass. As described in the comment of the two functions, one of the requirement to return true is it is a FP induction variable. The checks was not needed because InductionBinOp was not set on non FP cases before.

https://reviews.llvm.org/D60565 depends on the patch.

Committed on behalf of @Whitney (Whitney Tsang).

Reviewers: jdoerfert, kbarton, fhahn, hfinkel, dmgreen, Meinersbur

Reviewed By: jdoerfert

Subscribers: mgorny, hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61329

llvm-svn: 360671
2019-05-14 13:26:36 +00:00
Tim Northover 717b62a146 TableGen: support #ifndef in addition to #ifdef.
TableGen has a limited preprocessor, which only really supports
easier.

llvm-svn: 360670
2019-05-14 13:04:25 +00:00
Thomas Preud'homme 7b4ecdd3c2 Reinstate "FileCheck [5/12]: Introduce regular numeric variables"
This reinstates r360578 (git e47362c1ec),
reverted in r360653 (git 004393681c),
with a fix for the list added in FileCheck.rst to build without error.

Copyright:
    - Linaro (changes up to diff 183612 of revision D55940)
    - GraphCore (changes in later versions of revision D55940 and
                 in new revision created off D55940)

Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar,
arichardson, rnk

Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar,
arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60385

llvm-svn: 360665
2019-05-14 11:58:30 +00:00
Simon Pilgrim 2747ee2c83 [X86] X86TargetLowering::LowerINTRINSIC_WO_CHAIN - ensure rounding control is initialized. NFCI.
Fixes scan-build warnings

llvm-svn: 360664
2019-05-14 11:30:39 +00:00
Tim Northover ff6875acd9 AArch64: support binutils-like things on arm64_32.
This adds support for the arm64_32 watchOS ABI to LLVM's low level tools,
teaching them about the specific MachO choices and constants needed to
disassemble things.

llvm-svn: 360663
2019-05-14 11:25:44 +00:00
Tim Northover ed9117f88d GlobalOpt: do not promote globals used atomically to constants.
Some atomic loads are implemented as cmpxchg (particularly if large or
floating), and that usually requires write access to the memory involved
or it will segfault.

We can still propagate the constant value to users we understand though.

llvm-svn: 360662
2019-05-14 11:03:13 +00:00
Simon Pilgrim 15842132d5 [MemorySanitizer] getMMXVectorTy - assert valid element size. NFCI.
Fixes scan-build warnings

llvm-svn: 360658
2019-05-14 10:29:18 +00:00
Diana Picus a568222ddd [IRTranslator] Don't hardcode GEP index type
When breaking up loads and stores of aggregates, the IRTranslator uses
LLT::scalar(64) for the index type of the G_GEP instructions that
compute the addresses. This is unnecessarily large for 32-bit targets.
Use the int ptr type provided by the DataLayout instead.

Note that we're already doing the right thing when translating
getelementptr instructions from the IR. This is just an oversight when
generating new ones while translating loads/stores.

Both x86 and AArch64 already have tests confirming that the old
behaviour is preserved for 64-bit targets.

Differential Revision: https://reviews.llvm.org/D61852

llvm-svn: 360656
2019-05-14 09:25:17 +00:00
Thomas Preud'homme 004393681c Revert "FileCheck [5/12]: Introduce regular numeric variables"
This reverts r360578 (git e47362c1ec) to
solve the sphinx build failure on
http://lab.llvm.org:8011/builders/llvm-sphinx-docs buildbot.

llvm-svn: 360653
2019-05-14 08:43:11 +00:00
Philip Reames 3098e44daa [X86] Prefer locked stack op over mfence for seq_cst 64-bit stores on 32-bit targets
This is a follow on to D58632, with the same logic. Given a memory operation which needs ordering, but doesn't need to modify any particular address, prefer to use a locked stack op over an mfence.

Differential Revision: https://reviews.llvm.org/D61863

llvm-svn: 360649
2019-05-14 04:43:37 +00:00
Fangrui Song e1cb2c0f40 [Object] Change ObjectFile::getSectionContents to return Expected<ArrayRef<uint8_t>>
Change
std::error_code getSectionContents(DataRefImpl, StringRef &) const;
to
Expected<ArrayRef<uint8_t>> getSectionContents(DataRefImpl) const;

Many object formats use ArrayRef<uint8_t> as the underlying type, which
is generally better than StringRef to represent binary data, so change
the type to decrease the number of type conversions.

Reviewed By: ruiu, sbc100

Differential Revision: https://reviews.llvm.org/D61781

llvm-svn: 360648
2019-05-14 04:22:51 +00:00
Sanjay Patel 99d6420a82 [SDAG] fix unused variable warning and unneeded indirection; NFC
llvm-svn: 360640
2019-05-14 00:57:31 +00:00
Sanjay Patel 3a13d970aa [SDAG, x86] allow targets to override test for binop opcodes
This follows the pattern of the existing isCommutativeBinOp().

x86 shows improvements from vector narrowing for the min/max opcodes.

llvm-svn: 360639
2019-05-14 00:39:40 +00:00
Gor Nishanov d64455cd43 [coroutines] Fix spills of static array allocas
Summary:
CoroFrame was not considering static array allocas, and was only ever reserving a single element in the coroutine frame.
This meant that stores to the non-zero'th element would corrupt later frame data.

Store static array allocas as field arrays in the coroutine frame.

Added test.

Committed by Gor Nishanov on behalf of ben-clayton
Reviewers: GorNishanov, modocache

Reviewed By: GorNishanov

Subscribers: Orlando, capn, EricWF, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61372

llvm-svn: 360636
2019-05-13 23:58:24 +00:00
Craig Topper e2966473dd [X86] Use ISD::MERGE_VALUES to return from lowerAtomicArith instead of calling ReplaceAllUsesOfValueWith and returning SDValue().
Returning SDValue() makes the caller think that nothing happened and it will
end up executing the Expand path. This generates extra nodes that will need to
be pruned as dead code.

Returning an ISD::MERGE_VALUES will tell the caller that we'd like to make a
change and it will take care of replacing uses. This will prevent falling into
the Expand path.

llvm-svn: 360627
2019-05-13 22:17:13 +00:00
Craig Topper 5f999c2bea [X86] Various type corrections to the code that creates LOCK_OR32mi8/OR32mi8Locked to the stack for idempotent atomic rmw and atomic fence.
These are updates to match how isel table would emit a LOCK_OR32mi8 node.

-Use i32 for the immediate zero even though only 8 bits are encoded.
-Use i16 for segment register.
-Use LOCK_OR32mi8 for idempotent atomic operations in 32-bit mode to match
64-bit mode. I'm not sure why OR32mi8Locked and LOCK_OR32mi8 both exist. The
only difference seems to be that OR32mi8Locked is marked as UnmodeledSideEffects=1.
-Emit an extra i32 result for the flags output.

I don't know if the types here really matter just noticed it was inconsistent
with normal behavior.

llvm-svn: 360619
2019-05-13 21:01:24 +00:00
Lang Hames 56baade10d [JITLink][MachO] Honor the no-dead-strip flag on nlist entries.
llvm-svn: 360618
2019-05-13 20:52:30 +00:00
Nikita Popov 323dc634b9 [WebAssembly] Don't assume that zext/sext result is i32/i64 in fast isel (PR41841)
Usually this will abort fast-isel at the instruction using the
non-legal result, but if the only use is in a different basic block,
we'll incorrectly assume that the zext/sext is to i32 (rather than
i128 in this case).

Differential Revision: https://reviews.llvm.org/D61823

llvm-svn: 360616
2019-05-13 19:40:18 +00:00
Stanislav Mekhanoshin 79b2828b3f [AMDGPU] Reorder includes per coding standard. NFC.
llvm-svn: 360609
2019-05-13 18:05:10 +00:00
Stanislav Mekhanoshin 21088639ae [AMDGPU] Remove now unused V2FP16_ONE constant def. NFC.
llvm-svn: 360608
2019-05-13 17:52:57 +00:00
Robert Lougher 91a9d4ef4b Revert [X86] Avoid SFB - Fix inconsistent codegen with/without debug info
Revert r360436 as it is causing clang-x64-windows-msvc buildbot to fail.

llvm-svn: 360606
2019-05-13 17:36:46 +00:00
Sanjay Patel 760f61ab36 [InstCombine] try harder to form rotate (funnel shift) (PR20750)
We have a similar match for patterns ending in a truncate. This
should be ok for all targets because the default expansion would
still likely be better from replacing 2 'and' ops with 1.

Attempt to show the logic equivalence in Alive (which doesn't
currently have funnel-shift in its vocabulary AFAICT):

  %shamt = zext i8 %i to i32
  %m = and i32 %shamt, 31
  %neg = sub i32 0, %shamt
  %and4 = and i32 %neg, 31
  %shl = shl i32 %v, %m
  %shr = lshr i32 %v, %and4
  %or = or i32 %shr, %shl
  =>
  %a = and i8 %i, 31
  %shamt2 = zext i8 %a to i32
  %neg2 = sub i32 0, %shamt2
  %and4 = and i32 %neg2, 31
  %shl = shl i32 %v, %shamt2
  %shr = lshr i32 %v, %and4
  %or = or i32 %shr, %shl

https://rise4fun.com/Alive/V9r

llvm-svn: 360605
2019-05-13 17:28:19 +00:00
Nick Desaulniers c33f754e74 [TargetLowering] Handle multi depth GEPs w/ inline asm constraints
Summary:
X86TargetLowering::LowerAsmOperandForConstraint had better support than
TargetLowering::LowerAsmOperandForConstraint for arbitrary depth
getelementpointers for "i", "n", and "s" extended inline assembly
constraints. Hoist its support from the derived class into the base
class.

Link: https://github.com/ClangBuiltLinux/linux/issues/469

Reviewers: echristo, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61560

llvm-svn: 360604
2019-05-13 17:27:44 +00:00
Simon Pilgrim 73aee29095 [X86][SSE] LowerBuildVectorv4x32 - don't insert MOVQ for undef elts
Fixes the regression noted in D61782 where a VZEXT_MOVL was being inserted because we weren't discriminating between 'zeroable' and 'all undef' for the upper elts.

Differential Revision: https://reviews.llvm.org/D61782

llvm-svn: 360596
2019-05-13 16:10:11 +00:00
Simon Pilgrim cf5a8eb7cd [X86][SSE] Relax use limits for lowerAddSubToHorizontalOp (PR32433)
Now that we can use HADD/SUB for scalar additions from any pair of extracted elements (D61263), we can relax the one use limit as we will be able to merge multiple uses into using the same HADD/SUB op.

This exposes a couple of missed opportunities in LowerBuildVectorv4x32 which will be committed separately.

Differential Revision: https://reviews.llvm.org/D61782

llvm-svn: 360594
2019-05-13 16:02:45 +00:00
Simon Pilgrim d3cedee3c6 [TargetLowering] Add SimplifyDemandedBits support for ZERO_EXTEND_VECTOR_INREG
More work for PR39709.

llvm-svn: 360592
2019-05-13 15:51:26 +00:00
Amara Emerson e5248e6b41 Revert "[LSR] Tweak setup cost depth threshold to 10."
Changing the threshold might not be the best long term approach. Revert for now.

llvm-svn: 360589
2019-05-13 15:37:18 +00:00
Simon Pilgrim d9aa928603 [X86] Add SimplifyDemandedBits support for PEXTRB/PEXTRW (PR39709)
Test case will be included in a followup - its being used but its tricky to show a case that isn't caught at a later stage anyway.

llvm-svn: 360588
2019-05-13 15:31:27 +00:00
Sanjay Patel 05dafb1c97 [DAGCombiner] narrow vector binop with inserts/extract
We catch most of these patterns (on x86 at least) by matching
a concat vectors opcode early in combining, but the pattern may
emerge later using insert subvector instead.

The AVX1 diffs for add/sub overflow show another missed narrowing
pattern. That one may be falling though the cracks because of
combine ordering and multiple uses.

llvm-svn: 360585
2019-05-13 14:31:14 +00:00
Kevin P. Neal 5987749e33 Add constrained fptrunc and fpext intrinsics.
The new fptrunc and fpext intrinsics are constrained versions of the
regular fptrunc and fpext instructions.

Reviewed by:	Andrew Kaylor, Craig Topper, Cameron McInally, Conner Abbot
Approved by:	Craig Topper
Differential Revision: https://reviews.llvm.org/D55897

llvm-svn: 360581
2019-05-13 13:23:30 +00:00
Simon Pilgrim d845bc3d0c TargetLowering::SimplifyDemandedBits - early-out for UNDEF ops. NFCI.
llvm-svn: 360579
2019-05-13 12:44:03 +00:00
Thomas Preud'homme e47362c1ec FileCheck [5/12]: Introduce regular numeric variables
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch introduces regular numeric
variables which can be set on the command-line.

This commit introduces regular numeric variable that can be set on the
command-line with the -D option to a numeric value. They can then be
used in CHECK patterns in numeric expression with the same shape as
@LINE numeric expression, ie. VAR, VAR+offset or VAR-offset where offset
is an integer literal.

The commit also enable strict whitespace in the verbose.txt testcase to
check that the position or the location diagnostics. It fixes one of the
existing CHECK in the process which was not accurately testing a
location diagnostic (ie. the diagnostic was correct, not the CHECK).

Copyright:
    - Linaro (changes up to diff 183612 of revision D55940)
    - GraphCore (changes in later versions of revision D55940 and
                 in new revision created off D55940)

Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk

Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60385

llvm-svn: 360578
2019-05-13 12:39:08 +00:00
Eugene Leviant 053c6fc2b8 [ThinLTO] Don't internalize weak writeable variables
Variables with linkonce_odr and weak_odr linkage shouldn't be internalized
if they're not readonly. Otherwise we may end up with multiple copies of
such variable, so reads and writes will become inconsistent

Differential revision: https://reviews.llvm.org/D61255

llvm-svn: 360577
2019-05-13 11:53:05 +00:00
Cullen Rhodes 6dcef8fc0c [AArch64][SVE2] Add SVE2 target features to backend and TargetParser
Summary:
This patch adds the following features defined by Arm SVE2 architecture
extension:

  sve2, sve2-aes, sve2-sm4, sve2-sha3, bitperm

For existing CPUs these features are declared as unsupported to prevent
scheduler errors.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewers: SjoerdMeijer, sdesmalen, ostannard, rovka

Reviewed By: SjoerdMeijer, rovka

Subscribers: rovka, javed.absar, tschuett, kristof.beyls, kristina, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61513

llvm-svn: 360573
2019-05-13 10:10:24 +00:00
Ulrich Weigand 8e42f6ddc8 [SystemZ] Model floating-point control register
This adds the FPC (floating-point control register) as a reserved
physical register and models its use by SystemZ instructions.

Note that only the current rounding modes and the IEEE exception
masks are modeled.  *Changes* of the FPC due to exceptions (in
particular the IEEE exception flags and the DXC) are not modeled.

At this point, this patch is mostly NFC, but it will prevent
scheduling of floating-point instructions across SPFC/LFPC etc.

llvm-svn: 360570
2019-05-13 09:47:26 +00:00
Sam Parker a33e311a3b [ARM][ParallelDSP] Relax alias checks
When deciding the safety of generating smlad, we checked for any
writes within the block that may alias with any of the loads that
need to be widened. This is overly conservative because it only
matters when there's a potential aliasing write to a location
accessed by a pair of loads.

Now we check for aliasing writes only once, during setup. If two
loads are found to have an aliasing write between them, we don't add
these loads to LoadPairs. This means that later during the transform,
we can safely widened a pair without worrying about aliasing.

However, to maintain correctness, we also need to change the way that
wide loads are inserted because the order is now important.

The MatchSMLAD method has also been changed, absorbing
MatchReductions and AddMACCandidate to hopefully improve readability.

Differential Revision: https://reviews.llvm.org/D6102

llvm-svn: 360567
2019-05-13 09:23:32 +00:00
Clement Courbet 9afc4764dd [DAGCombiner] Fix invalid alias analysis.
Summary:
When we know for sure whether two addresses do or do not alias, we
should immediately return from DAGCombiner::isAlias().

I think this comes from a bad copy/paste, Sorry for not catching that during the
code review.

Fixes PR41855.

Reviewers: niravd, gchatelet, EricWF

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61846

llvm-svn: 360566
2019-05-13 09:07:37 +00:00
Fangrui Song f3be557159 [WebAssembly] Add dependency on WebAssemblyDesc to fix BUILD_SHARED_LIBS=on builds after rL360550
This fixes the link error

ld.lld: error: undefined symbol: llvm::WebAssembly::anyTypeToString(unsigned int)
>>> referenced by WebAssemblyDisassembler.cpp

llvm-svn: 360558
2019-05-13 05:51:39 +00:00
Yonghong Song 98fe9c9869 [BPF] emit BTF sections only if debuginfo available
Currently, without -g, BTF sections may still be emitted with
data sections, e.g., for linux kernel bpf selftest
test_tcp_check_syncookie_kern.c issue discovered by Martin
as shown below.

-bash-4.4$ bpftool btf dump file test_tcp_check_syncookie_kern.o
[1] VAR 'results' type_id=0, linkage=global-alloc
[2] VAR '_license' type_id=0, linkage=global-alloc
[3] DATASEC 'license' size=0 vlen=1
        type_id=2 offset=0 size=4
[4] DATASEC 'maps' size=0 vlen=1
        type_id=1 offset=0 size=28

Let disable BTF generation if no debuginfo, which is
the original design.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D61826

llvm-svn: 360556
2019-05-13 05:00:23 +00:00
Lang Hames 4513929094 [JITLink] Track section alignment and make sure it is respected during layout.
Previously we had only honored alignments on individual atoms, but
tools/runtimes may assume that the section alignment is respected too.

llvm-svn: 360555
2019-05-13 04:51:31 +00:00
Craig Topper 61e556d2bd Recommit r358887 "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"
I've included a new fix in X86RegisterInfo to prevent PR41619 without
reintroducing r359392. We might be able to improve that in the base class
implementation of shouldRewriteCopySrc somehow. But this hopefully enables
forward progress on SimplifyDemandedBits improvements for now.

Original commit message:

This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.

The AMDGPU backend needed an extra  (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb
but it caused a lot of noise on other targets - some improvements, some regressions.

The X86 changes are all definite wins.

llvm-svn: 360552
2019-05-13 04:03:35 +00:00
David L. Jones a263aa25e1 [WebAssembly] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360550
2019-05-13 03:32:41 +00:00
Lang Hames 23085ec36d [JITLink] Add a test for zero-filled content.
Also updates RuntimeDyldChecker and llvm-rtdyld to support zero-fill tests by
returning a content address of zero (but no error) for zero-fill atoms, and
treating loads from zero as returning zero.

llvm-svn: 360547
2019-05-12 22:26:33 +00:00
Simon Pilgrim a7fc763082 [X86][AVX] Split VZEXT_MOVL ymm/zmm if the upper elements are not demanded.
Removes unnecessary vzeroupper noted in D61806

llvm-svn: 360543
2019-05-12 15:16:29 +00:00
Sanjay Patel a09e686821 [DAGCombiner] try to move bitcast after extract_subvector
I noticed that we were failing to narrow an x86 ymm math op in a case similar
to the 'madd' test diff. That is because a bitcast is sitting between the math
and the extract subvector and thwarting our pattern matching for narrowing:

       t56: v8i32 = add t59, t58
      t68: v4i64 = bitcast t56
    t73: v2i64 = extract_subvector t68, Constant:i64<2>
  t96: v4i32 = bitcast t73

There are a few wins and neutral diffs in the other tests.

Differential Revision: https://reviews.llvm.org/D61806

llvm-svn: 360541
2019-05-12 14:43:20 +00:00
Simon Pilgrim fda6bffd3b [X86][SSE] SimplifyDemandedBits - call PEXTRB/PEXTRW SimplifyDemandedVectorElts as well.
See if we can simplify the demanded vector elts from the extraction before trying to simplify the demanded bits.

This helps us with target shuffles and hops in particular.

llvm-svn: 360535
2019-05-11 21:35:50 +00:00
Simon Pilgrim 605a840747 [DAG] Add SimplifyDemandedBits support for BITREVERSE
Pulled out of D58017 while I continue to investigate the BSWAP regression on PPC

llvm-svn: 360534
2019-05-11 20:56:05 +00:00
Don Hinton 0303e8a3fd [CommandLine] Add long option flag for cl::ParseCommandLineOptions . Part 5 of 5
Summary:
If passed, the long option flag makes the CommandLine parser
mimic the behavior or GNU getopt_long.  Short options are a single
character prefixed by a single dash, and long options are multiple
characters prefixed by a double dash.

This patch was motivated by the discussion in the following thread:
http://lists.llvm.org/pipermail/llvm-dev/2019-April/131786.html

Reviewed By: MaskRay

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61294

llvm-svn: 360532
2019-05-11 20:27:01 +00:00
Simon Pilgrim 6b10fde69b [CostModel][X86] Add min/max reduction costs for all SSE targets
The original costs stopped at SSE42, I've added conservative estimates for everything down to SSE1/SSE2 and moved some of the SSE42 costs to SSE41 (really only the addition of PCMPGT makes any difference).

I've also added missing vXi8 costs (we use PHMINPOSUW for i8/i16 for scarily quick results) and 256-bit vector costs for AVX1.

llvm-svn: 360528
2019-05-11 17:12:52 +00:00
Simon Pilgrim e4c5b6d9bd [X86][SSE] Add SimplifyDemandedVectorElts HADD/HSUB handling.
Still missing PHADDW/PHSUBW tests because PEXTRW doesn't call SimplifyDemandedVectorElts

llvm-svn: 360526
2019-05-11 16:07:12 +00:00
Simon Pilgrim 5e0f92acad FixupLEAPass::fixupIncDec - non-LEA opcodes should not happen here. NFCI.
Matches what we do in other functions and fixes scan-build warning about uninitialized NewOpcode variable.

llvm-svn: 360525
2019-05-11 16:02:34 +00:00
Craig Topper c9d7484aa3 [X86] Add CMOV_FR32X/CMOV_FR64X pseudo instructions. Use them in fast isel to fix a machine verifier error after adding test cases.
Fast isel picks the FR32X/FR64X register classes when lowering pseudo select, but it didn't have the right opcode to go with it.

llvm-svn: 360524
2019-05-11 16:00:28 +00:00
Craig Topper 74a436596d [X86] Sink some fast isel code into the only if that uses it. NFC
llvm-svn: 360523
2019-05-11 16:00:19 +00:00
Craig Topper 26f2b13a65 [X86] Use TLI.getRegClassFor to simplify some more fast isel code. NFCI
llvm-svn: 360522
2019-05-11 16:00:13 +00:00
Simon Pilgrim e7c51137aa HexagonConstEvaluator::evaluateHexExt - check incoming opcodes. NFCI.
Only certain extension opcodes are supported - fixes scan build warning.

llvm-svn: 360520
2019-05-11 15:24:34 +00:00
Simon Pilgrim 46d96c02b5 Fix uninitialized variable analyzer warning. NFCI.
llvm-svn: 360516
2019-05-11 11:08:24 +00:00
Simon Pilgrim aeed0a30c0 SelectionDAGISel::CodeGenAndEmitDAG - remove unused variable. NFCI.
llvm-svn: 360514
2019-05-11 11:00:37 +00:00
Craig Topper 682cc09675 [X86] Use getRegClassFor to simplify some code in fast isel. NFCI
No need to select the register class based on type and features. It should
already be setup by X86ISelLowering.

llvm-svn: 360513
2019-05-11 05:18:58 +00:00
Craig Topper 31f7adb94f [X86] Don't emit MOVNTDQA loads from fast-isel without SSE4.1.
We were checking for SSE4.1 for FP types, but not integer 128-bit types.

Fixes PR41837.

llvm-svn: 360512
2019-05-11 04:19:33 +00:00
Craig Topper bdef12df8d [X86] Add a test case for idempotent atomic operations with speculative load hardening. Fix an additional issue found by the test.
This test covers the fix from r360475 as well.

llvm-svn: 360511
2019-05-11 04:00:27 +00:00
Richard Trieu d0124bd762 [SystemZ] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360510
2019-05-11 03:36:16 +00:00
Richard Trieu 03fe9d82c4 [Sparc] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360506
2019-05-11 02:59:02 +00:00
Richard Trieu 00ecf67045 [RISCV] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure

llvm-svn: 360505
2019-05-11 02:43:58 +00:00
Richard Trieu 4bdb136b0f [PowerPC] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360502
2019-05-11 02:33:18 +00:00
Richard Trieu 4b620fcf0f [NVPTX] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360500
2019-05-11 02:09:13 +00:00
Richard Trieu 61fb6700a5 [MSP430] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360498
2019-05-11 01:58:52 +00:00
Richard Trieu fa29bee9d0 [Mips] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360497
2019-05-11 01:38:56 +00:00
Richard Trieu 4c3890ddbf [Lanai] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360496
2019-05-11 01:25:58 +00:00
Richard Trieu 48803aa65c [BPF] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360494
2019-05-11 01:13:21 +00:00
Richard Trieu bf9e67b5b9 [AVR] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360493
2019-05-11 01:03:03 +00:00
Richard Trieu 5e3ee4b84e [ARM] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360490
2019-05-11 00:34:07 +00:00
Richard Trieu dcf1ea08e5 [ARC] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360488
2019-05-11 00:13:01 +00:00
Richard Trieu c0bd7bd481 [AMDGPU] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360487
2019-05-11 00:03:35 +00:00
Richard Trieu 7ba0605511 [AArch64] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360486
2019-05-10 23:50:01 +00:00
Richard Trieu f48ef2f2ba [XCore] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360485
2019-05-10 23:36:49 +00:00
Richard Trieu b28b8b7724 [X86] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360484
2019-05-10 23:24:38 +00:00
Jordan Rupprecht 16c7fbd112 Revert [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
This reverts r360171 (git commit a9d6c32eaf). A repro showing the asan/msan failures is forthcoming.

llvm-svn: 360481
2019-05-10 23:20:02 +00:00
Philip Reames 849ef823df Factor out redzone ABI checks [NFCI]
As requested in D58632, cleanup our red zone detection logic in the X86 backend. The existing X86MachineFunctionInfo flag is used to track whether we *use* the redzone (via a particularly optimization?), but there's no common way to check whether the function *has* a red zone.

I'd appreciate careful review of the uses being updated. I think they are NFC, but a careful eye from someone else would be appreciated.

Differential Revision: https://reviews.llvm.org/D61799

llvm-svn: 360479
2019-05-10 22:55:42 +00:00
Lang Hames b3d6073b3c [ORC] Make a narrowing-cast explicit to silence a compiler warning.
llvm-svn: 360478
2019-05-10 22:51:03 +00:00
Lang Hames b0cecfc907 [JITLink][MachO] Mark atoms in sections 'no-dead-strip' set live by default.
If a MachO section has the no-dead-strip attribute set then its atoms should
be preserved, regardless of whether they're public or referenced elsewhere in
the object.

llvm-svn: 360477
2019-05-10 22:24:37 +00:00
Craig Topper df10cc6068 [X86] Disable speculative load hardening for operations with an explicit RSP base.
After D58632, we can create idempotent atomic operations to the top of stack.
This confused speculative load hardening because it thinks accesses should have
virtual register base except for the cases it already excluded.

This commit adds a new exclusion for this case. I'll try to reduce a test case
for this, but this fix was verified to work by the reporter. This should avoid
needing to revert D58632.

llvm-svn: 360475
2019-05-10 22:03:33 +00:00
Reid Kleckner 7eb6b5ffc3 [COFF] Fix .bss section size bug in obj2yaml / yaml2obj
We need to serialize SizeOfRawData through even when there is no data,
as in a .bss section.

Fixes PR41836

llvm-svn: 360473
2019-05-10 21:53:44 +00:00
Craig Topper 114f763f37 [LegalizeVectorOps] Remove calls to LegalizeOp on the return value from ExpandLoad/ExpandStore.
We already updated the LegalizedNodes map at the end of the Expand call. This
would have marked the new node as being mapped to itself. So the LegalizeOp
call will find that an immediately return.

llvm-svn: 360472
2019-05-10 21:42:27 +00:00
Mircea Trofin ff3bed0e61 Skip over prefetches
Summary: Skip over prefetches when assigning debug info to instructions with memory operands. This way, the debug info is stable after instrumenting a binary with prefetches, allowing for iterative profiling and instrumentation.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61789

llvm-svn: 360471
2019-05-10 21:27:55 +00:00
Nikita Popov 9f7537bd48 [SDAG] Recursively legalize both vector mulo results
Split out from D61692 per RKSimon's suggestion. Vector op
legalization will automatically recursively legalize the returned
SDValue, but we need to take care of the other results ourselves.
Otherwise it will end up getting legalized only during op
legalization, by which point it might be too late (though I'm not
aware of any specific cases right now).

There are codegen differences because expansion occurs earlier now
and we don't get a DAGCombiner run in between.

Differential Revision: https://reviews.llvm.org/D61744

llvm-svn: 360470
2019-05-10 20:42:48 +00:00
Teresa Johnson 37b80122bd [ThinLTO] Auto-hide prevailing linkonce_odr only when all copies eligible
Summary:
We hit undefined references building with ThinLTO when one source file
contained explicit instantiations of a template method (weak_odr) but
there were also implicit instantiations in another file (linkonce_odr),
and the latter was the prevailing copy. In this case the symbol was
marked hidden when the prevailing linkonce_odr copy was promoted to
weak_odr. It led to unsats when the resulting shared library was linked
with other code that contained a reference (expecting to be resolved due
to the explicit instantiation).

Add a CanAutoHide flag to the GV summary to allow the thin link to
identify when all copies are eligible for auto-hiding (because they were
all originally linkonce_odr global unnamed addr), and only do the
auto-hide in that case.

Most of the changes here are due to plumbing the new flag through the
bitcode and llvm assembly, and resulting test changes. I augmented the
existing auto-hide test to check for this situation.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, eraman, dexonsmith, arphaman, dang, llvm-commits, steven_wu, wmi

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59709

llvm-svn: 360466
2019-05-10 20:08:24 +00:00
Sanjay Patel b37ddeafc0 [DAGCombiner] reduce code duplication; NFC
llvm-svn: 360462
2019-05-10 20:02:30 +00:00
Cameron McInally e75412ab47 Add InstCombine::visitFNeg(...)
Differential Revision: https://reviews.llvm.org/D61784

llvm-svn: 360461
2019-05-10 20:01:04 +00:00
David Blaikie 7598b71488 DebugInfo: Only move types out of type units if they're named or type united
Follow up to r359122, after a bug was reported in it - the original
change too aggressively tried to move related types out of type units,
which included unnamed types (like array types) which can't reasonably
be declared-but-not-defined.

A step beyond that is that some types in type units can be anonymous, if
they are types with a name for linkage purposes (eg: "typedef struct { }
x;"). So ensure those don't get turned into plain declarations (without
signatures) because, lacking names, they can't be resolved to the
definition.

[Also include a fix for llvm-dwarfdump/libDebugInfoDWARF to pretty print
types in type units]

llvm-svn: 360458
2019-05-10 19:15:29 +00:00
Simon Pilgrim 6c3ae79e9b [SLP] Refactor VectorizableTree to use unique_ptr.
This patch fixes the TreeEntry dangling pointer issue caused by reallocations of VectorizableTree.

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D61706

llvm-svn: 360456
2019-05-10 18:55:17 +00:00
Amara Emerson b6af291772 [LSR] Tweak setup cost depth threshold to 10.
The original change introduced a depth limit of 7 which caused a 22% regression
in the Swift MapReduceLazyCollection & Ackermann benchmarks. This new threshold
still ensures that the original test case doesn't hang.

rdar://50359639

llvm-svn: 360444
2019-05-10 17:29:35 +00:00
Fangrui Song 9529c563eb [MC][ELF] Copy top 3 bits of st_other to .symver aliases
On PowerPC64 ELFv2 ABI, the top 3 bits of st_other encode the local
entry offset. A versioned symbol alias created by .symver should copy
the bits from the source symbol.

This partly fixes PR41048. A full fix needs tracking of .set assignments
and updating st_other fields when finish() is called, see D56586.

Patch by Alfredo Dal'Ava Júnior

Differential Revision: https://reviews.llvm.org/D59436

llvm-svn: 360442
2019-05-10 17:09:25 +00:00
Momchil Velikov c396f09ce9 Adjust MachineScheduler to use ProcResource counts
This fix allows the scheduler to take into account the number of instances of
each ProcResource specified. Previously a declaration in a scheduler of
ProcResource<1> would be treated identically to a declaration of
ProcResource<2>. Now the hazard recognizer would report a hazard only after all
of the resource instances are busy.

Patch by Jackson Woodruff and Momchil Velikov.

Differential Revision: https://reviews.llvm.org/D51160

llvm-svn: 360441
2019-05-10 16:54:32 +00:00
Robert Lougher 986b6b86bb [X86] Avoid SFB - Fix inconsistent codegen with/without debug info
Fixes https://bugs.llvm.org/show_bug.cgi?id=40969

The functions findPotentiallyBlockedCopies and buildCopy are currently not
accounting for the presence of debug instructions. In the former this results
in the optimization not being trigerred, and in the latter results in
inconsistent codegen.

This patch enables the optimization to be performed in a debug build and
ensures the codegen is consistent with non-debug builds.

Patch by Chris Dawson.

Differential Revision: https://reviews.llvm.org/D61680

llvm-svn: 360436
2019-05-10 15:55:06 +00:00
Simon Pilgrim a0b1518a4a [X86][SSE] Add getHopForBuildVector vector splitting
If we only use the lower xmm of a ymm hop, then extract the xmm's (for free), perform the xmm hop and then insert back into a ymm (for free).

Fixes some of the regressions noted in D61782

llvm-svn: 360435
2019-05-10 15:46:04 +00:00
Michael Liao b284414a1b [InferAddressSpaces] Enhance the handling of cosntexpr.
Summary:
- Constant expressions may not be added in strict postorder as the
  forward instruction scan order. Thus, for a constant express (CE0), if
  its operand (CE1) is used in an previous instruction, they are not in
  postorder. However, different from
  `cloneInstructionWithNewAddressSpace`,
  `cloneConstantExprWithNewAddressSpace` doesn't bookkeep uninferred
  instructions for later resolving. That results in failure of inferring
  constant address.
- This patch adds the support to infer constant expression operand
  recursively, since there won't be loop, if that operand is another
  constant expression.

Reviewers: arsenm

Subscribers: jholewinski, jvesely, wdng, nhaehnle, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61760

llvm-svn: 360431
2019-05-10 14:57:42 +00:00
Lei Huang 1ac6e9636c [PowerPC] custom lower `v2f64 fpext v2f32`
Reduces scalarization overhead via custom lowering of v2f64 fpext v2f32.

eg. For the following IR
  %0 = load <2 x float>, <2 x float>* %Ptr, align 8
  %1 = fpext <2 x float> %0 to <2 x double>
  ret <2 x double> %1

Pre custom lowering:
  ld r3, 0(r3)
  mtvsrd f0, r3
  xxswapd vs34, vs0
  xscvspdpn f0, vs0
  xxsldwi vs1, vs34, vs34, 3
  xscvspdpn f1, vs1
  xxmrghd vs34, vs0, vs1

After custom lowering:
  lfd f0, 0(r3)
  xxmrghw vs0, vs0, vs0
  xvcvspdp vs34, vs0

Differential Revision: https://reviews.llvm.org/D57857

llvm-svn: 360429
2019-05-10 14:04:06 +00:00
Tim Northover 6c1e3f9493 SelectionDAG: accommodate atomic floating stores.
We were applying a pointer truncation to floating types, which crashed LLVM.
That is Not A Good Thing(TM).

llvm-svn: 360421
2019-05-10 11:23:04 +00:00
Fangrui Song 93b6aa0751 [Object] Move ELF specific ObjectFile::getBuildAttributes to ELFObjectFileBase
Change the return type from std::error_code to Error and make the
function protected.

llvm-svn: 360416
2019-05-10 10:19:08 +00:00
Jeremy Morse a2b780b731 [DebugInfo] Use zero linenos for debug intrinsics when promoting dbg.declare
In certain circumstances, optimizations pick line numbers from debug
intrinsic instructions as the new location for altered instructions. This
is problematic because the line number of a debugging intrinsic is
meaningless (it doesn't produce any machine instruction), only the scope
information is valid. The result can be the line number of a variable
declaration "leaking" into real code from debugging intrinsics, making the
line table un-necessarily jumpy, and potentially different with / without
variable locations.

Fix this by using zero line numbers when promoting dbg.declare intrinsics
into dbg.values: this is safe for debug intrinsics as their line numbers
are meaningless, and reduces the scope for damage / misleading stepping
when optimizations pick locations from the wrong place.

Differential Revision: https://reviews.llvm.org/D59272

llvm-svn: 360415
2019-05-10 10:03:41 +00:00
Fangrui Song e357ca8231 [Object] Change SymbolicFile::printSymbolName to use Error
llvm-svn: 360414
2019-05-10 09:59:04 +00:00
Sam Clegg ea38ac5ba3 [WebAssembly] Don't assume that strongly defined symbols are DSO-local
The current PIC model for WebAssembly is more like ELF in that it
allows symbol interposition.

This means that more functions end up being addressed via the GOT
and fewer directly added to the wasm table.

One effect is a reduction in the number of wasm table entries similar
to the previous attempt in https://reviews.llvm.org/D61539 which was
reverted.

Differential Revision: https://reviews.llvm.org/D61772

llvm-svn: 360402
2019-05-10 01:52:08 +00:00
Sam Clegg 2147365484 [WebAssembly] Remove friend18.C from list of known gcc torture test failures. NFC.
Differential Revision: https://reviews.llvm.org/D61775

llvm-svn: 360401
2019-05-10 01:45:34 +00:00
Mircea Trofin 5c31c05fbd [llvm] X86DiscriminateMemOps: insert debug info when missing
Reviewers: davidxl

Reviewed By: davidxl

Subscribers: aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61735

llvm-svn: 360396
2019-05-10 00:12:51 +00:00
Stanislav Mekhanoshin 64196850f0 [AMDGPU] Pattern for v_xor3_b32
This also allows three op patterns to use increased constant bus
limit of GFX10.

Differential Revision: https://reviews.llvm.org/D61763

llvm-svn: 360395
2019-05-10 00:09:01 +00:00
Philip Reames bd588dfd59 [X86] Improve lowering of idemptotent RMW operations
The current lowering uses an mfence. mfences are substaintially higher latency than the locked operations originally requested, but we do want to avoid contention on the original cache line. As such, use a locked instruction on a cache line assumed to be thread local.

Differential Revision: https://reviews.llvm.org/D58632

llvm-svn: 360393
2019-05-09 23:23:42 +00:00
Lang Hames 112967833e [JITLink] Fixed a signedness bug when processing X86_64_RELOC_SUBTRACTOR.
Subtractor relocation addends are signed, so we need to read them via signed
int pointers. Accidentally treating 32-bit addends as unsigned leads to
out-of-range errors when we try to add very large (>INT32_MAX) bogus addends.

llvm-svn: 360392
2019-05-09 23:17:41 +00:00
Philip Reames 76ea748d2d Compile time tweak for libcall lookup
If we have a large module which is mostly intrinsics, we hammer the lib call lookup path from CodeGenPrepare.  Adding a fastpath reduces compile by 15% for one such example.

The problem is really more general than intrinsics - a module with lots of non-intrinsics non-libcall calls has the same problem - but we might as well avoid an easy case quickly.

llvm-svn: 360391
2019-05-09 23:13:09 +00:00
Lang Hames 5e332f1992 [ORC] Simplify logic for updating edges when should-discard atoms are pruned.
llvm-svn: 360384
2019-05-09 22:03:58 +00:00
Lang Hames dd61274f77 [JITLink] Improve/fix some JITLink debugging output.
Adds full edge details (rather than just edge targets) when out-of-range errors
are generated. Also fixes a bug where debugging output accessed an invalidated
DenseMap iterator by moving the debugging output above the invalidation point.

llvm-svn: 360383
2019-05-09 22:03:57 +00:00
Lang Hames 5fa4e9d990 [ORC] Fix a formatting bug.
llvm-svn: 360382
2019-05-09 22:03:53 +00:00
Bill Wendling 6ee7f31484 Add ".dword" directive
Summary:
The ".dword" directive is a synonym for ".xword" and is used used
by klibc, a minimalistic libc subset for initramfs.

Reviewers: t.p.northover, nickdesaulniers

Reviewed By: nickdesaulniers

Subscribers: nickdesaulniers, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61719

llvm-svn: 360381
2019-05-09 21:57:44 +00:00
David Blaikie 12faa0d44b DebugInfo/DWARF: Minor expression simplification
llvm-svn: 360377
2019-05-09 21:23:40 +00:00
Cameron McInally 156eb28289 [CodeGen] Add comment about FSUB <-> FNEG xforms
Differential Revision: https://reviews.llvm.org/D61741

llvm-svn: 360366
2019-05-09 19:28:52 +00:00
Stanislav Mekhanoshin a76da34b1d [AMDGPU] gfx1010 v_interp_* instructions
Differential Revision: https://reviews.llvm.org/D61703

llvm-svn: 360364
2019-05-09 18:38:55 +00:00
Simon Pilgrim 93bfa5af48 [X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920)
As reported on PR39920, "slow horizontal ops" targets tend to internally expand to 2*shuffle+add/sub - so if we can reduce 2*shuffle+add/sub to a hadd/sub then we should do it - similar port usage but reduced instruction count.

This works out in most cases, although the "PR22377" regression in vector-shuffle-combining.ll is annoying - going from 2*shuffle+add+shuffle to hadd+2*shuffle - I've opened PR41813 to cover this.

Differential Revision: https://reviews.llvm.org/D61308

llvm-svn: 360360
2019-05-09 17:45:01 +00:00
Florian Hahn be10bc71f9 [DAGCombiner] Limit number of nodes explored as store candidates.
To find the candidates to merge stores we iterate over all nodes in a chain
for each store, which leads to quadratic compile times for large basic blocks
with a large number of stores.

Reviewers: niravd, spatel, craig.topper

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D61511

llvm-svn: 360357
2019-05-09 17:05:52 +00:00
Stanislav Mekhanoshin 4d4c9e0757 [AMDGPU] gfx1010 changes for PAL metadata
Differential Revision: https://reviews.llvm.org/D61704

llvm-svn: 360353
2019-05-09 16:34:13 +00:00
Pavel Labath dcdb3c6650 MinidumpYAML: add support for the ThreadList stream
Summary:
The implementation is a pretty straightforward extension of the pattern
used for (de)serializing the ModuleList stream. Since there are other
streams which use the same format (MemoryList and MemoryList64, at
least). I tried to generalize the code a bit so that adding future
streams of this type can be done with less code.

Reviewers: amccarth, jhenderson, clayborg

Subscribers: markmentovai, lldb-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61423

llvm-svn: 360350
2019-05-09 15:13:53 +00:00
David Stuttard 411488b11e [CodeGenPrepare] Limit recursion depth for collectBitParts
Summary:
Seeing some issues for windows debug pathological cases with collectBitParts
recursion (1525 levels of recursion!)
Setting the limit to 64 as this should be sufficient - passes all lit cases

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61728

Change-Id: I7f44cdc6c1badf1c2ccbf1b0c4b6afe27ecb39a1
llvm-svn: 360347
2019-05-09 15:02:10 +00:00
Roman Lebedev 9db0e72570 [X86] AMD Piledriver (BdVer2): major cleanup (mainly inverse throughput)
I've started this cleanup more several times now, but got sidetracked
elsewhere, e.g. by llvm-exegesis problems. Not this time, finally!

This is mainly cleaning up the inverse throughput values,
and a few latencies/uops, based on the llvm-exegesis measured values.

Though this is not complete by any means,
there's certainly more cleanup to be done.

The performance numbers (i've only checked by RawSpeed benchmark) aren't
really surprising - overall this *slightly* (< -1%) improves perf.

llvm-svn: 360341
2019-05-09 13:54:51 +00:00
Sam Parker d7b650cc72 [ARM][CGP] Guard against signext args and sitofp
Add an Argument that has the SExtAttr attached, as well as SIToFP
instructions, as values that generate sign bits. SIToFP doesn't
strictly do this and could be treated as a sink to be sign-extended.

Differential Revision: https://reviews.llvm.org/D61381

llvm-svn: 360331
2019-05-09 11:56:16 +00:00
Simon Pilgrim 38ef296265 [CodeGenPrepare] Ensure we get a non-null result from getTrueOrFalseValue. NFCI.
llvm-svn: 360328
2019-05-09 10:51:26 +00:00
Sven van Haastregt ad9c7e0789 Fix LLVM_USE_PERF build after getPageSize change
Commit r360221 ("[Support] Add error handling to
sys::Process::getPageSize().", 2019-05-08) seems to have missed these
uses of getPageSize().  Update them to getPageSizeEstimate().

llvm-svn: 360322
2019-05-09 10:10:44 +00:00
Diana Picus 3531453371 [ARM GlobalISel] Map DBG_VALUE for types != s32
...and make sure we fail elegantly for unsupported values.

s64 goes into DPR, anything <= 32 into GPR.

llvm-svn: 360321
2019-05-09 09:49:36 +00:00
Hans Wennborg b1b09e5b55 X86WinAllocaExpander: Drop code looking through register copies (PR41786)
This code was never covered by tests, in PR41786 it was pointed out that
the deletion part doesn't work, and in a full Chrome build I was never
able to hit the code path that looks through copies. It seems the
situation it's supposed to handle doesn't actually come up in practice.

Delete it to simplify the code.

Differential revision: https://reviews.llvm.org/D61671

llvm-svn: 360320
2019-05-09 09:22:56 +00:00
Markus Lavin 92d5db524e Make sub-registers index names case sensitive in the MIRParser
Prior to this change sub-register index names are assumed to be lower
case (but they are printed with original casing). This means that if a
target has some upper case characters in its sub-register names then
mir-export directly followed by mir-import is not possible. This also
means that sub-register indices currently are (and will continue to be)
slightly inconsistent with register names which are printed and assumed
to be lower case.

As the current textual representation of mir has a few inconsistencies
in this area it is a bit arbitrary how to address the matter. This
change is towards the direction that we feel is most correct (i.e. case
sensitivity).

Differential Revision: https://reviews.llvm.org/D61499

llvm-svn: 360318
2019-05-09 08:29:04 +00:00
Pengfei Wang c05aad0532 Bugfix for nullptr check by klocwork
Klocwork static check:
Pointer from call to function `DebugLoc::operator DILocation *() const `
may be NULL and will be dereference in function `printExtendedName```
Patch by Shengchen Kan (skan)
Differential Revision: https://reviews.llvm.org/D61715

llvm-svn: 360317
2019-05-09 08:09:21 +00:00
Bjorn Pettersson 8d19e94f13 [CodeGen] Use "DL.getPointerSizeInBits" instead of "8 * DL.getPointerSize". NFC
llvm-svn: 360315
2019-05-09 08:07:36 +00:00
Petr Hosek 366cda03a8 [NewPM] Setup Passes for KASan and KMSan
While ASan and MSan passes were already ported to new PM, the kernel
variants weren't setup in the pipeline which makes the KASan and KMSan
tests in Clang fail.

Differential Revision: https://reviews.llvm.org/D61664

llvm-svn: 360313
2019-05-09 06:09:35 +00:00
Leonard Chan 95b7abdcc5 [SelectionDAG] Expand ADD/SUBCARRY
This patch allows for expansion of ADDCARRY and SUBCARRY when the target does not support it.

Differential Revision: https://reviews.llvm.org/D61411

llvm-svn: 360303
2019-05-09 01:17:48 +00:00
Eric Christopher c93f56d39e Temporarily Revert "[DebugInfo] Terminate more location-list ranges at the end of blocks"
as it was causing significant compile time regressions.

This reverts commit r359426 while we come up with testcases and additional ideas.

llvm-svn: 360301
2019-05-08 23:54:03 +00:00
Sanjay Patel 902b3ecdad [SelectionDAG] fold 'fneg undef' to undef
This is extracted from the original draft of D61419 with some additional tests.
We don't currently get this in IR (it's conservatively turned into a NaN),
but presumably that'll get updated as we add real IR support for 'fneg'
rather than 'fsub -0.0, x'.

The x86-32 run shows the following, and I haven't looked further to see why,
but that seems to be independent:
  Legalizing: t1: f32 = undef
  Trying to expand node
  Creating fp constant: t4: f32 = ConstantFP<0.000000e+00>

Differential Revision: https://reviews.llvm.org/D61516

llvm-svn: 360296
2019-05-08 22:19:52 +00:00
Matt Arsenault 462403a5c8 AMDGPU: Mark scheduler classes as final
llvm-svn: 360294
2019-05-08 22:10:04 +00:00
Matt Arsenault 01434f9377 AMDGPU: Select VOP3 form of add
The VOP3 form should always be the preferred selection, to be shrunk
later. This should only be an optimization issue, but this partially
works around a problem from clobbering VCC when SIFixSGPRCopies
rewrites an SCC defining operation directly to VCC.

3 of the testcases are regressions from failing to fold the immediate
in cases it should. These can be avoided by improving the VCC liveness
handling in SIFoldOperands. Simply increasing the threshold to
computeRegisterLiveness works, although this is common enough that VCC
liveness should probably be tracked throughout the pass. The hack of
leaving behind an implicit_def instruction to avoid breaking iterator
wastes instruction count, which inhibits finding the VCC def in long
chains of adds. Doing this however exposes different, worse looking
regressions from poor scheduling behavior. This could probably be
avoided around by forcing the shrink of the addc here, but the
scheduler should probably be fixed.

The r600 add test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.

llvm-svn: 360293
2019-05-08 22:09:57 +00:00
Thomas Preud'homme 4a8ef1128b [FileCheck] Fix code style of method comments
Summary:
Fix various issues in code style of method comments:
1) Move all heading comments to all non-static methods near their
declaration in the FileCheck.h header file.
2) Harmonize the action verb in doxygen comments for methods to always
be in third person
3) Use \returns instead of free text "return" and "returns".
4) Document a couple more parameters while at it.

Reviewers: jhenderson, probinson, arichardson

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61445

llvm-svn: 360288
2019-05-08 21:47:31 +00:00
Stanislav Mekhanoshin 1dbf721315 [AMDGPU] gfx1010 exp modifications
Differential Revision: https://reviews.llvm.org/D61701

llvm-svn: 360287
2019-05-08 21:23:37 +00:00
Craig Topper 51a17df45d [InstCombine] When turning sext into zext due to known bits, return the new ZExt instead of calling replaceinstuseswith
The worklist loop that we're returning back to should be able to do the repacement itself. This is how we normally do replacements.

My main motivation was that I observed that we weren't preserving the name of the result when we do this transform. The replacement code in the worklist loop will call takeName as part of the replacement.

Differential Revision: https://reviews.llvm.org/D61695

llvm-svn: 360284
2019-05-08 20:59:21 +00:00
Changpeng Fang 73b7272e7a AMDGPU: Fix a mis-placed bracket
Differential Revision:
  https://reviews.llvm.org/D61430

llvm-svn: 360283
2019-05-08 19:46:04 +00:00
Warren Ristow d27b0c6247 [SCEV] Suppress hoisting insertion point of binops when unsafe
InsertBinop tries to move insertion-points out of loops for expressions
that are loop-invariant. This patch adds a new parameter, IsSafeToHost,
to guard that hoisting. This allows callers to suppress that hoisting
for unsafe situations, such as divisions that may have a zero
denominator.

This fixes PR38697.

Differential Revision: https://reviews.llvm.org/D55232

llvm-svn: 360280
2019-05-08 18:50:07 +00:00
Quentin Colombet 157427245a [RegAllocFast] Scan physcial reg definitions before assigning virtual reg definitions
When assigning the definitions of an instruction we were updating
the available registers while walking the definitions. Some of
those definitions may be from physical registers and thus, they are
not available for other definitions to take, but by the time we see
that we may have already assign these registers to another
virtual register.

Fix that by walking through all the definitions and mark as unavailable
the physical register definitions, then do the virtual register assignments.

PR41790

llvm-svn: 360278
2019-05-08 18:30:26 +00:00
Alina Sbirlea 458c7339e1 [NewPassManager] Add tuning option: SLPVectorization [NFC].
Summary: Mirror tuning option from old pass manager in new pass manager.

Reviewers: chandlerc

Subscribers: mehdi_amini, jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61616

llvm-svn: 360276
2019-05-08 17:58:35 +00:00
Craig Topper 493aec3ef5 [FastISel][X86] Support FNeg instruction in target independent fast isel handling
This patch adds support for calling selectFNeg for FNeg instructions in addition to the fsub idiom

Differential Revision: https://reviews.llvm.org/D61624

llvm-svn: 360273
2019-05-08 17:27:08 +00:00
Alina Sbirlea f31eba6494 [MemorySSA] Teach LoopSimplify to preserve MemorySSA.
Summary:
Preserve MemorySSA in LoopSimplify, in the old pass manager, if the analysis is available.
Do not preserve it in the new pass manager.
Update tests.

Subscribers: nemanjai, jlebar, javed.absar, Prazek, kbarton, zzheng, jsji, llvm-commits, george.burgess.iv, chandlerc

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60833

llvm-svn: 360270
2019-05-08 17:05:36 +00:00
Simon Pilgrim e461e9a77d [AArch64] Remove scan-build "Value stored during its initialization is never read" warnings. NFCI.
llvm-svn: 360268
2019-05-08 16:29:39 +00:00
Simon Pilgrim 12521b2d43 [AArch64] Fix scan-build null/uninitialized pointer warnings. NFCI.
llvm-svn: 360267
2019-05-08 16:27:24 +00:00
Simon Pilgrim e3eec06dde [AMDGPU] Reapplied BFE canonicalization from D60462
This was committed in rL358887 but reverted in rL360066 due to a x86 regression, really it should be have been pre-committed instead of being part of the SimplifyDemandedBits bitcast patch.

llvm-svn: 360263
2019-05-08 15:49:10 +00:00
David Greene 6c433713e9 [Reassociation] Place moved instructions after landing pads
Reassociation's NegateValue moved instructions to the beginning of
blocks (after PHIs) without checking for exception handling pads.
It's possible for reassociation to move something into an exception
handling block so we need to make sure we don't move things too early
in the block.  This change advances the insertion point past any
exception handling pads.

If the block we want to move into contains a catchswitch, we cannot
move into it.  In that case just create a new neg as if we had not
found an existing neg to move.

Differential Revision: https://reviews.llvm.org/D61089

llvm-svn: 360262
2019-05-08 15:44:24 +00:00
Nikita Popov 9fd02a71a3 Revert "[ValueTracking] Improve isKnowNonZero for Ints"
This reverts commit 3b137a4956.

As reported in https://reviews.llvm.org/D60846, this is causing
miscompiles.

llvm-svn: 360260
2019-05-08 14:50:01 +00:00
Simon Pilgrim 2788ad3ee2 [LegalizeDAG] Assert non-power-of-2 load/store op splits are in range. NFCI.
Fixes static analyzer undefined/out-of-range shift warnings.

llvm-svn: 360245
2019-05-08 11:22:10 +00:00
Simon Pilgrim ec58090491 [Hexagon] Fix cppcheck reduce variable scope warnings. NFCI.
Also fixes a static analyzer "Value stored to 'S2' during its initialization is never read" warning.

llvm-svn: 360244
2019-05-08 11:02:46 +00:00
Tim Northover 18adcf331b ARM: disallow SP as Rn for Thumb2 TST & TEQ instructions
Using SP in this position is unpredictable in ARMv7. CMP and CMN are not
affected, and of course v8 relaxes this requirement, but that's handled
elsewhere.

llvm-svn: 360242
2019-05-08 10:59:08 +00:00
Simon Pilgrim cced3ecc35 [VPlan] Fix "value never used" static analyzer warning. NFCI.
llvm-svn: 360241
2019-05-08 10:52:26 +00:00
Simon Pilgrim 02937dad69 R600InstrInfo.cpp - Add getTransSwizzle assert for the swizzle op index. NFCI.
Fixes static analyzer undefined value warning.

llvm-svn: 360239
2019-05-08 10:39:56 +00:00
Andrea Di Biagio 69b8b17945 [MCA] Remove dead assignment. NFC
llvm-svn: 360237
2019-05-08 10:28:56 +00:00
Simon Pilgrim be9ade93d1 [SIMode] Fix typo in Status constructor
As noted in https://www.viva64.com/en/b/0629/ (Snippet No. 36) and the scan-build CI reports (https://llvm.org/reports/scan-build/report-SIModeRegister.cpp-Status-1-1.html#EndPath), rL348754 introduced a typo in the Status constructor due to argument variable names shadowing the member variable names.

Differential Revision: https://reviews.llvm.org/D61595

llvm-svn: 360236
2019-05-08 10:24:22 +00:00
Simon Pilgrim 2a09a6cfe2 [DebugInfo] Fix use-after-move warning. NFCI.
Don't rely on DWARFAbbreviationDeclarationSet::extract cleaning the struct up for reuse - the analyzers don't like it.

llvm-svn: 360235
2019-05-08 10:09:57 +00:00
Simon Pilgrim 97a0c54179 Fix cppcheck operator precedence warning. NFCI.
llvm-svn: 360234
2019-05-08 10:07:34 +00:00
Florian Hahn 3c696b3e7c [SCCP] Fix crash when trying to constant-fold terminators multiple times.
If we fold a branch/switch to an unconditional branch to another dead block we
replace the branch with unreachable, to avoid attempting to fold the
unconditional branch.

Reviewers: davide, efriedma, mssimpso, jdoerfert

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D61300

llvm-svn: 360232
2019-05-08 09:09:54 +00:00
QingShan Zhang 0e71a6e755 [CodeGenPrepare] Don't split the store if it is volatile
We shouldn't split the store when it is volatile.

Differential Revision: https://reviews.llvm.org/D61169

llvm-svn: 360228
2019-05-08 07:32:12 +00:00
QingShan Zhang e065af6a42 [NFC] Add a static function to do the endian check
Add a new function to do the endian check, as I will commit another patch later, which will also need the endian check. 

Differential Revision: https://reviews.llvm.org/D61236

llvm-svn: 360226
2019-05-08 07:21:37 +00:00
Mircea Trofin 0a753938db [llvm] Avoid div by 0 when updating profile weights.
Reviewers: davidxl

Reviewed By: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61661

llvm-svn: 360223
2019-05-08 03:57:25 +00:00
Dan Robertson 3b137a4956 [ValueTracking] Improve isKnowNonZero for Ints
Improve isKnownNonZero for integers in order to improve cttz
optimizations.

Differential Revision: https://reviews.llvm.org/D60846

llvm-svn: 360222
2019-05-08 02:25:08 +00:00
Lang Hames e4b4ab6d26 [Support] Add error handling to sys::Process::getPageSize().
This patch changes the return type of sys::Process::getPageSize to
Expected<unsigned> to account for the fact that the underlying syscalls used to
obtain the page size may fail (see below).

For clients who use the page size as an optimization only this patch adds a new
method, getPageSizeEstimate, which calls through to getPageSize but discards
any error returned and substitues a "reasonable" page size estimate estimate
instead. All existing LLVM clients are updated to call getPageSizeEstimate
rather than getPageSize.

On Unix, sys::Process::getPageSize is implemented in terms of getpagesize or
sysconf, depending on which macros are set. The sysconf call is documented to
return -1 on failure. On Darwin getpagesize is implemented in terms of sysconf
and may also fail (though the manpage documentation does not mention this).
These failures have been observed in practice when highly restrictive sandbox
permissions have been applied. Without this patch, the result is that
getPageSize returns -1, which wreaks havoc on any subsequent code that was
assuming a sane page size value.

<rdar://problem/41654857>

Reviewers: dblaikie, echristo

Subscribers: kristina, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59107

llvm-svn: 360221
2019-05-08 02:11:07 +00:00
Reid Kleckner 6bf108d77a [COFF] Use COFF stubs for extern_weak functions
Summary:
A COFF stub indirects the reference to a symbol through memory. A
.refptr.$sym global variable pointer is created to refer to $sym.
Typically mingw uses these for external global variable declarations,
but we can use them for weak function declarations as well.

Updates the dso_local classification to add a special case for
extern_weak symbols on COFF in both clang and LLVM.

Fixes PR37598

Reviewers: smeenai, mstorsjo

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D61615

llvm-svn: 360207
2019-05-07 23:06:21 +00:00
Sanjay Patel e088d03b9c [ValueTracking] add logic for known-never-nan with minnum/maxnum
From the LangRef: "Returns NaN only if both operands are NaN."

llvm-svn: 360206
2019-05-07 22:58:31 +00:00
Lang Hames 0d8ae1e343 Reapply r360194 "[JITLink] Add support for MachO .alt_entry atoms." with fixes.
This patch modifies MachOAtomGraphBuilder to use setLayoutNext rather than
addEdge, and fixes a bug in the section layout algorithm that could result in
atoms appearing more than once in the section ordering (which resulted in those
atoms being assigned invalid addresses during layout).

llvm-svn: 360205
2019-05-07 22:56:40 +00:00
Lang Hames 1a10101e21 Revert r360194 "[JITLink] Add support for MachO .alt_entry atoms."
The testcase is asserting on some bots - reverting while I investigate.

llvm-svn: 360200
2019-05-07 22:19:29 +00:00
Austin Kerbow 8a3d3a9af6 [AMDGPU] Check MI bundles for hazards
Summary: GCNHazardRecognizer fails to identify hazards that are in and around bundles. This patch allows the hazard recognizer to consider bundled instructions in both scheduler and hazard recognizer mode. We ignore “bundledness” for the purpose of detecting hazards and examine the instructions individually.

Reviewers: arsenm, msearles, rampitec

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61564

llvm-svn: 360199
2019-05-07 22:12:15 +00:00
Austin Kerbow 6e6480e216 [CodeGen] Rename DEBUG_TYPE for default hazard recognizer.
Summary:
The DEBUG_TYPE of the default hazard recognizer should be updated to
match the DEBUG_TYPE of the machine-scheduler pass.

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61359

llvm-svn: 360198
2019-05-07 22:09:04 +00:00
Lang Hames 2b09b25e48 [JITLink] Add support for MachO .alt_entry atoms.
The MachO .alt_entry directive is applied to a symbol to indicate that it is
locked (in terms of address layout and liveness) to its predecessor atom. I.e.
it is an alternate entry point, at a fixed offset, for the previous atom.

This patch updates MachOAtomGraphBuilder to check for the .alt_entry flag on
symbols and add a corresponding LayoutNext edge to the atom-graph. It also
updates MachOAtomGraphBuilder_x86_64 to generalize handling of the
X86_64_RELOC_SUBTRACTOR relocation: previously either the minuend or
subtrahend of the subtraction had to be the same as the atom being fixed up,
now it is only necessary for the minuend or subtrahend to be locked (via any
chain of alt_entry directives) to the atom being fixed up.

llvm-svn: 360194
2019-05-07 21:35:14 +00:00
Kostya Serebryany b9c5768302 revert r360162 as it breaks most of the buildbots
llvm-svn: 360190
2019-05-07 20:57:11 +00:00
Nikita Popov f610110f1a [ConstantRange] Simplify makeGNWR implementation; NFC
Compute results in more direct ways, avoid subset intersect
operations. Extract the core code for computing mul nowrap ranges
into separate static functions, so they can be reused.

llvm-svn: 360189
2019-05-07 20:34:46 +00:00
Robert Lougher 8681ef8f41 [InstCombine] Add new combine to add folding
(X | C1) + C2 --> (X | C1) ^ C1 iff (C1 == -C2)

I verified the correctness using Alive:
https://rise4fun.com/Alive/YNV

This transform enables the following transform that already exists in
instcombine:

(X | Y) ^ Y --> X & ~Y

As a result, the full expected transform is:

(X | C1) + C2 --> X & ~C1 iff (C1 == -C2)

There already exists the transform in the sub case:

(X | Y) - Y --> X & ~Y

However this does not trigger in the case where Y is constant due to an earlier
transform:

X - (-C) --> X + C

With this new add fold, both the add and sub constant cases are handled.

Patch by Chris Dawson.

Differential Revision: https://reviews.llvm.org/D61517

llvm-svn: 360185
2019-05-07 19:36:41 +00:00
Eric Christopher 4727221734 Make sure that the DAG combiner doesn't merge stores that we explicitly
asked not be greater than preferred vector width for the vectorizer.
Test for both 128 and 256 with a skylake architecture.

llvm-svn: 360183
2019-05-07 19:25:34 +00:00
Sanjay Patel 6a281a7545 [InstCombine] allow sinking fneg operands through an FP min/max
Fundamentally/generally, we should not have to rely on bailouts/crippling of
folds. In this particular case, I think we always recognize the inverted
predicate min/max pattern, so there should not be any loss of optimization.
Codegen looks better because we are eliminating an fneg.

llvm-svn: 360180
2019-05-07 18:58:07 +00:00
Don Hinton 102ec0977d [CommandLine] Allow Options to specify multiple OptionCategory's.
Summary:
It's not uncommon for separate components to share common
Options, e.g., it's common for related Passes to share Options in
addition to the Pass specific ones.

With this change, components can use OptionCategory's to simply help
output even if some of the options are shared.

Reviewed By: MaskRay

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61574

llvm-svn: 360179
2019-05-07 18:57:01 +00:00
Adrian Prantl e6e8db5e9b Debug Info: Support address space attributes on rvalue references.
DWARF5, 2.12 20ff says that

Any debugging information entry representing a pointer or reference
type [may have a DW_AT_address_class attribute].

The existing code (https://reviews.llvm.org/D29670) seems to take a
quite literal interpretation of that wording. I don't see a reason why
an rvalue reference isn't a reference type in the spirit of that
paragraph. This patch allows rvalue references to also have address
spaces.

rdar://problem/50511483

Differential Revision: https://reviews.llvm.org/D61625

llvm-svn: 360176
2019-05-07 17:42:38 +00:00
Adrian Prantl ccdefb24ad Guard __builtin_available() with __has_builtin to support older host compilers.
llvm-svn: 360174
2019-05-07 17:10:27 +00:00
Florian Hahn a9d6c32eaf [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
When simplifying TokenFactors, we potentially iterate over all
operands of a large number of TokenFactors. This causes quadratic
compile times in some cases and the large token factors cause additional
scalability problems elsewhere.

This patch adds some limits to the number of nodes explored for the
cases mentioned above.

Reviewers: niravd, spatel, craig.topper

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D61397

llvm-svn: 360171
2019-05-07 16:47:27 +00:00
Simon Pilgrim 3044ac058b Avoid use-after-move warnings by using swap instead. NFCI.
Swap should be as quick in these cases, and leaves the original variables in a known (empty) state.

llvm-svn: 360164
2019-05-07 15:45:00 +00:00
Orlando Cazalet-Hyams 78a6062c24 [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024

The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:

A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.

In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.

Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel

Reviewed By: hfinkel

Subscribers: bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits

Tags: #llvm, #debug-info

Differential Revision: https://reviews.llvm.org/D60831

llvm-svn: 360162
2019-05-07 15:37:38 +00:00
Keno Fischer a1a4adf4b9 [SCEV] Add explicit representations of umin/smin
Summary:
Currently we express umin as `~umax(~x, ~y)`. However, this becomes
a problem for operands in non-integral pointer spaces, because `~x`
is not something we can compute for `x` non-integral. However, since
comparisons are generally still allowed, we are actually able to
express `umin(x, y)` directly as long as we don't try to express is
as a umax. Support this by adding an explicit umin/smin representation
to SCEV. We do this by factoring the existing getUMax/getSMax functions
into a new function that does all four. The previous two functions were
largely identical.

Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D50167

llvm-svn: 360159
2019-05-07 15:28:47 +00:00
Simon Pilgrim debb2b2a1e Fix local shadow variable warning. NFCI.
llvm-svn: 360157
2019-05-07 14:56:34 +00:00
Nemanja Ivanovic b4f028f0f3 [PowerPC] Use the two-constant NR algorithm for refining estimates
The single-constant algorithm produces infinities on a lot of denormal values.
The precision of the two-constant algorithm is actually sufficient across the
range of denormals. We will switch to that algorithm for now to avoid the
infinities on denormals. In the future, we will re-evaluate the algorithm to
find the optimal one for PowerPC.

Differential revision: https://reviews.llvm.org/D60037

llvm-svn: 360144
2019-05-07 13:48:03 +00:00
George Rimar 0974688a42 [yaml2obj] - Allow setting st_value explicitly for Symbol.
In some cases it is useful to explicitly set symbol's st_name value.
For example, I am using it in a patch for LLD to remove the broken
binary from a test case and replace it with a YAML test.

Differential revision: https://reviews.llvm.org/D61180

llvm-svn: 360137
2019-05-07 12:10:51 +00:00
Diana Picus 0a47fb8884 [ARM GlobalISel] Widen G_SELECT operands
...except for the condition operand.

llvm-svn: 360135
2019-05-07 11:39:30 +00:00
Simon Pilgrim b0f51266b8 [X86][AVX] Fold concat(packus(),packus()) -> packus(concat(),concat()) (PR34773)
Basic "revectorization" combine, we can probably do more opcodes here but it can be a tricky cost-benefit depending on where the subvectors came from - but this case helps shuffle combining.

llvm-svn: 360134
2019-05-07 11:17:39 +00:00
Simon Pilgrim a80abeea88 Fixed "Value stored to 'Opc' is never read" warning. NFCI.
llvm-svn: 360133
2019-05-07 11:09:16 +00:00
Simon Pilgrim 3c975a0ab5 [X86] Reduce scope of variables where possible. NFCI.
Fixes cppcheck warnings.

llvm-svn: 360131
2019-05-07 10:50:11 +00:00
Diana Picus d6d3808fa4 [ARM GlobalISel] Widen G_INTTOPTR/G_PTRTOINT
We actually have a couple of G_PTRTOINT to s8 when building clang, so
we should do something about them.

llvm-svn: 360130
2019-05-07 10:48:01 +00:00
Simon Pilgrim c5ac14eef8 Fix uninitialized variable warning. NFCI.
This also fixes a scan-build "array subscript is undefined" warning.

llvm-svn: 360128
2019-05-07 10:30:22 +00:00
Diana Picus d18bac5d19 [ARM GlobalISel] Widen G_GEP index operand
llvm-svn: 360127
2019-05-07 10:11:57 +00:00
Orlando Cazalet-Hyams 0d05177337 Test commit access
llvm-svn: 360125
2019-05-07 09:30:55 +00:00
Nicolai Haehnle 79ea85c6af AMDGPU: Verify that SOP2/SOPC instructions have at most one immediate operand
Summary:
No test case because I don't know of a way to trigger this, but I
accidentally caused this to fail while working on a different change.

Change-Id: I8015aa447fe27163cc4e4902205a203bd44bf7e3

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61490

llvm-svn: 360123
2019-05-07 09:19:09 +00:00
Craig Topper c6d445f9c1 [FastISel][X86] If selectFNeg fails, fall back to SelectionDAG not treating it as an fsub.
Summary:
If fneg lowering for fsub -0.0, x fails we currently fall back to treating it as an fsub. This has different behavior for nans than the xor with sign bit trick we normally try to do. On X86, the xor trick for double fails fast-isel in 32-bit mode with sse2 due to 64 bit integer types not being available. With -O2 we would always use an xorpd for this case. If we use subsd, this creates an observable behavior difference between -O0 and -O2. So fall back to SelectionDAG if we can't fast-isel it, that way SelectionDAG will use the xorpd.

I believe this patch is restoring the behavior prior to r345295 from last October. This was missed then because our fast isel case in 32-bit mode aborted fast-isel earlier for another reason. But I've added new tests to cover that.

Reviewers: andrew.w.kaylor, cameron.mcinally, spatel, efriedma

Reviewed By: cameron.mcinally

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61622

llvm-svn: 360111
2019-05-07 04:25:24 +00:00
Sam Clegg 5f8c2edef3 [WebAssembly] Add more test coverage for reloctions against section symbols
The only known user of this relocation type and symbol type is
the debug info sections, but we were not testing the `--relocatable`
output path.

This change adds a minimal test case to cover relocations against
section symbols includes `--relocatable` output.

Differential Revision: https://reviews.llvm.org/D61623

llvm-svn: 360110
2019-05-07 03:53:16 +00:00
Fangrui Song da82ce99b7 [DebugInfo] Delete TypedDINodeRef
TypedDINodeRef<T> is a redundant wrapper of Metadata * that is actually a T *.

Accordingly, change DI{Node,Scope,Type}Ref uses to DI{Node,Scope,Type} * or their const variants.
This allows us to delete many resolve() calls that clutter the code.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D61369

llvm-svn: 360108
2019-05-07 02:06:37 +00:00
Fangrui Song a400ca3f3d [SanitizerCoverage] Use different module ctor names for trace-pc-guard and inline-8bit-counters
Fixes the main issue in PR41693

When both modes are used, two functions are created:
`sancov.module_ctor`, `sancov.module_ctor.$LastUnique`, where
$LastUnique is the current LastUnique counter that may be different in
another module.

`sancov.module_ctor.$LastUnique` belongs to the comdat group of the same
name (due to the non-null third field of the ctor in llvm.global_ctors).

    COMDAT group section [    9] `.group' [sancov.module_ctor] contains 6 sections:
       [Index]    Name
       [   10]   .text.sancov.module_ctor
       [   11]   .rela.text.sancov.module_ctor
       [   12]   .text.sancov.module_ctor.6
       [   13]   .rela.text.sancov.module_ctor.6
       [   23]   .init_array.2
       [   24]   .rela.init_array.2

    # 2 problems:
    # 1) If sancov.module_ctor in this module is discarded, this group
    # has a relocation to a discarded section. ld.bfd and gold will
    # error. (Another issue: it is silently accepted by lld)
    # 2) The comdat group has an unstable name that may be different in
    # another translation unit. Even if the linker allows the dangling relocation
    # (with --noinhibit-exec), there will be many undesired .init_array entries
    COMDAT group section [   25] `.group' [sancov.module_ctor.6] contains 2 sections:
       [Index]    Name
       [   26]   .init_array.2
       [   27]   .rela.init_array.2

By using different module ctor names, the associated comdat group names
will also be different and thus stable across modules.

Reviewed By: morehouse, phosek

Differential Revision: https://reviews.llvm.org/D61510

llvm-svn: 360107
2019-05-07 01:39:37 +00:00
Craig Topper a75630302d [X86] Use extended vector register classes in getRegForInlineAsmConstraint to support x/y/zmm16-31 when the type is mismatched.
The FR32/FR64/VR128/VR256 register classes don't contain the upper 16 registers. For most cases we use the default implementation which will find any register class that contains the register in question if the VT is legal for the register class. But if the VT is i32 or i64, we won't find a matching register class and will instead up in the code modified in this patch.

If the requested register is x/y/zmm16-31 we weren't returning a register class that contains those registers and will hit an assertion in the caller.

To fix this, I've changed to use the extended register class instead. I don't believe we need a subtarget check to see if avx512 is enabled. The default implementation just pick whatever register class it finds first. I checked and we currently pick FR32X for XMM0 with an f32 type using the default implementation regardless of whether avx512 is enabled. So I assume its it is ok to do the same for i32.

Differential Revision: https://reviews.llvm.org/D61457

llvm-svn: 360102
2019-05-06 23:57:42 +00:00
Amy Huang 987b969bab Fix bug in getCompleteTypeIndex in codeview debug info
Summary:
When there are multiple instances of a forward decl record type, only the first one is emitted with a type index, because
the type is added to a map with a null type index. Avoid this by reordering so that forward decl types aren't added to the map.

Reviewers: rnk

Subscribers: aprantl, hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61460

llvm-svn: 360101
2019-05-06 23:37:03 +00:00
Eli Friedman 2ea088173d [ARM] Glue register copies to tail calls.
This generally follows what other targets do. I don't completely
understand why the special case for tail calls existed in the first
place; even when the code was committed in r105413, call lowering didn't
work in the way described in the comments.

Stack protector lowering breaks if the register copies are not glued to
a tail call: we have to insert the stack protector check before the tail
call, and we choose the location based on the assumption that all
physical register dependencies of a tail call are adjacent to the tail
call. (See FindSplitPointForStackProtector.) This is sort of fragile,
but I don't see any reason to break that assumption.

I'm guessing nobody has seen this before just because it's hard to
convince the scheduler to actually schedule the code in a way that
breaks; even without the glue, the only computation that could actually
be scheduled after the register copies is the computation of the call
address, and the scheduler usually prefers to schedule that before the
copies anyway.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41417

Differential Revision: https://reviews.llvm.org/D60427

llvm-svn: 360099
2019-05-06 23:21:59 +00:00
Craig Topper 39f1a97417 [FastISel] Pass the fneg input operand to hasTrivialKill in FastISel::selectFNeg.
We're trying to calculate the kill flag for OpReg which is the input so we need to pass the input here.

llvm-svn: 360097
2019-05-06 23:09:09 +00:00
Stanislav Mekhanoshin 491746a584 [AMDGPU] gfx1010 verifier changes
Differential Revision: https://reviews.llvm.org/D61521

llvm-svn: 360095
2019-05-06 22:49:45 +00:00
Stanislav Mekhanoshin 971cb8b633 [AMDGPU] gfx1010: prefer V_MUL_LO_U32 over V_MUL_LO_I32
GFX10 deprecates v_mul_lo_i32 instruction, so choose u32 form for
all targets.

Differential Revision: https://reviews.llvm.org/D61525

llvm-svn: 360094
2019-05-06 22:27:05 +00:00
Philip Reames 2f53d79bff Fix pr33010, a 2 year old crashing regression
The problem was that we were creating a CMOV64rr <TargetFrameIndex>, <TargetFrameIndex>.  The entire point of a TFI is that address code is not generated, so there's no way to legalize/lower this.  Instead, simply prevent it's creation.

Arguably, we shouldn't be using *Target*FrameIndices in StatepointLowering at all, but that's a much deeper change.  

llvm-svn: 360090
2019-05-06 22:09:31 +00:00
Stanislav Mekhanoshin 1bc001dec4 [AMDGPU] gfx1010 memory legalizer
Differential Revision: https://reviews.llvm.org/D61535

llvm-svn: 360087
2019-05-06 21:57:02 +00:00
Jordan Rupprecht 8f14e7cacf Revert "Re-commit r357452: SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)"
This reverts r357452 (git commit 21eb771dcb).

This was causing strange optimization-related test failures on an internal test. Will followup with more details offline.

llvm-svn: 360086
2019-05-06 21:55:05 +00:00
Craig Topper d10a200ceb [X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly printing.
We require d/q suffixes on the memory form of these instructions to disambiguate the memory size.
We don't require it on the register forms, but need to support parsing both with and without it.

Previously we always printed the d/q suffix on the register forms, but it's redundant and
inconsistent with gcc and objdump.

After this patch we should support the d/q for parsing, but not print it when its unneeded.

llvm-svn: 360085
2019-05-06 21:39:51 +00:00
Martin Storsjo 899f3cd581 [AArch64] Default to SEH exception handling on MinGW
The SEH implementation is pretty mature at this point.

Differential Revision: https://reviews.llvm.org/D61590

llvm-svn: 360080
2019-05-06 21:18:15 +00:00
Sanjay Patel a6019d5164 [InstCombine] sink FP negation of operands through select
We don't always get this:

Cond ? -X : -Y --> -(Cond ? X : Y)

...even with the legacy IR form of fneg in the case with extra uses,
and we miss matching with the newer 'fneg' instruction because we
are expecting binops through the rest of the path.

Differential Revision: https://reviews.llvm.org/D61604

llvm-svn: 360075
2019-05-06 20:34:05 +00:00
Simon Pilgrim 364ef5db2b Pull out repeated CI->getCalledFunction() calls. NFCI.
llvm-svn: 360070
2019-05-06 19:51:54 +00:00
Craig Topper ad56843dd7 [SelectionDAG][X86] Support inline assembly returning an mmx register into a type with fewer than 64 bits.
It's possible to use the 'y' mmx constraint with a type narrower than 64-bits.

This patch supports this by bitcasting the mmx type to 64-bits and then
truncating to the desired type.

There are probably other missing type combinations we need to support, but this
is the case we have a bug report for.

Fixes PR41748.

Differential Revision: https://reviews.llvm.org/D61582

llvm-svn: 360069
2019-05-06 19:50:14 +00:00
Amara Emerson 3d1128cc9e [GlobalISel] Handle <1 x T> vector return types properly.
After support for dealing with types that need to be extended in some way was
added in r358032 we didn't correctly handle <1 x T> return types. These types
don't have a GISel direct representation, instead we just see them as scalars.
When we need to pad them into <2 x T> types however we need to use a
G_BUILD_VECTOR instead of trying to do a G_CONCAT_VECTOR.

This fixes PR41738.

llvm-svn: 360068
2019-05-06 19:41:01 +00:00
Craig Topper 55a71b575c Revert r359392 and r358887
Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead"
Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"

Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list.

Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead
moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer
contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with
respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work
there. I'll file a separate PR for that and add test cases.

Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised
if that bug can still be hit independent of that.

This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again.

llvm-svn: 360066
2019-05-06 19:29:24 +00:00
Sanjay Patel a64bd09ec4 [InstCombine] reduce code duplication; NFC
llvm-svn: 360059
2019-05-06 17:39:18 +00:00
Nikita Popov d5a403fb80 [ConstantRange] Add srem() support
Add support for srem() to ConstantRange so we can use it in LVI. For
srem the sign of the result matches the sign of the LHS. For the RHS
only the absolute value is important. Apart from that the logic is
like urem.

Just like for urem this is only an approximate implementation. The tests
check a few specific cases and run an exhaustive test for conservative
correctness (but not exactness).

Differential Revision: https://reviews.llvm.org/D61207

llvm-svn: 360055
2019-05-06 16:59:37 +00:00
Nikita Popov cfe786a195 [SDAG][AArch64] Boolean and/or reduce to umax/min reduce (PR41635)
This addresses one half of https://bugs.llvm.org/show_bug.cgi?id=41635
by combining a VECREDUCE_AND/OR into VECREDUCE_UMIN/UMAX (if latter is
legal but former is not) for zero-or-all-ones boolean reductions (which
are detected based on sign bits).

Differential Revision: https://reviews.llvm.org/D61398

llvm-svn: 360054
2019-05-06 16:17:17 +00:00
Cameron McInally c3167696bc Add FNeg support to InstructionSimplify
Differential Revision: https://reviews.llvm.org/D61573

llvm-svn: 360053
2019-05-06 16:05:10 +00:00
Sanjay Patel 62f457b137 [InstCombine] reduce code duplication; NFCI
llvm-svn: 360051
2019-05-06 15:35:02 +00:00
Guillaume Chatelet edd69fca3e Modernize repmovsb implementation of x86 memcpy and allow runtime sizes.
Summary: This is a prerequisite to RFC http://lists.llvm.org/pipermail/llvm-dev/2019-April/131973.html

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61593

Fix typo.

Turn this patch into an NFC.

Addressing comments

llvm-svn: 360050
2019-05-06 15:10:19 +00:00
Simon Pilgrim 2a0ef0530b [X86] Fix uninitialized members in constructor warnings. NFCI.
Initialize all member variables in X86ATTInstPrinter and X86DAGToDAGISel constructors to fix cppcheck warning.

llvm-svn: 360047
2019-05-06 14:48:02 +00:00
Alexandre Ganea 799d96ec39 Fix compilation warnings when compiling with GCC 7.3
Differential Revision: https://reviews.llvm.org/D61046

llvm-svn: 360044
2019-05-06 13:41:54 +00:00
Nemanja Ivanovic 70afe4f7e1 [PowerPC] Fix erroneous condition for converting uint-to-fp vector conversion
A condition for exiting the legalization of v4i32 conversion to v2f64 through
extract/convert/build erroneously checks for the extract having type i32.
This is not adequate as smaller extracts are actually legalized to i32 as well.
Furthermore, an early exit is missing which means that we only check that
both extracts are from the same vector if that check fails.
As a result, both cases in the included test case fail - the first gets a
select error and the second generates incorrect code.

The culprit commit is r274535.

llvm-svn: 360043
2019-05-06 13:35:49 +00:00
Simon Pilgrim d672d0e246 X86DAGToDAGISel::tryVPTESTM - fix uninitialized variable warning. NFCI.
findBroadcastedOp should always initialize the value if it returns true but static-analyzer isn't great at recognising this.

llvm-svn: 360037
2019-05-06 11:52:16 +00:00
Simon Pilgrim 97fbc2abfe [LoadStoreVectorizer] vectorizeStoreChain - ensure we find a store type.
Properly initialize store type to null then ensure we find a real store type in the chain.

Fixes scan-build null dereference warning and makes the code clearer.

llvm-svn: 360031
2019-05-06 10:25:11 +00:00
Simon Pilgrim 04dad8f66d [X86] X86InstrInfo::findThreeSrcCommutedOpIndices - fix unread variable warning.
scan-build was reporting that CommutableOpIdx1 never used its original initialized value - move it down to where its first used to make the real initialization more obvious (and matches the comment that's there).

llvm-svn: 360028
2019-05-06 10:15:34 +00:00
Simon Pilgrim 07d91cd98a [X86] lowerVectorShuffle - use any_of to detect out of bounds shuffle indices. NFCI.
Fixes cppcheck local shadow warning as well.

llvm-svn: 360027
2019-05-06 10:11:24 +00:00
Clement Courbet 9e1f2a7fe7 [SimplifyLibCalls] Simplify bcmp too.
Summary: Fixes PR40699.

Reviewers: gchatelet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61585

llvm-svn: 360021
2019-05-06 09:15:22 +00:00
Luo, Yuanke beec41c656 Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake
Summary:
1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS  instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.
VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data.
VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data.
VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision.
For more details about BF16 isa, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference

Author: LiuTianle

Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, RKSimon, spatel

Reviewed By: craig.topper

Subscribers: kristina, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60550

llvm-svn: 360017
2019-05-06 08:22:37 +00:00
Fangrui Song 7e55672b22 DWARF v5: fix directory index in the line table
Summary:
Prior to DWARF v5, a directory index of 0 represents DW_AT_comp_dir.

In DWARF v5, the index starts with 0 and Entry.DirIdx is the index into
Prologue.IncludeDirectories.

Reviewed By: labath

Differential Revision: https://reviews.llvm.org/D61253

llvm-svn: 360015
2019-05-06 08:03:46 +00:00
Markus Lavin a778074165 [DebugInfo] GlobalOpt DW_OP_deref_size instead of DW_OP_deref.
Optimization pass lib/Transforms/IPO/GlobalOpt.cpp needs to insert
DW_OP_deref_size instead of DW_OP_deref to be compatible with big-endian
targets for same reasons as in D59687.

Differential Revision: https://reviews.llvm.org/D60611

llvm-svn: 360013
2019-05-06 07:20:56 +00:00
Craig Topper f723490e76 [SelectionDAG] Replace llvm_unreachable at the end of getCopyFromParts with a report_fatal_error.
Based on PR41748, not all cases are handled in this function.

llvm_unreachable is treated as an optimization hint than can prune code paths
in a release build. This causes weird behavior when PR41748 is encountered on a
release build. It appears to generate an fp_round instruction from the floating
point code.

Making this a report_fatal_error prevents incorrect optimization of the code
and will instead generate a message to file a bug report.

llvm-svn: 360008
2019-05-06 04:01:49 +00:00
Simon Pilgrim 8462cc3c74 [X86] Pull out repeated Subtarget feature tests. NFCI.
Avoids a scan-build "uninitialized value" warning in X86FastISel::X86SelectFPExtOrFPTrunc

llvm-svn: 360001
2019-05-05 20:45:20 +00:00
Simon Pilgrim addc90e4e8 [TTI][X86] Make getAddressComputationCost cost value const. NFCI.
llvm-svn: 359999
2019-05-05 20:03:51 +00:00
Roman Lebedev 02569408ef [NFC] BasicBlock: generalize replaceSuccessorsPhiUsesWith(), take Old bb
Thus it does not assume that the old basic block is the basic block
for which we are looking at successors.

Not reviewed, but seems rather trivial, in line with the rest of
previous few patches.

llvm-svn: 359997
2019-05-05 18:59:45 +00:00
Roman Lebedev 1a1b922177 [NFC] BasicBlock: refactor changePhiUses() out of replacePhiUsesWith(), use it
Summary:
It is a common thing to loop over every `PHINode` in some `BasicBlock`
and change old `BasicBlock` incoming block to a new `BasicBlock` incoming block.
`replaceSuccessorsPhiUsesWith()` already had code to do that,
it just wasn't a function.
So outline it into a new function, and use it.

Reviewers: chandlerc, craig.topper, spatel, danielcdh

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61013

llvm-svn: 359996
2019-05-05 18:59:39 +00:00
Roman Lebedev e3b1d82b53 [NFC] PHINode: introduce replaceIncomingBlockWith() function, use it
Summary:
There is `PHINode::getBasicBlockIndex()`, `PHINode::setIncomingBlock()`
and `PHINode::getNumOperands()`, but no function to replace every
specified `BasicBlock*` predecessor with some other specified `BasicBlock*`.
Clearly, there are a lot of places that could use that functionality.

Reviewers: chandlerc, craig.topper, spatel, danielcdh

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61011

llvm-svn: 359995
2019-05-05 18:59:30 +00:00
Roman Lebedev 7ad5d14f3a [NFC] Instruction: introduce replaceSuccessorWith() function, use it
Summary:
There is `Instruction::getNumSuccessors()`, `Instruction::getSuccessor()`
and `Instruction::setSuccessor()`, but no function to replace every
specified `BasicBlock*` successor with some other specified `BasicBlock*`.
I've found one place where it should clearly be used.

Reviewers: chandlerc, craig.topper, spatel, danielcdh

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61010

llvm-svn: 359994
2019-05-05 18:59:22 +00:00
Roman Lebedev e5be660e25 [NFC][Utils] deleteDeadLoop(): add an assert that exit block has some non-PHI instruction
Summary:
If `deleteDeadLoop()` is called on such a loop, that has "bad" exit block,
one that e.g. has no terminator instruction, the `DIBuilder::insertDbgValueIntrinsic()`
will be told to insert the Dbg Value Intrinsic after `nullptr`
(since there is no first non-PHI instruction), which will cause it to not insert
those instructions into any basic block. The instructions will be parent-less,
and IR verifier will complain. It is rather obvious to track down the root cause
when that happens, so let's just assert it never happens.

Reviewers: sanjoy, davide, vsk

Reviewed By: vsk

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61008

llvm-svn: 359993
2019-05-05 18:59:12 +00:00
Simon Pilgrim 5170c0e5fe Move getOpcode() call into if statement. NFCI.
Avoids a cppcheck "Local variable name shadows outer variable" warning. 

llvm-svn: 359991
2019-05-05 18:34:38 +00:00
Simon Pilgrim afb0e664e6 [SLPVectorizer] Prefer pre-increments. NFCI.
llvm-svn: 359989
2019-05-05 17:53:09 +00:00
Craig Topper 922e252a70 [LLParser] Remove unused variable after r359987. NFC
llvm-svn: 359988
2019-05-05 17:46:17 +00:00
Craig Topper f6e07c472d [LLParser] Remove unnecessary error check making sure NUW/NSW flags aren't set on a non-integer operation.
Summary: This check appears to be a leftover from when add/sub/mul could be either integer or fp. The NSW/NUW flags are only set for add/sub/mul/shl earlier. And we check that those operations only have integer types just below this. So it seems unnecessary to explicitly error for NUW/NSW being used on a add/sub/mul that have the wrong type that would later error for that.

Reviewers: spatel, dblaikie, jyknight, arsenm

Reviewed By: spatel

Subscribers: wdng, llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61562

llvm-svn: 359987
2019-05-05 17:19:23 +00:00
Craig Topper 8279695d66 [LLParser] Simplify type checking in ParseArithmetic and ParseUnaryOp.
Summary:
These methods previously took a 0, 1, or 2 to indicate what types were allowed, but the 0 encoding which meant both fp and integer types has been unused for years. Its leftover from when add/sub/mul used to be shared between int and fp

Simplify it by changing it to just a bool to distinquish int and fp.

Reviewers: spatel, dblaikie, jyknight, arsenm

Reviewed By: spatel

Subscribers: wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61561

llvm-svn: 359986
2019-05-05 17:19:19 +00:00
Craig Topper 41c999bcf5 [Constants] Simplify type checking switch in ConstantExpr::get.
Summary:
Remove duplicate checks that both operands have the same type. This is checked
before the switch.

Use 'integer' or 'floating-point' instead of 'arithmetic' type. I think this
might be a leftover to the days when floating point and integer operations
shared the same opcodes.

Reviewers: spatel, RKSimon, dblaikie

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61558

llvm-svn: 359985
2019-05-05 17:19:16 +00:00
Andrea Di Biagio 0460a3629b [MCA] Notify event listeners when instructions transition to the Pending state. NFCI
llvm-svn: 359983
2019-05-05 16:07:27 +00:00
Cameron McInally 1d0c845d9d Add FNeg IR constant folding support
llvm-svn: 359982
2019-05-05 16:07:09 +00:00
Simon Pilgrim 70ee2def90 [X86] Make X86RegisterInfo(const Triple &TT) constructor explicit.
Fixes cppcheck warning.

llvm-svn: 359981
2019-05-05 12:51:47 +00:00
Simon Pilgrim cbcd9b1b92 [X86] Fix some cppcheck "Local variable name shadows outer variable" warnings. NFCI.
llvm-svn: 359976
2019-05-05 12:00:14 +00:00
Simon Pilgrim 5b05f20a3a [SLPVectorizer] Make getSpillCost() const. NFCI.
Ideally getTreeCost() should be const as well but non-const Type creation would need to be addressed first.

llvm-svn: 359975
2019-05-05 10:37:38 +00:00
Simon Pilgrim 0f89b76b84 [SelectionDAG] Use any_of/all_of where possible. NFCI.
llvm-svn: 359974
2019-05-05 10:30:04 +00:00
Simon Pilgrim 7a2e855a0f Move Value *RHSCIOp def into the scope where its actually used. NFCI.
llvm-svn: 359973
2019-05-05 10:27:45 +00:00
Sanjay Patel 5ab41a7a05 [CodeGenPrepare] limit overflow intrinsic matching to a single basic block (2nd try)
This is a subset of the original commit from rL359879
which was reverted because it could crash when using the 'RemovedInstructions'
structure that enables delayed deletion of dead instructions. The motivating
compile-time win does not require that change though. We should get most of
that win from this change alone.

Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.

See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html

Differential Revision: https://reviews.llvm.org/D61075

llvm-svn: 359969
2019-05-04 12:46:32 +00:00
Stanislav Mekhanoshin 5ddd564e19 [AMDGPU] Fixed asan error after D61536
llvm-svn: 359963
2019-05-04 06:40:20 +00:00
Stanislav Mekhanoshin 51d1415a16 AMDGPU] gfx1010 hazard recognizer
Differential Revision: https://reviews.llvm.org/D61536

llvm-svn: 359961
2019-05-04 04:30:57 +00:00
Stanislav Mekhanoshin 28a1936f6d [AMDGPU] gfx1010: use fmac instructions
Differential Revision: https://reviews.llvm.org/D61527

llvm-svn: 359959
2019-05-04 04:20:37 +00:00
Lang Hames ce8255f3e2 [JITLink] Add two useful Section operations: find by name, get address range.
These operations were already used in eh-frame registration, and are likely to
be used in other runtime registrations, so this commit moves them into a header
where they can be re-used.

llvm-svn: 359950
2019-05-04 00:23:09 +00:00
Jessica Paquette 910630c1e4 [AArch64][GlobalISel] Use fcsel instead of csel for G_SELECT on FPRs
This saves us some unnecessary copies.

If the inputs to a G_SELECT are floating point, we should use fcsel rather than
csel.

Changes here are...

- Teach selectCopy about s1-to-s1 copies across register banks.
- AArch64RegisterBankInfo about G_SELECT in general.
- Teach the instruction selector about the FCSEL instructions.

Also add two tests:

- select-select.mir to show that we get the expected FCSEL
- regbank-select.mir (unfortunately named) to show the register banks on
G_SELECT are properly preserved

And update fast-isel-select.ll to show that we do the same thing as other
instruction selectors in these cases.

llvm-svn: 359940
2019-05-03 22:37:46 +00:00
Stanislav Mekhanoshin d9dcf392c7 [AMDGPU] gfx1010 wait count insertion
Differential Revision: https://reviews.llvm.org/D61534

llvm-svn: 359938
2019-05-03 21:53:53 +00:00
Stanislav Mekhanoshin 41bbe101a2 [AMDGPU] gfx1010 s_code_end generation
Also add some missing metadata in the streamer.

Differential Revision: https://reviews.llvm.org/D61531

llvm-svn: 359937
2019-05-03 21:26:39 +00:00
Stanislav Mekhanoshin 93f15c922f [AMDGPU] gfx1010 loop alignment
Differential Revision: https://reviews.llvm.org/D61529

llvm-svn: 359935
2019-05-03 21:17:29 +00:00
Mandeep Singh Grang 5dc8aeb26d [COFF, ARM64] Fix ABI implementation of struct returns
Summary:
Refer the ABI doc at: https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values

Related clang patch: D60349

Reviewers: rnk, efriedma, TomTan, ssijaric

Reviewed By: rnk, efriedma

Subscribers: mstorsjo, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60348

llvm-svn: 359934
2019-05-03 21:12:36 +00:00
Matt Arsenault b6c599afd3 Reapply r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block"
This reverts commit r359912.

This should pass now, since the clang test was made less fragile in
r359918.

llvm-svn: 359919
2019-05-03 19:06:57 +00:00
Don Hinton f6eac2dd3b [CommandLine] Enable Grouping for short options by default. Part 4 of 5
Summary:
This change enables `cl::Grouping` for short options --
options with names of a single character.  This is consistent with GNU
getopt behavior.

Reviewers: rnk, MaskRay

Reviewed By: MaskRay

Subscribers: thopre, cfe-commits, MaskRay, rupprecht, hiraditya, llvm-commits

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D61270

llvm-svn: 359917
2019-05-03 18:56:25 +00:00
Simon Pilgrim 5d3b100750 [DAGCombine] Remove repeated variables. NFCI.
llvm-svn: 359915
2019-05-03 18:20:28 +00:00
Nico Weber bb852a9672 Revert r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block"
Makes clang/test/Misc/backend-stack-frame-diagnostics-fallback.cpp fail.

llvm-svn: 359912
2019-05-03 18:08:03 +00:00
Simon Pilgrim 308b5ec1ff [TargetLowering] SimplifySetCC - remove repeated variable. NFCI.
Also reduce scope of Temp variable.

llvm-svn: 359911
2019-05-03 18:02:33 +00:00
Don Hinton c242be40a1 [CommandLine] Change help output to prefix long options with `--` instead of `-`. NFC . Part 3 of 5
Summary:
By default, `parseCommandLineOptions()` will accept either a
`-` or `--` prefix for long options -- options with names longer than
a single character.

While this change does not affect behavior, it will be helpful with a
subsequent change that requires long options use the `--` prefix.

Reviewers: rnk, thopre

Reviewed By: thopre

Subscribers: thopre, cfe-commits, hiraditya, llvm-commits

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D61269

llvm-svn: 359909
2019-05-03 17:47:29 +00:00
Evgeniy Stepanov 46ec57e576 Revert "[CodeGenPrepare] limit overflow intrinsic matching to a single basic block"
This reverts commit r359879, which introduced a compiler crash.

llvm-svn: 359908
2019-05-03 17:31:49 +00:00
Matt Arsenault daf2d653fa RegAllocFast: Add heuristic to detect values not live-out of a block
Add an improved/new heuristic to catch more cases when values are not
live out of a basic block.

Patch by Matthias Braun

llvm-svn: 359906
2019-05-03 17:03:24 +00:00
Brian Cain 3428c9daef [hexagon] change AsmParser assertion to error
For immediates that can't be evaluated in assembler-mapped instructions, we
should return 'invalid operand' instead of assert.

llvm-svn: 359905
2019-05-03 16:50:38 +00:00
Craig Topper a8f3840c62 [X86] Allow assembly parser to accept x/y/z suffixes on non-memory vfpclassps/pd and on memory forms in intel syntax
The x/y/z suffix is needed to disambiguate the memory form in at&t syntax since no xmm/ymm/zmm register is mentioned.

But we should also allow it for the register and broadcast forms where its not needed for consistency. This matches gas.

The printing code will still only use the suffix for the memory form where it is needed.

llvm-svn: 359903
2019-05-03 16:15:15 +00:00
Simon Pilgrim b323d5ec7c [X86] LowerToHorizontalOp - Tidyup calls to getHopForBuildVector. NFCI.
Merge the if() tests for the various HADD/SUB + Subtarget tests

llvm-svn: 359901
2019-05-03 15:56:06 +00:00
Simon Pilgrim d857f64c31 [SelectionDAG] CreateTopologicalOrder - don't use iterator
We shouldn't use an iterator to loop across a std::vector when the same loop is adding elements to that std::vector

Found by cppcheck

llvm-svn: 359900
2019-05-03 15:50:37 +00:00
Matt Arsenault 657ef48a88 AMDGPU: Select VOP3 form of sub
The VOP3 form should always be the preferred selection form to be
shrunk later.

The r600 sub test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.

llvm-svn: 359899
2019-05-03 15:37:07 +00:00
Matt Arsenault cfd0ca38b0 AMDGPU: Support shrinking add with FI in SIFoldOperands
Avoids test regression in a future patch

llvm-svn: 359898
2019-05-03 15:21:53 +00:00
Matt Arsenault 344d68d3c9 AMDGPU: Remove redundant patterns for shifts
llvm-svn: 359895
2019-05-03 15:08:36 +00:00
Matt Arsenault ada33314a2 AMDGPU: Remove redundant patterns for sub
There were 2 patterns for sub, one selecting to sub and one to
subrev. Only one of these will succeed, so remove the reversed one.

llvm-svn: 359894
2019-05-03 15:08:35 +00:00
Matt Arsenault 0446fbe45e AMDGPU: Replace shrunk instruction with dummy implicit_def
This was broken if the original operand was killed. The kill flag
would appear on both instructions, and fail the verifier. Keep the
kill flag, but remove the operands from the old instruction. This has
an added benefit of really reducing the use count for future folds.

Ideally the pass would be structured more like what PeepholeOptimizer
does to avoid this hack to avoid breaking instruction iterators.

llvm-svn: 359891
2019-05-03 14:40:10 +00:00
Simon Pilgrim bc876df3a5 [TargetLowering] ShrinkDemandedConstant - reduce scope of TLO.DAG variable. NFCI.
Only ever used in one block

llvm-svn: 359890
2019-05-03 14:38:24 +00:00
Simon Pilgrim bfdd0f75a8 [X86] Remove repeated variables. NFCI.
llvm-svn: 359889
2019-05-03 14:37:00 +00:00
Simon Pilgrim aa49be4926 Avoid cppcheck operator precedence warnings. NFCI.
Prefer ((X & Y) ? A : B) to (X & Y ? A : B)

llvm-svn: 359884
2019-05-03 13:50:38 +00:00
Matt Arsenault 2c8936fd26 AMDGPU: Fix incorrect commute with sub when folding immediates
When a fold of an immediate into a sub/subrev required shrinking the
instruction, the wrong VOP2 opcode was used. This was using the VOP2
equivalent of the original instruction, not the commuted instruction
with the inverted opcode.

llvm-svn: 359883
2019-05-03 13:42:56 +00:00
Sanjay Patel 8ff072e48e [CodeGenPrepare] limit overflow intrinsic matching to a single basic block
Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.

Also, we were restarting the iterator loops when doing the overflow intrinsic
transforms by marking the dominator tree for update. That was done to prevent
iterating over a removed instruction. But we can postpone the deletion using
the existing "RemovedInsts" structure, and that means we don't need to update
the DT.

See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html

Differential Revision: https://reviews.llvm.org/D61075

llvm-svn: 359879
2019-05-03 13:09:18 +00:00
Sean Fertile fd75ee9154 [Object][XCOFF] Add an XCOFF dumper for llvm-readobj.
Patch adds support for dumping of file headers with llvm-readobj. XCOFF
object files are added to test dumping a well formed file, and dumping
both negative timestamps and negative symbol counts, both of which are
allowed in the XCOFF definition.

Differential Revision: https://reviews.llvm.org/D60878

llvm-svn: 359878
2019-05-03 12:57:07 +00:00
Simon Pilgrim e798e3a346 [TargetLowering] expandUnalignedStore - cleanup EVT variables. NFCI.
Avoid duplicated EVTs and rename Store/Load VTs to avoid -Wshadow warnings.

llvm-svn: 359877
2019-05-03 12:55:25 +00:00
Anton Afanasyev 6d08b8dbae Revert "[MIR] Add simple PRE pass to MachineCSE"
This reverts commit 9c20156de3.
It breaks stage 2 of clang-ppc64be-linux-multistage.

llvm-svn: 359875
2019-05-03 12:36:22 +00:00
Simon Pilgrim 42d2b604b5 [SelectionDAG] Use INT_MIN as (1 << 31) is UB for signed integers. NFCI.
llvm-svn: 359873
2019-05-03 11:32:00 +00:00
Simon Pilgrim bfd00a6440 [SelectionDAG] computeKnownBits - remove some duplicate/shadow variables. NFCI.
llvm-svn: 359872
2019-05-03 11:11:03 +00:00
Simon Pilgrim a359ef192b [X86] LowerMULH - remove unused Lo/Hi vector indices. NFCI.
Leftover from before we had the extract128BitVector helpers.

llvm-svn: 359871
2019-05-03 10:32:07 +00:00
Anton Afanasyev 9c20156de3 [MIR] Add simple PRE pass to MachineCSE
This is the second part of the commit fixing PR38917 (hoisting
partitially redundant machine instruction). Most of PRE (partitial
redundancy elimination) and CSE work is done on LLVM IR, but some of
redundancy arises during DAG legalization. Machine CSE is not enough
to deal with it. This simple PRE implementation works a little bit
intricately: it passes before CSE, looking for partitial redundancy
and transforming it to fully redundancy, anticipating that the next
CSE step will eliminate this created redundancy. If CSE doesn't
eliminate this, than created instruction will remain dead and eliminated
later by Remove Dead Machine Instructions pass.

The third part of the commit is supposed to refactor MachineCSE,
to make it more clear and to merge MachinePRE with MachineCSE,
so one need no rely on further Remove Dead pass to clear instrs
not eliminated by CSE.

First step: https://reviews.llvm.org/D54839

Fixes llvm.org/PR38917

Reviewers: RKSimon

Subscribers: hfinkel, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D56772

llvm-svn: 359870
2019-05-03 10:30:59 +00:00
Simon Pilgrim 88f9117168 Reduce variable scope to just the if() block its actually used in. NFCI.
llvm-svn: 359869
2019-05-03 10:13:41 +00:00
Craig Topper d724360695 [X86] Add more one checks to masked compare patterns that were missed in r358358.
This covers the patterns we use for widening 128/256 comparisons to 512-bit when
AVX512VL isn't supported.

llvm-svn: 359863
2019-05-03 07:14:05 +00:00
Quentin Colombet c9256cc6ba [IRTranslator] Use the alloc size instead of the store size when translating allocas
We use to incorrectly use the store size instead of the alloc size when
creating the stack slot for allocas.
On aarch64 this can be demonstrated by allocating weirdly sized types.

For instance, in the added test case, we use an alloca for i19. We used
to allocate a slot of size 24-bit (19 rounded up to the next byte),
whereas we really want to use a full 32-bit slot for this type.

llvm-svn: 359856
2019-05-03 01:23:56 +00:00
Eli Friedman 7238353848 [AArch64][MC] Reject "add x0, x1, w2, lsl #1" etc.
Looks like just a minor oversight in the parsing code.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41504.

Differential Revision: https://reviews.llvm.org/D60840

llvm-svn: 359855
2019-05-03 00:59:52 +00:00
Eric Christopher 86e2f169bb Tidy up a comment, fix a typo, remove a comment that's obsolete.
llvm-svn: 359852
2019-05-03 00:15:23 +00:00
Eli Friedman 0b61d220c9 [AArch64][Windows] Compute function length correctly in unwind tables.
The primary fix here is to WinException.cpp: we need to exclude jump
tables when computing the length of a function, or else we fail to
correctly compute the length. (We can only compute the number of bytes
consumed by certain assembler directives after the entire file is
parsed. ".p2align" is one of those directives, and is used by jump table
generation.)

The secondary fix, to MCWin64EH, is to make sure we don't silently
miscompile if we hit a similar situation in the future.

It's possible we could extend ARM64EmitUnwindInfo so it allows function
bodies that contain assembler directives, but that's a lot more
complicated; see the FIXME in MCWin64EH.cpp.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41581 .

Differential Revision: https://reviews.llvm.org/D61095

llvm-svn: 359849
2019-05-03 00:10:45 +00:00
Alina Sbirlea 0363c3b8bb [MemorySSA] Check that block is reachable when adding phis.
Summary:
Originally the insertDef method was only used when building MemorySSA, and was limiting the number of Phi nodes that it created.
Now it's used for updates as well, and it can create additional Phis needed for correctness.
Make sure no Phis are created in unreachable blocks (condition met during MSSA build), otherwise the renamePass will find a null DTNode.

Resolves PR41640.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61410

llvm-svn: 359845
2019-05-02 23:41:58 +00:00
Alina Sbirlea 151ab4844a [MemorySSA] Refactor removing multiple trivial phis [NFC].
Summary: Create a method to clean up multiple potentially trivial phis, since we will need this often.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61471

llvm-svn: 359842
2019-05-02 23:12:49 +00:00
Craig Topper bf29238e1a [X86] Remove LEA16r references from X86FixupLEAs. NFCI
As far as I know, we never emit LEA16r

llvm-svn: 359840
2019-05-02 22:46:23 +00:00
Craig Topper e1e38d4248 [X86] Correct the register class for specific mask register constraints in getRegForInlineAsmConstraint when the VT is a scalar type
The default impementation in the base class for TargetLowering::getRegForInlineAsmConstraint doesn't work for mask registers when the VT is a scalar type integer types since the only legal mask types are vXi1. So we end up just getting whatever the first register class that contains the register. Currently this appears to be VK1, but its really dependent on the order tablegen outputs the register classes.

Some code in the caller ends up looking up the type for this register class and find v1i1 then generates a copyfromreg from the physical k-register with the v1i1 type. Then it generates an any_extend from v1i1 to the scalar VT which isn't legal. This bad any_extend sticks around until isel where it selects a MOVZX32rr8 with a v1i1 input or maybe a i8 input. Not sure but eventually we pick up a copy from VK1 to GR8 in MachineIR which isn't supported. This leads to a failure in physical register copying.

This patch uses the scalar type to find a VK class of the right size. In the attached test case this will be VK16. This causes a bitcast from vk16 to i16 to be generated instead of an any_extend. This will be properly iseled to a VK16 to GR32 copy and a GR32->GR16 extract_subreg.

Fixes PR41678

Differential Revision: https://reviews.llvm.org/D61453

llvm-svn: 359837
2019-05-02 22:26:40 +00:00
Craig Topper e8a1cde886 [SelectionDAG] Add asserts to verify the vectorness of input and output types of TRUNCATE/ZERO_EXTEND/ANY_EXTEND/SIGN_EXTEND agree
As a result of the underlying cause of PR41678 we created an ANY_EXTEND node with a scalar result type and v1i1 input type. Ideally we would have asserted for this instead of letting it go through to instruction selection and generate bad machine IR

Differential Revision: https://reviews.llvm.org/D61463

llvm-svn: 359836
2019-05-02 22:26:26 +00:00
Evandro Menezes 111df108e6 [AArch64] Update for Exynos
Fix the forwarding of multiplication results for Exynos M4.

llvm-svn: 359834
2019-05-02 22:01:39 +00:00
Craig Topper 47d8865a38 [X86] Remove string literal from an if. NFC
This if used to be an assert that got refactored into an if, but left the string literal behind.

Fixes PR41718

llvm-svn: 359833
2019-05-02 21:57:18 +00:00
Nico Weber 81862f82ee lld-link: Add /force:multipleres extension to make dupe resource diag non-fatal
As a side benefit, lld-link now reports more than one duplicate resource
entry before exiting with an error even if the new flag is not passed.

llvm-svn: 359829
2019-05-02 21:21:55 +00:00
Sanjay Patel 1972826178 [DAGCombiner] try repeated fdiv divisor transform before building estimate (2nd try)
The original patch was committed at rL359398 and reverted at rL359695 because of
infinite looping.

This includes a fix to check for a vector splat of "1.0" to avoid the infinite loop.

Original commit message:

This was originally part of D61028, but it's an independent diff.

If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.

The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.

Differential Revision: https://reviews.llvm.org/D61149

llvm-svn: 359793
2019-05-02 15:02:08 +00:00
Sanjay Patel 284472be6d [SelectionDAG] remove constant folding limitations based on FP exceptions
We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops),
so it does not make sense to have them here in the DAG either. Nothing else in the backend tries
to preserve exceptions (again outside of strict ops), so I don't see how this could have ever
worked for real code that cares about FP exceptions.

There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least
partially) to preserve exceptions without even asking if the target supports FP exceptions. Those
should be corrected in subsequent patches.

Real support for FP exceptions requires several changes to handle the constrained/strict FP ops.

Differential Revision: https://reviews.llvm.org/D61331

llvm-svn: 359791
2019-05-02 14:47:59 +00:00
Simon Pilgrim df8daf0ef4 [X86][SSE] lowerAddSubToHorizontalOp - enable ymm extraction+fold
Limiting scalar hadd/hsub generation to the lowest xmm looks to be unnecessary - we will be extracting one upper xmm whatever, and we can remove a shuffle by using the hop which is inline with what shouldUseHorizontalOp expects to happen anyway.

Testing on btver2 (the main target for fast-hops) shows this is beneficial even for float ops where we have a 'shuffle' to extract the float result:
https://godbolt.org/z/0R-U-K

Differential Revision: https://reviews.llvm.org/D61426

llvm-svn: 359786
2019-05-02 14:00:55 +00:00
Simon Pilgrim 9fa56f7829 [X86][SSE] Move shouldUseHorizontalOp inside isHorizontalBinOp. NFCI.
Matches what we do for lowerAddSubToHorizontalOp and will make it easier to peek through subvectors to help fix PR39921

llvm-svn: 359782
2019-05-02 12:18:24 +00:00
Fangrui Song 8be28cdc52 [Object] Change getSectionName() to return Expected<StringRef>
Summary:
It currently receives an output parameter and returns
std::error_code. Expected<StringRef> fits for this purpose perfectly.

Differential Revision: https://reviews.llvm.org/D61421

llvm-svn: 359774
2019-05-02 10:32:03 +00:00
Diana Picus 1136ea2d44 [ARM GlobalISel] Fixup r359768
Get rid of local variable used only in assertion.

llvm-svn: 359772
2019-05-02 10:08:29 +00:00
Diana Picus 06a61ccc42 [ARM GlobalISel] Select extensions to < 32 bits
Select G_SEXT and G_ZEXT with destination types smaller than 32 bits in
the exact same way as 32 bits. This overwrites the higher bits, but that
should be ok since all legal users of types smaller than 32 bits ignore
those bits anyway.

llvm-svn: 359768
2019-05-02 09:28:00 +00:00
Diana Picus 53bcf6f2e7 [ARM GlobalISel] Legalize extensions to < 32 bits
Make it legal to extend from e.g. s1 to s8 or s16.

llvm-svn: 359766
2019-05-02 09:21:46 +00:00
Kang Zhang 1a0d6d6899 [NFC][PowerPC] Return early if the element type is not byte-sized in combineBVOfConsecutiveLoads
Summary:
Based on the Eli Friedman's comments in https://reviews.llvm.org/D60811 , we'd better return early if the element type is not byte-sized in `combineBVOfConsecutiveLoads`.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61076

llvm-svn: 359764
2019-05-02 08:15:13 +00:00
Pavel Labath cfc4519ef3 Object/Minidump: Add support for the ThreadList stream
Summary:
The stream contains the list of threads belonging to the process
described by the minidump. Its structure is the same as the ModuleList
stream, and in fact, I have generalized the ModuleList reading code to
handle this stream too.

Reviewers: amccarth, jhenderson, clayborg

Subscribers: llvm-commits, lldb-commits, markmentovai, zturner

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61064

llvm-svn: 359762
2019-05-02 07:45:42 +00:00
Fangrui Song 7d0e8cb1e2 [Support] Don't check MAP_ANONYMOUS, just use MAP_ANON
Though being marked "deprecated" by the Linux man-pages project
(MAP_ANON is a synonym of MAP_ANONYMOUS), it is the mostly widely
available macro - many systems that don't provide MAP_ANONYMOUS have
MAP_ANON. MAP_ANON is also used here and there in compiler-rt.

llvm-svn: 359758
2019-05-02 05:58:09 +00:00
Stanislav Mekhanoshin 64399da8b8 [AMDGPU] gfx1010 lost VOP2 forms of some add/sub
Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32.

Differential Revision:

llvm-svn: 359757
2019-05-02 04:26:35 +00:00
Stanislav Mekhanoshin 5cf8167735 [AMDGPU] gfx1010 allows VOP3 to have a literal
Differential Revision: https://reviews.llvm.org/D61413

llvm-svn: 359756
2019-05-02 04:01:39 +00:00
Stanislav Mekhanoshin f2baae0abb [AMDGPU] gfx1010 constant bus limit
Constant bus limit has increased to 2 with GFX10.

Differential Revision: https://reviews.llvm.org/D61404

llvm-svn: 359754
2019-05-02 03:47:23 +00:00
Craig Topper b929a0062e [X86] Remove the redundant suffix in vfpclassp[d,s]'s broadcasting variant
The broadcasting variant for instruction vfpclassp[d,s] shouldn't use suffix q/l. So remove them from the template.

Patch by Pengfei Wang

Differential Revision: https://reviews.llvm.org/D61295

llvm-svn: 359753
2019-05-02 03:25:50 +00:00
Nico Weber 413517ecfe lld-link: Make "duplicate resource" error message a bit more concise
Reduces the error message from:
    lld-link: error: failed to parse .res file: duplicate resource: type STRINGTABLE (ID 6)/name ID 3/language 1033, in test1.res and in test2.res

To:
    lld-link: error: duplicate resource: type STRINGTABLE (ID 6)/name ID 3/language 1033, in test1.res and in test2.res

Make sure every error message emitted by cvtres contains the name of at
least one ".res" file, so that removing the "failed to parse .res file"
string doesn't lose information.

Differential Revision: https://reviews.llvm.org/D61388

llvm-svn: 359749
2019-05-02 01:52:24 +00:00
Bob Haarman a78ab77b6b remove inalloca parameters in globalopt and simplify argpromotion
Summary:
Inalloca parameters require special handling in some optimizations.
This change causes globalopt to strip the inalloca attribute from
function parameters when it is safe to do so, removes the special
handling for inallocas from argpromotion, and replaces it with a
simple check that causes argpromotion to skip functions that receive
inallocas (for when the pass is invoked on code that didn't run
through globalopt first). This also avoids a case where argpromotion
would incorrectly try to pass an inalloca in a register.

Fixes PR41658.

Reviewers: rnk, efriedma

Reviewed By: rnk

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61286

llvm-svn: 359743
2019-05-02 00:37:36 +00:00
Thomas Preud'homme 288ed91e99 FileCheck [4/12]: Introduce @LINE numeric expressions
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch introduces the @LINE numeric
expressions.

This commit introduces a new syntax to express a relation a numeric
value in the input text must have with the line number of a given CHECK
pattern: [[#<@LINE numeric expression>]]. Further commits build on that
to express relations between several numeric values in the input text.
To help with naming, regular variables are renamed into pattern
variables and old @LINE expression syntax is referred to as legacy
numeric expression.

Compared to existing @LINE expressions, this new syntax allow arbitrary
spacing between the component of the expression. It offers otherwise the
same functionality but the commit serves to introduce some of the data
structure needed to support more general numeric expressions.

Copyright:
    - Linaro (changes up to diff 183612 of revision D55940)
    - GraphCore (changes in later versions of revision D55940 and
                 in new revision created off D55940)

Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk

Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60384

llvm-svn: 359741
2019-05-02 00:04:38 +00:00
Hiroshi Yamauchi 1620104034 [PGO][CHR] A bug fix.
Summary: Fix a transformation bug where two scopes share a common instrution to hoist.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61405

llvm-svn: 359736
2019-05-01 22:49:52 +00:00
Lang Hames 42a3b4ff0e [ORC] Pass object buffer ownership back in NotifyEmitted.
Clients who want to regain ownership of object buffers after they have been
linked may now use the NotifyEmitted callback for this purpose.

Note: Currently NotifyEmitted is only called if linking succeeds. If linking
fails the buffer is always discarded.

llvm-svn: 359735
2019-05-01 22:40:23 +00:00
Jessica Paquette a3843fe6f4 [GlobalISel][AArch64] Use fmov for G_FCONSTANT when possible
This adds support for using fmov rather than a standard mov to materialize
G_FCONSTANT when it's safe to do so.

Update arm64-fast-isel-materialize.ll and select-constant.mir to show that the
selection is correct.

llvm-svn: 359734
2019-05-01 22:39:43 +00:00
Simon Pilgrim 9f04d97cd7 [X86][SSE] Fold scalar horizontal add/sub for non-0/1 element extractions
We already perform horizontal add/sub if we extract from elements 0 and 1, this patch extends it to non-0/1 element extraction indices (as long as they are from the lowest 128-bit vector).

Differential Revision: https://reviews.llvm.org/D61263

llvm-svn: 359707
2019-05-01 17:13:35 +00:00
Stanislav Mekhanoshin 3b7925f035 [AMDGPU] gfx1010 GCNRegBankReassign pass
Reassign registers to reduce register bank conflicts.

Differential Revision: https://reviews.llvm.org/D61344

llvm-svn: 359704
2019-05-01 16:49:31 +00:00
Nico Weber c991daa532 Option spell checking: Penalize delimiter flags if input has no argument
If the user passes a flag like `-version` to a program, it's more likely
they mean `--version` than `-version:`, since there's no parameter
passed. Hence, give delimited arguments a penalty of 1 if the user input
doesn't contain the delimiter or no data after it.

The motivation is that with this, lld-link can suggest "--version"
instead of "-version:" for "-version" and "-nodefaultlib" instead of
"-nodefaultlib:" for "-nodefaultlibs".

Differential Revision: https://reviews.llvm.org/D61382

llvm-svn: 359701
2019-05-01 16:45:15 +00:00
Stanislav Mekhanoshin c29d491596 [AMDGPU] gfx1010 GCNNSAReassign pass
Convert NSA into non-NSA images.

Differential Revision: https://reviews.llvm.org/D61341

llvm-svn: 359700
2019-05-01 16:40:49 +00:00
Stanislav Mekhanoshin 692560dc98 [AMDGPU] gfx1010 MIMG implementation
Differential Revision: https://reviews.llvm.org/D61339

llvm-svn: 359698
2019-05-01 16:32:58 +00:00
Teresa Johnson b3203ec078 [ThinLTO] Fix unreachable code when parsing summary entries.
Summary:
Early returns were causing some code to be skipped. This was missed
since the summary entries are typically at the end of the llvm assembly
file.

Fixes PR41663.

Reviewers: RKSimon, wristow

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61355

llvm-svn: 359697
2019-05-01 16:26:59 +00:00
Stanislav Mekhanoshin a224f68a10 [AMDGPU] gfx1010 DS implementation
Differential Revision: https://reviews.llvm.org/D61332

llvm-svn: 359696
2019-05-01 16:11:11 +00:00
Sanjay Patel 64d5751254 Revert "[DAGCombiner] try repeated fdiv divisor transform before building estimate"
This reverts commit fb9a5307a9 (rL359398)
because it can cause an infinite loop due to opposing combines.

llvm-svn: 359695
2019-05-01 16:06:21 +00:00
Simon Pilgrim f5bdff7747 Fix 80 column violation. NFCI.
llvm-svn: 359694
2019-05-01 16:01:49 +00:00
Keno Fischer a3e4b3bd33 [SCEV] Use isKnownViaNonRecursiveReasoning for smax simplification
Summary:
Commit
	rL331949: SCEV] Do not use induction in isKnownPredicate for simplification umax

changed the codepath for umax from isKnownPredicate to
isKnownViaNonRecursiveReasoning to avoid compile time blow up (and as
I found out also stack overflows). However, there is an exact copy of
the code for umax that was lacking this change. In D50167 I want to unify
these codepaths, but to avoid that being a behavior change for the smax
case, pull this independent bit out of it.

Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D61166

llvm-svn: 359693
2019-05-01 15:58:24 +00:00
Simon Pilgrim 6711b9699a [X86][SSE] Add demanded elts support X86ISD::PMULDQ\PMULUDQ
Add to SimplifyDemandedVectorEltsForTargetNode and SimplifyDemandedBitsForTargetNode

llvm-svn: 359686
2019-05-01 14:50:50 +00:00
Nico Weber f68e0f79c7 Fix OptTable::findNearest() adding delimiter for free
Prior to this, OptTable::findNearest() thought that the input `--foo`
had an editing distance of 0 from an existing flag `--foo=`, which made
it suggest flags with delimiters more often than flags without one.
After this, it correctly assigns this case an editing distance of 1.

Differential Revision: https://reviews.llvm.org/D61373

llvm-svn: 359685
2019-05-01 14:46:17 +00:00
Keno Fischer d8f856d265 [LoopInfo] Faster implementation of setLoopID. NFC.
Summary:
This change was part of D46460. However, in the meantime rL341926 fixed the
correctness issue here. What remained was the performance issue in setLoopID
where it would iterate through all blocks in the loop and their successors,
rather than just the predecessor of the header (the later presumably being
much faster). We already have the `getLoopLatches` to compute precisely these
basic blocks in an efficient manner, so just use it (as the original commit
did for `getLoopID`).

Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D61215

llvm-svn: 359684
2019-05-01 14:39:11 +00:00
Simon Pilgrim 3d6899e369 [X86][SSE] Add SSE vector shift support to SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359680
2019-05-01 13:51:09 +00:00
Nico Weber 4e701ab177 Wrap to 80 columns, no behavior change
llvm-svn: 359679
2019-05-01 13:04:44 +00:00
Simon Pilgrim ba372c6e62 [X86][SSE] Split 512-bit -> 128-bit vector directly in SimplifyDemandedVectorEltsForTargetNode
llvm-svn: 359678
2019-05-01 12:48:42 +00:00
Simon Pilgrim 951a6b4579 [X86][SSE] Add 512-bit vector support to SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359677
2019-05-01 12:37:41 +00:00
Tim Northover ee2474df9f DAG: allow DAG pointer size different from memory representation.
In preparation for supporting ILP32 on AArch64, this modifies the SelectionDAG
builder code so that pointers are allowed to have a larger type when "live" in
the DAG compared to memory.

Pointers get zero-extended whenever they are loaded, and truncated prior to
stores.  In addition, a few not quite so obvious locations need updating:

  * A GEP that has not been marked inbounds needs to enforce the IR-documented
    2s-complement wrapping at the memory pointer size. Inbounds GEPs are
    undefined if they overflow the address space, so no additional operations
    are needed.
  * Signed comparisons would give incorrect results if performed on the
    zero-extended values.

This shouldn't affect CodeGen for now, but will become active when the AArch64
ILP32 support is committed.

llvm-svn: 359676
2019-05-01 12:37:30 +00:00
Simon Pilgrim 37c2419cc7 [X86][SSE] Add X86ISD::PACKSS\PACKUS to SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359673
2019-05-01 11:29:36 +00:00
Simon Pilgrim 3353cee06c [X86][SSE] Add X86ISD::UNPCKL\UNPCK to SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359670
2019-05-01 11:08:03 +00:00
Simon Pilgrim f7b978a71b [X86][SSE] Move extract_subvector(pshufb) fold to SimplifyDemandedVectorEltsForTargetNode
This lets us hit more cases than combineExtractSubvector and allows us reuse more code.

llvm-svn: 359669
2019-05-01 10:58:38 +00:00
Simon Pilgrim a7d107a3e0 [X86] SimplifyDemandedVectorEltsForTargetNode - pull out vector halving code. NFCI.
Pull out the HADD/HSUB code to halve vector widths if the upper half isn't used - prep work to adding support for other opcodes.

llvm-svn: 359667
2019-05-01 10:38:10 +00:00
Simon Pilgrim 99eefe94b5 [X86][SSE] Extract i1 elements from vXi1 bool vectors
This is an alternative to D59669 which more aggressively extracts i1 elements from vXi1 bool vectors using a MOVMSK.

Differential Revision: https://reviews.llvm.org/D61189

llvm-svn: 359666
2019-05-01 10:02:22 +00:00
Craig Topper dd66acef96 [X86FixupLEAs] Hoist the calls to isLEA out of the 3 separate functions and put it in the basic block instruction loop. NFC
Now need to check it 3 different times. Just do it once at the top of the loop.

llvm-svn: 359658
2019-05-01 06:53:03 +00:00
David L. Jones fccb505f0f Revert "[llvm] r359313 - [PowerPC] Update P9 vector costs for insert/extract element"
This causes segfaults during optimized builds. More details, including a reproducer, are on the llvm-commits thread for r359313.

llvm-svn: 359648
2019-05-01 05:01:03 +00:00
Lang Hames 3181b87cb6 [JITLink] Make sure we explicitly deallocate memory on failure.
JITLinkGeneric phases 2 and 3 (focused on applying fixups and finalizing memory,
respectively) may fail for various reasons. If this happens, we need to
explicitly de-allocate the memory allocated in phase 1 (explicitly, because
deallocation may also fail and so is implemented as a method returning error).

No testcase yet: I am still trying to decide on the right way to test totally
platform agnostic code like this.

llvm-svn: 359643
2019-05-01 02:43:52 +00:00
Sam Clegg 6898781d87 [WebAssembly] Update expectations for gcc torture tests
This is needed to make the wasm waterfall green again
after we land the update to WASI:
https://github.com/WebAssembly/waterfall/pull/492

Differential Revision: https://reviews.llvm.org/D61351

llvm-svn: 359634
2019-04-30 23:10:28 +00:00
Philip Reames 84e54eb471 [InstCombine] Limit a vector demanded elts rule which was producing invalid IR.
The demanded elts rules introduced for GEPs in https://reviews.llvm.org/rL356293 replaced vector constants with undefs (by design).  It turns out that the LangRef disallows such cases when indexing structs.  The right fix is probably to relax the langref requirement, and update other passes to expect the result, but for the moment, limit the transform to avoid compiler crashes.

This should fix https://bugs.llvm.org/show_bug.cgi?id=41624.

llvm-svn: 359633
2019-04-30 23:09:26 +00:00
Alina Sbirlea b468320313 [MemorySSA] Invalidate MemorySSA if AA or DT are invalidated.
Summary:
MemorySSA keeps internal pointers of AA and DT.
If these get invalidated, so should MemorySSA.

Reviewers: george.burgess.iv, chandlerc

Subscribers: jlebar, Prazek, llvm-commits

Tags: LLVM

Differential Revision: https://reviews.llvm.org/D61043

llvm-svn: 359627
2019-04-30 22:43:55 +00:00
Lang Hames 4637e15844 [ORC] Move SimpleCompiler/ConcurrentIRCompiler definitions into a .cpp file.
SimpleCompiler is no longer templated, so there's no reason for this code to be
in a header any more.

llvm-svn: 359626
2019-04-30 22:42:01 +00:00
Alina Sbirlea ba48a2c5e8 [AliasAnalysis/NewPassManager] Invalidate AAManager less often.
Summary:
This is a redo of D60914.

The objective is to not invalidate AAManager, which is stateless, unless
there is an explicit invalidate in one of the AAResults.

To achieve this, this patch adds an API to PAC, to check precisely this:
is this analysis not invalidated explicitly == is this analysis not abandoned == is this analysis stateless, so preserved without explicitly being marked as preserved by everyone

Reviewers: chandlerc

Subscribers: mehdi_amini, jlebar, george.burgess.iv, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61284

llvm-svn: 359622
2019-04-30 22:15:47 +00:00
Stanislav Mekhanoshin a6322941ff [AMDGPU] gfx1010 VMEM and SMEM implementation
Differential Revision: https://reviews.llvm.org/D61330

llvm-svn: 359621
2019-04-30 22:08:23 +00:00
Eric Christopher 6435102c03 Fix a few -Werror warnings:
- Remove a variable only used in an assert
 - Fix pessimizing move warning around copy elision

llvm-svn: 359617
2019-04-30 21:44:21 +00:00
Alina Sbirlea 4e1ac95cf5 [PassManagerBuilder] Add option for interleaved loops, for loop vectorize.
Summary:
Match NewPassManager behavior: add option for interleaved loops in the
old pass manager, and use that instead of the flag used to disable loop unroll.
No changes in the defaults.

Reviewers: chandlerc

Subscribers: mehdi_amini, jlebar, dmgreen, hsaito, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61030

llvm-svn: 359615
2019-04-30 21:29:20 +00:00
Lang Hames d407b4b980 [JITLink] Add debugging output to print resolved external atoms.
llvm-svn: 359614
2019-04-30 21:28:07 +00:00
Lang Hames 88816bdd2f [ORC][JITLink] Name in-memory compiled objects after their source modules.
In-memory compiled object buffer identifiers will now be derived from the
identifiers of their source IR modules. This makes it easier to connect
in-memory objects with their source modules in debugging output.

llvm-svn: 359613
2019-04-30 21:27:56 +00:00
Rong Xu 998b97f6f1 [llvm-profdata] Add overlap command to compute similarity b/w two profile files
Add overlap functionality to llvm-profdata tool to compute the similarity
between two profile files.

Differential Revision: https://reviews.llvm.org/D60977

llvm-svn: 359612
2019-04-30 21:19:12 +00:00
Fedor Sergeev eeae45dc77 [NFC][InlineCost] cleanup - comments, overflow handling.
Reviewed By: apilipenko
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60751

llvm-svn: 359609
2019-04-30 20:44:53 +00:00
Simon Pilgrim 07ab4e7db8 [X86][SSE] Fold extract_subvector(extend(x)) -> extend_vector_inreg(x)
This adds any extend support - folding to zero_extend_vector_inreg (PMOVZX) for legality

Minor improvement for PR39709

llvm-svn: 359608
2019-04-30 20:31:07 +00:00
Nico Weber e7fa09e4ae Fix stack-use-after free after r359580
`Candidate` was a StringRef refering to a temporary string.
Instead, create a local variable for the string and use
a StringRef referring to that.

llvm-svn: 359604
2019-04-30 19:43:35 +00:00
Dan Gohman 3b5b9d0e72 [WebAssembly] Support EXPLICIT_NAME symbols in llvm-readobj
Teach llvm-readobj about WASM_SYMBOL_EXPLICIT_NAME.

Differential Revision: https://reviews.llvm.org/D61323

Reviewer: sbc100
llvm-svn: 359602
2019-04-30 19:30:24 +00:00
Dan Gohman 3a7532e645 [WebAssembly] Support f16 libcalls
Add support for f16 libcalls in WebAssembly. This entails adding signatures
for the remaining F16 libcalls, and renaming gnu_f2h_ieee/gnu_h2f_ieee to
truncsfhf2/extendhfsf2 for consistency between f32 and f64/f128 (compiler-rt
already supports this).

Differential Revision: https://reviews.llvm.org/D61287

Reviewer: dschuff
llvm-svn: 359600
2019-04-30 19:17:59 +00:00
Craig Topper cad318014e [X86] Remove if that's always true
It's been like this since it was added in a refactor of this code.

Fixes PR41659

llvm-svn: 359597
2019-04-30 19:02:15 +00:00
Evandro Menezes ea349f3ef5 [SimplifyLibCalls] Clean up code (NFC)
Fix pointer check after dereferencing (PR41665).

llvm-svn: 359595
2019-04-30 18:35:38 +00:00
Craig Topper 3958719dda [X86] If PreprocessISelDAG reorders a load before a call, make sure we remove dead nodes from the graph
The reordering can leave at least a dead TokenFactor in the graph. This cause the linearize scheduler to fail with something like the assert seen in PR22614. This is only one of many ways we can break the linearize scheduler today so I can't say for sure that any of the other failures in that bug were caused by this issue.

This takes the heavy hammer approach of just running RemoveDeadNodes unconditionally at the end of the PreprocessISelDAG. If this turns out to be a compile time hit, we can try to refine it.

Differential Revision: https://reviews.llvm.org/D61164

llvm-svn: 359582
2019-04-30 17:56:47 +00:00
Craig Topper 965d1306ae [X86] Initial cleanups on the FixupLEAs pass. Separate Atom LEA creation from other LEA optimizations.
This removes some of the class variables. Merge basic block processing into
runOnMachineFunction to keep the flags local.

Pass MachineBasicBlock around instead of an iterator. We can get the iterator in
the few places that need it. Allows a range-based outer for loop.

Separate the Atom optimization from the rest of the optimizations. This allows
fixupIncDec to create INC/DEC and still allow Atom to turn it back into LEA
when profitable by its heuristics.

I'd like to improve fixupIncDec to turn LEAs into ADD any time the base or index
register is equal to the destination register. This is profitable regardless of
the various slow flags. But again we would want Atom to be able to undo that.

Differential Revision: https://reviews.llvm.org/D60993

llvm-svn: 359581
2019-04-30 17:56:28 +00:00
Nico Weber 98ca8da55e Re-reland "[Option] Fix PR37006 prefix choice in findNearest"
This was first reviewed in https://reviews.llvm.org/D46776 and
landed in r332299, but got reverted because it broke the PS4
bots.

https://reviews.llvm.org/D50410 fixed this, and then this
change was re-reviewed at https://reviews.llvm.org/D50515 and
relanded in r341329. It got reverted due to causing MSan issues.
However, nobody wrote down the error message and the bot link
is dead, so I'm relanding this to capture the MSan error.
I'll then either fix it, or copy it somewhere and revert if
fixing looks difficult.

llvm-svn: 359580
2019-04-30 17:46:00 +00:00
Sanjay Patel 0387bf5269 [SelectionDAG] remove div-by-zero constant folding restriction
We don't have this restriction in IR, so it should not be here
either simply out of consistency. Code that wants to handle FP
exceptions is expected to use the 'strict' variants of these
nodes.

We don't get the frem case because frem by 0.0 produces NaN (invalid),
and that's the remaining check here (so the removed check for frem
was dead code AFAIK).

This is the only place in SDAG that uses "HasFPExceptions", so I
think we should remove that entirely as a follow-up patch.

llvm-svn: 359566
2019-04-30 14:37:15 +00:00
Simon Pilgrim 123e04b8a8 [TableGen] Fix null pointer dereferencing in token parser.
Reported in https://www.viva64.com/en/b/0629/

llvm-svn: 359559
2019-04-30 13:09:55 +00:00
Simon Pilgrim f5e8f222d6 Revert rL359519 : [MemorySSA] Invalidate MemorySSA if AA or DT are invalidated.
Summary:
MemorySSA keeps internal pointers of AA and DT.
If these get invalidated, so should MemorySSA.

Reviewers: george.burgess.iv, chandlerc

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61043
........
This was causing windows build bot failures

llvm-svn: 359555
2019-04-30 12:34:21 +00:00
Sjoerd Meijer ea31ddb36f [ARM] Implement TTI::getMemcpyCost
This implements TargetTransformInfo method getMemcpyCost, which estimates the
number of instructions to which a memcpy instruction expands to.

Differential Revision: https://reviews.llvm.org/D59787

llvm-svn: 359547
2019-04-30 10:28:50 +00:00
Simon Pilgrim 22641cc194 Fix for bug 41512: lower INSERT_VECTOR_ELT(ZeroVec, 0, Elt) to SCALAR_TO_VECTOR(Elt) for all SSE flavors
Current LLVM uses pxor+pinsrb on SSE4+ for INSERT_VECTOR_ELT(ZeroVec, 0, Elt) insead of much simpler movd.
INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is idiomatic construct which is used e.g. for _mm_cvtsi32_si128(Elt) and for lowest element initialization in _mm_set_epi32.
So such inefficient lowering leads to significant performance digradations in ceratin cases switching from SSSE3 to SSE4.
https://bugs.llvm.org/show_bug.cgi?id=41512

Here INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is simply converted to SCALAR_TO_VECTOR(Elt) when applicable since latter is closer match to desired behavior and always efficiently lowered to movd and alike.

Committed on behalf of @Serge_Preis (Serge Preis)

Differential Revision: https://reviews.llvm.org/D60852

llvm-svn: 359545
2019-04-30 10:18:25 +00:00
Sjoerd Meijer 0ed4619679 [TargetLowering] findOptimalMemOpLowering. NFCI.
This was a local static funtion in SelectionDAG, which I've promoted to
TargetLowering so that I can reuse it to estimate the cost of a memory
operation in D59787.

Differential Revision: https://reviews.llvm.org/D59766

llvm-svn: 359543
2019-04-30 10:09:15 +00:00
Diana Picus 59a4c0481a [ARM GlobalISel] Widen small shift operands
The legalizer was already widening the shift amount. Add tests for that
behaviour, and also support widening the shifted value.

llvm-svn: 359542
2019-04-30 09:24:43 +00:00
Fangrui Song 7bce25cd7d [AsmPrinter] Make AsmPrinter::HandlerInfo::Handler a unique_ptr
Handlers.clear() in AsmPrinter::doFinalization() will destroy these handlers.
A unique_ptr makes the ownership clearer.

llvm-svn: 359541
2019-04-30 09:14:02 +00:00
Diana Picus 1e88ac213b [ARM GlobalISel] Be more careful about bailing out
Bail out on function arguments/returns with types aggregating an
unsupported type. This fixes cases where we would happily and
incorrectly lower functions taking e.g. [1 x i64] parameters, when we
don't even support plain i64 yet.

llvm-svn: 359540
2019-04-30 09:05:25 +00:00
Sjoerd Meijer 180f1ae57c [TargetLowering] Change getOptimalMemOpType to take a function attribute list
The MachineFunction wasn't used in getOptimalMemOpType, but more importantly,
this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType.

This is the groundwork for the changes in D59766 and D59787, that allows
implementation of TTI::getMemcpyCost.

Differential Revision: https://reviews.llvm.org/D59785

llvm-svn: 359537
2019-04-30 08:38:12 +00:00
Alexander Potapenko 06d00afa61 MSan: handle llvm.lifetime.start intrinsic
Summary:
When a variable goes into scope several times within a single function
or when two variables from different scopes share a stack slot it may
be incorrect to poison such scoped locals at the beginning of the
function.
In the former case it may lead to false negatives (see
https://github.com/google/sanitizers/issues/590), in the latter - to
incorrect reports (because only one origin remains on the stack).

If Clang emits lifetime intrinsics for such scoped variables we insert
code poisoning them after each call to llvm.lifetime.start().
If for a certain intrinsic we fail to find a corresponding alloca, we
fall back to poisoning allocas for the whole function, as it's now
impossible to tell which alloca was missed.

The new instrumentation may slow down hot loops containing local
variables with lifetime intrinsics, so we allow disabling it with
-mllvm -msan-handle-lifetime-intrinsics=false.

Reviewers: eugenis, pcc

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60617

llvm-svn: 359536
2019-04-30 08:35:14 +00:00
Markus Lavin a475da36eb [DebugInfo] DW_OP_deref_size in PrologEpilogInserter.
The PrologEpilogInserter need to insert a DW_OP_deref_size before
prepending a memory location expression to an already implicit
expression to avoid having the existing expression act on the memory
address instead of the value behind it.

The reason for using DW_OP_deref_size and not plain DW_OP_deref is that
big-endian targets need to read the right size as simply truncating a
larger read would yield the wrong result (LSB bytes are not at the lower
address).

This re-commit fixes issues reported in the first one. Namely deref was
inserted under wrong conditions and additionally the deref_size argument
was incorrectly encoded.

Differential Revision: https://reviews.llvm.org/D59687

llvm-svn: 359535
2019-04-30 07:58:57 +00:00
Zi Xuan Wu 49d60fdc2e [DAGCombiner] Do not generate ISD::ADDE node if adde is not legal for the target when combine ISD::TRUNC node
Do not combine (trunc adde(X, Y, Carry)) into (adde trunc(X), trunc(Y), Carry), 
if adde is not legal for the target. Even it's at type-legalize phase. 
Because adde is special and will not be legalized at operation-legalize phase later.

This fixes: PR40922
https://bugs.llvm.org/show_bug.cgi?id=40922

Differential Revision: https://reviews.llvm.org//D60854

llvm-svn: 359532
2019-04-30 03:01:14 +00:00
Lang Hames b12867230c [ORC] Allow JITDylib definition generators to return Errors.
Background: A definition generator can be attached to a JITDylib to generate
new definitions in response to queries. For example: a generator that forwards
calls to dlsym can map symbols from a dynamic library into the JIT process on
demand.

If definition generation fails then the generator should be able to return an
error. This allows the JIT API to distinguish between the case where a
generator does not provide a definition, and the case where it was not able to
determine whether it provided a definition due to an error.

The immediate motivation for this is cross-process symbol lookups: If the
remote-lookup generator is attached to a JITDylib early in the search list, and
if a generator failure is misinterpreted as "no definition in this JITDylib" then
lookup may continue and bind to a different definition in a later JITDylib, which
is a bug.

llvm-svn: 359521
2019-04-30 00:03:26 +00:00
Alina Sbirlea 9a1edd14a2 [MemorySSA] Invalidate MemorySSA if AA or DT are invalidated.
Summary:
MemorySSA keeps internal pointers of AA and DT.
If these get invalidated, so should MemorySSA.

Reviewers: george.burgess.iv, chandlerc

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61043

llvm-svn: 359519
2019-04-29 23:53:04 +00:00
Nico Weber e577be4ed1 [PDB] Fix hash function used to write /src/headerblock
lld-link used to write PDB files that DIA couldn't recover natvis
files from if:

- The global strings table was > 64kiB
- There were at least 3 natvis files

The cause was that the hash function for the /src/headerblock stream
was incorrect: It needs to be truncated to 16 bit.

If the global strings table was <= 64kiB, truncating to 16 bit is a
no-op, so this wasn't needed for small programs.

If there are only 1 or 2 natvis files, then the growth strategy in
HashTable::grow() would mean the hash table would have 2 buckets (for 1
natvis file) or 4 buckets (for 4 natvis files), and since the hash
function is used modulo number of buckets, and since 2 and 4 divide
0x10000, the missing `% 0x10000` is a no-op there too. For 3 natvis
files, the hash table grows to 6 buckets, which has a factor that's not
common with 0x10000 and the difference starts to matter.

Fixes PR41626.

Differential Revision: https://reviews.llvm.org/D61277

llvm-svn: 359515
2019-04-29 23:09:35 +00:00
Lang Hames eb14dc7585 [ORC] Replace the LLJIT/LLLazyJIT Create methods with Builder utilities.
LLJITBuilder and LLLazyJITBuilder construct LLJIT and LLLazyJIT instances
respectively. Over time these will allow more configurable options to be
added while remaining easy to use in the default case, which for default
in-process JITing is now:

auto J = ExitOnErr(LLJITBuilder.create());

llvm-svn: 359511
2019-04-29 22:37:27 +00:00
Dan Gohman 8d6e80f959 [WebAssembly] Make an assertion message prettier. NFC.
This is a follow-up to https://reviews.llvm.org/D59521.

llvm-svn: 359509
2019-04-29 22:37:08 +00:00
Steven Wu 6c9f6fd11b [ThinLTO] Adding architecture name into saved object filename
Summary:
For ThinLTOCodegenerator, it has an option to save the object file
outputs into a directory which is essential for debug info. Tools like lldb
and dsymutil will look for these object files for debug info.

On Darwin platform, you can link fat binaries with one single clang
driver invocation like:
 $ clang -arch x86_64 -arch i386 -Wl,-object_path_lto,$TMPDIR ...
Unfornately, the output object files for one architecture is going to
overwrite the previous ones and one architecture slice will end up with
no debug info. One example for this is to turn on ThinLTO for sanitizer
dylibs in compiler-rt project.

To fix the issue, add the name for the architecture into the name of the
output object file.

rdar://problem/35482935

Reviewers: tejohnson, bd1976llvm, dexonsmith, JDevlieghere

Reviewed By: dexonsmith

Subscribers: mehdi_amini, aprantl, inglorion, eraman, hiraditya, jkorous, dang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60924

llvm-svn: 359508
2019-04-29 21:39:54 +00:00
Dan Gohman 8306cb5702 [WebAssembly] Define the signature for __stack_chk_fail
The WebAssembly backend needs to know the signatures of all runtime
libcall functions. This adds the signature for __stack_chk_fail which was
previously missing.

Also, make the error message for a missing libcall include the name of
the function.

Differential Revision: https://reviews.llvm.org/D59521

Reviewed By: sbc100

llvm-svn: 359505
2019-04-29 21:09:44 +00:00
Roland Froese 728e139700 [PowerPC] Try harder to avoid load/move-to VSR for partial vector loads
Change the PPCISelLowering.cpp function that decides to avoid update form in
favor of partial vector loads to know about newer load types and to not be
confused by the chain operand.

Differential Revision: https://reviews.llvm.org/D60102

llvm-svn: 359504
2019-04-29 21:08:35 +00:00
Jessica Paquette 7f6fe7c02c [GlobalISel][AArch64] Select llvm.aarch64.crypto.sha1h
This was falling back and gives us a reason to create a selectIntrinsic function
which we would need eventually anyway. Update arm64-crypto.ll to show that we
correctly select it.

Also factor out the code for finding an intrinsic ID.

llvm-svn: 359501
2019-04-29 20:58:17 +00:00
Martin Storsjo c0d138d147 [X86] Run CFIInstrInserter on Windows if Dwarf is used
This is necessary since SVN r330706, as tail merging can include
CFI instructions since then.

This fixes PR40322 and PR40012.

Differential Revision: https://reviews.llvm.org/D61252

llvm-svn: 359496
2019-04-29 20:25:51 +00:00
Simon Pilgrim 028485d7b9 [X86][SSE] isHorizontalBinOp - add support for target shuffles
Add target shuffle decoding to isHorizontalBinOp as well as ISD::VECTOR_SHUFFLE support.

This does mean we can go through bitcasts so we need to bitcast the extracted args to ensure they are the correct type

Fixes PR39936 and should help with PR39920/PR39921

Differential Revision: https://reviews.llvm.org/D61245

llvm-svn: 359491
2019-04-29 19:52:59 +00:00
Simon Pilgrim 9b17b80a0e computePolynomialFromPointer - add missing early-out return for non-pointer types.
Reported in https://www.viva64.com/en/b/0629/

llvm-svn: 359486
2019-04-29 19:25:16 +00:00
Sanjay Patel a706b9a90e [InstCombine] reduce code duplication; NFC
Follow-up to:
rL359482

Avoid this potential problem throughout by giving the type a name
and verifying the assumption that both operands are the same type.

llvm-svn: 359485
2019-04-29 19:23:44 +00:00
Simon Pilgrim 4559739f7c Remove duplicate line. NFCI.
Reported in https://www.viva64.com/en/b/0629/

llvm-svn: 359483
2019-04-29 18:58:32 +00:00
Simon Pilgrim e3c8776172 [InstCombine] visitFCmpInst - appease copy+paste pattern warning. NFCI.
PVS Studio's copy+paste recognizer was seeing this as a typo, technically Op0/Op1 in a fcmp should always be the same type, but we might as well avoid the issue.

Reported in https://www.viva64.com/en/b/0629/

llvm-svn: 359482
2019-04-29 18:52:19 +00:00
Daniel Sanders 8f079844d0 [globalisel] Improve Legalizer debug output
* LegalizeAction should be printed by name rather than number
* Newly created instructions are incomplete at the point the observer first sees
  them. They are therefore recorded in a small vector and printed just before
  the legalizer moves on to another instruction. By this point, the instruction
  must be complete.

llvm-svn: 359481
2019-04-29 18:45:59 +00:00
Don Hinton 89e583b843 [CommandLine] Don't allow unlimitted dashes for options. Part 1 or 5
Summary:
Prior to this patch, the CommandLine parser would strip an
unlimitted number of dashes from options.  This patch limits it to
two.

Reviewers: rnk

Reviewed By: rnk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61229

llvm-svn: 359480
2019-04-29 18:34:18 +00:00
Simon Pilgrim 0a5c2b2449 [X86] scaleShuffleMask - avoid potential signed overflow warning.
Use size_t assignment to prevent a bad explicit type conversion warning.

Given the typical size of shuffle masks this was never going to happen, but this at least stops the warning.

Reported in https://www.viva64.com/en/b/0629/

llvm-svn: 359479
2019-04-29 18:32:06 +00:00
Simon Pilgrim 1c4c641ebc [TextAPI] Fix Symbol::dump which was failing to append the SymbolKind string.
Reported in https://www.viva64.com/en/b/0629/

llvm-svn: 359478
2019-04-29 18:25:04 +00:00
Bjorn Pettersson 820994572c [DAG] Refactor DAGCombiner::ReassociateOps
Summary:
Extract the logic for doing reassociations
from DAGCombiner::reassociateOps into a helper
function DAGCombiner::reassociateOpsCommutative,
and use that helper to trigger reassociation
on the original operand order, or the commuted
operand order.

Codegen is not identical since the operand order will
be different when doing the reassociations for the
commuted case. That causes some unfortunate churn in
some test cases. Apart from that this should be NFC.

Reviewers: spatel, craig.topper, tstellar

Reviewed By: spatel

Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61199

llvm-svn: 359476
2019-04-29 17:50:10 +00:00
Thomas Preud'homme 15cb1f1501 FileCheck [3/12]: Stricter parsing of @LINE expressions
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch gives earlier and better
diagnostics for the @LINE expressions.

Rather than detect parsing errors at matching time, this commit adds
enhance parsing to detect issues with @LINE expressions at parse time
and diagnose them more accurately.

Copyright:
    - Linaro (changes up to diff 183612 of revision D55940)
    - GraphCore (changes in later versions of revision D55940 and
                 in new revision created off D55940)

Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk

Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60383

llvm-svn: 359475
2019-04-29 17:46:26 +00:00
Simon Pilgrim 19cde62008 Avoid "checking a pointer after dereferencing" warning. NFCI.
Reported in https://www.viva64.com/en/b/0629/

llvm-svn: 359473
2019-04-29 17:38:18 +00:00
Simon Pilgrim 6f349d8c39 Move if() to newline to stop ambiguity over whether it should be else if. NFCI.
Reported in https://www.viva64.com/en/b/0629/

llvm-svn: 359472
2019-04-29 17:34:26 +00:00
Simon Pilgrim 2755b73ba0 Fix operator precedence warning. NFCI.
Reported in https://www.viva64.com/en/b/0629/

llvm-svn: 359469
2019-04-29 17:04:14 +00:00
Simon Pilgrim 864cf8e274 Remove superfluous break from switch statement. NFCI.
Reported in https://www.viva64.com/en/b/0629/

llvm-svn: 359467
2019-04-29 16:45:35 +00:00
Quentin Colombet 31ce274207 [BlockExtractor] Expose a constructor for the group extraction
NFC

Differential Revision: https://reviews.llvm.org/D60971

llvm-svn: 359463
2019-04-29 16:14:02 +00:00
Quentin Colombet ae2cbb3400 [BlockExtractor] Change the basic block separator from ',' to ';'
This change aims at making the file format be compatible with the
way LLVM handles command line options.

Differential Revision: https://reviews.llvm.org/D60970

llvm-svn: 359462
2019-04-29 16:14:00 +00:00
Simon Pilgrim cbf3501e56 [X86] Remove duplicate string comparison
Fix typo introduced in rL332824 where we simplified the extact string matches for "avx512.mask.permvar.sf.256" and "avx512.mask.permvar.si.256" to a string startswith test for "avx512.mask.permvar."

llvm-svn: 359460
2019-04-29 16:03:35 +00:00
Cullen Rhodes 2c0d5043a7 [AArch64][SVE] Asm: add aliases for unpredicated bitwise logical instructions
This patch adds aliases for element sizes .B/.H/.S to the
AND/ORR/EOR/BIC bitwise logical instructions. The assembler now accepts
these instructions with all element sizes up to 64-bit (.D). The
preferred disassembly is .D.

llvm-svn: 359457
2019-04-29 15:27:27 +00:00
Thomas Preud'homme 5a33047022 FileCheck [2/12]: Stricter parsing of -D option
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch gives earlier and better
diagnostics for the -D option.

Prior to this change, parsing of -D option was very loose: it assumed
that there is an equal sign (which to be fair is now checked by the
FileCheck executable) and that the part on the left of the equal sign
was a valid variable name. This commit adds logic to ensure that this
is the case and gives diagnostic when it is not, making it clear that
the issue came from a command-line option error. This is achieved by
sharing the variable parsing code into a new function ParseVariable.

Copyright:
    - Linaro (changes up to diff 183612 of revision D55940)
    - GraphCore (changes in later versions of revision D55940 and
                 in new revision created off D55940)

Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk

Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60382

llvm-svn: 359447
2019-04-29 13:32:36 +00:00
Yevgeny Rouban 0822bfc6de [LoopSimplifyCFG] Suppress expensive DomTree verification
This patch makes verification level lower for builds with
inexpensive checks.

Differential Revision: https://reviews.llvm.org/D61055

llvm-svn: 359446
2019-04-29 13:29:55 +00:00
Diogo N. Sampaio d95abb170b [ARM] Add bitcast/extract_subvec. of fp16 vectors
Summary:
This patch adds some basic operations for fp16
vectors, such as bitcast from fp16 to i16,
required to perform extract_subvector (also added
here) and extract_element.

Reviewers: SjoerdMeijer, DavidSpickett, t.p.northover, ostannard

Reviewed By: ostannard

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60618

llvm-svn: 359433
2019-04-29 10:28:07 +00:00
Diogo N. Sampaio 2078eb745d [ARM] Add v4f16 and v8f16 types to the CallingConv
Summary:
The Procedure Call Standard for the Arm Architecture
states that float16x4_t and float16x8_t behave just
as uint16x4_t and uint16x8_t for argument passing.
This patch adds the fp16 vectors to the
ARMCallingConv.td file.

Reviewers: miyuki, ostannard

Reviewed By: ostannard

Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60720

llvm-svn: 359431
2019-04-29 10:10:37 +00:00
David Chisnall 714a4425de Try to use /proc on FreeBSD for getExecutablePath
Currently, clang's libTooling passes this function a fake argv0, which
means that no libTooling tools can find the standard headers on FreeBSD.
With this change, these will now work on any FreeBSD systems that have
procfs mounted.  This isn't the right fix for the libTooling issue, but
it does bring the FreeBSD implementation of getExecutablePath closer to
the Linux and macOS implementations.

llvm-svn: 359427
2019-04-29 09:24:51 +00:00
Jeremy Morse 055aee1d8a [DebugInfo] Terminate more location-list ranges at the end of blocks
This patch fixes PR40795, where constant-valued variable locations can
"leak" into blocks placed at higher addresses. The root of this is that
DbgEntityHistoryCalculator terminates all register variable locations at
the end of each block, but not constant-value variable locations.

Fixing this requires constant-valued DBG_VALUE instructions to be
broadcast into all blocks where the variable location remains valid, as
documented in the LiveDebugValues section of SourceLevelDebugging.rst,
and correct termination in DbgEntityHistoryCalculator.

Differential Revision: https://reviews.llvm.org/D59431

llvm-svn: 359426
2019-04-29 09:13:16 +00:00
Fangrui Song 97b8cd54ad [DWARF] Fix dump of local/foreign TU lists in .debug_names
Differential Revision: https://reviews.llvm.org/D61241

llvm-svn: 359425
2019-04-29 08:55:10 +00:00
Fangrui Song cc1fec31d9 [DWARF] Delete a redundant check in getFileNameByIndex()
llvm-svn: 359422
2019-04-29 08:15:13 +00:00
Craig Topper 9202d5f8f1 [X86] Remove some intel syntax aliases on (v)cvtpd2(u)dq, (v)cvtpd2ps, (v)cvt(u)qq2ps. Add 'x' and'y' suffix aliases to masked version of the same in att syntax.
The 128/256 bit version of these instructions require an 'x' or 'y' suffix to
disambiguate the memory form in att syntax.

We were allowing the same suffix in intel syntax, but it appears gas does not
do that.

gas does allow the 'x' and 'y' suffix on register and broadcast forms even
though its not needed. We were allowing it on unmasked register form, but not on
masked versions or on masked or unmasked broadcast form.

While there fix some test coverage holes so they can be extended with the 'x'
and 'y' suffix tests.

llvm-svn: 359418
2019-04-29 06:13:41 +00:00
Nico Weber cf6267cecb llvm-cvtres: Attempt to make llvm-cvtres/duplicate.test work on big-endian systems
llvm-svn: 359414
2019-04-29 00:51:41 +00:00
Simon Pilgrim d5cc753b6d [X86][SSE] combineExtractVectorElt - add early-out to return zero/undef for out-of-range extraction indices.
llvm-svn: 359406
2019-04-28 19:12:58 +00:00
Nikita Popov 7a94795b2b [ConstantRange] Add makeExactNoWrapRegion()
I got confused on the terminology, and the change in D60598 was not
correct. I was thinking of "exact" in terms of the result being
non-approximate. However, the relevant distinction here is whether
the result is

 * Largest range such that:
   Forall Y in Other: Forall X in Result: X BinOp Y does not wrap.
   (makeGuaranteedNoWrapRegion)
 * Smallest range such that:
   Forall Y in Other: Forall X not in Result: X BinOp Y wraps.
   (A hypothetical makeAllowedNoWrapRegion)
 * Both. (makeExactNoWrapRegion)

I'm adding a separate makeExactNoWrapRegion method accepting a
single APInt (same as makeExactICmpRegion) and using it in the
places where the guarantee is relevant.

Differential Revision: https://reviews.llvm.org/D60960

llvm-svn: 359402
2019-04-28 15:40:56 +00:00
Simon Pilgrim 22d1476bfa [X86][AVX] Combine non-lane crossing binary shuffles using X86ISD::VPERMV3
Some of the combines might be further improved if we lower more shuffles with X86ISD::VPERMV3 directly, instead of waiting to combine the results.

llvm-svn: 359400
2019-04-28 14:31:01 +00:00
Sanjay Patel fb9a5307a9 [DAGCombiner] try repeated fdiv divisor transform before building estimate
This was originally part of D61028, but it's an independent diff.

If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.

The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.

Differential Revision: https://reviews.llvm.org/D61149

llvm-svn: 359398
2019-04-28 12:23:43 +00:00
Simon Pilgrim 93ad48210c [X86][SSE] Optimize llvm.experimental.vector.reduce.xor.vXi1 parity reduction (PR38840)
An xor reduction of a bool vector can be optimized to a parity check of the MOVMSK/BITCAST'd integer - if the population count is odd return 1, else return 0.

Differential Revision: https://reviews.llvm.org/D61230

llvm-svn: 359396
2019-04-28 10:46:17 +00:00
Craig Topper bd35a30940 [X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead
Summary:
The register form of these instructions are CodeGenOnly instructions that cover
GR32->FR32 and GR64->FR64 bitcasts. There is a similar set of instructions for
the opposite bitcast. Due to the patterns using bitcasts these instructions get
marked as "bitcast" machine instructions as well. The peephole pass is able to
look through these as well as other copies to try to avoid register bank copies.

Because FR32/FR64/VR128 are all coalescable to each other we can end up in a
situation where a GR32->FR32->VR128->FR64->GR64 sequence can be reduced to
GR32->GR64 which the copyPhysReg code can't handle.

To prevent this, this patch removes one set of the 'bitcast' instructions. So
now we can only go GR32->VR128->FR32 or GR64->VR128->FR64. The instruction that
converts from GR32/GR64->VR128 has no special significance to the peephole pass
and won't be looked through.

I guess the other option would be to add support to copyPhysReg to just promote
the GR32->GR64 to a GR64->GR64 copy. The upper bits were basically undefined
anyway. But removing the CodeGenOnly instruction in favor of one that won't be
optimized seemed safer.

I deleted the peephole test because it couldn't be made to work with the bitcast
instructions removed.

The load version of the instructions were unnecessary as the pattern that selects
them contains a bitcasted load which should never happen.

Fixes PR41619.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61223

llvm-svn: 359392
2019-04-28 06:25:33 +00:00
Simon Pilgrim 03c4e2663c Revert rL359389: [X86][SSE] Add support for <64 x i1> bool reduction
Minor generalization of the existing <32 x i1> pre-AVX2 split code.
........
Causing irregular buildbot failures.

llvm-svn: 359391
2019-04-27 20:44:08 +00:00
Simon Pilgrim 4118be3af6 [X86][SSE] Add support for <64 x i1> bool reduction
Minor generalization of the existing <32 x i1> pre-AVX2 split code.

llvm-svn: 359389
2019-04-27 20:04:44 +00:00
Simon Pilgrim 2a2d422400 [X86][AVX512] Improve vector bool reductions
As predicate masks are legal on AVX512 targets, we avoid MOVMSK in these cases, but we can just bitcast the bool vector to the integer equivalent directly - avoiding expansion of the reduction to a shuffle pattern.

llvm-svn: 359386
2019-04-27 17:32:46 +00:00
Fangrui Song 795c00b21f [DJB] Fix variable case after D61178
llvm-svn: 359381
2019-04-27 15:33:22 +00:00
Simon Pilgrim acc1e6d1c6 [X86][AVX] Merge mask select with shuffles across extract_subvector (PR40332)
Fixes PR40332 in the limited case where we're selecting between a target shuffle and a zero vector.

We can extend this in the future to handle more opcodes and non-zero selections.

llvm-svn: 359378
2019-04-27 13:35:32 +00:00
Andrea Di Biagio d77dc9ada2 [MCA] Add field `IsEliminated` to class Instruction. NFCI
llvm-svn: 359377
2019-04-27 11:59:11 +00:00
Craig Topper 063b471ff7 [X86] Use MOVQ for i64 atomic_stores when SSE2 is enabled
Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store.

Reviewers: spatel, RKSimon, reames, jfb, efriedma

Reviewed By: RKSimon

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60546

llvm-svn: 359368
2019-04-27 03:38:15 +00:00
Mark Searles 76c5b62988 Revert "AMDGPU: Split block for si_end_cf"
This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4.

We discovered some internal test failures, so reverting for now.

Differential Revision: https://reviews.llvm.org/D61213

llvm-svn: 359363
2019-04-27 00:51:18 +00:00
Stanislav Mekhanoshin 4f331cb1f3 [AMDGPU] gfx1010 VOPC implementation
Differential Revision: https://reviews.llvm.org/D61208

llvm-svn: 359358
2019-04-26 23:16:16 +00:00
Lang Hames a9fdf375b3 [ORC] Add a 'plugin' interface to ObjectLinkingLayer for events/configuration.
ObjectLinkingLayer::Plugin provides event notifications when objects are loaded,
emitted, and removed. It also provides a modifyPassConfig callback that allows
plugins to modify the JITLink pass configuration.

This patch moves eh-frame registration into its own plugin, and teaches
llvm-jitlink to only add that plugin when performing execution runs on
non-Windows platforms. This should allow us to re-enable the test case that was
removed in r359198.

llvm-svn: 359357
2019-04-26 22:58:39 +00:00
Jessica Paquette 76f64b665b [GlobalISel][AArch64] Use getConstantVRegValWithLookThrough for extracts
getConstantVRegValWithLookThrough does the same thing as the
getConstantValueForReg function, and has more visibility across GISel. Plus, it
supports looking through G_TRUNC, G_SEXT, and G_ZEXT. So, we get better code
reuse and more functionality for free by using it.

Add some test cases to select-extract-vector-elt.mir to show that we can now
look through those instructions.

llvm-svn: 359351
2019-04-26 21:53:13 +00:00
Nick Desaulniers 7ab164c4a4 [AsmPrinter] refactor to support %c w/ GlobalAddress'
Summary:
Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when
printing the address of a MachineOperand::MO_GlobalAddress. Move that
handling into a new overriden method in each base class. A virtual
method was added to the base class for handling the generic case.

Refactors a few subclasses to support the target independent %a, %c, and
%n.

The patch also contains small cleanups for AVRAsmPrinter and
SystemZAsmPrinter.

It seems that NVPTXTargetLowering is possibly missing some logic to
transform GlobalAddressSDNodes for
TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended
inline assembly asm constraints.

Fixes:
- https://bugs.llvm.org/show_bug.cgi?id=41402
- https://github.com/ClangBuiltLinux/linux/issues/449

Reviewers: echristo, void

Reviewed By: void

Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60887

llvm-svn: 359337
2019-04-26 18:45:04 +00:00
Simon Pilgrim 27e01e675c [X86][AVX] Fold extract_subvector(broadcast(x)) -> broadcast(x) iff x has one use
llvm-svn: 359332
2019-04-26 18:02:14 +00:00
Jessica Paquette 67ab9eb193 [AArch64][GlobalISel] Select G_BSWAP for vectors of s32 and s64
There are instructions for these, so mark them as legal. Select the correct
instruction in AArch64InstructionSelector.cpp.

Update select-bswap.mir and arm64-rev.ll to reflect the changes.

llvm-svn: 359331
2019-04-26 18:00:01 +00:00
Stanislav Mekhanoshin 61beff020e [AMDGPU] gfx1010 VOP3 and VOP3P implementation
Differential Revision: https://reviews.llvm.org/D61202

llvm-svn: 359328
2019-04-26 17:56:03 +00:00
Simon Pilgrim ef54b1dddf [DAGCombine] Cleanup visitEXTRACT_SUBVECTOR. NFCI.
Use ArrayRef::slice, reduce some rather awkward long lines for legibility and run clang-format.

llvm-svn: 359326
2019-04-26 17:49:02 +00:00
Nikita Popov c0fa4ec01d [ConstantRange] Add abs() support
Add support for abs() to ConstantRange. This will allow to handle
SPF_ABS select flavor in LVI and will also come in handy as a
primitive for the srem implementation.

The implementation is slightly tricky, because a) abs of signed min
is signed min and b) sign-wrapped ranges may have an abs() that is
smaller than a full range, so we need to explicitly handle them.

Differential Revision: https://reviews.llvm.org/D61084

llvm-svn: 359321
2019-04-26 16:50:31 +00:00
Craig Topper 354247c08d [X86] Sink NoRegister creation for unused Base/Index registers into getAddressOperands. NFCI
llvm-svn: 359318
2019-04-26 16:39:38 +00:00
Craig Topper ad662cf4c1 [X86] Segment registers should have i16 type not i32.
Probably doesn't really matter, but was inconsistent with the rest of the code.

llvm-svn: 359317
2019-04-26 16:39:35 +00:00
Stanislav Mekhanoshin 8f3da70eed [AMDGPU] gfx1010 VOP2 changes
Differential Revision: https://reviews.llvm.org/D61156

llvm-svn: 359316
2019-04-26 16:37:51 +00:00
Roland Froese 4b17772b9e [PowerPC] Update P9 vector costs for insert/extract element
The PPC vector cost model values for insert/extract element reflect older
processors that lacked vector insert/extract and move-to/move-from VSR
instructions.  Update getVectorInstrCost to give appropriate values for when
the newer instructions are present.

Differential Revision: https://reviews.llvm.org/D60160

llvm-svn: 359313
2019-04-26 16:14:17 +00:00
Fangrui Song 3153764c88 s/Dwarf 5/DWARF v5/ NFC
llvm-svn: 359307
2019-04-26 13:41:19 +00:00
Simon Pilgrim c3a34c3e07 Fix Wparentheses warning. NFCI.
llvm-svn: 359299
2019-04-26 12:23:42 +00:00
Simon Pilgrim bb230c5e79 [X86][SSE] Pull out OR(EXTRACTELT(X,0),OR(EXTRACTELT(X,1),...)) matching code from LowerVectorAllZeroTest
Create a matchBitOpReduction helper that checks for the pattern with any opcode.

First step towards reusing this code to recognize other scalar reduction patterns.

llvm-svn: 359296
2019-04-26 11:45:54 +00:00
Fangrui Song 50dcd8bf90 caseFoldingDjbHash: simplify and make the US-ASCII fast path faster
The slow path (with at least one non US-ASCII) will be slower but that
doesn't matter.

Differential Revision: https://reviews.llvm.org/D61178

llvm-svn: 359294
2019-04-26 10:56:10 +00:00
Simon Pilgrim 5d6ef94c36 [X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 targets (PR40758)
As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs).

Differential Revision: https://reviews.llvm.org/D61068

llvm-svn: 359293
2019-04-26 10:49:13 +00:00
Simon Pilgrim 5e161df9f8 [X86][AVX] Combine shuffles extracted from a common vector
A small step towards combining shuffles across vector sizes - this recognizes when a shuffle's operands are all extracted from the same larger source and tries to combine to an unary shuffle of that source instead. Fixes one of the test cases from PR34380.

Differential Revision: https://reviews.llvm.org/D60512

llvm-svn: 359292
2019-04-26 09:56:14 +00:00
Sven van Haastregt 66f612601d [InferAddressSpaces] Add AS parameter to the pass factory
This enables the pass to be used in the absence of
TargetTransformInfo. When the argument isn't passed, the factory
defaults to UninitializedAddressSpace and the flat address space is
obtained from the TargetTransformInfo as before this change. Existing
users won't have to change.

Patch by Kevin Petit.

Differential Revision: https://reviews.llvm.org/D60602

llvm-svn: 359290
2019-04-26 09:21:25 +00:00
Hans Wennborg 5d5ee4aff7 Fix alignment in AArch64InstructionSelector::emitConstantPoolEntry()
The code was using the alignment of a pointer to the value, not the
alignment of the constant itself.

Maybe we got away with it so far because the pointer alignment is
fairly high, but we did end up under-aligning <16 x i8> vectors,
which was caught in the Chromium build after lld stopped over-aligning
the .rodata.cst16 section in r356428. (See crbug.com/953815)

Differential revision: https://reviews.llvm.org/D61124

llvm-svn: 359287
2019-04-26 08:31:00 +00:00
Marcello Maggioni c596584f67 [GlobalISel] Fix inserting copies in the right position for reg definitions
When constrainRegClass is called if the constraining happens on a use the COPY
needs to be inserted before the instruction that contains the MachineOperand,
but if we are constraining a definition it actually needs to be added
after the instruction. In addition, the COPY needs to have its operands
flipped (in the use case we are copying from the old unconstrained register
to the new constrained register, while in the definition case we are copying
from the new constrained register that the instruction defines to the old
unconstrained register).

llvm-svn: 359282
2019-04-26 07:21:56 +00:00
Justin Bogner df5d2b3846 [GlobalOpt] Swap the expensive check for cold calls with the cheap TTI check
isValidCandidateForColdCC is much more expensive than
TTI.useColdCCForColdCall, which by default just returns false. Avoid
doing this work if we're not going to look at the answer anyway.

This change is NFC, but I see significant compile time improvements on
some code with pathologically many functions.

llvm-svn: 359253
2019-04-26 00:12:50 +00:00
Lang Hames 4f71049a39 [ORC] Remove symbols from dependency lists when failing materialization.
When failing materialization of a symbol X, remove X from the dependants list
of any of X's dependencies. This ensures that when X's dependencies are
emitted (or fail themselves) they do not try to access the no-longer-existing
MaterializationInfo for X.

llvm-svn: 359252
2019-04-25 23:31:33 +00:00
Artem Belevich 5fe85a003f [CUDA] Implemented _[bi]mma* builtins.
These builtins provide access to the new integer and
sub-integer variants of MMA (matrix multiply-accumulate) instructions
provided by CUDA-10.x on sm_75 (AKA Turing) GPUs.

Also added a feature for PTX 6.4. While Clang/LLVM does not generate
any PTX instructions that need it, we still need to pass it through to
ptxas in order to be able to compile code that uses the new 'mma'
instruction as inline assembly (e.g used by NVIDIA's CUTLASS library
https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101)

Differential Revision: https://reviews.llvm.org/D60279

llvm-svn: 359248
2019-04-25 22:28:09 +00:00
Artem Belevich 16737538f4 PTX 6.3 extends `wmma` instruction to support s8/u8/s4/u4/b1 -> s32.
All of the new instructions are still handled mostly by tablegen. I've slightly
refactored the code to drive intrinsic/instruction generation from a master
list of supported variants, so all irregularities have to be implemented in one place only.

The test generation script wmma.py has been refactored in a similar way.

Differential Revision: https://reviews.llvm.org/D60015

llvm-svn: 359247
2019-04-25 22:27:57 +00:00
Artem Belevich 8d825b38ed [NVPTX] generate correct MMA instruction mnemonics with PTX63+.
PTX 6.3 requires using ".aligned" in the MMA instruction names.
In order to generate correct name, now we pass current
PTX version to each instruction as an extra constant operand
and InstPrinter adjusts its output accordingly.

Differential Revision: https://reviews.llvm.org/D59393

llvm-svn: 359246
2019-04-25 22:27:46 +00:00
Artem Belevich 7ecd82ce19 [NVPTX] Refactor generation of MMA intrinsics and instructions. NFC.
Generalized constructions of 'fragments' of MMA operations to provide
common primitives for construction of the ops. This will make it easier
to add new variants of the instructions that operate on integer types.

Use nested foreach loops which makes it possible to better control
naming of the intrinsics.

This patch does not affect LLVM's output, so there are no test changes.

Differential Revision: https://reviews.llvm.org/D59389

llvm-svn: 359245
2019-04-25 22:27:35 +00:00
Sean Fertile a93a33cb87 [Object][XCOFF] Add intial support for section header table.
Adds a representation of the section header table to XCOFFObjectFile,
and implements enough to dump the section headers with llvm-obdump.

Differential Revision: https://reviews.llvm.org/D60784

llvm-svn: 359244
2019-04-25 21:36:04 +00:00
Stanislav Mekhanoshin 917c477a07 [AMDGPU] gfx1010 - fix ubsan failure
Revert DecoderNamespace in one place for now. It will need more
changes to properly work.

llvm-svn: 359239
2019-04-25 20:39:06 +00:00
David Blaikie 0c4dbf9ecd Assigning to a local object in a return statement prevents copy elision. NFC.
I added a diagnostic along the lines of `-Wpessimizing-move` to detect `return x = y` suppressing copy elision, but I don't know if the diagnostic is really worth it. Anyway, here are the places where my diagnostic reported that copy elision would have been possible if not for the assignment.

P1155R1 in the post-San-Diego WG21 (C++ committee) mailing discusses whether WG21 should fix this pitfall by just changing the core language to permit copy elision in cases like these.

(Kona update: The bulk of P1155 is proceeding to CWG review, but specifically *not* the parts that explored the notion of permitting copy-elision in these specific cases.)

Reviewed By: dblaikie

Author: Arthur O'Dwyer

Differential Revision: https://reviews.llvm.org/D54885

llvm-svn: 359236
2019-04-25 20:09:00 +00:00
Jessica Paquette f54258c888 [GlobalISel][AArch64] Make G_EXTRACT_VECTOR_ELT legal for v8s16s
This case was missing before, so we couldn't legalize it.

Add it to AArch64LegalizerInfo.cpp and update select-extract-vector-elt.mir.

llvm-svn: 359231
2019-04-25 20:00:57 +00:00
Akira Hatanaka 8edf8f317b [ObjC][ARC] Let ARC optimizer bail out if the number of pointer states
it keeps track of becomes too large

ARC optimizer does a top-down and a bottom-up traversal of the whole
function to pair up retain and release instructions and remove them.
This can be expensive if the number of instructions in the function and
pointer states it tracks are large since it has to look at each pointer
state and determine whether the instruction being visited can
potentially use the pointer.

This patch adds a command line option that sets a limit to the number of
pointers it tracks.

rdar://problem/49477063

Differential Revision: https://reviews.llvm.org/D61100

llvm-svn: 359226
2019-04-25 19:42:55 +00:00
Stanislav Mekhanoshin 2c97ff07bf [AMDGPU] gfx1010 VOP1 instructions
Differential Revision: https://reviews.llvm.org/D61099

llvm-svn: 359225
2019-04-25 19:01:51 +00:00
Stanislav Mekhanoshin 956b0be72e [AMDGPU] gfx1010 utility functions
Differential Revision: https://reviews.llvm.org/D61094

llvm-svn: 359224
2019-04-25 18:53:41 +00:00
Jessica Paquette 8184b6e7f6 [GlobalISel][AArch64] Add generic legalization rule for extends
This adds a legalization rule for G_ZEXT, G_ANYEXT, and G_SEXT which allows
extends whenever the types will fit in registers (or the source is an s1).

Update tests. Add GISel checks throughout all of arm64-vabs.ll,
where we now select a good portion of the code. Add GISel checks to
arm64-subvector-extend.ll, which has a good number of vector extends in it.

Differential Revision: https://reviews.llvm.org/D60889

llvm-svn: 359222
2019-04-25 18:42:00 +00:00
Craig Topper f9c30eddd0 [SelectionDAG][X86] Use stack load/store in PromoteIntRes_BITCAST when the input needs to be be split and the output type is a vector.
We had special case handling here, but it uses a scalar any_extend for the
promotion then bitcasts to the final type. This won't split up the input data
into multiple promoted elements like we need.

This patch falls back to doing the conversion through memory.

Fixes PR41594 which I believe was reflected in the bitcast-vector-bool.ll
changes. The changes to vector-half-conversions.ll are fixing a previously
unknown miscompile from this issue.

Differential Revision: https://reviews.llvm.org/D61114

llvm-svn: 359219
2019-04-25 18:19:59 +00:00
Robert Lougher d469133f95 [Evaluator] Walk initial elements when handling load through bitcast
When evaluating a store through a bitcast, the evaluator tries to move the
bitcast from the pointer onto the stored value. If the cast is invalid, it
tries to "introspect" the type to get a valid cast by obtaining a pointer to
the initial element (if the type is nested, this may require walking several
initial elements).

In some situations it is possible to get a bitcast on a load (e.g. with
unions, where the bitcast may not be the same type as the store). However,
equivalent logic to the store to introspect the type is missing. This patch
add this logic.

Note, when developing the patch I was unhappy with adding similar logic
directly to the load case as it could get out of step. Instead, I have
abstracted the "introspection" into a helper function, with the specifics
being handled by a passed-in lambda function.

Differential Revision: https://reviews.llvm.org/D60793

llvm-svn: 359205
2019-04-25 17:00:01 +00:00
Jessica Paquette ba55767f51 [GlobalISel][AArch64] Legalize G_FNEARBYINT
Add legalizer support for G_FNEARBYINT. It's the same as G_FCEIL etc.

Since the importer allows us to automatically select this after legalization,
also add tests for selection etc. Also update arm64-vfloatintrinsics.ll.

llvm-svn: 359204
2019-04-25 16:44:40 +00:00
Jessica Paquette bd7ac30b15 [GlobalISel] Add IRTranslator support for G_FNEARBYINT
Translate llvm.nearbyint into G_FNEARBYINT as a simple intrinsic. Update
arm64-irtranslator.ll.

Differential Revision: https://reviews.llvm.org/D60922

llvm-svn: 359203
2019-04-25 16:39:28 +00:00
Simon Pilgrim 48a3b54572 [InstCombine][X86] Tweak generic expansion of PACKSS/PACKUS to shuffle then truncate. NFCI.
This has no effect on constant folding but will be useful when we expand non-saturating PACKSS/PACKUS intrinsics.

llvm-svn: 359191
2019-04-25 13:51:57 +00:00
Sam McCall a7edcfb533 [Support] Add JSON streaming output API, faster where the heavy value types aren't needed.
Summary:
There's still a little bit of constant factor that could be trimmed (e.g.
more overloads to avoid round-tripping primitives through json::Value).
But this solves the memory scaling problem, and greatly improves the performance
constant factor, and the API should leave room for optimization if needed.

Adapt TimeProfiler to use it, eliminating almost all the performance regression
from r358476.

Performance test on my machine:
perf stat -r 5 ~/llvmbuild-opt/bin/clang++ -w -S -ftime-trace -mllvm -time-trace-granularity=0 spirit.cpp

Handcrafted JSON (HEAD=r358532 with r358476 reverted): 2480ms
json::Value (HEAD): 2757ms (+11%)
After this patch: 2520 ms (+1.6%)

Reviewers: anton-afanasyev, lebedev.ri

Subscribers: kristina, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60804

llvm-svn: 359186
2019-04-25 12:51:42 +00:00
Fangrui Song f6a6290908 Parallel: only allow the first TaskGroup to run tasks parallelly
Summary:
Concurrent (e.g. nested) llvm::parallel::for_each() may lead to dead
locks. See PR35788 (fixed by rLLD322041) and PR41508 (fixed by D60757).

When parallel_for_each() is about to return, in ~Latch() called by
~TaskGroup(), a thread (in the default executor) may block in
Latch::sync() waiting for Count to become zero. If all threads in the
default executor are blocked, it is a dead lock.

To fix this, force serial execution if the current TaskGroup is not the
first one. For a nested llvm::parallel::for_each(), this parallelizes
the outermost loop and serializes inner loops.

Differential Revision: https://reviews.llvm.org/D61115

llvm-svn: 359182
2019-04-25 11:33:30 +00:00
Florian Hahn 1038137f14 [ConstantRange] [a, b) udiv a full range is [0, umax(b)).
Reviewers: nikic, spatel, efriedma

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D60536

llvm-svn: 359180
2019-04-25 10:12:43 +00:00
Ilya Biryukov 6fae38ec91 [Testing] Move clangd::Annotations to llvm testing support
Summary:
Annotations allow writing nice-looking unit test code when one needs
access to locations from the source code, e.g. running code completion
at particular offsets in a file. See comments in Annotations.cpp for
more details on the API.

Also got rid of a duplicate annotations parsing code in clang's code
complete tests.

Reviewers: gribozavr, sammccall

Reviewed By: gribozavr

Subscribers: mgorny, hiraditya, ioeric, MaskRay, jkorous, arphaman, kadircet, jdoerfert, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D59814

llvm-svn: 359179
2019-04-25 10:08:31 +00:00
George Rimar 45d042ed96 [yaml2obj] - Don't crash on invalid inputs.
yaml2obj might crash on invalid input when unable to parse the YAML.

Recently a crash with a very similar nature was fixed for an empty files. 
This patch revisits the fix and does it in yaml::Input instead.
It seems to be more correct way to handle such situation.

With that crash for invalid inputs is also fixed now.

Differential revision: https://reviews.llvm.org/D61059

llvm-svn: 359178
2019-04-25 09:59:55 +00:00
Simon Pilgrim 4b7d3c4831 Fix include order. NFCI.
llvm-svn: 359177
2019-04-25 09:49:37 +00:00
Simon Pilgrim 0a7d1b3ce1 [X86][SSE] combineBitcastvxi1 - add support for bitcasting to non-scalar integers
Truncate the movmsk scalar integer result to the equivalent scalar integer width as before but then bitcast to the requested type.

We still have the issue identified in PR41594 but D61114 should handle this.

llvm-svn: 359176
2019-04-25 09:34:36 +00:00
Simon Atanasyan a0291110da [MIPS] Use custom bitcast lowering to avoid excessive instructions
On Mips32r2 bitcast can be expanded to two sw instructions and an ldc1
when using bitcast i64 to double or an sdc1 and two lw instructions when
using bitcast double to i64. By introducing custom lowering that uses
mtc1/mthc1 we can avoid excessive instructions.

Patch by Mirko Brkusanin.

Differential Revision: https://reviews.llvm.org/D61069

llvm-svn: 359171
2019-04-25 07:47:28 +00:00
Craig Topper 013503c78d [X86] Remove part of an if condition that should always be true.
The IndexReg will always be non-null at this point. Earlier in the function, if
IndexReg was null we set it to CurDAG->getRegister(0, VT) which made it
non-null.

llvm-svn: 359170
2019-04-25 06:08:02 +00:00
Alina Sbirlea 733c8c40c8 Enable LoopVectorization by default.
Summary:
When refactoring vectorization flags, vectorization was disabled by default in the new pass manager.
This patch re-enables is for both managers, and changes the assumptions opt makes, based on the new defaults.
Comments in opt.cpp should clarify the intended use of all flags to enable/disable vectorization.

Reviewers: chandlerc, jgorbe

Subscribers: jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61091

llvm-svn: 359167
2019-04-25 04:49:48 +00:00
Philip Reames 88cd69b56f Consolidate existing utilities for interpreting vector predicate maskes [NFC]
llvm-svn: 359163
2019-04-25 02:30:17 +00:00
Kit Barton 8e64f0a649 Fix unused variable warning in LoopFusion pass.
Do not wrap the contents of printFusionCandidates in the LLVM_DEBUG macro. This
fixes an unused variable warning generated when compiling without asserts but
with -DENABLE_LLVM_DUMP.

Differential Revision: https://reviews.llvm.org/D61035

llvm-svn: 359161
2019-04-25 02:10:02 +00:00
Philip Reames 7c8647b26f [InstCombine] Be consistent w/handling of masked intrinsics style wise [NFC]
llvm-svn: 359160
2019-04-25 01:18:56 +00:00
Austin Kerbow 83e52142d1 Fix spelling error. NFC
Summary: Test commit.

Reviewers: msearles, jkorous

Reviewed By: jkorous

Subscribers: dexonsmith, arsenm, jvesely, nhaehnle, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61093

llvm-svn: 359154
2019-04-24 23:32:21 +00:00
Nico Weber 23cb79ff93 llvm-cvtres: Make new dupe resource error a bit friendlier
For well-known type IDs, include the name of the type.

To not duplicate the ID->name map, make llvm-readobj call this new
function as well.  It has slightly different output, so this also
requires updating a few tests.

Differential Revision: https://reviews.llvm.org/D61086

llvm-svn: 359153
2019-04-24 23:26:30 +00:00
JF Bastien fb742da34c posix_spawn should retry upon EINTR
Summary:
We've seen cases of bots failing with:
  clang: error: unable to execute command: posix_spawn failed: Interrupted system call

Add a small retry loop to posix_spawn in case this happens. Don't retry too much in case there's some systemic problem going on, but retry a few times.
<rdar://problem/50181448>

Reviewers: Bigcheese, arphaman

Subscribers: jkorous, dexonsmith, kristina, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61096

llvm-svn: 359152
2019-04-24 23:24:53 +00:00
Amy Huang 68c9199493 Recommitting r358783 and r358786 "[MS] Emit S_HEAPALLOCSITE debug info" with fixes for buildbot error (undefined assembler label).
Summary:
This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug
info in codeview. Currently only changes FastISel, so emitting labels still
needs to be implemented in SelectionDAG.

Reviewers: rnk

Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D61083

llvm-svn: 359149
2019-04-24 23:02:48 +00:00
Sanjay Patel 6f41bf948b [DAGCombiner] scale repeated FP divisor by splat factor
If we have a vector FP division with a splatted divisor, use the existing transform
that converts 'x/y' into 'x * (1.0/y)' to allow more conversions. This can then
potentially be converted into a scalar FP division by existing combines (rL358984)
as seen in the tests here.

That can be a potentially big perf difference if scalar fdiv has better timing
(including avoiding possible frequency throttling for vector ops).

Differential Revision: https://reviews.llvm.org/D61028

llvm-svn: 359147
2019-04-24 22:28:58 +00:00
Joerg Sonnenberger 8372b467f1 [PowerPC] Allow using initial-exec TLS with PIC
Using initial-exec TLS variables is a reasonable performance
optimisation for system libraries. Use the correct PIC mechanism to get
hold of the GOT to avoid text relocations.

Differential Revision: https://reviews.llvm.org/D61026

llvm-svn: 359146
2019-04-24 22:12:22 +00:00
Sean Fertile 526633deea Add period at end of comment.
llvm-svn: 359144
2019-04-24 21:51:30 +00:00
Craig Topper 6932abee2c [X86] Attempt to fix use-after-poison from r359121.
llvm-svn: 359143
2019-04-24 21:48:24 +00:00
Stanislav Mekhanoshin 9d287358a8 [AMDGPU] gfx1010 SOP instructions
Differential Revision: https://reviews.llvm.org/D61080

llvm-svn: 359139
2019-04-24 20:44:34 +00:00
Alexey Bataev ef3c1884ec [SLP] Fix crash after r358519, by V. Porpodas.
Summary: The code did not check if operand was undef before casting it to Instruction.

Reviewers: RKSimon, ABataev, dtemirbulatov

Reviewed By: ABataev

Subscribers: uabelho

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61024

llvm-svn: 359136
2019-04-24 20:21:32 +00:00
Reid Kleckner c06a470fc8 Try once more to ensure constant initializaton of ManagedStatics
First, use the old style of linker initialization for MSVC 2019 in
addition to 2017. MSVC 2019 emits a dynamic initializer for
ManagedStatic when compiled in debug mode, and according to zturner,
also sometimes in release mode. I wasn't able to reproduce that, but it
seems best to stick with the old code that works.

When clang is using the MSVC STL, we have to give ManagedStatic a
constexpr constructor that fully zero initializes all fields, otherwise
it emits a dynamic initializer. The MSVC STL implementation of
std::atomic has a non-trivial (but constexpr) default constructor that
zero initializes the atomic value. Because one of the fields has a
non-trivial constructor, ManagedStatic ends up with a non-trivial ctor.
The ctor is not constexpr, so clang ends up emitting a dynamic
initializer, even though it simply does zero initialization. To make it
constexpr, we must initialize all fields of the ManagedStatic.

However, while the constructor that takes a pointer is marked constexpr,
clang says it does not evaluate to a constant because it contains a cast
from a pointer to an integer. I filed this as:
https://developercommunity.visualstudio.com/content/problem/545566/stdatomic-value-constructor-is-not-actually-conste.html

Once we do that, we can add back the
LLVM_REQUIRE_CONSTANT_INITIALIZATION marker, and so far as I'm aware it
compiles successfully on all supported targets.

llvm-svn: 359135
2019-04-24 20:13:23 +00:00
Xinliang David Li 499c80b890 Add optional arg to profile count getters to filter
synthetic profile count.

Differential Revision: http://reviews.llvm.org/D61025

llvm-svn: 359131
2019-04-24 19:51:16 +00:00
Craig Topper af194e9380 [X86] Prevent folding a load into an AND if that AND is really a ZEXT_INREG that should use movzx.
This can save a 32-bit immediate move.

We would shrink the load and fold it if it was non-volatile, but that's trickier to check for.

llvm-svn: 359129
2019-04-24 19:28:38 +00:00
Adrian Prantl c90ff5e123 Revert using fcopyfile(3) to implement sys::fs::copy_file(Twine, int) on macOS
It turns out that I mesread the man page and fcopyfile(3) does not
actually support COPYFILE_CLONE for files.

<rdar://problem/50148757>

llvm-svn: 359127
2019-04-24 19:08:43 +00:00
David Blaikie 832c7d9f36 DebugInfo: Emit only declarations (not whole definitions) of non-unit user defined types into type units
While this doesn't come up in reasonable cases currently (the only user
defined types not in type units are ones without linkage - which makes
for near-ODR violations, because it'd be a type with linkage referencing
a type without linkage - such a type can't be validly defined in more
than one TU, so arguably it shouldn't be in a type unit to begin with -
but it's a convenient way to demonstrate an issue that will become more
revalent with homed modular debug info type definitions - which also
don't need to be in type units but more legitimately so).

Precursor to the Clang change to de-type-unit (by omitting the
'identifier') types homed due to strong linkage vtables. (making that
change without this one would lead to major type duplication in type
units)

llvm-svn: 359122
2019-04-24 18:09:44 +00:00
Craig Topper 882ca6d484 [X86] Remove dead nodes left after ReplaceAllUsesWith calls during address matching
ReplaceAllUsesWith doesn't remove the node that was replaced. So its left around in the graph messing up use counts on other nodes.

One thing to note, is that this isn't valid if the node being deleted is the root node of an LEA match that gets rejected. In that case the node needs to stay alive because the isel table walking code would still have a reference to it that its going to try to match next. I don't think that's the case here though because the nodes being deleted here should be "and", "srl", and "zero_extend" none of which can be the root node of an LEA match.

Differential Revision: https://reviews.llvm.org/D61048

llvm-svn: 359121
2019-04-24 18:02:07 +00:00
Stanislav Mekhanoshin 33d806a517 [AMDGPU] gfx1010 sgpr register changes
Differential Revision: https://reviews.llvm.org/D61045

llvm-svn: 359117
2019-04-24 17:28:30 +00:00
Robert Widmann 09c5b883cb [LLVM-C] Deprecate the LLVMValueRef-returning metadata creation functions
Summary: There is still some value in using these functions while the remaining LLVMValueRef-based accessors are still around, but LLVMMDNodeInContext in particular has some wonky semantics that make it worth replacing outright.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60524

llvm-svn: 359114
2019-04-24 17:05:08 +00:00
Stanislav Mekhanoshin cee607e414 [AMDGPU] Add gfx1010 target definitions
Differential Revision: https://reviews.llvm.org/D61041

llvm-svn: 359113
2019-04-24 17:03:15 +00:00
Simon Pilgrim 55f14dac74 [InstCombine][X86] Use generic expansion of PACKSS/PACKUS for constant folding. NFCI.
This patch rewrites the existing PACKSS/PACKUS constant folding code to expand as a generic expansion.

This is a first NFCI step toward expanding PACKSS/PACKUS intrinsics which are acting as non-saturating truncations (although technically the expansion could be used in all cases - but we'll probably want to be conservative).

llvm-svn: 359111
2019-04-24 16:53:17 +00:00
Nico Weber 8d05eb8556 llvm-undname: Fix assert-on->4GiB-string-literal, found by oss-fuzz
llvm-svn: 359109
2019-04-24 16:09:38 +00:00
Lang Hames b1ba4d8a8a [JITLink] Refer to FDE's CIE (not the most recent CIE) when parsing eh-frame.
Frame Descriptor Entries (FDEs) have a pointer back to a Common Information
Entry (CIE) that describes how the rest FDE should be parsed. JITLink had been
assuming that FDEs always referred to the most recent CIE encountered, but the
spec allows them to point back to any previously encountered CIE. This patch
fixes JITLink to look up the correct CIE for the FDE.

The testcase is a MachO binary with an FDE that refers to a CIE that is not the
one immediately proceeding it (the layout can be viewed wit
'dwarfdump --eh-frame <testcase>'. This test case had to be a binary as llvm-mc
now sorts FDEs (as of r356216) to ensure FDEs *do* point to the most recent CIE.

llvm-svn: 359105
2019-04-24 15:15:55 +00:00
Dmitry Preobrazhensky 47621d7c89 [AMDGPU][MC] Parser cleanup and refactoring
Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D60767

llvm-svn: 359096
2019-04-24 14:06:15 +00:00
Sanjay Patel b1b3368907 [x86] make sure horizontal op and broadcast types match to simplify (PR41414)
If the types don't match, we can't just remove the shuffle.
There may be some other opportunity for optimization here,
but this should prevent the crashing seen in:
https://bugs.llvm.org/show_bug.cgi?id=41414

llvm-svn: 359095
2019-04-24 14:05:08 +00:00
whitequark 50392a3b1b [LLVM-C] Use dyn_cast instead of unwrap in LLVMGetDebugLoc functions
Summary:
The `unwrap<Type>` calls can assert with:
```
Assertion failed: (isa<X>(Val) && "cast<Ty>() argument of incompatible type!"), function cast
```
so replace them with `dyn_cast`.

Reviewers: whitequark, abdulras, hiraditya, compnerd

Reviewed By: whitequark

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60473

llvm-svn: 359093
2019-04-24 13:30:03 +00:00
Simon Pilgrim d30745b2a0 [X86] Add shouldFoldConstantShiftPairToMask override placeholder. NFCI.
Prep work toward fixing PR40758

llvm-svn: 359088
2019-04-24 12:34:08 +00:00
Nico Weber ccf096463a Let llvm-cvtres (and lld-link) report duplicate resources
If two .res files contain the same resource, cvtres.exe (and hence
link.exe) reject the input with this message:

    CVTRES : fatal error CVT1100: duplicate resource.  type:STRING, name:101, language:0x0409
    LINK : fatal error LNK1123: failure during conversion to COFF: file invalid or corrupt

llvm-cvtres (and lld-link) used to silently pick one of the duplicate
resources instead. This patch makes them report an error as well.
We slightly improve on cvtres by printing the name of two .res files
containing duplicate entries as well.

Differential Revision: https://reviews.llvm.org/D61049

llvm-svn: 359083
2019-04-24 11:42:59 +00:00
Bjorn Pettersson 71e8c6f20f Add "const" in GetUnderlyingObjects. NFC
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.

It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.

Reviewers: hfinkel, materi, jkorous

Reviewed By: jkorous

Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61038

llvm-svn: 359072
2019-04-24 06:55:50 +00:00
Craig Topper 1e413ffa7b [Mips][CodeGen] Remove MachineFunction::setSubtarget. Change Mips to just copy the subtarget from the MachineFunction instead of recalculating it.
Summary:
The MachineFunction should have been created with the correct subtarget. As
long as there is no way to change it, MipsTargetMachine can just capture it
directly from the MachineFunction without calling getSubtargetImpl again.

While there, const correct the Subtarget pointer to avoid a const_cast.

I believe the Mips16Subtarget and NoMips16Subtarget members are never used, but
I'll leave there removal for a separate patch.

Reviewers: echristo, atanasyan

Reviewed By: atanasyan

Subscribers: sdardis, arichardson, hiraditya, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60936

llvm-svn: 359071
2019-04-24 06:48:31 +00:00
Fangrui Song b5f3984541 [CommandLine] Provide parser<unsigned long> instantiation to allow cl::opt<uint64_t> on LP64 platforms
Summary:
And migrate opt<unsigned long long> to opt<uint64_t>

Fixes PR19665

Differential Revision: https://reviews.llvm.org/D60933

llvm-svn: 359068
2019-04-24 02:40:20 +00:00
Alina Sbirlea b341efce31 Revert [AliasAnalysis] AAResults preserves AAManager.
Triggers use-after-free.

llvm-svn: 359055
2019-04-24 00:28:29 +00:00
Francis Visoiu Mistrih 7fee2b89fd [Remarks] Add string deduplication using a string table
* Add support for uniquing strings in the remark streamer and emitting the string table in the remarks section.

* Add parsing support for the string table in the RemarkParser.

From this remark:

```
--- !Missed
Pass:     inline
Name:     NoDefinition
DebugLoc: { File: 'test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c',
            Line: 7, Column: 3 }
Function: printArgsNoRet
Args:
  - Callee:   printf
  - String:   ' will not be inlined into '
  - Caller:   printArgsNoRet
    DebugLoc: { File: 'test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c',
                Line: 6, Column: 0 }
  - String:   ' because its definition is unavailable'
...
```

to:

```
--- !Missed
Pass: 0
Name: 1
DebugLoc: { File: 3, Line: 7, Column: 3 }
Function: 2
Args:
  - Callee:   4
  - String:   5
  - Caller:   2
    DebugLoc: { File: 3, Line: 6, Column: 0 }
  - String:   6
...
```

And the string table in the .remarks/__remarks section containing:

```
inline\0NoDefinition\0printArgsNoRet\0
test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c\0printf\0
will not be inlined into \0 because its definition is unavailable\0
```

This is mostly supposed to be used for testing purposes, but it gives us
a 2x reduction in the remark size, and is an incremental change for the
updates to the remarks file format.

Differential Revision: https://reviews.llvm.org/D60227

llvm-svn: 359050
2019-04-24 00:06:24 +00:00
Josh Stone 27924c3a3c [Lint] Permit aliasing noalias readonly arguments
Summary:
If two arguments are both readonly, then they have no memory dependency
that would violate noalias, even if they do actually overlap.

Reviewers: hfinkel, efriedma

Reviewed By: efriedma

Subscribers: efriedma, hiraditya, llvm-commits, tstellar

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60239

llvm-svn: 359047
2019-04-23 23:43:47 +00:00
Jessica Paquette 4fe7574d5d [AArch64][GlobalISel] Select G_INTRINSIC_ROUND
Add selection support for G_INTRINSIC_ROUND, add a selection test, and add
check lines to arm64-vfloatintrinsics.ll and f16-instructions.ll.

llvm-svn: 359046
2019-04-23 23:03:03 +00:00
Jessica Paquette 9766bf1854 [AArch64][GlobalISel] Mark G_INTRINSIC_ROUND as a pre-isel floating point opcode
Add G_INTRINSIC_ROUND to isPreISelGenericFloatingPointOpcode to ensure that its
input and output are assigned the correct register bank.

Add a regbankselect test to verify that we get what we expect here.

llvm-svn: 359044
2019-04-23 22:47:00 +00:00
Dmitry Mikulin 312b5f86b7 The error message for mismatched value sites is very cryptic.
Make it more readable for an average user.

Differential Revision: https://reviews.llvm.org/D60896

llvm-svn: 359043
2019-04-23 22:26:55 +00:00
Francis Visoiu Mistrih 1646851b87 [CGP] Look through bitcasts when duplicating returns for tail calls
The simple case of:

```
int *callee();
void *caller(void *a) {
  if (a == NULL)
    return callee();
  return a;
}
```

would generate a regular call instead of a tail call because we don't
look through the bitcast of the call to `callee` when duplicating the
return blocks.

Differential Revision: https://reviews.llvm.org/D60837

llvm-svn: 359041
2019-04-23 21:57:46 +00:00
Heejin Ahn b9f282d384 [WebAssembly] Emit br_table for most switch instructions
Summary:
Always convert switches to br_tables unless there is only one case,
which is equivalent to a simple branch. This reduces code size for wasm,
and we defer possible jump table optimizations to the VM.
Addresses PR41502.

Reviewers: kripken, sunfish

Subscribers: dschuff, sbc100, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60966

llvm-svn: 359038
2019-04-23 21:30:30 +00:00
Amy Huang fc79ab9857 Revert "[MS] Emit S_HEAPALLOCSITE debug info" because of ToTWin64(db)
buildbot failure.

This reverts commit d07d6d6177 and
c774f687b6.

llvm-svn: 359034
2019-04-23 21:12:58 +00:00
Jessica Paquette 3cc6d1f542 [AArch64][GlobalISel] Legalize G_INTRINSIC_ROUND
Add it to the same rule as G_FCEIL etc. Add a legalizer test, and add a missing
switch case to AArch64LegalizerInfo.cpp.

llvm-svn: 359033
2019-04-23 21:11:57 +00:00
Alina Sbirlea 4fd1f266b1 [MemorySSA] LCSSA preserves MemorySSA.
Summary:
Enabling MemorySSA in the old pass manager leads to MemorySSA being run
twice due to the fact that LCSSA and LoopSimplify do not preserve
MemorySSA. This is the first step to address that: target LCSSA.

LCSSA does not make any changes that invalidate MemorySSA, so it
preserves it by design. It must preserve AA as well, for this to hold.

After this patch, MemorySSA is still run twice in the old pass manager.
Step two follows: target LoopSimplify.

Subscribers: mehdi_amini, jlebar, Prazek, llvm-commits, george.burgess.iv, chandlerc

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60832

llvm-svn: 359032
2019-04-23 20:59:44 +00:00
Jessica Paquette 991cb39242 [AArch64][GlobalISel] Actually select G_INTRINSIC_TRUNC
Apparently FileCheck wasn't actually matching the fallback check lines in
arm64-vfloatintrinsics.ll properly. So, there were selection fallbacks for
G_INTRINSIC_TRUNC there.

Actually hook it up into AArch64InstructionSelector.cpp and write a proper
selection test.

I guess I'll figure out the FileCheck magic to make the fallback checks work
properly in arm64-vfloatintrinsics.ll.

llvm-svn: 359030
2019-04-23 20:46:19 +00:00
Akira Hatanaka 5c3117b0a9 [ObjC][ARC] Check the basic block size before calling
DominatorTree::dominate.

ARC contract pass has an optimization that replaces the uses of the
argument of an ObjC runtime function call with the call result.

For example:

; Before optimization
%1 = tail call i8* @foo1()
%2 = tail call i8* @llvm.objc.retainAutoreleasedReturnValue(i8* %1)
store i8* %1, i8** @g0, align 8

; After optimization
%1 = tail call i8* @foo1()
%2 = tail call i8* @llvm.objc.retainAutoreleasedReturnValue(i8* %1)
store i8* %2, i8** @g0, align 8 // %1 is replaced with %2

Before replacing the argument use, DominatorTree::dominate is called to
determine whether the user instruction is dominated by the ObjC runtime
function call instruction. The call to DominatorTree::dominate can be
expensive if the two instructions belong to the same basic block and the
size of the basic block is large. This patch checks the basic block size
and just bails out if the size exceeds the limit set by command line
option "arc-contract-max-bb-size".

rdar://problem/49477063

Differential Revision: https://reviews.llvm.org/D60900

llvm-svn: 359027
2019-04-23 19:49:03 +00:00
David Blaikie 2f51176223 Reapply: "DebugInfo: Emit only one kind of accelerated access/name table""
Originally committed in r358931
Reverted in r358997

Seems this change made Apple accelerator tables miss names (because
names started respecting the CU NameTableKind GNU & assuming that
shouldn't produce accelerated names too), which is never correct (apple
accelerator tables don't have separators or CU lists - if present, they
must describe all names in all CUs).

Original Description:
Currently to opt in to debug_names in DWARFv5, the IR must contain
'nameTableKind: Default' which also enables debug_pubnames.

Instead, only allow one of {debug_names, apple_names, debug_pubnames,
debug_gnu_pubnames}.

nameTableKind: Default gives debug_names in DWARFv5 and greater,
debug_pubnames in v4 and earlier - and apple_names when tuning for lldb
on MachO.
nameTableKind: GNU always gives gnu_pubnames

llvm-svn: 359026
2019-04-23 19:00:45 +00:00
Teresa Johnson 867bc3951b [ThinLTO] Pass down opt level to LTO backend and handle -O0 LTO in new PM
Summary:
The opt level was not being passed down to the ThinLTO backend when
invoked via clang (for distributed ThinLTO).

This exposed an issue where the new PM was asserting if the Thin or
regular LTO backend pipelines were invoked with -O0 (not a new issue,
could be provoked by invoking in-process *LTO backends via linker using
new PM and -O0). Fix this similar to the old PM where -O0 only does the
necessary lowering of type metadata (WPD and LowerTypeTest passes) and
then quits, rather than asserting.

Reviewers: xur

Subscribers: mehdi_amini, inglorion, eraman, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits, pcc

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D61022

llvm-svn: 359025
2019-04-23 18:56:19 +00:00
Nico Weber 6967da8ffa llvm-cvtres: Split addChild(ID) into two functions
Before, there was an IsData parameter. Now, there are two different
functions for data nodes and ID nodes. No behavior change, needed for a
follow-up change to make two data nodes (but not two ID nodes) with the
same ID an error.

For consistency, rename another addChild() overload to addNameChild().

llvm-svn: 359024
2019-04-23 18:46:53 +00:00
Jessica Paquette ede0b2e695 [AArch64][GlobalISel] Teach regbankselect about G_INTRINSIC_TRUNC
Add it to isPreISelGenericFloatingPointOpcode, and add a regbankselect test.

Update arm64-vfloatintrinsics.ll now that we can select it.

llvm-svn: 359022
2019-04-23 18:20:47 +00:00
Jessica Paquette 56342642a0 [AArch64][GlobalISel] Legalize G_INTRINSIC_TRUNC
Same patch as G_FCEIL etc.

Add the missing switch case in widenScalar, add G_INTRINSIC_TRUNC to the correct
rule in AArch64LegalizerInfo.cpp, and add a test.

llvm-svn: 359021
2019-04-23 18:20:44 +00:00
Nikita Popov f945429fed [ConstantRange] Add urem support
Add urem support to ConstantRange, so we can handle in in LVI. This
is an approximate implementation that tries to capture the most useful
conditions: If the LHS is always strictly smaller than the RHS, then
the urem is a no-op and the result is the same as the LHS range.
Otherwise the lower bound is zero and the upper bound is
min(LHSMax, RHSMax - 1).

Differential Revision: https://reviews.llvm.org/D60952

llvm-svn: 359019
2019-04-23 18:00:17 +00:00
Stanislav Mekhanoshin c464dddccb [AMDGPU] Fixed addReg() in SIOptimizeExecMaskingPreRA.cpp
The second argument is flags, not subreg.

Differential Revision: https://reviews.llvm.org/D61031

llvm-svn: 359017
2019-04-23 17:59:26 +00:00
Jessica Paquette df5ce782ad [AArch64][GlobalISel] Legalize G_FMA for more vector types
Same as G_FCEIL, G_FABS, etc. Just move it into that rule.

Add a legalizer test for G_FMA, which we didn't have before and update
arm64-vfloatintrinsics.ll.

llvm-svn: 359015
2019-04-23 17:37:56 +00:00
Alina Sbirlea a809e8e5e7 [AliasAnalysis] AAResults preserves AAManager.
Summary:
AAResults should not invalidate AAManager.
Update tests.

Reviewers: chandlerc

Subscribers: mehdi_amini, jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60914

llvm-svn: 359014
2019-04-23 17:21:18 +00:00
Jessica Paquette e50e6d2563 [AArch64][GlobalISel] Add G_FMA to isPreISelGenericFloatingPointOpcode
Noticed an unnecessary fallback in arm64-vmul caused by this.

Also add a regbankselect test for G_FMA.

llvm-svn: 359013
2019-04-23 17:17:06 +00:00
Nico Weber e8f21b1a6b llvm-undname: Support demangling the spaceship operator
Also add a test for demanling the co_await operator.

llvm-svn: 359007
2019-04-23 16:20:27 +00:00
Philip Reames 2ce017026a [InstCombine] Convert a masked.load of a dereferenceable address to an unconditional load
If we have a masked.load from a location we know to be dereferenceable, we can simply issue a speculative unconditional load against that address. The key advantage is that it produces IR which is well understood by the optimizer. The select (cnd, load, passthrough) form produced should be pattern matchable back to hardware predication if profitable.

Differential Revision: https://reviews.llvm.org/D59703

llvm-svn: 359000
2019-04-23 15:25:14 +00:00
Sanjay Patel 12a561fa1b [x86] use psubus for more vsetcc lowering (PR39859)
Circling back to a leftover bit from PR39859:
https://bugs.llvm.org/show_bug.cgi?id=39859#c1

...we have this counter-intuitive (based on the test diffs) opportunity to use 'psubus'.
This appears to be the better perf option for both Haswell and Jaguar based on llvm-mca.
We already do this transform for the SETULT predicate, so this makes the code more
symmetrical too. If we have pminub/pminuw, we prefer those, so this should not affect
anything but pre-SSE4.1 subtargets.

  $ cat before.s
	movdqa	-16(%rip), %xmm2    ## xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768]
	pxor	%xmm0, %xmm2
	pcmpgtw	-32(%rip), %xmm2 ## xmm2 = [255,255,255,255,255,255,255,255]
	pand	%xmm2, %xmm0
	pandn	%xmm1, %xmm2
	por	%xmm2, %xmm0

  $ cat after.s
	movdqa	-16(%rip), %xmm2    ## xmm2 = [256,256,256,256,256,256,256,256]
	psubusw	%xmm0, %xmm2
	pxor	%xmm3, %xmm3
	pcmpeqw	%xmm2, %xmm3
	pand	%xmm3, %xmm0
	pandn	%xmm1, %xmm3
	por	%xmm3, %xmm0

  $ llvm-mca before.s -mcpu=haswell
  Iterations:        100
  Instructions:      600
  Total Cycles:      909
  Total uOps:        700

  Dispatch Width:    4
  uOps Per Cycle:    0.77
  IPC:               0.66
  Block RThroughput: 1.8

  $ llvm-mca after.s -mcpu=haswell
  Iterations:        100
  Instructions:      700
  Total Cycles:      409
  Total uOps:        700

  Dispatch Width:    4
  uOps Per Cycle:    1.71
  IPC:               1.71
  Block RThroughput: 1.8

Differential Revision: https://reviews.llvm.org/D60838

llvm-svn: 358999
2019-04-23 15:20:17 +00:00
Joerg Sonnenberger 6e7cc49d5c [SPARC] Use the correct register set for the "r" asm constraint.
64bit mode must use 64bit registers, otherwise assumptions about the top
half of the registers are made. Problem found by Takeshi Nakayama in
NetBSD.

llvm-svn: 358998
2019-04-23 15:15:33 +00:00
David Blaikie a2470a4653 Revert "DebugInfo: Emit only one kind of accelerated access/name table"
Regresses some apple_names situations - still investigating.

This reverts commit r358931.

llvm-svn: 358997
2019-04-23 15:03:24 +00:00
Fangrui Song efd94c56ba Use llvm::stable_sort
While touching the code, simplify if feasible.

llvm-svn: 358996
2019-04-23 14:51:27 +00:00
Lewis Revill df3cb477a3 [RISCV] Support assembling %tls_{ie,gd}_pcrel_hi modifiers
This patch adds support for parsing and assembling the %tls_ie_pcrel_hi
and %tls_gd_pcrel_hi modifiers.

Differential Revision: https://reviews.llvm.org/D55342

llvm-svn: 358994
2019-04-23 14:46:13 +00:00
Scott Linder 3eed961973 [AMDGPU] Fix hidden argument metadata duplication for V3
Essentially complete a proper rebase of the V3 metadata change over
https://reviews.llvm.org/D49096.

Minimize the diff between the V2 and V3 variants of the relevant lit
tests, and clean up some trailing whitespace.

llvm-svn: 358992
2019-04-23 14:31:17 +00:00
Simon Pilgrim 0e4992ce27 [X86] Pull out collectConcatOps helper. NFCI.
Create collectConcatOps helper that returns all the subvector ops for CONCAT_VECTORS or a INSERT_SUBVECTOR series.

llvm-svn: 358989
2019-04-23 14:07:49 +00:00
Tim Northover 6af366be8a ARM: disallow add/sub to sp unless Rn is also sp.
The manual says that Thumb2 add/sub instructions are only allowed to modify sp
if the first source is also sp. This is slightly different from the usual rGPR
restriction since it's context-sensitive, so implement it in C++.

llvm-svn: 358987
2019-04-23 13:50:13 +00:00
Sanjay Patel 06ff5eae5b [DAGCombiner] generalize binop-of-splats scalarization
If we only match build vectors, we can miss some patterns
that use shuffles as seen in the affected tests.

Note that the underlying calls within getSplatSourceVector()
have the potential for compile-time explosion because of
exponential recursion looking through binop opcodes, but
currently the list of supported opcodes is very limited.
Both of those problems should be addressed in follow-up
patches.

llvm-svn: 358984
2019-04-23 13:16:41 +00:00
Nicolai Haehnle 7edae4c403 AMDGPU: Fix LCSSA phi lowering in SILowerI1Copies
Summary:
When an LCSSA phi survives through instruction selection, the pass
ends up removing that phi entirely because it is dominated by the
logic that does the lanemask merging.

This then used to trigger an assertion when processing a dependent
phi instruction.

Change-Id: Id4949719f8298062fe476a25718acccc109113b6

Reviewers: llvm-commits

Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, tpr, dstuttard, rtaylor, arsenm

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60999

llvm-svn: 358983
2019-04-23 13:12:52 +00:00
Fedor Sergeev 652168a99b [CallSite removal] move InlineCost to CallBase usage
Converting InlineCost interface and its internals into CallBase usage.
Inliners themselves are still not converted.

Reviewed By: reames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60636

llvm-svn: 358982
2019-04-23 12:43:27 +00:00
David Green c519d3c403 [ARM] Update check for CBZ in Ifcvt
The check for creating CBZ in constant island pass recently obtained the
ability to search backwards to find a Cmp instruction. The code in IfCvt should
mirror this to allow more conversions to the smaller form. The common code has
been pulled out into a separate function to be shared between the two places.

Differential Revision: https://reviews.llvm.org/D60090

llvm-svn: 358977
2019-04-23 12:11:26 +00:00
David Green 2f9eed6265 [ARM] Don't replicate instructions in Ifcvt at minsize
Ifcvt can replicate instructions as it converts them to be predicated. This
stops that from happening on thumb2 targets at minsize where an extra IT
instruction is likely needed.

Differential Revision: https://reviews.llvm.org/D60089

llvm-svn: 358974
2019-04-23 11:46:58 +00:00
Simon Pilgrim ddd225d1a9 Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFCI.
llvm-svn: 358970
2019-04-23 11:16:16 +00:00
Simon Pilgrim e7a68fd93e Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFCI.
llvm-svn: 358969
2019-04-23 11:11:34 +00:00
Bjorn Pettersson f97b29be88 [DAGCombiner] Combine OR as ADD when no common bits are set
Summary:
The DAGCombiner is rewriting (canonicalizing) an ISD::ADD
with no common bits set in the operands as an ISD::OR node.

This could sometimes result in "missing out" on some
combines that normally are performed for ADD. To be more
specific this could happen if we already have rewritten an
ADD into OR, and later (after legalizations or combines)
we expose patterns that could have been optimized if we
had seen the OR as an ADD (e.g. reassociations based on ADD).

To make the DAG combiner less sensitive to if ADD or OR is
used for these "no common bits set" ADD/OR operations we
now apply most of the ADD combines also to an OR operation,
when value tracking indicates that the operands have no
common bits set.

Reviewers: spatel, RKSimon, craig.topper, kparzysz

Reviewed By: spatel

Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59758

llvm-svn: 358965
2019-04-23 10:01:08 +00:00
Javed Absar 1cdc3dbc58 [AArch64] Add support for MTE intrinsics
This patch provides intrinsics support for Memory Tagging Extension (MTE),
which was introduced with the Armv8.5-a architecture.
The intrinsics are described in detail in the latest
ACLE Q1 2019 documentation: https://developer.arm.com/docs/101028/latest
Reviewed by: David Spickett
Differential Revision: https://reviews.llvm.org/D60486

llvm-svn: 358963
2019-04-23 09:39:58 +00:00
Diogo N. Sampaio 2619f399f9 [ARM][FIX] Add missing f16.lane.vldN/vstN lowering
Summary:
Add missing D and Q lane VLDSTLane lowering
for fp16 elements.

Reviewers: efriedma, kosarev, SjoerdMeijer, ostannard

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60874

llvm-svn: 358962
2019-04-23 09:36:39 +00:00
George Rimar b9ed9cb5d7 [llvm-mc] - Properly set the the address align field of the compressed sections.
About the compressed sections spec says:
(https://docs.oracle.com/cd/E37838_01/html/E36783/section_compression.html)
sh_addralign fields of the section header for a compressed section
reflect the requirements of the compressed section.

Currently, llvm-mc always puts uncompressed section alignment to sh_addralign.
It is not correct. zlib styled section contains an Elfxx_Chdr header,
so we should either use 4 or 8 values depending on the target
(Uncompressed section alignment is stored in ch_addralign field of the compression header).

GNU assembler version 2.31.1 also has this issue,
but in 2.32.51 it was already fixed. This is how it was found
during debugging of the https://bugs.llvm.org/show_bug.cgi?id=40482
actually.

Differential revision: https://reviews.llvm.org/D60965

llvm-svn: 358960
2019-04-23 09:16:53 +00:00
David Green 63a2aa715a [LSR] Limit the recursion for setup cost
In some circumstances we can end up with setup costs that are very complex to
compute, even though the scevs are not very complex to create. This can also
lead to setupcosts that are calculated to be exactly -1, which LSR treats as an
invalid cost. This patch puts a limit on the recursion depth for setup cost to
prevent them taking too long.

Thanks to @reames for the report and test case.

Differential Revision: https://reviews.llvm.org/D60944

llvm-svn: 358958
2019-04-23 08:52:21 +00:00
Sam Clegg 9da81421b8 [WebAssembly] Bail out of fastisel earlier when computing PIC addresses
This change partially reverts https://reviews.llvm.org/D54647 in favor
of bailing out during computeAddress instead.

This catches the condition earlier and handles more cases.

Differential Revision: https://reviews.llvm.org/D60986

llvm-svn: 358948
2019-04-23 03:43:26 +00:00
Chandler Carruth bbddf21f90 Revert "Use const DebugLoc&"
This reverts r358910 (git commit 2b74466530)

While this patch *seems* trivial and safe and correct, it is not. The
copies are actually load bearing copies. You can observe this with MSan
or other ways of checking for use-after-destroy, but otherwise this may
result in ... difficult to debug inexplicable behavior.

I suspect the issue is that the debug location is used after the
original reference to it is removed. The metadata backing it gets
destroyed as its last references goes away, and then we reference it
later through these const references.

llvm-svn: 358940
2019-04-23 01:42:07 +00:00
David Blaikie 68602ab2f3 DebugInfo: Emit only one kind of accelerated access/name table
Currently to opt in to debug_names in DWARFv5, the IR must contain
'nameTableKind: Default' which also enables debug_pubnames.

Instead, only allow one of {debug_names, apple_names, debug_pubnames,
debug_gnu_pubnames}.

nameTableKind: Default gives debug_names in DWARFv5 and greater,
debug_pubnames in v4 and earlier - and apple_names when tuning for lldb
on MachO.
nameTableKind: GNU always gives gnu_pubnames

llvm-svn: 358931
2019-04-22 22:45:11 +00:00
Sanjay Patel bf8aacb715 [SelectionDAG] move splat util functions up from x86 lowering
This was supposed to be NFC, but the change in SDLoc
definitions causes instruction scheduling changes.

There's nothing x86-specific in this code, and it can
likely be used from DAGCombiner's simplifyVBinOp().

llvm-svn: 358930
2019-04-22 22:43:36 +00:00
Michael Liao 389d5a3474 [AMDGPU] Fix an issue in `op_sel_hi` skipping.
Summary:
- Only apply packed literal `op_sel_hi` skipping on operands requiring
  packed literals. Even an instruction is `packed`, it may have operand
  requiring non-packed literal, such as `v_dot2_f32_f16`.

Reviewers: rampitec, arsenm, kzhuravl

Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60978

llvm-svn: 358922
2019-04-22 22:05:49 +00:00
Philip Reames d748689c7f [InstCombine] Eliminate stores to constant memory
If we have a store to a piece of memory which is known constant, then we know the store must be storing back the same value. As a result, the store (or memset, or memmove) must either be down a dead path, or a noop. In either case, it is valid to simply remove the store.

The motivating case for this involves a memmove to a buffer which is constant down a path which is dynamically dead.

Note that I'm choosing to implement the less aggressive of two possible semantics here. We could simply say that the store *is undefined*, and prune the path. Consensus in the review was that the more aggressive form might be a good follow on change at a later date.

Differential Revision: https://reviews.llvm.org/D60659

llvm-svn: 358919
2019-04-22 20:28:19 +00:00
Philip Reames d8d9b7b20e [InstSimplify] Move masked.gather w/no active lanes handling to InstSimplify from InstCombine
In the process, use the existing masked.load combine which is slightly stronger, and handles a mix of zero and undef elements in the mask.  

llvm-svn: 358913
2019-04-22 19:30:01 +00:00
Matt Arsenault 2b74466530 Use const DebugLoc&
llvm-svn: 358910
2019-04-22 19:14:27 +00:00
Matt Arsenault f84ce75cd1 AMDGPU: Skip debug instructions in assert
These are inserted after branch relaxation, and for some reason it's
decided to put them in the long branch expansion block. It's probably
not great to rely on the source block address, so this should probably
be switched to being PC relative instead of relying on the block
address

llvm-svn: 358909
2019-04-22 19:14:26 +00:00
Justin Bogner e90d5c8db0 [IPSCCP] Add missing `AssumptionCacheTracker` dependency
Back in August, r340525 introduced a dependency on the assumption
cache tracker in the ipsccp pass, but that commit missed a call to
INITIALIZE_PASS_DEPENDENCY, which leaves the assumption cache
improperly registered if SCCP is the only thing that pulls it in.

llvm-svn: 358903
2019-04-22 17:38:29 +00:00
Philip Reames 37104d7189 [LPM/BPI] Preserve BPI through trivial loop pass pipeline (e.g. LCSSA, LoopSimplify)
Currently, we do not expose BPI to loop passes at all. In the old pass manager, we appear to have been ignoring the fact that LCSSA and/or LoopSimplify didn't preserve BPI, and making it available to the following loop passes anyways.  In the new one, it's invalidated before running any loop pass if either LCSSA or LoopSimplify actually make changes. If they don't make changes, then BPI is valid and available.  So, we go ahead and teach LCSSA and LoopSimplify how to preserve BPI for consistency between old and new pass managers.

This patch avoids an invalidation between the two requires in the following trivial pass pipeline:
opt -passes="requires<branch-prob>,loop(no-op-loop),requires<branch-prob>"
(when the input file is one which requires either LCSSA or LoopSimplify to canonicalize the loops)

Differential Revision: https://reviews.llvm.org/D60790

llvm-svn: 358901
2019-04-22 17:13:43 +00:00
Wei Mi 01f8d556aa [PGO/SamplePGO][NFC] Move the function updateProfWeight from Instruction
to CallInst.

The issue was raised here: https://reviews.llvm.org/D60903#1472783

The function Instruction::updateProfWeight is only used for CallInst in
profile update. From the current interface, it is very easy to think that
the function can also be used for branch instruction. However, Branch
instruction does't need the scaling the function provides for
branch_weights and VP (value profile), in addition, scaling may introduce
inaccuracy for branch probablity.

The patch moves the function updateProfWeight from Instruction class to
CallInst to remove the confusion. The patch also changes the scaling of
branch_weights from a loop to a block because we know that ProfileData
for branch_weights of CallInst will only have two operands at most.

Differential Revision: https://reviews.llvm.org/D60911

llvm-svn: 358900
2019-04-22 17:04:51 +00:00
Matt Arsenault 2b6f76f05f AMDGPU/GlobalISel: Fix non-power-of-2 G_EXTRACT sources
llvm-svn: 358894
2019-04-22 15:22:46 +00:00
Matt Arsenault 8f624abc1d GlobalISel: Legalize scalar G_EXTRACT sources
llvm-svn: 358892
2019-04-22 15:10:42 +00:00
Nico Weber f5c7f3ad33 llvm-undname: Fix an assert-on-invalid, found by oss-fuzz
llvm-svn: 358891
2019-04-22 15:05:18 +00:00
Matt Arsenault 70346d127b AMDGPU: Fix not checking for copy when looking at copy src
Effectively reverts r356956. The check for isFullCopy was excessive,
but there still needs to be a check that this is a copy.

llvm-svn: 358890
2019-04-22 14:54:39 +00:00
Dmitry Preobrazhensky e2707f5aac [AMDGPU][MC] Corrected parsing of SP3 'neg' modifier
See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D60624

llvm-svn: 358888
2019-04-22 14:35:47 +00:00
Simon Pilgrim 6276ce0142 [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling
This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.

The AMDGPU backend needed an extra  (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions.

The X86 changes are all definite wins.

Differential Revision: https://reviews.llvm.org/D60462

llvm-svn: 358887
2019-04-22 14:04:35 +00:00
Sanjay Patel 9bc6c77220 [DAGCombiner] make variable name less ambiguous; NFC
llvm-svn: 358886
2019-04-22 13:42:50 +00:00
Sanjay Patel d6989daae9 [DAGCombiner] prepare shuffle-of-splat to handle more patterns; NFC
llvm-svn: 358884
2019-04-22 13:36:07 +00:00
Robert Widmann ff8febcb6d [LLVM-C] Add accessors to the default floating-point metadata node
Summary: Add a getter and setter pair for floating-point accuracy metadata.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60527

llvm-svn: 358883
2019-04-22 13:13:22 +00:00
Serguei Katkov 40a3b96196 [NewPM] Add Option handling for SimpleLoopUnswitch
This patch enables passing options to SimpleLoopUnswitch via the passes pipeline.

Reviewers: chandlerc, fedor.sergeev, leonardchan, philip.pfaffe
Reviewed By: fedor.sergeev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D60676

llvm-svn: 358880
2019-04-22 10:35:07 +00:00
Nikita Popov 5aacc7a573 Revert "[ConstantRange] Rename make{Guaranteed -> Exact}NoWrapRegion() NFC"
This reverts commit 7bf4d7c07f2fac862ef34c82ad0fef6513452445.

After thinking about this more, this isn't right, the range is not exact
in the same sense as makeExactICmpRegion(). This needs a separate
function.

llvm-svn: 358876
2019-04-22 09:01:38 +00:00
Nikita Popov 5299e25f50 [ConstantRange] Rename make{Guaranteed -> Exact}NoWrapRegion() NFC
Following D60632 makeGuaranteedNoWrapRegion() always returns an
exact nowrap region. Rename the function accordingly. This is in
line with the naming of makeExactICmpRegion().

llvm-svn: 358875
2019-04-22 08:36:05 +00:00
Craig Topper 5c43ab337f [X86] Reject 512-bit types in getRegForInlineAsmConstraint when AVX512 is not enabled. Same for 256 bit and AVX.
llvm-svn: 358872
2019-04-22 06:12:02 +00:00
Lang Hames 1233c15be5 [JITLink] Remove a lot of reduntant 'JITLink_' prefixes. NFC.
llvm-svn: 358869
2019-04-22 03:03:09 +00:00
Lang Hames d3dac47aa2 [JITLink] Fix section start address calculation in eh-frame recorder.
Section atoms are not sorted, so we need to scan the whole section to find the
start address.

No test case: Found by inspection, and any reproduction would depend on pointer
ordering.

llvm-svn: 358865
2019-04-22 01:35:16 +00:00
Nico Weber ce67a41741 llvm-undname: Fix hex escapes in wchar_t, char16_t, char32_t strings
llvm-undname used to put '\x' in front of every pair of nibbles, but
u"\xD7\xFF" produces a string with 6 bytes: \xD7 \0 \xFF \0 (and \0\0). Correct
for a single character (plus terminating \0) is u\xD7FF instead.
Now, wchar_t, char16_t, and char32_t strings roundtrip from source to
clang-cl (and cl.exe) and then llvm-undname.

(...at least as long as it's not a string like L"\xD7FF" L"foo" which
gets demangled as L"\xD7FFfoo", where the compiler then considers the
"f" as part of the hex escape. That seems ok.)

Also add a comment saying that the "almost-valid" char32_t string I
added in my last commit is actually produced by compilers.

llvm-svn: 358857
2019-04-21 17:19:27 +00:00
Nico Weber 8fc9902bbb llvm-undname: Fix stack overflow on almost-valid
If a unsigned with all 4 bytes non-0 was passed to outputHex(), there
were two off-by-ones in it:

- Both MaxPos and Pos left space for the final \0, which left the buffer
  one byte to small. Set MaxPos to 16 instead of 15 to fix.

- The `assert(Pos >= 0);` was after a `Pos--`, move it up one line.

Since valid Unicode codepoints are <= 0x10ffff, this could never really
happen in practice.

Found by oss-fuzz.

llvm-svn: 358856
2019-04-21 16:58:25 +00:00
Nikita Popov 198ab60136 [ConstantRange] Add saturating add/sub methods
Add support for uadd_sat and friends to ConstantRange, so we can
handle uadd.sat and friends in LVI. The implementation is forwarding
to the corresponding APInt methods with appropriate bounds.

One thing worth pointing out here is that the handling of wrapping
ranges is not maximally accurate. A simple example is that adding 0
to a wrapped range will return a full range, rather than the original
wrapped range. The tests also only check that the non-wrapping
envelope is correct and minimal.

Differential Revision: https://reviews.llvm.org/D60946

llvm-svn: 358855
2019-04-21 15:23:05 +00:00
Nikita Popov dbc3fbafe7 [ConstantRange] Add getNonEmpty() constructor
ConstantRanges have an annoying special case: If upper and lower are
the same, it can be either an empty or a full set. When constructing
constant ranges nearly always a full set is intended, but this still
requires an explicit check in many places.

This revision adds a getNonEmpty() constructor that disambiguates this
case: If upper and lower are the same, a full set is created.

Differential Revision: https://reviews.llvm.org/D60947

llvm-svn: 358854
2019-04-21 15:22:54 +00:00
Nico Weber aa162682ca llvm-undname: Fix stack overflow on invalid found by oss-fuzz
llvm-svn: 358852
2019-04-21 14:25:07 +00:00
David Green 0d741507f7 [ARM] Rewrite isLegalT2AddressImmediate
This does two main things, firstly adding some at least basic addressing modes
for i64 types, and secondly treats floats and doubles sensibly when there is no
fpu. The floating point change can help codesize in some cases, especially with
D60294.

Most backends seems to not consider the exact VT in isLegalAddressingMode,
instead switching on type size. That is now what this does when the target does
not have an fpu (as the float data will be loaded using LDR's). i64's currently
use the address range of an LDRD (even though they may be legalised and loaded
with an LDR). This is at least better than marking them all as illegal
addressing modes.

I have not attempted to do much with vectors yet. That will need changing once
MVE is added.

Differential Revision: https://reviews.llvm.org/D60677

llvm-svn: 358845
2019-04-21 09:54:29 +00:00
Craig Topper df02beb416 [X86] Add the rounding control operand to the printing for some scalar FMA instructions.
llvm-svn: 358844
2019-04-21 07:12:56 +00:00
Fangrui Song a0f9c4f72c [CachePruning] Simplify comparator
llvm-svn: 358843
2019-04-21 06:17:40 +00:00
Craig Topper 63db7e347b [X86] Don't form masked vfpclass instruction from and+vfpclass unless the fpclass only has a single use.
llvm-svn: 358841
2019-04-21 05:18:04 +00:00
Lang Hames a97032e947 [JITLink] Remove an overly strict error check in JITLink's eh-frame parser.
The error check required FDEs to refer to the most recent CIE, but the eh-frame
spec allows them to refer to any previously seen CIE. This patch removes the
offending check.

llvm-svn: 358840
2019-04-21 04:48:32 +00:00
Lang Hames 0191531a76 [JITLink] Factor basic common GOT and stub creation code into its own class.
llvm-svn: 358838
2019-04-21 03:14:42 +00:00
Nico Weber 8eeaf5178d llvm-undname: Improve string literal demangling with embedded \0 chars
- Don't assert when a string looks like a u32 string to the heuristic
  but doesn't have a length that's 0 mod 4.  Instead, classify those
  as u16 with embedded \0 chars. Found by oss-fuzz.
- Print embedded nul bytes as \0 instead of \x00.

llvm-svn: 358835
2019-04-20 23:59:06 +00:00
Nico Weber f2654b638d ftime-trace: Trace the name of the currently active pass as well.
Differential Revision: https://reviews.llvm.org/D60782

llvm-svn: 358834
2019-04-20 23:22:45 +00:00
Lang Hames 65e1ddd713 [JITLink] Add yet more detail to MachO/x86-64 unsupported relocation errors.
Knowing the address/symbolnum field values makes it easier to identify the
unsupported relocation, and provides enough information for the full bit
pattern of the relocation to be reconstructed.

llvm-svn: 358833
2019-04-20 22:59:43 +00:00
Lang Hames 5004abcd86 [JITLink][ORC] Add JITLink to the list of dependencies for ORC.
The new ObjectLinkingLayer in ORC depends on JITLink.

This should fix the build error at
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/9621

llvm-svn: 358832
2019-04-20 22:15:57 +00:00
Lang Hames 7f77a231fa [JITLink] Fix a bad formatv format string.
llvm-svn: 358831
2019-04-20 22:06:12 +00:00
Amara Emerson 4286652556 Revert r358800. Breaks Obsequi from the test suite.
The last attempt fixed gcc and consumer-typeset, but Obsequi seems to fail with
a different issue.

llvm-svn: 358829
2019-04-20 21:25:00 +00:00
Lang Hames daed9b10f1 [JITLink] Add BinaryFormat to JITLink's dependencies.
Hopefully this will fix the missing dependence on llvm::identify_magic that is
showing up on some PPC bots. E.g.

http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/9617

llvm-svn: 358827
2019-04-20 19:48:45 +00:00
Lang Hames c283fc5ebb [JITLink] Add more detail to MachO/x86-64 "unsupported relocation" errors.
The extra information here will be helpful in diagnosing errors, like the
ones currently occuring on the PPC big-endian bots. :)

llvm-svn: 358826
2019-04-20 18:50:13 +00:00
Lang Hames dfc3a4f6ff [JITLink] Silence some MSVC implicit cast warnings.
llvm-svn: 358824
2019-04-20 18:30:16 +00:00
Lang Hames b39109585a [JITLink] Use memset instead of bzero.
llvm-svn: 358822
2019-04-20 17:49:58 +00:00
Lang Hames 68b0b8c192 [JITLink] Fix a missing header and bad prototype.
llvm-svn: 358819
2019-04-20 17:29:57 +00:00
Lang Hames 11c8dfa583 Initial implementation of JITLink - A replacement for RuntimeDyld.
Summary:

JITLink is a jit-linker that performs the same high-level task as RuntimeDyld:
it parses relocatable object files and makes their contents runnable in a target
process.

JITLink aims to improve on RuntimeDyld in several ways:

(1) A clear design intended to maximize code-sharing while minimizing coupling.

RuntimeDyld has been developed in an ad-hoc fashion for a number of years and
this had led to intermingling of code for multiple architectures (e.g. in
RuntimeDyldELF::processRelocationRef) in a way that makes the code more
difficult to read, reason about, extend. JITLink is designed to isolate
format and architecture specific code, while still sharing generic code.

(2) Support for native code models.

RuntimeDyld required the use of large code models (where calls to external
functions are made indirectly via registers) for many of platforms due to its
restrictive model for stub generation (one "stub" per symbol). JITLink allows
arbitrary mutation of the atom graph, allowing both GOT and PLT atoms to be
added naturally.

(3) Native support for asynchronous linking.

JITLink uses asynchronous calls for symbol resolution and finalization: these
callbacks are passed a continuation function that they must call to complete the
linker's work. This allows for cleaner interoperation with the new concurrent
ORC JIT APIs, while still being easily implementable in synchronous style if
asynchrony is not needed.

To maximise sharing, the design has a hierarchy of common code:

(1) Generic atom-graph data structure and algorithms (e.g. dead stripping and
 |  memory allocation) that are intended to be shared by all architectures.
 |
 + -- (2) Shared per-format code that utilizes (1), e.g. Generic MachO to
       |  atom-graph parsing.
       |
       + -- (3) Architecture specific code that uses (1) and (2). E.g.
                JITLinkerMachO_x86_64, which adds x86-64 specific relocation
                support to (2) to build and patch up the atom graph.

To support asynchronous symbol resolution and finalization, the callbacks for
these operations take continuations as arguments:

  using JITLinkAsyncLookupContinuation =
      std::function<void(Expected<AsyncLookupResult> LR)>;

  using JITLinkAsyncLookupFunction =
      std::function<void(const DenseSet<StringRef> &Symbols,
                         JITLinkAsyncLookupContinuation LookupContinuation)>;

  using FinalizeContinuation = std::function<void(Error)>;

  virtual void finalizeAsync(FinalizeContinuation OnFinalize);

In addition to its headline features, JITLink also makes other improvements:

  - Dead stripping support: symbols that are not used (e.g. redundant ODR
    definitions) are discarded, and take up no memory in the target process
    (In contrast, RuntimeDyld supported pointer equality for weak definitions,
    but the redundant definitions stayed resident in memory).

  - Improved exception handling support. JITLink provides a much more extensive
    eh-frame parser than RuntimeDyld, and is able to correctly fix up many
    eh-frame sections that RuntimeDyld currently (silently) fails on.

  - More extensive validation and error handling throughout.

This initial patch supports linking MachO/x86-64 only. Work on support for
other architectures and formats will happen in-tree.

Differential Revision: https://reviews.llvm.org/D58704

llvm-svn: 358818
2019-04-20 17:10:34 +00:00
Craig Topper 3980d1ca6b [X86] Disable argument copy elision for arguments passed via pointers
Summary:
If you pass two 1024 bit vectors in IR with AVX2 on Windows 64. Both vectors will be split in four 256 bit pieces. The four pieces of the first argument will be passed indirectly using 4 gprs. The second argument will get passed via pointers in memory.

The PartOffsets stored for the second argument are all in terms of its original 1024 bit size. So the PartOffsets for each piece are 32 bytes apart. So if we consider it for copy elision we'll only load an 8 byte pointer, but we'll move the address 32 bytes. The stack object size we create for the first part is probably wrong too.

This issue was encountered by ISPC. I'm working on getting a reduce test case, but wanted to go ahead and get feedback on the fix.

Reviewers: rnk

Reviewed By: rnk

Subscribers: dbabokin, llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60801

llvm-svn: 358817
2019-04-20 15:26:44 +00:00
Luqman Aden 2993661cc0 [CorrelatedValuePropagation] Mark subs that we know not to wrap with nuw/nsw.
Summary:
Teach CorrelatedValuePropagation to also handle sub instructions in addition to add. Relatively simple since makeGuaranteedNoWrapRegion already understood sub instructions. Only subtle change is which range is passed as "Other" to that function, since sub isn't commutative.

Note that CorrelatedValuePropagation::processAddSub is still hidden behind a default-off flag as IndVarSimplify hasn't yet been fixed to strip the added nsw/nuw flags and causes a miscompile. (PR31181)

Reviewers: sanjoy, apilipenko, nikic

Reviewed By: nikic

Subscribers: hiraditya, jfb, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60036

llvm-svn: 358816
2019-04-20 13:14:18 +00:00
Fangrui Song d3b2682351 [ExecutionDomainFix] Optimize a binary search insertion
llvm-svn: 358815
2019-04-20 13:00:50 +00:00
Fangrui Song dd0e833555 [llvm-symbolizer] Fix section index at the end of a section
This is very minor issue. The returned section index is only used by
DWARFDebugLine as an llvm::upper_bound input and the use case shouldn't
cause any behavioral change.

llvm-svn: 358814
2019-04-20 13:00:09 +00:00
Nikita Popov b75c8fc6fb [X86] Fix stack probing on x32 (PR41477)
Fix for https://bugs.llvm.org/show_bug.cgi?id=41477. On the x32 ABI
with stack probing a dynamic alloca will result in a WIN_ALLOCA_32
with a 32-bit size. The current implementation tries to copy it into
RAX, resulting in a physreg copy error. Fix this by copying to EAX
instead. Also fix incorrect opcodes or registers used in subs.

llvm-svn: 358807
2019-04-20 07:25:46 +00:00
Craig Topper 4d4b5d952e [X86] Don't turn (and (shl X, C1), C2) into (shl (and X, (C1 >> C2), C2) if the original AND can represented by MOVZX.
The MOVZX doesn't require an immediate to be encoded at all. Though it does use
a 2 byte opcode so its the same size as a 1 byte immediate. But it has a
separate source and dest register so can help avoid copies.

llvm-svn: 358805
2019-04-20 04:38:53 +00:00
Craig Topper 8b8264828c [X86] Turn (and (anyextend (shl X, C1), C2)) into (shl (and (anyextend X), (C1 >> C2), C2) if the AND could match a movzx.
There's one slight regression in here because we don't check that the immediate
already allowed movzx before the shift. I'll fix that next.

llvm-svn: 358804
2019-04-20 04:38:49 +00:00
Sam Clegg fe8aabf9d9 [WebAssembly] Object: Improve error messages on invalid section
Also add a test.

Differential Revision: https://reviews.llvm.org/D60836

llvm-svn: 358801
2019-04-20 00:11:46 +00:00
Amara Emerson eac69e9377 Revert "Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores""
We were shifting the wrong component of a split load when trying to combine them
back into a single value.

llvm-svn: 358800
2019-04-19 23:54:44 +00:00
Jessica Paquette d5c69e0836 [GlobalISel][AArch64] Legalize + select G_FRINT
Exactly the same as G_FCEIL, G_FABS, etc.

Add tests for the fp16/nofp16 behaviour, update arm64-vfloatintrinsics, etc.

Differential Revision: https://reviews.llvm.org/D60895

llvm-svn: 358799
2019-04-19 23:41:52 +00:00
Sam Clegg a27252794e [WebAssembly] FastISel: Don't fallback to SelectionDAG after BuildMI in selectCall
My understanding is that once BuildMI has been called we can't fallback
to SelectionDAG.

This change moves the fallback for when getRegForValue() fails for
that target of an indirect call.  This was failing in -fPIC mode when
the callee is GlobalValue.

Add a test case that tickles this.

Differential Revision: https://reviews.llvm.org/D60908

llvm-svn: 358793
2019-04-19 22:43:32 +00:00
Vedant Kumar 282b26ec4d [GVN+LICM] Use line 0 locations for better crash attribution
This is a follow-up to r291037+r291258, which used null debug locations
to prevent jumpy line tables.

Using line 0 locations achieves the same effect, but works better for
crash attribution because it preserves the right inline scope.

Differential Revision: https://reviews.llvm.org/D60913

llvm-svn: 358791
2019-04-19 22:36:40 +00:00
Eric Christopher dfebd84eb3 Remove the EnableEarlyCSEMemSSA set of options from the legacy
and new pass managers. They were default to true and not being
used.

Differential Revision: https://reviews.llvm.org/D60747

llvm-svn: 358789
2019-04-19 22:18:53 +00:00
Eli Friedman 1810339bc3 [AArch64] Fix checks for AArch64MCExpr::VK_SABS flag.
VK_SABS is part of the SymLoc bitfield in the variant kind which should
be compared for equality, not by checking the VK_SABS bit.

As far as I know, the existing code happened to produce the correct
results in all cases, so this is just a cleanup.

Patch by Stephen Crane.

Differential Revision: https://reviews.llvm.org/D60596

llvm-svn: 358788
2019-04-19 21:58:10 +00:00
Jessica Paquette ad69af3e95 [GlobalISel] Add IRTranslator support for G_FRINT
Add it as a simple intrinsic, update arm64-irtranslator.ll.

Differential Revision: https://reviews.llvm.org/D60893

llvm-svn: 358787
2019-04-19 21:46:12 +00:00
Amy Huang d07d6d6177 Attempt to fix buildbot failure in commit 1bb57bac959ac163fd7d8a76d734ca3e0ecee6ab.
llvm-svn: 358786
2019-04-19 21:44:30 +00:00
Amy Huang c774f687b6 [MS] Emit S_HEAPALLOCSITE debug info
Summary:
This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug
info in codeview. Currently only changes FastISel, so emitting labels still
needs to be implemented in SelectionDAG.

Reviewers: hans, rnk

Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D60800

llvm-svn: 358783
2019-04-19 21:09:11 +00:00
Alina Sbirlea 43709f7233 [LICM & MemorySSA] Make limit flags pass tuning options.
Summary:
Make the flags in LICM + MemorySSA tuning options in the old and new
pass managers.

Subscribers: mehdi_amini, jlebar, Prazek, george.burgess.iv, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60490

llvm-svn: 358772
2019-04-19 17:46:50 +00:00
Amara Emerson 36c5baef49 Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores"
This introduces some runtime failures which I'll need to investigate further.

llvm-svn: 358771
2019-04-19 17:42:13 +00:00
Jessica Paquette dfd87f6fa1 [GlobalISel][AArch64] Legalize vector G_FPOW
This instruction is legalized in the same way as G_FSIN, G_FCOS, G_FLOG10, etc.

Update legalize-pow.mir and arm64-vfloatintrinsics.ll to reflect the change.

Differential Revision: https://reviews.llvm.org/D60218

llvm-svn: 358764
2019-04-19 16:28:08 +00:00
Alina Sbirlea 0499a2f961 [NewPassManager] Adding pass tuning options: loop vectorize.
Summary:
Trying to add the plumbing necessary to add tuning options to the new pass manager.
Testing with the flags for loop vectorize.

Reviewers: chandlerc

Subscribers: sanjoy, mehdi_amini, jlebar, steven_wu, dexonsmith, dang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59723

llvm-svn: 358763
2019-04-19 16:11:59 +00:00
Sanjay Patel e197c617a6 [SelectionDAG] soften splat mask assert/unreachable (PR41535)
These are general queries, so they should not die when given
a degenerate input like an all undef mask. Callers should be
able to deal with an op that will eventually be simplified away.

llvm-svn: 358761
2019-04-19 15:31:11 +00:00
Nico Weber e145a540cc llvm-undname: Attempt to fix leak-on-invalid found by oss-fuzz
llvm-svn: 358760
2019-04-19 14:13:11 +00:00
Florian Hahn b340497f76 [LTO] Add plumbing to save stats during LTO on Darwin.
Gold and ld on Linux already support saving stats, but the
infrastructure is missing on Darwin. Unfortunately it seems like the
configuration from lib/LTO/LTO.cpp is not used.

This patch adds a new LTOStatsFile option and adds plumbing in Clang to
use it on Darwin, similar to the way remarks are handled.

Currnetly the handling of LTO flags seems quite spread out, with a bunch
of duplication. But I am not sure if there is an easy way to improve
that?

Reviewers: anemet, tejohnson, thegameg, steven_wu

Reviewed By: steven_wu

Differential Revision: https://reviews.llvm.org/D60516

llvm-svn: 358753
2019-04-19 12:36:41 +00:00
Bjorn Pettersson 238c9d6308 [CodeGen] Add "const" to MachineInstr::mayAlias
Summary:
The basic idea here is to make it possible to use
MachineInstr::mayAlias also when the MachineInstr
is const (or the "Other" MachineInstr is const).

The addition of const in MachineInstr::mayAlias
then rippled down to the need for adding const
in several other places, such as
TargetTransformInfo::getMemOperandWithOffset.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60856

llvm-svn: 358744
2019-04-19 09:08:38 +00:00
James Molloy 9ad4cb3de4 [PATCH] [MachineScheduler] Check pending instructions when an instruction is scheduled
Pending instructions that may have been blocked from being available by the HazardRecognizer may no longer may not be blocked any more when an instruction is scheduled; pending instructions should be re-checked in this case.

This is primarily aimed at VLIW targets with large parallelism and esoteric constraints.

No testcase as no in-tree targets have this behavior.

Differential revision: https://reviews.llvm.org/D60861

llvm-svn: 358743
2019-04-19 09:00:55 +00:00
Fangrui Song 7137b54a03 [MergeFunc] Delete unused FunctionNode::release()
llvm-svn: 358742
2019-04-19 08:03:20 +00:00
Fangrui Song 884f557bb2 [MergeFunc] removeUsers: call remove() only on direct users
removeUsers uses a work list to collect indirect users and call remove()
on those functions. However it has a bug (`if (!Visited.insert(UU).second)`).

Actually, we don't have to collect indirect users.
After the merge of F and G, G's callers will be considered (added to
Deferred). If G's callers can be merged, G's callers' callers will be
considered.

Update the test unnamed-addr-reprocessing.ll to make it clear we can
still merge indirect callers.

llvm-svn: 358741
2019-04-19 07:57:51 +00:00
Piotr Sobczak 72e2960e52 [AMDGPU] Ignore non-SUnits edges
Summary:
Ignore edges to non-SUnits (e.g. ExitSU) when checking
for low latency instructions.

When calling the function isLowLatencyInstruction(),
an ExitSU could be on the list of successors, not necessarily
a regular SU. In other places in the code there is a check
"Succ->NodeNum >= DAGSize" to prevent further processing of
ExitSU as "Succ->getInstr()" is NULL in such a case.
Also, 8 out of 9 cases of "SUnit *Succ = SuccDep.getSUnit())"
has the guard, so it is clearly an omission here.

Change-Id: Ica86f0327c7b2e6bcb56958e804ea6c71084663b

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: MatzeB, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60864

llvm-svn: 358740
2019-04-19 06:19:14 +00:00
Chandler Carruth ce3f75df1f [CallSite removal] Move the legacy PM, call graph, and some inliner
code to `CallBase`.

This patch focuses on the legacy PM, call graph, and some of inliner and legacy
passes interacting with those APIs from `CallSite` to the new `CallBase` class.
No interesting changes.

Differential Revision: https://reviews.llvm.org/D60412

llvm-svn: 358739
2019-04-19 05:59:42 +00:00
Fangrui Song 82216048e6 [MergeFunc] Use less_first() as the comparator of Schwartzian transform
llvm-svn: 358738
2019-04-19 05:49:29 +00:00
Craig Topper bb769a2946 [X86] Turn (and (shl X, C1), C2) into (shl (and X, (C1 >> C2), C2) if the AND could match a movzx.
Could get further improvements by recognizing (i64 and (anyext (i32 shl))).

llvm-svn: 358737
2019-04-19 05:48:13 +00:00
Craig Topper f73caae956 [X86] Make sure we copy the HandleSDNode back to N before executing the default code after the switch in matchAddressRecursively
Summary:
There are two places where we create a HandleSDNode in address matching in order to handle the case where N is changed by CSE. But if we end up not matching, we fall back to code at the bottom of the switch that really would like N to point to something that wasn't CSEd away. So we should make sure we copy the handle back to N on any paths that can reach that code.

This appears to be the true reason we needed to check DELETED_NODE in the negation matching. In pr32329.ll we had two subtracts back to back. We recursed through the first subtract, and onto the second subtract. The second subtract called matchAddressRecursively on its LHS which caused that subtract to CSE. We ultimately failed the match and ended up in the default code. But N was pointing at the old node that had been deleted, but the default code didn't know that and took it as the base register. Then we unwound back to the first subtract and tried to access this bogus base reg requiring the check for deleted node. With this patch we now use the CSE result as the base reg instead.

matchAdd has been broken since sometime in 2015 when it was pulled out of the switch into a helper function. The assignment to N at the end was still there, but N was passed by value and not by reference so the update didn't go anywhere.

Reviewers: niravd, spatel, RKSimon, bkramer

Reviewed By: niravd

Subscribers: llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60843

llvm-svn: 358735
2019-04-19 04:52:21 +00:00
Fangrui Song 9a331bba2a [DWARF] Use hasFileAtIndex to properly verify DWARF 5 after rL358732
llvm-svn: 358734
2019-04-19 03:34:28 +00:00
Ali Tamur 783d84bb39 [llvm] Prevent duplicate files in debug line header in dwarf 5: another attempt
Another attempt to land the changes in debug line header to prevent duplicate
files in Dwarf 5. I rolled back my previous commit because of a mistake in
generating the object file in a test. Meanwhile, I addressed some offline
comments and changed the implementation; the largest difference is that
MCDwarfLineTableHeader does not keep DwarfVersion but gets it as a parameter. I
also merged the patch to fix two lld tests that will strt to fail into this
patch.

Original Commit:

https://reviews.llvm.org/D59515

Original Message:
Motivation: In previous dwarf versions, file name indexes started from 1, and
the primary source file was not explicit. Dwarf 5 standard (6.2.4) prescribes
the primary source file to be explicitly given an entry with an index number 0.

The current implementation honors the specification by just duplicating the
main source file, once with index number 0, and later maybe with another
index number. While this is compliant with the letter of the standard, the
duplication causes problems for consumers of this information such as lldb.
(Some files are duplicated, where only some of them have a line table although
all refer to the same file)

With this change, dwarf 5 debug line section files always start from 0, and
the zeroth entry is not duplicated whenever possible. This requires different
handling of dwarf 4 and dwarf 5 during generation (e.g. when a function returns
an index zero for a file name, it signals an error in dwarf 4, but not in dwarf
5) However, I think the minor complication is worth it, because it enables all
consumers (lldb, gdb, dwarfdump, objdump, and so on) to treat all files in the
file name list homogenously.

llvm-svn: 358732
2019-04-19 02:26:56 +00:00
Fangrui Song acc7641bcb [APInt] Optimize umul_ov
Change two costly udiv() calls to lshr(1)*RHS + left-shift + plus

On one 64-bit umul_ov benchmark, I measured an obvious improvement: 12.8129s -> 3.6257s

Note, there may be some value to special case 64-bit (the most common
case) with __builtin_umulll_overflow().

Differential Revision: https://reviews.llvm.org/D60669

llvm-svn: 358730
2019-04-19 02:06:06 +00:00
Saleem Abdulrasool b96d9b3419 MergeFunc: preserve COMDAT information when creating a thunk
We would previously drop the COMDAT on the thunk we generated when replacing a
function body with the forwarding thunk. This would result in a function that
may have been multiply emitted and multiply merged to be emitted with the same
name without the COMDAT. This is a hard error with PE/COFF where the COMDAT is
used for the deduplication of Value Witness functions for Swift.

llvm-svn: 358728
2019-04-19 01:48:36 +00:00
Alina Sbirlea da0f71af7d [LoopUnroll] Move list of params into a struct [NFCI].
Summary: Cleanup suggested in review of r358304.

Reviewers: sanjoy, efriedma

Subscribers: jlebar, zzheng, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60638

llvm-svn: 358723
2019-04-18 23:43:49 +00:00
Adrian Prantl fac7875704 Implement sys::fs::copy_file using the macOS copyfile(3) API
to support APFS clones.

This patch adds a Darwin-specific implementation of
llvm::sys::fs::copy_file() that uses the macOS copyfile(3) API to
support APFS copy-on-write clones, which should be faster and much
more space efficient.

https://developer.apple.com/library/archive/documentation/FileManagement/Conceptual/APFS_Guide/ToolsandAPIs/ToolsandAPIs.html

Differential Revision: https://reviews.llvm.org/D60802

This reapplies 358628 with an additional bugfix handling the case
where the destination file already exists. (Caught by the clang testsuite).

llvm-svn: 358716
2019-04-18 21:22:50 +00:00
Jessica Paquette 0aa9b453c4 [GlobalISel][AArch64] Legalize/select G_(S/Z/ANY)_EXT for v8s8s
This adds legalization for G_SEXT, G_ZEXT, and G_ANYEXT for v8s8s.

We were falling back on G_ZEXT in arm64-vabs.ll before, preventing us from
selecting the @llvm.aarch64.neon.sabd.v8i8 intrinsic.

This adds legalizer support for those 3, which gives us selection via the
importer. Update the relevant tests (legalize-ext.mir, select-int-ext.mir) and
add a GISel line to arm64-vabs.ll.

Differential Revision: https://reviews.llvm.org/D60881

llvm-svn: 358715
2019-04-18 21:15:48 +00:00
Jessica Paquette 3b5119c684 [GlobalISel][AArch64] Legalize v8s8 loads
Add legalizer support for loads of v8s8 and update legalize-load-store.mir.

Differential Revision: https://reviews.llvm.org/D60877

llvm-svn: 358714
2019-04-18 21:13:58 +00:00
Nico Weber a0ac65c98f llvm-undname: Fix two more asserts-on-invalid, found by oss-fuzz
llvm-svn: 358708
2019-04-18 19:52:32 +00:00
Nico Weber 502cf4bd19 llvm-undname: Fix two asserts-on-invalid
llvm-svn: 358707
2019-04-18 19:30:21 +00:00
Philip Reames 137995d8da [GuardWidening] Wire up a NPM version of the LoopGuardWidening pass
llvm-svn: 358704
2019-04-18 19:17:14 +00:00
Michael Berg d573aa0156 [NFC] FMF propagation for GlobalIsel
llvm-svn: 358702
2019-04-18 18:48:57 +00:00
Quentin Colombet ea3364bf85 [BlockExtractor] Extend the file format to support the grouping of basic blocks
Prior to this patch, each basic block listed in the extrack-blocks-file
would be extracted to a different function.

This patch adds the support for comma separated list of basic blocks
to form group.

When the region formed by a group is not extractable, e.g., not single
entry, all the blocks of that group are left untouched.

Let us see this new format in action (comments are not part of the
file format):
;; funcName bbName[,bbName...]
   foo      bb1        ;; Extract bb1 in its own function
   foo      bb2,bb3    ;; Extract bb2,bb3 in their own function
   bar      bb1,bb4    ;; Extract bb1,bb4 in their own function
   bar      bb2        ;; Extract bb2 in its own function

Assuming all regions are extractable, this will create one function and
thus one call per region.

Differential Revision: https://reviews.llvm.org/D60746

llvm-svn: 358701
2019-04-18 18:28:30 +00:00
Simon Pilgrim 4171a91e92 [X86] combineVectorTruncationWithPACKUS - remove split/concatenation of mask
combineVectorTruncationWithPACKUS is currently splitting the upper bit bit masking into 128-bit subregs and then concatenating them back together.

This was originally done to avoid regressions that caused existing subregs to be concatenated to the larger type just for the AND masking before being extracted again. This was fixed by @spatel (notably rL303997 and rL347356).

This also lets SimplifyDemandedBits do some further improvements before it hits the recursive depth limit.

My only annoyance with this is that we were broadcasting some xmm masks but we seem to have lost them by moving to ymm - but that's a known issue as the logic in lowerBuildVectorAsBroadcast isn't great.

Differential Revision: https://reviews.llvm.org/D60375#inline-539623

llvm-svn: 358692
2019-04-18 17:23:09 +00:00
Philip Reames adf288c5d9 [LoopPred] Fix a blatantly obvious bug in r358684
The bug is that I didn't check whether the operand of the invariant_loads were themselves invariant.  I don't know how this got missed in the patch and review.  I even had an unreduced test case locally, and I remember handling this case, but I must have lost it in one of the rebases.  Oops.

llvm-svn: 358688
2019-04-18 17:01:19 +00:00
Philip Reames 92a7177e6b [LoopPredication] Allow predication of loop invariant computations (within the loop)
The purpose of this patch is to eliminate a pass ordering dependence between LoopPredication and LICM. To understand the purpose, consider the following snippet of code inside some loop 'L' with IV 'i'
A = _a.length;
guard (i < A)
a = _a[i]
B = _b.length;
guard (i < B);
b = _b[i];
...
Z = _z.length;
guard (i < Z)
z = _z[i]
accum += a + b + ... + z;

Today, we need LICM to hoist the length loads, LoopPredication to make the guards loop invariant, and TrivialUnswitch to eliminate the loop invariant guard to establish must execute for the next length load. Today, if we can't prove speculation safety, we'd have to iterate these three passes 26 times to reduce this example down to the minimal form.

Using the fact that the array lengths are known to be invariant, we can short circuit this iteration. By forming the loop invariant form of all the guards at once, we remove the need for LoopPredication from the iterative cycle. At the moment, we'd still have to iterate LICM and TrivialUnswitch; we'll leave that part for later.

As a secondary benefit, this allows LoopPred to expose peeling oppurtunities in a much more obvious manner.  See the udiv test changes as an example.  If the udiv was not hoistable (i.e. we couldn't prove speculation safety) this would be an example where peeling becomes obviously profitable whereas it wasn't before.

A couple of subtleties in the implementation:
- SCEV's isSafeToExpand guarantees speculation safety (i.e. let's us expand at a new point).  It is not a precondition for expansion if we know the SCEV corresponds to a Value which dominates the requested expansion point.
- SCEV's isLoopInvariant returns true for expressions which compute the same value across all iterations executed, regardless of where the original Value is located.  (i.e. it can be in the loop)  This implies we have a speculation burden to prove before expanding them outside loops.
- invariant_loads and AA->pointsToConstantMemory are two cases that SCEV currently does not handle, but meets the SCEV definition of invariance.  I plan to sink this part into SCEV once this has baked for a bit.

Differential Revision: https://reviews.llvm.org/D60093

llvm-svn: 358684
2019-04-18 16:33:17 +00:00
Nicolai Haehnle 523f90a2ba [SDA] Bug fix: Use IPD outside the loop as divergence bound
Summary:
The immediate post dominator of the loop header may be part of the divergent loop.
Since this /was/ the divergence propagation bound the SDA would not detect joins of divergent paths outside the loop.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: mmasten, arsenm, jvesely, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59042

llvm-svn: 358681
2019-04-18 16:17:35 +00:00
Philip Reames b2c9fc02d5 Fix a bug in SCEV's isSafeToExpand around speculation safety
isSafeToExpand was making a common, but dangerously wrong, mistake in assuming that if any instruction within a basic block executes, that all instructions within that block must execute.  This can be trivially shown to be false by considering the following small example:
bb:
  add x, y  <-- InsertionPoint
  call @throws()
  udiv x, y <-- SCEV* S
  br ...

It's clearly not legal to expand S above the throwing call, but the previous logic would do so since S dominates (but not properlyDominates) the block containing the InsertionPoint.

Since iterating instructions w/in a block is expensive, this change special cases two cases: 1) S is an operand of InsertionPoint, and 2) InsertionPoint is the terminator of it's block.  These two together are enough to keep all current optimizations triggering while fixing the latent correctness issue.

As best I can tell, this is a silent bug in current ToT.  Given that, there's no tests with this change.  It was noticed in an upcoming optimization change (D60093), and was reviewed as part of that.  That change will include the test which caused me to notice the issue.  I'm submitting this seperately so that anyone bisecting a problem gets a clear explanation.

llvm-svn: 358680
2019-04-18 16:10:21 +00:00
Benjamin Kramer 7085795284 MinidumpYAML: Fix ambiguity between std::make_unique and llvm::make_unique
llvm-svn: 358673
2019-04-18 15:06:03 +00:00
Pavel Labath 7429d86f36 MinidumpYAML: Add support for ModuleList stream
Summary:
This patch adds support for yaml (de)serialization of the minidump
ModuleList stream. It's a fairly straight forward-application of the
existing patterns to the ModuleList structures defined in previous
patches.

One thing, which may be interesting to call out explicitly is the
addition of "new" allocation functions to the helper BlobAllocator
class. The reason for this was, that there was an emerging pattern of a
need to allocate space for entities, which do not have a suitable
lifetime for use with the existing allocation functions. A typical
example of that was the "size" of various lists, which is only available
as a temporary returned by the .size() method of some container. For
these cases, one can use the new set of allocation functions, which
will take a temporary object, and store it in an allocator-managed
buffer until it is written to disk.

Reviewers: amccarth, jhenderson, clayborg, zturner

Subscribers: lldb-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60405

llvm-svn: 358672
2019-04-18 14:57:31 +00:00
Simon Pilgrim 8f87e53462 [X86][SSE] Lower ICMP EQ(AND(X,C),C) -> SRA(SHL(X,LOG2(C)),BW-1) iff C is power-of-2.
This replaces the MOVMSK combine introduced at D52121/rL342326

(movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C))

with the more general icmp lowering so it can pick up more cases through bitcasts - notably vXi8 cases which use vXi16 shifts+masks, this patch can remove the mask and use pcmpgtb(0,x) for the sra.

Differential Revision: https://reviews.llvm.org/D60625

llvm-svn: 358651
2019-04-18 09:58:59 +00:00
Serguei Katkov ca6c03a22f [NewPM] Add Option handling for LoopVectorize
This patch enables passing options to LoopVectorizePass via the passes pipeline.

Reviewers: chandlerc, fedor.sergeev, leonardchan, philip.pfaffe
Reviewed By: fedor.sergeev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D60681

llvm-svn: 358647
2019-04-18 08:46:11 +00:00
Kang Zhang 009a21d2fd [PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS()
Summary:
This issue from the bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41177

When the two operands for BUILD_VECTOR are same, we will get assert error.
llvm::SDValue combineBVOfConsecutiveLoads(llvm::SDNode*, llvm::SelectionDAG&):
Assertion `!(InputsAreConsecutiveLoads && InputsAreReverseConsecutive) &&
"The loads cannot be both consecutive and reverse consecutive."' failed.

This error caused by the wrong ElemSIze when calling isConsecutiveLS(). We
should use `getScalarType().getStoreSize();` to get the ElemSize instread of
 `getScalarSizeInBits() / 8`.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D60811

llvm-svn: 358644
2019-04-18 07:24:15 +00:00
Eric Christopher eff3b6fe7f Elaborate why we have an option on by default for enabling chr.
llvm-svn: 358641
2019-04-18 06:17:40 +00:00
Tim Renouf 7c55c8d8c3 [AMDGPU] Avoid DAG combining assert with fneg(fadd(A,0))
fneg combining attempts to turn it into fadd(fneg(A), fneg(0)), but
creating the new fadd folds to just fneg(A). When A has multiple uses,
this confuses it and you get an assert. Fixed.

Differential Revision: https://reviews.llvm.org/D60633

Change-Id: I0ddc9b7286abe78edc0cd8d734fdeb05ff09821c
llvm-svn: 358640
2019-04-18 05:27:01 +00:00
Ali Tamur 6263365b08 Fix a typo in comments. [NFC]
llvm-svn: 358639
2019-04-18 02:39:37 +00:00
Aditya Nandakumar 9266337656 [GISel]:IRTranslator: Prefer a buidInstr form that allows CSE of cast instructions
https://reviews.llvm.org/D60844

Use the style of buildInstr that allows CSEing.

llvm-svn: 358637
2019-04-18 02:19:29 +00:00
Richard Trieu 7b6192025e Fix bad compare function over FusionCandidate.
Reverse the checking of the domiance order so that when a self compare happens,
it returns false.  This makes compare function have strict weak ordering.

llvm-svn: 358636
2019-04-18 01:39:45 +00:00
Adrian Prantl 00d97ea202 Revert Implement sys::fs::copy_file using the macOS copyfile(3) API to support APFS clones.
This reverts r358628 (git commit 91a06bee78)
while investigating a crash reproducer bot failure.

llvm-svn: 358634
2019-04-18 01:21:10 +00:00
Adrian Prantl 91a06bee78 Implement sys::fs::copy_file using the macOS copyfile(3) API
to support APFS clones.

This patch adds a Darwin-specific implementation of
llvm::sys::fs::copy_file() that uses the macOS copyfile(3) API to
support APFS copy-on-write clones, which should be faster and much
more space efficient.

https://developer.apple.com/library/archive/documentation/FileManagement/Conceptual/APFS_Guide/ToolsandAPIs/ToolsandAPIs.html

Differential Revision: https://reviews.llvm.org/D60802

llvm-svn: 358628
2019-04-18 00:01:05 +00:00
Akira Hatanaka 0b19f5aef9 Fix formatting. NFC
llvm-svn: 358623
2019-04-17 23:14:39 +00:00
Sanjay Patel fb363a778f [x86] try to widen 'shl' as part of LEA formation
The test file has pairs of tests that are logically equivalent:
https://rise4fun.com/Alive/2zQ

%t4 = and i8 %t1, 8
%t5 = zext i8 %t4 to i16
%sh = shl i16 %t5, 2
%t6 = add i16 %sh, %t0
=>
%t4 = and i8 %t1, 8
%sh2 = shl i8 %t4, 2
%z5 = zext i8 %sh2 to i16
%t6 = add i16 %z5, %t0

...so if we can fold the shift op into LEA in the 1st pattern, then we
should be able to do the same in the 2nd pattern (unnecessary 'movzbl'
is a separate bug I think).

We don't want to do this any sooner though because that would conflict
with generic transforms that try to narrow the width of the shift.

Differential Revision: https://reviews.llvm.org/D60789

llvm-svn: 358622
2019-04-17 22:38:51 +00:00
Denis Bakhvalov cfd25a4b0e Test commit by Denis Bakhvalov
Change-Id: I4d85123a157d957434902fb14ba50926b2d56212
llvm-svn: 358619
2019-04-17 22:27:30 +00:00
Nick Desaulniers 9609ce2f33 [AsmPrinter] hoist %a output template to base class for ARM+Aarch64
Summary:
X86 is quite complicated; so I intend to leave it as is. ARM+Aarch64 do
basically the same thing (Aarch64 did not correctly handle immediates,
ARM has a test llvm/test/CodeGen/ARM/2009-04-06-AsmModifier.ll that uses
%a with an immediate) for a flag that should be target independent
anyways.

Reviewers: echristo, peter.smith

Reviewed By: echristo

Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60841

llvm-svn: 358618
2019-04-17 22:21:10 +00:00
Amara Emerson d51adf0568 Add a getSizeInBits() accessor to MachineMemOperand. NFC.
Cleans up a bunch of places where we do getSize() * 8.

Differential Revision: https://reviews.llvm.org/D60799

llvm-svn: 358617
2019-04-17 22:21:05 +00:00
Amara Emerson daf6e66ac5 [GlobalISel] Add legalization support for non-power-2 loads and stores
Legalize things like i24 load/store by splitting them into smaller power of 2 operations.

This matches how SelectionDAG handles these operations.

Differential Revision: https://reviews.llvm.org/D59971

llvm-svn: 358613
2019-04-17 21:30:07 +00:00
Kit Barton 3cdf87940f Add basic loop fusion pass.
This patch adds a basic loop fusion pass. It will fuse loops that conform to the
following 4 conditions:
  1. Adjacent (no code between them)
  2. Control flow equivalent (if one loop executes, the other loop executes)
  3. Identical bounds (both loops iterate the same number of iterations)
  4. No negative distance dependencies between the loop bodies.

The pass does not make any changes to the IR to create opportunities for fusion.
Instead, it checks if the necessary conditions are met and if so it fuses two
loops together.

The pass has not been added to the pass pipeline yet, and thus is not enabled by
default. It can be run stand alone using the -loop-fusion option.

Differential Revision: https://reviews.llvm.org/D55851

llvm-svn: 358607
2019-04-17 18:53:27 +00:00
Nick Desaulniers a2077bab40 [AsmPrinter] defer %c to base class for ARM, PPC, and Hexagon. NFC
Summary:
None of these derived classes do anything that the base class cannot.
If we remove these case statements, then the base class can handle them
just fine.

Reviewers: peter.smith, echristo

Reviewed By: echristo

Subscribers: nemanjai, javed.absar, eraman, kristof.beyls, hiraditya, kbarton, jsji, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60803

llvm-svn: 358603
2019-04-17 18:22:48 +00:00
Steven Wu 05a358cdcd [ThinLTO] Fix ThinLTOCodegenerator to export llvm.used symbols
Summary:
Reapply r357931 with fixes to ThinLTO testcases and llvm-lto tool.

ThinLTOCodeGenerator currently does not preserve llvm.used symbols and
it can internalize them. In order to pass the necessary information to the
legacy ThinLTOCodeGenerator, the input to the code generator is
rewritten to be based on lto::InputFile.

Now ThinLTO using the legacy LTO API will requires data layout in
Module.

"internalize" thinlto action in llvm-lto is updated to run both
"promote" and "internalize" with the same configuration as
ThinLTOCodeGenerator. The old "promote" + "internalize" option does not
produce the same output as ThinLTOCodeGenerator.

This fixes: PR41236
rdar://problem/49293439

Reviewers: tejohnson, pcc, kromanova, dexonsmith

Reviewed By: tejohnson

Subscribers: ormris, bd1976llvm, mehdi_amini, inglorion, eraman, hiraditya, jkorous, dexonsmith, arphaman, dang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60421

llvm-svn: 358601
2019-04-17 17:38:09 +00:00
Philip Reames 88679717ce [InstCombine] Factor out unreachable inst idiom creation [NFC]
In InstCombine, we use an idiom of "store i1 true, i1 undef" to indicate we've found a path which we've proven unreachable.  We can't actually insert the unreachable instruction since that would require changing the CFG.  We leave that to simplifycfg later.

This just factors out that idiom creation so we don't duplicate the same mostly undocument idiom creation in multiple places.

llvm-svn: 358600
2019-04-17 17:37:58 +00:00
Nikita Popov 2039581002 [LVI][CVP] Constrain values in with.overflow branches
If a branch is conditional on extractvalue(op.with.overflow(%x, C), 1)
then we can constrain the value of %x inside the branch based on
makeGuaranteedNoWrapRegion(). We do this by extending the edge-value
handling in LVI. This allows CVP to then fold comparisons against %x,
as illustrated in the tests.

Differential Revision: https://reviews.llvm.org/D60650

llvm-svn: 358597
2019-04-17 16:57:42 +00:00
Dmitry Preobrazhensky 394d0a1637 [AMDGPU][MC] Corrected handling of "-" before expressions
See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D60622

llvm-svn: 358596
2019-04-17 16:56:34 +00:00
Rhys Perry c2814e12e7 AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructions
Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches.

Reviewers: arsen, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60824

llvm-svn: 358592
2019-04-17 16:31:52 +00:00
Florian Hahn 893aea58ea [LoopUnroll] Allow unrolling if the unrolled size does not exceed loop size.
Summary:
In the following cases, unrolling can be beneficial, even when
optimizing for code size:
 1) very low trip counts
 2) potential to constant fold most instructions after fully unrolling.

We can unroll in those cases, by setting the unrolling threshold to the
loop size. This might highlight some cost modeling issues and fixing
them will have a positive impact in general.

Reviewers: vsk, efriedma, dmgreen, paquette

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D60265

llvm-svn: 358586
2019-04-17 15:57:43 +00:00
Simon Pilgrim e7fe6dd5ed [DAGCombine] Add SimplifyDemandedBits helper that handles demanded elts mask as well
The other SimplifyDemandedBits helpers become wrappers to this new demanded elts variant.

llvm-svn: 358585
2019-04-17 15:45:44 +00:00
Lang Hames c1106c9b11 [Support] Add LEB128 support to BinaryStreamReader/Writer.
Summary:
This patch adds support for ULEB128 and SLEB128 encoding and decoding to
BinaryStreamWriter and BinaryStreamReader respectively.

Support for ULEB128/SLEB128 will be used for eh-frame parsing in the JITLink
library currently under development (see https://reviews.llvm.org/D58704).

Reviewers: zturner, dblaikie

Subscribers: kristina, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60810

llvm-svn: 358584
2019-04-17 15:38:27 +00:00
Florian Hahn 258a425c69 [ScheduleDAGRRList] Recompute topological ordering on demand.
Currently there is a single point in ScheduleDAGRRList, where we
actually query the topological order (besides init code). Currently we
are recomputing the order after adding a node (which does not have
predecessors) and then we add predecessors edge-by-edge.

We can avoid adding edges one-by-one after we added a new node. In that case, we can
just rebuild the order from scratch after adding the edges to the DAG
and avoid all the updates to the ordering.

Also, we can delay updating the DAG until we query the DAG, if we keep a
list of added edges. Depending on the number of updates, we can either
apply them when needed or recompute the order from scratch.

This brings down the geomean compile time for of CTMark with -O1 down 0.3% on X86,
with no regressions.

Reviewers: MatzeB, atrick, efriedma, niravd, paquette

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D60125

llvm-svn: 358583
2019-04-17 15:05:29 +00:00
Dmitry Preobrazhensky 20d52e3aa2 [AMDGPU][MC] Corrected parsing of registers
See bug 41280: https://bugs.llvm.org/show_bug.cgi?id=41280

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D60621

llvm-svn: 358581
2019-04-17 14:44:01 +00:00
Tim Renouf 59e8bd3093 [AMDGPU] Flag new raw/struct atomic ops as source of divergence
Differential Revision: https://reviews.llvm.org/D60731

Change-Id: I821d93dec8b9cdd247b8172d92fb5e15340a9e7d
llvm-svn: 358579
2019-04-17 14:04:31 +00:00
Robert Widmann d909a5ed8d [LLVM-C] Add DIFile Field Accesssors
Summary:
Add accessors for the file, directory, source file name (curiously, an `Optional` value?), of a DIFile.

This is intended to replace the LLVMValueRef-based accessors used in D52239

Reviewers: whitequark, jberdine, deadalnix

Reviewed By: whitequark, jberdine

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60489

llvm-svn: 358577
2019-04-17 13:29:14 +00:00
Simon Pilgrim 9daacec816 [CostModel][X86] Add bool anyof/allof reduction costs
On pre-AVX512 targets we can use MOVMSK to extract reduced boolean results. This is properly optimized, annoyingly AVX512 isn't and produces code that is almost as bad as the (unchanged) costs suggest......

Differential Revision: https://reviews.llvm.org/D60403

llvm-svn: 358574
2019-04-17 10:58:19 +00:00
Fangrui Song a364d599ab [DWARF] llvm::Error -> Error. NFC
The unqualified name is more common and is used in the file as well.

llvm-svn: 358567
2019-04-17 09:11:08 +00:00
Fangrui Song c82e92bca8 Change some llvm::{lower,upper}_bound to llvm::bsearch. NFC
llvm-svn: 358564
2019-04-17 07:58:05 +00:00
Roman Lebedev 0080645846 [CVP] processOverflowIntrinsic(): don't crash if constant-holding happened
As reported by Mikael Holmén in post-commit review in
https://reviews.llvm.org/D60791#1469765

llvm-svn: 358559
2019-04-17 06:35:07 +00:00
Fangrui Song df44ff1b78 [DWARF] Pass ReferenceToDIEOffsets elements by reference
llvm-svn: 358558
2019-04-17 06:33:52 +00:00
Craig Topper 6bf0802738 [X86] In CopyToFromAsymmetricReg, use VR128 instead of FR32 instructions for GR32<->XMM register copies.
We have two versions of some instructions, VR128 versions and FR32 versions that
are marked as CodeGenOnly.

This change switches to using the VR128 versions for these copies. It's after
register allocation so the class size no longer matters. This matches how GR64
works.

llvm-svn: 358555
2019-04-17 06:09:11 +00:00
Eric Christopher e29874eaa0 Revert "Add basic loop fusion pass." Per request.
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358553
2019-04-17 04:55:24 +00:00
Eric Christopher cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher 0ebbf72a63 Remove the run-slp-after-loop-vectorization option.
It's been on by default for 4 years and cleans up the pass
hierarchy.

llvm-svn: 358548
2019-04-17 02:26:27 +00:00
Eric Christopher a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Kit Barton ab70da0728 Add basic loop fusion pass.
This patch adds a basic loop fusion pass. It will fuse loops that conform to the
following 4 conditions:
  1. Adjacent (no code between them)
  2. Control flow equivalent (if one loop executes, the other loop executes)
  3. Identical bounds (both loops iterate the same number of iterations)
  4. No negative distance dependencies between the loop bodies.

The pass does not make any changes to the IR to create opportunities for fusion.
Instead, it checks if the necessary conditions are met and if so it fuses two
loops together.

The pass has not been added to the pass pipeline yet, and thus is not enabled by
default. It can be run stand alone using the -loop-fusion option.

Phabricator: https://reviews.llvm.org/D55851
llvm-svn: 358543
2019-04-17 01:37:00 +00:00
Robert Widmann d6eb4bb801 [LLVM-C] Add Accessors For Global Variable Metadata Properties
Summary: Metadata for a global variable is really a  (GlobalVariable, Expression) tuple.  Allow access to these, then allow retrieving the file, scope, and line for a DIVariable, whether global or local.  This should be the last of the accessors required for uniform access to location and file information metadata.

Reviewers: jberdine, whitequark, deadalnix

Reviewed By: jberdine, whitequark

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60725

llvm-svn: 358532
2019-04-16 21:39:48 +00:00
Ali Tamur e8de5cd602 Fix a typo in comments. [NFC]
llvm-svn: 358531
2019-04-16 21:37:43 +00:00
Nick Desaulniers 3271ca01fe [NVPTXAsmPrinter] clean up dead code. NFC
Summary:
The printOperand function takes a default parameter, for which there are
zero call sites that explicitly pass such a parameter.  As such, there
is no case to support. This means that the method
printVecModifiedImmediate is purly dead code, and can be removed.

The eventual goal for some of these AsmPrinter refactoring is to have
printOperand be a virtual method; making it easier to print operands
from the base class for more generic Asm printing. It will help if all
printOperand methods have the same function signature (ie. no Modifier
argument when not needed).

Reviewers: echristo, tra

Reviewed By: echristo

Subscribers: jholewinski, hiraditya, llvm-commits, craig.topper, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60727

llvm-svn: 358527
2019-04-16 21:04:34 +00:00
Simon Pilgrim e5573f4f4e [TargetLowering] Rename preferShiftsToClearExtremeBits and shouldFoldShiftPairToMask (PR41359)
As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious.

shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask

preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair

llvm-svn: 358526
2019-04-16 20:57:28 +00:00
Sanjay Patel e08783e2f5 [EarlyCSE] detect equivalence of selects with inverse conditions and commuted operands (PR41101)
This is 1 of the problems discussed in the post-commit thread for:
rL355741 / http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190311/635516.html
and filed as:
https://bugs.llvm.org/show_bug.cgi?id=41101

Instcombine tries to canonicalize some of these cases (and there's room for improvement
there independently of this patch), but it can't always do that because of extra uses.
So we need to recognize these commuted operand patterns here in EarlyCSE. This is similar
to how we detect commuted compares and commuted min/max/abs.

Differential Revision: https://reviews.llvm.org/D60723

llvm-svn: 358523
2019-04-16 20:41:20 +00:00
Anton Afanasyev 3a00b020aa Time profiler: optimize json output time
Summary:
Use llvm::json::Array.reserve() to optimize json output time. Here is motivation:
https://reviews.llvm.org/D60609#1468941. In short: for the json array
with ~32K entries, pushing back each entry takes ~4% of whole time compared
to the method of preliminary memory reservation: (3995-3845)/3995 = 3.75%.

Reviewers: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60792

llvm-svn: 358522
2019-04-16 20:36:56 +00:00
Nikita Popov 52b24ee932 [CVP] Simplify umulo and smulo that cannot overflow
If a umul.with.overflow or smul.with.overflow operation cannot
overflow, simplify it to a simple mul nuw / mul nsw. After the
refactoring in D60668 this is just a matter of removing an
explicit check against multiplications.

Differential Revision: https://reviews.llvm.org/D60791

llvm-svn: 358521
2019-04-16 20:31:41 +00:00
Simon Pilgrim 82ffa88a04 [SLP] Refactoring of the operand reordering code.
This is a refactoring patch which should have all the functionality of the current code. Its goal is twofold:
i. Cleanup and simplify the reordering code, and
ii. Generalize reordering so that it will work for an arbitrary number of operands, not just 2.

This is the second patch in a series of patches that will enable operand reordering across chains of operations. An example of this was presented in EuroLLVM'18 https://www.youtube.com/watch?v=gIEn34LvyNo .

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59973

llvm-svn: 358519
2019-04-16 19:27:00 +00:00
Simon Pilgrim d769bb1e58 [X86][AVX] X86ISD::PERMV/PERMV3 node types can never fold index ops
Improves codegen demonstrated by D60512 - instructions represented by X86ISD::PERMV/PERMV3 can never memory fold the operand used for their index register.

This patch updates the 'isUseOfShuffle' helper into the more capable 'isFoldableUseOfShuffle' that recognises that the op is used for a X86ISD::PERMV/PERMV3 index mask and can't be folded - allowing us to use broadcast/subvector-broadcast ops to reduce the size of the mask constant pool data.

Differential Revision: https://reviews.llvm.org/D60562

llvm-svn: 358516
2019-04-16 19:18:53 +00:00
Nikita Popov 5ecd6a48b9 [InstCombine] Prune fshl/fshr with masked operands
If a constant shift amount is used, then only some of the LHS/RHS
operand bits are demanded and we may be able to simplify based on
that. InstCombineSimplifyDemanded already had the necessary support
for that, we just weren't calling it with fshl/fshr as root.

In particular, this allows us to relax some masked funnel shifts
into simple shifts, as shown in the tests.

Patch by Shawn Landden.

Differential Revision: https://reviews.llvm.org/D60660

llvm-svn: 358515
2019-04-16 19:05:49 +00:00
Nikita Popov 79dffc67b5 [IR] Add WithOverflowInst class
This adds a WithOverflowInst class with a few helper methods to get
the underlying binop, signedness and nowrap type and makes use of it
where sensible. There will be two more uses in D60650/D60656.

The refactorings are all NFC, though I left some TODOs where things
could be improved. In particular we have two places where add/sub are
handled but mul isn't.

Differential Revision: https://reviews.llvm.org/D60668

llvm-svn: 358512
2019-04-16 18:55:16 +00:00
Krzysztof Parzyszek ef6823ec8d [Hexagon] Remove indeterministic traversal order
Patch by Sergei Larin.

llvm-svn: 358505
2019-04-16 16:05:07 +00:00
Luis Marques eda370d4c8 [DAGCombiner] Add missing flag to addressing mode check
The checks in `canFoldInAddressingMode` tested for addressing modes that have a
base register but didn't set the `HasBaseReg` flag to true (it's false by
default). This patch fixes that. Although the omission of the flag was
technically incorrect it had no known observable impact, so no tests were
changed by this patch.

Differential Revision:  https://reviews.llvm.org/D60314

llvm-svn: 358502
2019-04-16 15:09:18 +00:00
Luis Marques 20d2424016 [RISCV] Custom lower SHL_PARTS, SRA_PARTS, SRL_PARTS
When not optimizing for minimum size (-Oz) we custom lower wide shifts
(SHL_PARTS, SRA_PARTS, SRL_PARTS) instead of expanding to a libcall.

Differential Revision: https://reviews.llvm.org/D59477

llvm-svn: 358498
2019-04-16 14:38:32 +00:00
Kadir Cetinkaya 8fdc5abffe [llvm][Support] Provide interface to set thread priorities
Summary:
We have a multi-platform thread priority setting function(last piece
landed with D58683), I wanted to make this available to all llvm community,
there seem to be other users of such functionality with portability fixmes:
lib/Support/CrashRecoveryContext.cpp
tools/clang/tools/libclang/CIndex.cpp

Reviewers: gribozavr, ioeric

Subscribers: krytarowski, jfb, kristina, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59130

llvm-svn: 358494
2019-04-16 14:32:43 +00:00
Nico Weber 930994ce14 llvm-undname: Consistently use "return nullptr" in functions returning pointers
llvm-svn: 358492
2019-04-16 14:24:42 +00:00
Nico Weber c035c243da llvm-undname: Fix nullptr deref on invalid structor names in template args
Similar to r358421: A StructorIndentifierNode has a Class field which
is read when printing it, but if the StructorIndentifierNode appears in
a template argument then demangleFullyQualifiedSymbolName() which sets
Class isn't called. Since StructorIndentifierNodes are always leaf
names, we can just reject them as well.

Found by oss-fuzz.

llvm-svn: 358491
2019-04-16 14:10:34 +00:00
Hans Wennborg 21eb771dcb Re-commit r357452: SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)
The original commit caused false positives from AddressSanitizer's
use-after-scope checks, which have now been fixed in r358478.

> The code was previously checking that candidates for sinking had exactly
> one use or were a store instruction (which can't have uses). This meant
> we could sink call instructions only if they had a use.
>
> That limitation seemed a bit arbitrary, so this patch changes it to
> "instruction has zero or one use" which seems more natural and removes
> the need to special-case stores.
>
> Differential revision: https://reviews.llvm.org/D59936

llvm-svn: 358483
2019-04-16 12:13:25 +00:00
Hans Wennborg 6ae05777b8 Asan use-after-scope: don't poison allocas if there were untraced lifetime intrinsics in the function (PR41481)
If there are any intrinsics that cannot be traced back to an alloca, we
might have missed the start of a variable's scope, leading to false
error reports if the variable is poisoned at function entry. Instead, if
there are some intrinsics that can't be traced, fail safe and don't
poison the variables in that function.

Differential revision: https://reviews.llvm.org/D60686

llvm-svn: 358478
2019-04-16 07:54:20 +00:00
Anton Afanasyev 6547d51458 Use native llvm JSON library for time profiler output
Summary: Replace plain json text output with llvm JSON library wrapper using.

Reviewers: takuto.ikuta, lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60609

llvm-svn: 358476
2019-04-16 06:35:07 +00:00