Commit Graph

1672 Commits

Author SHA1 Message Date
Manoj Gupta d536180fdc [AArch64]: add 'a' inline asm operand modifier.
Summary:
This is used in the Linux kernel, and effectively just means "print an
address". This brings back r193593.

Reviewed by: Renato Golin

Reviewers: t.p.northover, rengolin, richard.barton.arm, kristof.beyls

Subscribers: aemerson, javed.absar, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D33558

llvm-svn: 303901
2017-05-25 19:07:57 +00:00
Francis Visoiu Mistrih 1c98701e57 AsmPrinter: mark the beginning and the end of a function in verbose mode
llvm-svn: 303690
2017-05-23 21:22:16 +00:00
Florian Hahn abb4218b98 [AArch64] Make instruction fusion more aggressive.
Summary:
This patch makes instruction fusion more aggressive by
* adding artificial edges between the successors of FirstSU and
  SecondSU, similar to BaseMemOpClusterMutation::clusterNeighboringMemOps.
* updating PostGenericScheduler::tryCandidate to keep clusters together,
   similar to GenericScheduler::tryCandidate.

This change increases the number of AES instruction pairs generated on
 Cortex-A57 and Cortex-A72. This doesn't change code at all in
 most benchmarks or general code, but we've seen improvement on kernels
 using AESE/AESMC and AESD/AESIMC. 

Reviewers: evandro, kristof.beyls, t.p.northover, silviu.baranga, atrick, rengolin, MatzeB

Reviewed By: evandro

Subscribers: aemerson, rengolin, MatzeB, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33230

llvm-svn: 303618
2017-05-23 09:33:34 +00:00
Akira Hatanaka e8ae3346a3 [AArch64] Fix PRR33100.
This commit fixes a bug introduced in r301019 where optimizeLogicalImm
would replace a logical node's immediate operand that was CSE'd and
was also an operand of another node.

This commit fixes the bug by replacing the logical node instead of its
immediate operand.

rdar://problem/32295276

llvm-svn: 303607
2017-05-23 06:08:37 +00:00
Dehao Chen 00549e47bd update the test that should have been updated in r303292. (NFC)
llvm-svn: 303298
2017-05-17 20:44:08 +00:00
Dehao Chen 02828a93e8 Only enable LiveRangeShrink for x86.
Summary: Moving LiveRangeShrink to x86 as this pass is mostly useful for archtectures with great register pressure.

Reviewers: MatzeB, qcolombet

Reviewed By: qcolombet

Subscribers: jholewinski, jyknight, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33294

llvm-svn: 303292
2017-05-17 20:18:13 +00:00
Amara Emerson c9916d7e97 Re-commit r302678, fixing PR33053.
The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions
which didn't have a lowering.

llvm-svn: 303211
2017-05-16 21:29:22 +00:00
Nirav Dave da8f221273 Elide stores which are overwritten without being observed.
Summary:
In SelectionDAG, when a store is immediately chained to another store
to the same address, elide the first store as it has no observable
effects. This is causes small improvements dealing with intrinsics
lowered to stores.

Test notes:

* Many testcases overwrite store addresses multiple times and needed
  minor changes, mainly making stores volatile to prevent the
  optimization from optimizing the test away.

* Many X86 test cases optimized out instructions associated with
  associated with va_start.

* Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has
  dependencies to check and can probably be removed and potentially
  replaced with another test.

Reviewers: rnk, john.brawn

Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33206

llvm-svn: 303198
2017-05-16 19:43:56 +00:00
Tim Northover 203c6f055d AArch64: use linker-private symbols for globals in MachO.
We don't use section-relative relocations on AArch64, so all symbols must be at
least visible to the linker (i.e. properly global or l_whatever, but not
L_whatever).

llvm-svn: 303118
2017-05-15 21:51:38 +00:00
Hans Wennborg bd6e9e77a7 Revert r302678 "[AArch64] Enable use of reduction intrinsics."
This caused PR33053.

Original commit message:

> The new experimental reduction intrinsics can now be used, so I'm enabling this
> for AArch64. We will need this for SVE anyway, so it makes sense to do this for
> NEON reductions as well.
>
> The existing code to match shufflevector patterns are replaced with a direct
> lowering of the reductions to AArch64-specific nodes. Tests updated with the
> new, simpler, representation.
>
> Differential Revision: https://reviews.llvm.org/D32247

llvm-svn: 303115
2017-05-15 20:59:32 +00:00
Florian Hahn af91e7e6d2 [AArch64] Enable FeatureFuseAES on Cortex-A72.
This patch enables fusing dependent AESE/AESMC and AESD/AESIMC
instruction pairs on Cortex-A72, as recommended in the Software
Optimization Guide, section 4.10.

llvm-svn: 303073
2017-05-15 15:15:22 +00:00
Dehao Chen 65dd23e273 Add LiveRangeShrink pass to shrink live range within BB.
Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB.

Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb

Reviewed By: MatzeB, andreadb

Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32563

llvm-svn: 302938
2017-05-12 19:29:27 +00:00
Chad Rosier aeffffdb44 [AArch64][MachineCombine] Fold FNMUL+FSUB -> FNMADD.
Differential Revision: http://reviews.llvm.org/D33101.

llvm-svn: 302822
2017-05-11 20:07:24 +00:00
Quentin Colombet 307e29124c [AArch64][RegisterBankInfo] Change the default mapping of fp stores.
For stores, check if the stored value is defined by a floating point
instruction and if yes, we return a default mapping with FPR instead
of GPR.

llvm-svn: 302679
2017-05-10 15:19:41 +00:00
Amara Emerson 816542ceb3 [AArch64] Enable use of reduction intrinsics.
The new experimental reduction intrinsics can now be used, so I'm enabling this
for AArch64. We will need this for SVE anyway, so it makes sense to do this for
NEON reductions as well.

The existing code to match shufflevector patterns are replaced with a direct
lowering of the reductions to AArch64-specific nodes. Tests updated with the
new, simpler, representation.

Differential Revision: https://reviews.llvm.org/D32247

llvm-svn: 302678
2017-05-10 15:15:38 +00:00
Serge Pavlov d526b13e61 Add extra operand to CALLSEQ_START to keep frame part set up previously
Using arguments with attribute inalloca creates problems for verification
of machine representation. This attribute instructs the backend that the
argument is prepared in stack prior to  CALLSEQ_START..CALLSEQ_END
sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size
stored in CALLSEQ_START in this case does not count the size of this
argument. However CALLSEQ_END still keeps total frame size, as caller can
be responsible for cleanup of entire frame. So CALLSEQ_START and
CALLSEQ_END keep different frame size and the difference is treated by
MachineVerifier as stack error. Currently there is no way to distinguish
this case from actual errors.

This patch adds additional argument to CALLSEQ_START and its
target-specific counterparts to keep size of stack that is set up prior to
the call frame sequence. This argument allows MachineVerifier to calculate
actual frame size associated with frame setup instruction and correctly
process the case of inalloca arguments.

The changes made by the patch are:
- Frame setup instructions get the second mandatory argument. It
  affects all targets that use frame pseudo instructions and touched many
  files although the changes are uniform.
- Access to frame properties are implemented using special instructions
  rather than calls getOperand(N).getImm(). For X86 and ARM such
  replacement was made previously.
- Changes that reflect appearance of additional argument of frame setup
  instruction. These involve proper instruction initialization and
  methods that access instruction arguments.
- MachineVerifier retrieves frame size using method, which reports sum of
  frame parts initialized inside frame instruction pair and outside it.

The patch implements approach proposed by Quentin Colombet in
https://bugs.llvm.org/show_bug.cgi?id=27481#c1.
It fixes 9 tests failed with machine verifier enabled and listed
in PR27481.

Differential Revision: https://reviews.llvm.org/D32394

llvm-svn: 302527
2017-05-09 13:35:13 +00:00
Quentin Colombet 55a72b3b05 [AArch64][RegisterBankInfo] Change the default mapping of fp loads.
This fixes PR32550, in a way that does not imply running the greedy
mode at O0.

The fix consists in checking if a load is used by any floating point
instruction and if yes, we return a default mapping with FPR instead
of GPR.

llvm-svn: 302453
2017-05-08 18:16:31 +00:00
Matthias Braun 8940114f61 MIParser/MIRPrinter: Compute block successors if not explicitely specified
- MIParser: If the successor list is not specified successors will be
  added based on basic block operands in the block and possible
  fallthrough.

- MIRPrinter: Adds a new `simplify-mir` option, with that option set:
  Skip printing of block successor lists in cases where the
  parser is guaranteed to reconstruct it. This means we still print the
  list if some successor cannot be determined (happens for example for
  jump tables), if the successor order changes or branch probabilities
  being unequal.

Differential Revision: https://reviews.llvm.org/D31262

llvm-svn: 302289
2017-05-05 21:09:30 +00:00
Aditya Nandakumar 117b667bd9 [GISel]: Add support to translate ConstantVectors
Reviewed by Quentin
https://reviews.llvm.org/D32814

llvm-svn: 302196
2017-05-04 21:43:12 +00:00
Adrian Prantl defc99a94e Cleanup tests to not share a DISubprogram between multiple Functions.
rdar://problem/31926379

llvm-svn: 302166
2017-05-04 16:24:31 +00:00
Chad Rosier 84a238dd62 [DAGCombine] Transform (fadd A, (fmul B, -2.0)) -> (fsub A, (fadd B, B)).
Differential Revision: http://reviews.llvm.org/D32596

llvm-svn: 302153
2017-05-04 14:14:44 +00:00
Dean Michael Berris bdfe90050b [XRay] Create an Index of sleds per function
Summary:
This change adds a new section to the xray-instrumented binary that
stores an index into ranges of the instrumentation map, where sleds
associated with the same function can be accessed as an array. At
runtime, we can get access to this index by function ID offset allowing
for selective patching and unpatching by function ID.

Each entry in this new section (xray_fn_idx) will include two pointers
indicating the start and one past the end of the sleds associated with
the same function. These entries will be 16 bytes long on x86 and
aarch64. On arm, we align to 16 bytes anyway so the runtime has to take
that into consideration.

__{start,stop}_xray_fn_idx will be the symbols that the runtime will
look for when we implement the selective patching/unpatching by function
id APIs. Because XRay synthesizes the function id's in a monotonically
increasing manner at runtime now, implementations (and users) can use
this table to look up the sleds associated with a specific function.
This is useful in implementations that want to do things like:

  - Implement coverage mode for functions by patching everything
    pre-main, then as functions are encountered, the installed handler
    can unpatch the function that's been encountered after recording
    that it's been called.
  - Do "learning mode", so that the implementation can figure out some
    statistical information about function calls by function id for a
    time being, and then determine which functions are worth
    uninstrumenting at runtime.
  - Do "selective instrumentation" where an implementation can
    specifically instrument only certain function id's at runtime
    (either based on some external data, or through some other
    heuristics) instead of patching all the instrumented functions at
    runtime.

Reviewers: dblaikie, echristo, chandlerc, javed.absar

Subscribers: pelikan, aemerson, kpw, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D32693

llvm-svn: 302109
2017-05-04 03:37:57 +00:00
Joel Jones 6513405735 [AArch64] ILP32 Backend Relocation Support
Remove "_NC" suffix and semantics from TLSDESC_LD{64,32}_LO12 and
  TLSDESC_ADD_LO12 relocations
Rearrange ordering in AArch64.def to follow relocation encoding
Fix name:
  R_AARCH64_P32_LD64_GOT_LO12_NC => R_AARCH64_P32_LD32_GOT_LO12_NC
Add support for several "TLS", "TLSGD", and "TLSLD" relocations for
  ILP32
Fix return values from isNonILP32reloc
Add implementations for
  R_AARCH64_ADR_PREL_PG_HI21_NC, R_AARCH64_P32_LD32_GOT_LO12_NC,
  R_AARCH64_P32_TLSIE_LD32_GOTTPREL_LO12_NC,
  R_AARCH64_P32_TLSDESC_LD32_LO12, R_AARCH64_LD64_GOT_LO12_NC,
  *TLSLD_LDST128_DTPREL_LO12, *TLSLD_LDST128_DTPREL_LO12_NC,
  *TLSLE_LDST128_TPREL_LO12, *TLSLE_LDST128_TPREL_LO12_NC
Modify error messages to give name of equivalent relocation in the
  ABI not being used, along with better checking for non-existent
  requested relocations.
Added assembler support for "pg_hi21_nc"
Relocation definitions added without implementations:
  R_AARCH64_P32_TLSDESC_ADR_PREL21, R_AARCH64_P32_TLSGD_ADR_PREL21,
  R_AARCH64_P32_TLSGD_ADD_LO12_NC, R_AARCH64_P32_TLSLD_ADR_PREL21, 
  R_AARCH64_P32_TLSLD_ADR_PAGE21, R_AARCH64_P32_TLSLD_ADD_LO12_NC,
  R_AARCH64_P32_TLSLD_LD_PREL19, R_AARCH64_P32_TLSDESC_LD_PREL19,
  R_AARCH64_P32_TLSGD_ADR_PAGE21, R_AARCH64_P32_TLS_DTPREL,
  R_AARCH64_P32_TLS_DTPMOD, R_AARCH64_P32_TLS_TPREL,
  R_AARCH64_P32_TLSDESC
Fix encoding:
  R_AARCH64_P32_TLSDESC_ADR_PAGE21

Reviewers: Peter Smith

Patch by: Joel Jones (jjones@cavium.com)

Differential Revision: https://reviews.llvm.org/D32072

llvm-svn: 301980
2017-05-02 22:01:48 +00:00
Zachary Turner a0aae2757d Revert "Remove "_NC" suffix and semantics from TLSDESC_LD{64,32}_LO12 and"
This reverts commit c08155afc5d3230792da2ad30a046a8617735a73.

This is causing undefined symbol errors with some of the constants.

llvm-svn: 301944
2017-05-02 17:51:27 +00:00
Joel Jones 705103e523 Remove "_NC" suffix and semantics from TLSDESC_LD{64,32}_LO12 and
TLSDESC_ADD_LO12 relocations
Rearrange ordering in AArch64.def to follow relocation encoding
Fix name:
  R_AARCH64_P32_LD64_GOT_LO12_NC => R_AARCH64_P32_LD32_GOT_LO12_NC
Add support for several "TLS", "TLSGD", and "TLSLD" relocations for
  ILP32
Fix return values from isNonILP32reloc
Add implementations for
  R_AARCH64_ADR_PREL_PG_HI21_NC, R_AARCH64_P32_LD32_GOT_LO12_NC,
  R_AARCH64_P32_TLSIE_LD32_GOTTPREL_LO12_NC,
  R_AARCH64_P32_TLSDESC_LD32_LO12, R_AARCH64_LD64_GOT_LO12_NC,
  *TLSLD_LDST128_DTPREL_LO12, *TLSLD_LDST128_DTPREL_LO12_NC,
  *TLSLE_LDST128_TPREL_LO12, *TLSLE_LDST128_TPREL_LO12_NC
Modify error messages to give name of equivalent relocation in the
  ABI not being used, along with better checking for non-existent
  requested relocations.
Added assembler support for "pg_hi21_nc"
Relocation definitions added without implementations:
  R_AARCH64_P32_TLSDESC_ADR_PREL21, R_AARCH64_P32_TLSGD_ADR_PREL21,
  R_AARCH64_P32_TLSGD_ADD_LO12_NC, R_AARCH64_P32_TLSLD_ADR_PREL21, 
  R_AARCH64_P32_TLSLD_ADR_PAGE21, R_AARCH64_P32_TLSLD_ADD_LO12_NC,
  R_AARCH64_P32_TLSLD_LD_PREL19, R_AARCH64_P32_TLSDESC_LD_PREL19,
  R_AARCH64_P32_TLSGD_ADR_PAGE21, R_AARCH64_P32_TLS_DTPREL,
  R_AARCH64_P32_TLS_DTPMOD, R_AARCH64_P32_TLS_TPREL,
  R_AARCH64_P32_TLSDESC
Fix encoding:
  R_AARCH64_P32_TLSDESC_ADR_PAGE21

Reviewers: Peter Smith

Patch by: Joel Jones (jjones@cavium.com)

Differential Revision: https://reviews.llvm.org/D32072

llvm-svn: 301939
2017-05-02 17:14:31 +00:00
Sanjoy Das ba0daee6b2 [StackMaps] Increase the size of the "location size" field
Summary:
In some cases LLVM (especially the SLP vectorizer) will create vectors
that are 256 bytes (or larger).  Given that this is intentional[0] is
likely to get more common, this patch updates the StackMap binary
format to deal with the spill locations for said vectors.

This change also bumps the stack map version from 2 to 3.

[0]: https://reviews.llvm.org/D32533#738350

Reviewers: reames, kavon, skatkov, javed.absar

Subscribers: mcrosier, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D32629

llvm-svn: 301615
2017-04-28 04:48:42 +00:00
Jonathan Roelofs 1233fe5ac3 Fix testcase: s/CHECKNEXT/CHECK-NEXT/
llvm-svn: 301098
2017-04-22 23:43:44 +00:00
Daniel Sanders 3016d3c6c9 [globalisel][tablegen] Fix PR32733 by checking which instruction operands belong to.
canMutate() was returning true when the operands were all in the same order as
the matched instruction. However, it wasn't checking the operands were actually
on that instruction. This worked when we could only match a single instruction
but the addition of nested instruction matching led to cases where the operands
could be split across multiple instructions. canMutate() now returns false if
operands belong to instructions other than the root of the match.

llvm-svn: 301077
2017-04-22 14:31:28 +00:00
Matthias Braun d78597ec08 AArch64FrameLowering: Check if the ExtraCSSpill register is actually unused
The code assumed that when saving an additional CSR register
(ExtraCSSpill==true) we would have a free register throughout the
function. This was not true if this CSR register is also used to pass
values as in the swiftself case.

rdar://31451816

llvm-svn: 301057
2017-04-21 22:42:08 +00:00
Tim Northover 1efaa3a88f AArch64: add test for "fence singlethread"
Forgot a git add yesterday.

llvm-svn: 301037
2017-04-21 20:36:08 +00:00
Akira Hatanaka 22e839f4b2 [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

This recommits r300932 and r300930, which was causing dag-combine to
loop forever. The problem was that optimizeLogicalImm was returning
true even when there was no change to the immediate node (which happened
when the immediate was all zeros or ones), which caused dag-combine to
push and pop the same node to the work list over and over again without
making any progress.

This commit fixes the bug by returning false early in optimizeLogicalImm
if the immediate is all zeros or ones. Also, it changes the code to
compare the immediate with 0 or Mask rather than calling
countPopulation.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

llvm-svn: 301019
2017-04-21 18:53:12 +00:00
Akira Hatanaka 78ccba6a20 Revert r300932 and r300930.
It seems that r300930 was creating an infinite loop in dag-combine when
compling the following file:

MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c

llvm-svn: 300940
2017-04-21 01:31:50 +00:00
Akira Hatanaka 19077aaee0 [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

This recommits r300913, which broke bots because I didn't fix a call to
ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of
TargetLoweringOpt and TargetLowering.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

llvm-svn: 300930
2017-04-21 00:05:16 +00:00
Akira Hatanaka 7b06cebe73 Revert "[AArch64] Improve code generation for logical instructions taking"
This reverts r300913.

This broke bots.

llvm-svn: 300916
2017-04-20 23:03:30 +00:00
Akira Hatanaka e327f09832 [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

llvm-svn: 300913
2017-04-20 22:47:56 +00:00
Kristof Beyls 0f36e68f62 [GlobalISel] Support vector-of-pointers in LLT
This fixes PR32471.

As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.

I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality.  Once we gain more experience on the use of LLT, we can
of course reconsider this direction.

Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.

llvm-svn: 300664
2017-04-19 07:23:57 +00:00
Nirav Dave 855ef45602 [DAG] Improve store merge candidate pruning.
Remove non-consecutive stores from store merge candidate search as
they cannot be merged and will prevent us from finding subsequent
mergeable store cases.

Reviewers: jyknight, bogner, javed.absar, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32086

llvm-svn: 300561
2017-04-18 15:36:34 +00:00
Kristof Beyls a4e79cca77 Revert "[GlobalISel] Support vector-of-pointers in LLT"
This reverts r300535 and r300537.
The newly added tests in test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll
produces slightly different code between LLVM versions being built with different compilers.
E.g., dependent on the compiler LLVM is built with, either one of the following
can be produced:

remark: <unknown>:0:0: unable to legalize instruction: %vreg0<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg2; (in function: vector_of_pointers_extractelement)
remark: <unknown>:0:0: unable to legalize instruction: %vreg2<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg0; (in function: vector_of_pointers_extractelement)

Non-determinism like this is clearly a bad thing, so reverting this until
I can find and fix the root cause of the non-determinism.

llvm-svn: 300538
2017-04-18 09:26:36 +00:00
Kristof Beyls fb73eb0324 [GlobalISel] Support vector-of-pointers in LLT
This fixes PR32471.

As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.

I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality. Once we gain more experience on the use of LLT, we can
of course reconsider this direction.

Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.

llvm-svn: 300535
2017-04-18 08:12:45 +00:00
Tim Northover 46e36f0953 AArch64: put nonlazybind special handling behind a flag for now.
It's basically a terrible idea anyway but objc_msgSend gets emitted like that.
We can decide on a better way to deal with it in the unlikely event that anyone
actually uses it.

llvm-svn: 300474
2017-04-17 18:18:47 +00:00
Tim Northover 879a0b2e1b AArch64: support nonlazybind
It's almost certainly not a good idea to actually use it in most cases (there's
a pretty large code size overhead on AArch64), but we can't do those
experiments until it's supported.

llvm-svn: 300462
2017-04-17 17:27:56 +00:00
Adam Nemet c5779460f4 [AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16
This further improves Ahmed's change in rL299482.  See the new comment for the
rationale.

The patch recovers most of the regression for bzip2 after D31965. We're down
to +2.68% from +6.97%.

Differential Revision: https://reviews.llvm.org/D32028

llvm-svn: 300276
2017-04-13 23:32:47 +00:00
Volkan Keles 64ad85f8ba [GlobalISel] LegalizerInfo: Enable legalization of non-power-of-2 types
Summary: Legalize only if the type is marked as Legal or Custom. If not, return Unsupported as LegalizerHelper is not able to handle non-power-of-2 types right now.

Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, kristof.beyls, javed.absar, ab

Reviewed By: kristof.beyls, ab

Subscribers: dberris, rovka, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D31711

llvm-svn: 299929
2017-04-11 10:10:14 +00:00
Kyle Butt ee51a20164 CodeGen: BlockPlacement: Minor probability changes.
Qin may be large, and Succ may be more frequent than BB. Take these both into
account when deciding if tail-duplication is profitable.

llvm-svn: 299891
2017-04-10 22:28:18 +00:00
Matt Arsenault f10061ec70 Add address space mangling to lifetime intrinsics
In preparation for allowing allocas to have non-0 addrspace.

llvm-svn: 299876
2017-04-10 20:18:21 +00:00
Aditya Nandakumar eb80a51b52 [GlobalISel]: Fix bug where we can report GISelFailure on erased instructions
The original instruction might get legalized and erased and expanded
into intermediate instructions and the intermediate instructions might
fail legalization. This end up in reporting GISelFailure on the erased
instruction.
Instead report GISelFailure on the intermediate instruction which failed
legalization.

Reviewed by: ab

llvm-svn: 299802
2017-04-07 21:49:30 +00:00
Petr Hosek c3a9e6db38 [AArch64] Allow global register asm("x18") or asm("w18") under -ffixed-x18
When using -ffixed-x18, the x18 (or w18) register can safely be used
with the "global register variable" GCC extension, but the backend
fails to recognize it.

Patch by Roland McGrath.

Differential Revision: https://reviews.llvm.org/D31793

llvm-svn: 299799
2017-04-07 20:41:58 +00:00
Eli Friedman 5fba1e53f2 Turn on -addr-sink-using-gep by default.
The new codepath has been in the tree for years, and there isn't any
reason to use two codepaths here.

Differential Revision: https://reviews.llvm.org/D30596

llvm-svn: 299723
2017-04-06 22:42:18 +00:00
Adam Nemet d5ffdd3605 [DAGCombine] Support FMF contract in fused multiple-and-sub too
This is a follow-on to r299096 which added support for fmadd.

Subtract does not have the case where with two multiply operands we commute in
order to fuse with the multiply with the fewer uses.

llvm-svn: 299572
2017-04-05 17:58:48 +00:00
Ahmed Bougacha d3c03a5ddd [AArch64] Avoid partial register deps on insertelt of load into lane 0.
This improves upon r246462: that prevented FMOVs from being emitted
for the cross-class INSERT_SUBREGs by disabling the formation of
INSERT_SUBREGs of LOAD.  But the ld1.s that we started selecting
caused us to introduce partial dependencies on the vector register.

Avoid that by using SCALAR_TO_VECTOR: it's a first-class citizen that
is folded away by many patterns, including the scalar LDRS that we
want in this case.

Credit goes to Adam for finding the issue!

llvm-svn: 299482
2017-04-04 22:55:53 +00:00
Petr Hosek 9eb0a1e09b [AArch64][Fuchsia] Allow -mcmodel=kernel for --target=aarch64-fuchsia
This mode is just like -mcmodel=small except that it moves the
thread pointer from TPIDR_EL0 to TPIDR_EL1.

Patch by Roland McGrath.

Differential Revision: https://reviews.llvm.org/D31624

llvm-svn: 299462
2017-04-04 19:51:53 +00:00
Daniel Sanders bee5739a7c [tablegen][globalisel] Add support for nested instruction matching.
Summary:
Lift the restrictions that prevented the tree walking introduced in the
previous change and add support for patterns like:
  (G_ADD (G_MUL (G_SEXT $src1), (G_SEXT $src2)), $src3) -> SMADDWrrr $dst, $src1, $src2, $src3
Also adds support for G_SEXT and G_ZEXT to support these cases.

One particular aspect of this that I should draw attention to is that I've
tried to be overly conservative in determining the safety of matches that
involve non-adjacent instructions and multiple basic blocks. This is intended
to be used as a cheap initial check and we may add a more expensive check in
the future. The current rules are:
* Reject if any instruction may load/store (we'd need to check for intervening
  memory operations.
* Reject if any instruction has implicit operands.
* Reject if any instruction has unmodelled side-effects.
See isObviouslySafeToFold().

Reviewers: t.p.northover, javed.absar, qcolombet, aditya_nandakumar, ab, rovka

Reviewed By: ab

Subscribers: igorb, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30539

llvm-svn: 299430
2017-04-04 13:25:23 +00:00
Jun Bum Lim dee5565869 [CodeGenPrep] move aarch64-type-promotion to CGP
Summary:
Move the aarch64-type-promotion pass within the existing type promotion framework in CGP.
This change also support forking sexts when a new sext is required for promotion.
Note that change is based on D27853 and I am submitting this out early to provide a better idea on D27853.

Reviewers: jmolloy, mcrosier, javed.absar, qcolombet

Reviewed By: qcolombet

Subscribers: llvm-commits, aemerson, rengolin, mcrosier

Differential Revision: https://reviews.llvm.org/D28680

llvm-svn: 299379
2017-04-03 19:20:07 +00:00
Quentin Colombet fc8f048c13 Revert "Localizer fun"
This reverts commit r299283.

Didn't intend to commit this :(

llvm-svn: 299287
2017-04-01 01:26:21 +00:00
Quentin Colombet 7f64318938 [RegBankSelect] Support REG_SEQUENCE for generic mapping
REG_SEQUENCE falls into the same category as COPY for operands mapping:
- They don't have MCInstrDesc with register constraints
- The input variable could use whatever register classes
- It is possible to have register class already assigned to the operands

In particular, given REG_SEQUENCE are always target specific because of
the subreg indices. Those indices must apply to the register class of
the definition of the REG_SEQUENCE and therefore, the target must set a
register class to that definition. As a result, the generic code can
always use that register class to derive a valid mapping for a
REG_SEQUENCE.

llvm-svn: 299285
2017-04-01 01:26:14 +00:00
Quentin Colombet 3c40b366c5 Localizer fun
WIP

llvm-svn: 299283
2017-04-01 01:21:28 +00:00
Balaram Makam 2aba753e84 [AArch64] Add new subtarget feature to fold LSL into address mode.
Summary:
This feature enables folding of logical shift operations of up to 3 places into addressing mode on Kryo and Falkor that have a fastpath LSL.

Reviewers: mcrosier, rengolin, t.p.northover

Subscribers: junbuml, gberry, llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D31113

llvm-svn: 299240
2017-03-31 18:16:53 +00:00
Adam Nemet edaec6de73 [DAGCombiner] Initial support for the fast-math flag contract
Now alternatively to the TargetOption.AllowFPOpFusion global flag, FMUL->FADD
can also use the per operation FMF to allow fusion.

The idea here is not to port everything to the new scheme (e.g. fused
multiply-and-sub will be ported later) but that this work all the way from
clang.

The transformation is conditionalized on *both* the FADD and the FMUL having
the FMF contract flag.

Differential Revision: https://reviews.llvm.org/D31169

llvm-svn: 299096
2017-03-30 18:53:04 +00:00
Ahmed Bougacha 6dd6082472 [CodeGen] Pass SDAG an ORE, and replace FastISel stats with remarks.
In the long-term, we want to replace statistics with something
finer-grained that lets us gather per-function data.
Remarks are that replacement.

Create an ORE instance in SelectionDAGISel, and pass it to
SelectionDAG.

SelectionDAG was used so that we can emit remarks from all
SelectionDAG-related code, including TargetLowering and DAGCombiner.
This isn't used in the current patch but Adam tells me he's interested
for the fp-contract combines.

Use the ORE instance to emit FastISel failures as remarks (instead of
the mix of dbgs() dumps and statistics that we currently have).

Eventually, we want to have an API that tells us whether remarks are
enabled (http://llvm.org/PR32352) so that we don't emit expensive
remarks (in this case, dumping IR) when it's not needed.  For now, use
'isEnabled' as a crude replacement.

This does mean that the replacement for '-fast-isel-verbose' is now
'-pass-remarks-missed=isel'.  Additionally, clang users also need to
enable remark diagnostics, using '-Rpass-missed=isel'.

This also removes '-fast-isel-verbose2': there are no static statistics
that we want to only enable in asserts builds, so we can always use
the remarks regardless of the build type.

Differential Revision: https://reviews.llvm.org/D31405

llvm-svn: 299093
2017-03-30 17:49:58 +00:00
Eric Christopher 69b191c628 Add a similar test for tailcall optimization as in r270287 for aarch64.
llvm-svn: 298952
2017-03-28 22:37:43 +00:00
Ahmed Bougacha f75782f9dc [GlobalISel][AArch64] Fold FI into LDR/STR ui addressing mode.
A majority of loads and stores at O0 access an alloca.

It's trivial to fold the G_FRAME_INDEX into the instruction; do it.

llvm-svn: 298864
2017-03-27 17:31:56 +00:00
Ahmed Bougacha 8a654085d0 [GlobalISel][AArch64] Fold G_GEP into LDR/STR ui addressing mode.
We're not to the point of supporting the load/store patterns yet
(because they extensively use PatFrags).

But in the meantime, we can implement some of the simplest addressing
modes.

llvm-svn: 298863
2017-03-27 17:31:52 +00:00
Ahmed Bougacha 85a66a6d9f [GlobalISel][AArch64] Select store of zero to WZR/XZR.
These occur very frequently, and are quite trivial to catch.

llvm-svn: 298862
2017-03-27 17:31:48 +00:00
Ahmed Bougacha 641cb203b6 [GlobalISel][AArch64] Select CBZ.
CBZ/CBNZ represent a substantial portion of all conditional branches.
Look through G_ICMP to select them.

We can't use tablegen yet because the existing patterns match an
AArch64ISD node.

llvm-svn: 298856
2017-03-27 16:35:31 +00:00
Ahmed Bougacha c1cbcee170 [GlobalISel][AArch64] Use proper constant types in test. NFC.
llvm-svn: 298854
2017-03-27 16:35:23 +00:00
Chad Rosier 862a41270f [AArch64] Mark mrs of TPIDR_EL0 (thread pointer) as not having side effects.
Among other things, this allows Machine LICM to hoist a costly 'mrs'
instruction from within a loop.

Differential Revision: http://reviews.llvm.org/D31151

llvm-svn: 298851
2017-03-27 15:52:38 +00:00
Aditya Nandakumar bc389badbc [GlobalISel]: Create VREGs for ConstantInt args
This patch changes the behavior of IRTranslating intrinsics where we
now create VREG + G_CONSTANT for ConstantInt values. We already do this
for FloatingPoint values. This makes it easier for the backends to
select code and it won't have to de-duplicate creation+selection of
constants.

Reviewed by: ab

llvm-svn: 298473
2017-03-22 01:16:39 +00:00
Ahmed Bougacha 15b3e8a93a [GlobalISel] Update DBG_VALUEs referencing DCE'd instructions.
Quentin points out that r298358 would cause us to emit different code
with debug info.  That's a big no-no; also erase the instructions that
only live thanks to DBG_VALUE users.

Adrian explained how this is an existing problem and an OK thing to do:
clang has allocas for all variables so shouldn't be affected at -O0, but
swift uses a bit of inlineasm to explicitly keep values live for the
purpose of debug info quality.  I'm not sure there is a better scheme.

llvm-svn: 298460
2017-03-21 23:42:54 +00:00
Ahmed Bougacha e8e1fa3a7c [GlobalISel] Don't translate br to layout successor.
MI can represent fallthrough to layout successor blocks, and our
post-isel representation uses that extensively.

We might as well use it too, to avoid translating and carrying along
unnecessary branches.

llvm-svn: 298459
2017-03-21 23:42:50 +00:00
Tim Northover dd4b9d6d7b GlobalISel: widen booleans by zero-extending to a byte.
A bool is represented by a single byte, which the ARM ABI requires to be either
0 or 1. So we cannot use G_ANYEXT when legalizing the type.

llvm-svn: 298439
2017-03-21 21:12:04 +00:00
Volkan Keles 044e003203 [GlobalISel] Fix shufflevector tests
clang-lld-x86_64-2stage fails because of the order
of the instructions. `CHECK-DAG` directives should
fix the problem.

llvm-svn: 298367
2017-03-21 13:12:59 +00:00
Volkan Keles 75bdc7690e [GlobalISel] Translate shufflevector
Reviewers: qcolombet, aditya_nandakumar, t.p.northover, javed.absar, ab, dsanders

Reviewed By: javed.absar

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30962

llvm-svn: 298347
2017-03-21 08:44:13 +00:00
Tim Northover 4340d64f91 GlobalISel: add implicit defs & uses when mutating an instruction.
Otherwise a scheduler might do bad things to the code we produce.

llvm-svn: 298311
2017-03-20 21:58:23 +00:00
Nirav Dave f5f0864ac2 Add test case for merging of chained stores of mismatched type.
llvm-svn: 298293
2017-03-20 19:48:22 +00:00
Tim Northover 89268b183f GlobalISel: allow quad-precision values to be dumped.
Otherwise the fallback path fails with an assertion on AAPCS AArch64 targets,
when "long double" is encountered.

llvm-svn: 298273
2017-03-20 16:52:08 +00:00
Diana Picus d79253a9f7 [GlobalISel] Use the correct calling conv for calls
This commit adds a parameter that lets us pass in the calling convention
of the call to CallLowering::lowerCall. This allows us to handle
situations where the calling convetion of the callee is different from
that of the caller.

Differential Revision: https://reviews.llvm.org/D31039

llvm-svn: 298254
2017-03-20 14:40:18 +00:00
Ahmed Bougacha 931904d777 [GlobalISel] Don't select trivially dead instructions.
Folding instructions when selecting can cause them to become dead.
Don't select these dead instructions (if they don't have other side
effects, and don't define physical registers).

Preserve existing tests by adding COPYs.

In some tests, the G_CONSTANT vregs never get constrained to a class:
the only use of the vreg was folded into another instruction, so the
G_CONSTANT, now dead, never gets selected.

llvm-svn: 298224
2017-03-19 16:13:00 +00:00
Ahmed Bougacha 48bcd22ce8 [GlobalISel][AArch64] Add DBG_VALUE select test. NFC.
llvm-svn: 298223
2017-03-19 16:12:53 +00:00
Ahmed Bougacha dcd416a4b9 [GlobalISel][AArch64] Split out cast select tests. NFC.
And remove some redundant bitcast tests.

Also split the test functions themselves: it makes it obvious to see
what's tested where and what isn't, it makes the tests much easier to
read and manually update, and, most importantly, it makes them almost
trivial to update using tooling.  Yes, it's obnoxiously verbose, but
said tooling helps upgrade to better MIR syntax whenever available.

llvm-svn: 298222
2017-03-19 16:12:51 +00:00
Jessica Paquette ea8cc09be0 [Outliner] Add outliner for AArch64
This commit adds the necessary target hooks for outlining in AArch64. It also
refactors the switch statement used in `getMemOpBaseRegImmOfsWidth` into a
more general function, `getMemOpInfo`. This allows the outliner to share that
code without copying and pasting it.

The AArch64 outliner can be run using -mllvm -enable-machine-outliner, as with
the X86-64 outliner.

The test for this pass verifies that the outliner does, in fact outline
functions, fixes up the stack accesses properly, and can correctly generate a
tail call. In the future, this test should be replaced with a MIR test, so that
we can properly test immediate offset overflows in fixed-up instructions.

llvm-svn: 298162
2017-03-17 22:26:55 +00:00
Jun Bum Lim 4230101def [CodeGenPrep]Restructure promoting Ext to form ExtLoad
Summary:
Instead of just looking for a load which is mergable with Ext to form ExtLoad, trying to promote Exts as long as the cost is acceptable. This change is not a NFC as it continue promoting Exts even after finding a load during promotions; the change in arm64-codegen-prepare-extload.ll described in 2.b might show the case.
This change was motivated from D26524.  Based on this change, I will move the transformation performed in aarch64-type-promotion into CGP.

Reviewers: jmolloy, qcolombet, mcrosier, javed.absar

Reviewed By: qcolombet

Subscribers: rengolin, llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D27853

llvm-svn: 298114
2017-03-17 19:05:21 +00:00
Chad Rosier a69dcb6b66 [AArch64] Use alias analysis in the load/store optimization pass.
This allows the optimization to rearrange loads and stores more aggressively.

Differential Revision: http://reviews.llvm.org/D30903

llvm-svn: 298092
2017-03-17 14:19:55 +00:00
Daniel Sanders 16846764d0 [globalisel] Correct one more simple immediate that should be a ConstantInt.
llvm-svn: 297979
2017-03-16 19:59:19 +00:00
Daniel Sanders 0e64202871 [globalisel] Correct G_CONSTANT path of selectArithImmed()
Earlier stages of GlobalISel always use ConstantInt in G_CONSTANT so that's
what we should check for.

This fixes a crash introduced in r297782.

llvm-svn: 297968
2017-03-16 18:04:50 +00:00
Ahmed Bougacha 2fb8030748 [GlobalISel] Avoid translating synthetic constants to new G_CONSTANTS.
Currently, we create a G_CONSTANT for every "synthetic" integer
constant operand (for instance, for the G_GEP offset).
Instead, share the G_CONSTANTs we might have created by going through
the ValueToVReg machinery.

When we're emitting synthetic constants, we do need to get Constants from
the context.  One could argue that we shouldn't modify the context at
all (for instance, this means that we're going to use a tad more memory
if the constant wasn't used elsewhere), but constants are mostly
harmless.  We currently do this for extractvalue and all.

For constant fcmp, this does mean we'll emit an extra COPY, which is not
necessarily more optimal than an extra materialized constant.
But that preserves the current intended design of uniqued G_CONSTANTs,
and the rematerialization problem exists elsewhere and should be
resolved with a single coherent solution.

llvm-svn: 297875
2017-03-15 19:21:11 +00:00
Ahmed Bougacha 62cd73d989 [GlobalISel][AArch64] Select ADDXri.
We're now able to select ADDWri thanks to the new complex pattern
support.  Extend that to ADDXri.

llvm-svn: 297874
2017-03-15 19:20:59 +00:00
Ahmed Bougacha 07f247b6c2 [GlobalISel] Insert translated switch icmp blocks after switch parent.
Now that we preserve the IR layout, we would end up with all the newly
synthesized switch comparison blocks at the end of the function.
Instead, use a hopefully more reasonable layout, with the comparison
blocks immediately following the switch comparison blocks.

llvm-svn: 297869
2017-03-15 18:22:37 +00:00
Ahmed Bougacha a61c214f51 [GlobalISel] Preserve IR block layout.
It makes the output function layout more predictable;  the layout has
an effect on performance, we don't want it to be at the mercy of the
translator's visitation order and such.
The predictable output is also easier to digest.

getOrCreateBB isn't appropriately named anymore, as it never needs to
create anything.  Rename it and extract the MBB creation logic out of it.

A couple tests were sensitive to the order. Update them.

llvm-svn: 297868
2017-03-15 18:22:33 +00:00
Ahmed Bougacha 1a6deeefe0 [GlobalISel][AArch64] Add back constant select tests. NFC.
More of r297856.

llvm-svn: 297859
2017-03-15 16:51:41 +00:00
Ahmed Bougacha d691cf731c [GlobalISel][AArch64] Use appropriate test function names. NFC.
These FP tests are on FPR, not GPR.  Don't lie in the name.

llvm-svn: 297857
2017-03-15 16:29:40 +00:00
Ahmed Bougacha 170778f0db [GlobalISel][AArch64] Split out select tests. NFC.
The test has grown enough to be annoying to navigate.
While there, Remove unnecessary RUNs, and cleanup a couple comments.

llvm-svn: 297856
2017-03-15 16:29:37 +00:00
Simon Pilgrim 018eedd9a5 [SelectionDAG] Support BUILD_VECTOR implicit truncation in SelectionDAG::ComputeNumSignBits (PR32273)
llvm-svn: 297852
2017-03-15 16:22:24 +00:00
Simon Pilgrim a5f332edd1 [SelectionDAG][AArch64] Add test case showing incorrect SelectionDAG::ComputeNumSignBits BUILD_VECTOR handling
Reduced from a mixture of PR32273 and David Green's test cases showing SelectionDAG::ComputeNumSignBits not correctly handling BUILD_VECTOR implicit truncation of inputs.

llvm-svn: 297847
2017-03-15 15:40:34 +00:00
Peter Collingbourne 7f6e2c97b8 Ensure that prefix data is preserved with subsections-via-symbols
On MachO platforms that use subsections-via-symbols dead code stripping will
drop prefix data. Unfortunately there is no great way to convey the relationship
between a function and its prefix data to the linker. We are forced to use a bit
of a hack: we give the prefix data it’s own symbol, and mark the actual function
entry an .alt_entry.

Patch by Moritz Angermann!

Differential Revision: https://reviews.llvm.org/D30770

llvm-svn: 297804
2017-03-15 04:18:16 +00:00
Volkan Keles 4862c63594 [GlobalISel] IRTranslator: Return the scalar for <1 x Ty> constant vectors
Summary:
<1 x Ty> is not a legal vector type in LLT, we shouldn’t build G_MERGE_VALUES
instruction for them.

Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, ab, javed.absar

Reviewed By: qcolombet

Subscribers: dberris, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D30948

llvm-svn: 297792
2017-03-14 23:45:06 +00:00
Daniel Sanders 8a4bae9993 [globalisel][tblgen] Add support for ComplexPatterns
Summary:
Adds a new kind of MachineOperand: MO_Placeholder.
This operand must not appear in the MIR and only exists as a way of
creating an 'uninitialized' operand until a matcher function overwrites it.

Depends on D30046, D29712

Reviewers: t.p.northover, ab, rovka, aditya_nandakumar, javed.absar, qcolombet

Reviewed By: qcolombet

Subscribers: dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D30089

llvm-svn: 297782
2017-03-14 21:32:08 +00:00
Nirav Dave 54e22f33d9 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements

    Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 297695
2017-03-14 00:34:14 +00:00
Volkan Keles 38a91a0de6 GlobalISel: Translate ConstantDataVector
Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, javed.absar, ab

Reviewed By: qcolombet, dsanders, ab

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30216

llvm-svn: 297670
2017-03-13 21:36:19 +00:00
Tim Northover 55e6f10d69 Revert "GlobalISel: move vector extract/insert inside generic opcode region."
I was writing against an earlier branch and Volkan had already fixed this.

llvm-svn: 297668
2017-03-13 21:25:10 +00:00
Tim Northover 0f1d32d557 GlobalISel: move vector extract/insert inside generic opcode region.
Otherwise they won't be legalized or selected, causing instruction selection to
fail horribly.

llvm-svn: 297666
2017-03-13 21:18:59 +00:00