Commit Graph

29326 Commits

Author SHA1 Message Date
Matt Arsenault 61ced4b87a GlobalISel: Handle 'n' inline asm constraint 2020-07-26 09:30:41 -04:00
Changpeng Fang 9162b70e51 DADCombiner: Don't simplify the token factor if the node's number of operands already exceeds TokenFactorInlineLimit
Summary:
  In parallelizeChainedStores, a TokenFactor was created with the size greater than 3000.
We found that DAGCombiner::visitTokenFactor will consume a huge amount of time on
such nodes. Since the number of operands already exceeds TokenFactorInlineLimit, we propose
to give up simplification with the consideration of compile time.

Reviewers:
  @spatel, @arsenm

Differential Revision:
  https://reviews.llvm.org/D84204
2020-07-25 21:20:59 -07:00
Eric Christopher 18975762c1 Fold StatepointBB into checks as it's only used from an NDEBUG or ASSERT
context fixing an unused variable warning.
2020-07-25 18:36:53 -07:00
Philip Reames 55dae9c20c [Statepoints] Style cleanup after 3da1a963 [NFC]
Just fixing a few minor stylistic issues.
2020-07-25 16:40:39 -07:00
Philip Reames 3da1a9634e [Statepoints] Support lowering gc relocations to virtual registers
(Disabled under flag for the moment)

This is part of a larger project wherein we are finally integrating lowering of gc live operands with the register allocator.  Today, we force spill all operands in SelectionDAG.  The code to do so is distinctly non-optimal.  The approach this patch is working towards is to instead lower the relocations directly into the MI form, and let the register allocator pick which ones get spilled and which stack slots they get spilled to.  In terms of performance, the later part is actually more important as it avoids redundant shuffling of values between stack slots.

This particular change adds ISEL support to produce the variadic def STATEPOINT form required by the above.  In particular, the first N are lowered to variadic tied def/use pairs.  So new statepoint looks like this:
reloc1,reloc2,... = STATEPOINT ..., base1, derived1<tied-def0>, base2, derived2<tied-def1>, ...

N is limited by the maximal number of tied registers machine instruction can have (15 at the moment).

The current patch is restricted to handling relocations within a single basic block.  Cross block relocations (e.g. invokes) are handled via the legacy mechanism.  This restriction will be relaxed in future patches.

Patch By: dantrushin
Differential Revision: https://reviews.llvm.org/D81648
2020-07-25 14:26:05 -07:00
Matt Arsenault 4b53072ee5 GlobalISel: Define mulfix/divfix opcodes
The full expansion involves the funnel shifts, which depend on another
patch to expand those.
2020-07-24 20:02:20 -04:00
Nicolai Hähnle 5934df0c9a MachineBasicBlock: add printName method
Common up some existing MBB name printing logic into a single place.
Note that basic block dumping now prints the same set of attributes as
the MIRPrinter.

Change-Id: I8f022bbd922e831bc96d63143d7472c03282530b

Differential Revision: https://reviews.llvm.org/D83253
2020-07-24 18:18:09 +02:00
Djordje Todorovic 6371a0a00e [DWARF][EntryValues] Emit GNU extensions in the case of DWARF 4 + SCE
Emit DWARF 5 call-site symbols even though DWARF 4 is set,
only in the case of LLDB tuning.

This patch addresses PR46643.

Differential Revision: https://reviews.llvm.org/D83463
2020-07-24 14:33:57 +02:00
Simon Pilgrim 0128b9505c Revert rG5dd566b7c7b78bd- "PassManager.h - remove unnecessary Function.h/Module.h includes. NFCI."
This reverts commit 5dd566b7c7.

Causing some buildbot failures that I'm not seeing on MSVC builds.
2020-07-24 13:02:33 +01:00
Simon Pilgrim 5dd566b7c7 PassManager.h - remove unnecessary Function.h/Module.h includes. NFCI.
PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list.

This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.
2020-07-24 12:40:50 +01:00
Djordje Todorovic cbb3571b0d [DWARF] Avoid entry_values production for SCE
SONY debugger does not prefer debug entry values feature, so
the plan is to avoid production of the entry values
by default when the tuning is SCE debugger.

The feature still can be enabled with the -debug-entry-values
option for the testing/development purposes.

This patch addresses PR46643.

Differential Revision: https://reviews.llvm.org/D83462
2020-07-24 13:34:05 +02:00
Craig Topper 8131e19064 [LegalizeTypes] Teach DAGTypeLegalizer::GenWidenVectorLoads to pad with undef if needed when concatenating small or loads to match a larger load
In the included test case the align 16 allowed the v23f32 load to handled as load v16f32, load v4f32, and load v4f32(one element not used). These loads all need to be concatenated together into a final vector. In this case we tried to concatenate the two v4f32 loads to match the type of the v16f32 load so we could do a second concat_vectors, but those loads alone only add up to v8f32. So we need to two v4f32 undefs to pad it.

It appears we've tried to hack around a similar issue in this code before by adding undef padding to loads in one of the earlier loops in this function. Originally in r147964 by padding all loads narrower than previous loads to the same size. Later modifed to only the last load in r293088. This patch removes that earlier code and just handles it on demand where we know we need it.

Fixes PR46820

Differential Revision: https://reviews.llvm.org/D84463
2020-07-23 19:02:03 -07:00
Matt Arsenault 891759db73 GlobalISel: Add scalarSameSizeAs LegalizeRule
Widen or narrow a type to a type with the same scalar size as
another. This can be used to force G_PTR_ADD/G_PTRMASK's scalar
operand to match the bitwidth of the pointer type. Use this to
disallow narrower types for G_PTRMASK.
2020-07-23 21:17:31 -04:00
Amara Emerson 645e7fc542 [GlobalISel] Use existing MIR builder instead of creating one in combiner. 2020-07-23 14:16:45 -07:00
Amara Emerson 3b10e42ba1 [AArch64][GlobalISel] Add post-legalize combine for sext(trunc(sextload)) -> trunc/copy
On AArch64 we generate redundant G_SEXTs or G_SEXT_INREGs because of this.

Differential Revision: https://reviews.llvm.org/D81993
2020-07-23 12:06:35 -07:00
Nikita Popov deb4bb2b3a [IR] Add min/max/abs intrinsics
This adds the llvm.abs(), llvm.umin(), llvm.umax(), llvm.smin(),
and llvm.smax() intrinsics specified in D81829. For SelectionDAG,
the ISD opcodes and all the legalization and lowering already exist,
so this just wires them up to the intrinsic in the SDAG builder and
adds rudimentary tests. For GlobalISel only the min/max intrinsics
are wired up, as llvm.abs() will require the addition of a G_ABS op,
and corresponding legalization support.

Differential Revision: https://reviews.llvm.org/D84125
2020-07-23 20:56:19 +02:00
Mircea Trofin 302e91baf4 [llvm][NFC] Add comments and common-case API to MachineBlockFrequencyInfo
Clarify the relation between a block's BlockFrequency and the
getEntryFreq() API, and added an API for the relatively common case of
finding a block's frequency relative to the entrypoint.

Added / moved some comments to header.

Differential Revision: https://reviews.llvm.org/D84357
2020-07-23 08:42:34 -07:00
Evgeny Leviant dc619f3d7a [CodeGen][TargetPassConfig] Add unreachable-mbb-elimination pass explicitly
Differential revision: https://reviews.llvm.org/D84228
2020-07-23 18:05:11 +03:00
Jay Foad b35833b84e [GlobalISel][AMDGPU] Legalize saturating add/subtract
Add support in LegalizerHelper for lowering G_SADDSAT etc. either
using add/subtract-with-overflow or using max/min instructions.

Enable this lowering for AMDGPU so it can be tested. The legalization
rules are still approximate and skips out on using the clamp bit to
treat these as legal, which has never been used before. This also
doesn't yet try to deal with expanding SALU cases.
2020-07-23 09:06:42 -04:00
Simon Pilgrim 1003113ef0 Fix -Wparentheses warning - add missing brackets around the entire assertion condition 2020-07-23 13:33:24 +01:00
Konstantin Schwarz 931488779f [GlobalISel][InlineAsm] Add register class ID to the flags of register input operands
Summary: We do this already for output operands, but missed it for (non-tied) input operands.

Reviewers: arsenm, Petar.Avramovic

Reviewed By: arsenm

Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, llvm-commits, kerbowa

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83763
2020-07-23 13:35:01 +02:00
Florian Hahn 6c9da995fc [ScheduleDAGRRList] Pacify overload mismatch in std::min.
On systems where size() doesn't return unsigned long, this leads to an
overloading mismatch. Convert the constant to whatever type is used for
Q.size() on the system.
2020-07-23 11:56:50 +01:00
Florian Hahn 2f8e6b5f3c [ScheduleDAGRRList] Limit number of candidates to explore.
Currently popFromQueueImpl iterates over all candidates to find the best
one. While the candidate queue is small, this is not a problem. But it
becomes a problem once the queue gets larger. For example, the snippet
below takes 330s to compile with llc -O0, but completes in 3s with this
patch.

define void @test(i4000000* %ptr) {
entry:
  store i4000000 0, i4000000* %ptr, align 4
  ret void
}

This patch limits the number of candidates to check to 1000. This limit
ensures that it never triggers for test-suite/SPEC2000/SPEC2006 on X86
and AArch64 with -O3, while still drastically limiting the compile-time
in case of very large queues.

It would be even better to use a binary heap to manage to queue
(D83335), but some heuristics change the score of a node in the queue
after another node has been scheduled. I plan to address this for
backends that use the MachineScheduler in the future, but that requires
a more careful evaluation. In the meantime, the limit should help users
impacted by this issue.

The patch includes a slightly smaller version of the motivating example
as test case, to guard against the issue.

Reviewers: efriedma, paquette, niravd

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84328
2020-07-23 11:35:33 +01:00
Sourabh Singh Tomar 8998f8ab66 [DebugInfo] Attempt to fix regression test failure after 59a76d957a
Test case `test/CodeGen/WebAssembly/stackified-debug.ll`
was failing due to malformed DwarfExpression.

This failure has been seen in lot of bots, for instance in:
http://lab.llvm.org:8011/builders/lld-x86_64-ubuntu-fast/builds/18794

: 'RUN: at line 1'
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/build/bin/llc
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/build/bin/FileCheck /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/test/CodeGen/WebAssembly/stackified-debug.ll
home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/test/CodeGen/WebAssembly/stackified-debug.ll:26:10: error: CHECK: expected string not found in input
 CHECK: .int16 4 # Loc expr size
         ^
<stdin>:34:2: note: scanning from here
 .int16 3 # Loc expr size

Differential Revision: https://reviews.llvm.org/D83560
2020-07-23 14:55:30 +05:30
Sourabh Singh Tomar 59a76d957a Re-apply:" Emit DW_OP_implicit_value for Floating point constants"
This patch was reverted in 9d2da6759b due to assertion failure seen
in `test/DebugInfo/Sparc/subreg.ll`. Assertion failure was happening
due to malformed/unhandeled DwarfExpression.

Differential Revision: https://reviews.llvm.org/D83560
2020-07-23 13:56:20 +05:30
Sourabh Singh Tomar 9d2da6759b Revert "[DebugInfo] Emit DW_OP_implicit_value for Floating point constants"
This reverts commit 6b55a95898.
Temporal revert due to a failing/assertion in test case in Sparc backend.
`test/DebugInfo/Sparc/subreg.ll`
Seen in lot of bots, for instance in:
`http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/24679`
2020-07-23 08:50:01 +05:30
Sourabh Singh Tomar 6b55a95898 [DebugInfo] Emit DW_OP_implicit_value for Floating point constants
Summary:
llvm is missing support for DW_OP_implicit_value operation.
DW_OP_implicit_value op is indispensable for cases such as
optimized out long double variables.

For intro refer: DWARFv5 Spec Pg: 40 2.6.1.1.4 Implicit Location Descriptions

Consider the following example:
```
int main() {
        long double ld = 3.14;
        printf("dummy\n");
        ld *= ld;
        return 0;
}
```
when compiled with tunk `clang` as
`clang test.c -g -O1` produces following location description
of variable `ld`:
```
DW_AT_location        (0x00000000:
                     [0x0000000000201691, 0x000000000020169b): DW_OP_constu 0xc8f5c28f5c28f800, DW_OP_stack_value, DW_OP_piece 0x8, DW_OP_constu 0x4000, DW_OP_stack_value, DW_OP_bit_piece 0x10 0x40, DW_OP_stack_value)
                  DW_AT_name    ("ld")
```
Here one may notice that this representation is incorrect(DWARF4
stack could only hold integers(and only up to the size of address)).
Here the variable size itself is `128` bit.
GDB and LLDB confirms this:
```
(gdb) p ld
$1 = <invalid float value>
(lldb) frame variable ld
(long double) ld = <extracting data from value failed>
```

GCC represents/uses DW_OP_implicit_value in these sort of situations.
Based on the discussion with Jakub Jelinek regarding GCC's motivation
for using this, I concluded that DW_OP_implicit_value is most appropriate
in this case.

Link: https://gcc.gnu.org/pipermail/gcc/2020-July/233057.html

GDB seems happy after this patch:(LLDB doesn't have support
for DW_OP_implicit_value)
```
(gdb) p ld
p ld
$1 = 3.14000000000000012434
```

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D83560
2020-07-23 07:21:49 +05:30
Christopher Tetreault ae35c09c34 [MVT] Fix getTypeForEVT for v64f16 and v128f16
Summary: These should have half float as the element type

Reviewers: cameron.mcinally, efriedma, sdesmalen, paulwalker-arm

Reviewed By: paulwalker-arm

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84211
2020-07-22 14:27:08 -07:00
David Blaikie 5c2451785d DebugInfo: Use debug_line.dwo for debug_macro.dwo
This is an alternative proposal to D81476 (and D82084) - the details were sufficiently confusing to me it seemed easier to write some code and see how it looks.

Reviewers: SouraVX

Differential Revision: https://reviews.llvm.org/D84278
2020-07-22 14:06:33 -07:00
Mircea Trofin 111a018b36 [llvm][NFC] const-ed MachineBlockFrequencyInfo::isIrrLoopHeader 2020-07-22 13:06:34 -07:00
Andrew Litteken bcbc6117b5 [CGP] Add Pass Dependencies
Add pass dependecies:
  - TargetTransformInfoWrapperPass
  - TargetPassConfig
  - LoopInfoWrapperPass
  - TargetLibraryInfoWrapperPass

To fix inconsistencies when passes are added to the pipeline.

Reviewers: efriedma, kmclaughlin, paquette

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84346
2020-07-22 12:02:53 -07:00
Simon Pilgrim 1c060aa988 DwarfCompileUnit.cpp - remove duplicate includes that already exist in DwarfCompileUnit.h. NFC.
Also remove DIE.h include from DwarfCompileUnit.h and replace with forward declarations.
2020-07-22 19:25:27 +01:00
Simon Pilgrim cd0a36bbda CodeViewDebug.cpp - remove duplicate includes that already exist in CodeViewDebug.h. NFC. 2020-07-22 19:25:27 +01:00
Matt Arsenault b98f902f18 GlobalISel: Restructure argument lowering loop in handleAssignments
This was structured in a way that implied every split argument is in
memory, or in registers. It is possible to pass an original argument
partially in registers, and partially in memory. Transpose the logic
here to only consider a single piece at a time. Every individual
CCValAssign should be treated independently, and any merge to original
value needs to be handled later.

This is in preparation for merging some preprocessing hacks in the
AMDGPU calling convention lowering into the generic code.

I'm also not sure what the correct behavior for memlocs where the
promoted size is larger than the original value. I've opted to clamp
the memory access size to not exceed the value register to avoid the
explicit trunc/extend/vector widen/vector extract instruction. This
happens for AMDGPU for i8 arguments that end up stack passed, which
are promoted to i16 (I think this is a preexisting DAG bug though, and
they should not really be promoted when in memory).
2020-07-22 13:31:11 -04:00
jasonliu b98b1700ef [XCOFF] Enable symbol alias for AIX
Summary:
AIX assembly's .set directive is not usable for aliasing purpose.
We need to use extra-label-at-defintion strategy to generate symbol
aliasing on AIX.

Reviewed By: DiggerLin, Xiangling_L

Differential Revision: https://reviews.llvm.org/D83252
2020-07-22 14:03:55 +00:00
Simon Pilgrim fa95688237 SelectionDAGBuilder.cpp - remove duplicate includes that already exist in SelectionDAGBuilder.h. NFC. 2020-07-22 14:19:41 +01:00
OCHyams ce6de3747b [DebugInfo] Drop location ranges for variables which exist entirely outside the variable's scope
Summary:
This patch reduces file size in debug builds by dropping variable locations a
debugger user will not see.

After building the debug entity history map we loop through it. For each
variable we look at each entry. If the entry opens a location range which does
not intersect any of the variable's scope's ranges then we mark it for removal.
After visiting the entries for each variable we also mark any clobbering
entries which will no longer be referenced for removal, and then finally erase
the marked entries. This all requires the ability to query the order of
instructions, so before this runs we number them.

Tests:
Added llvm/test/DebugInfo/X86/trim-var-locs.mir

Modified llvm/test/DebugInfo/COFF/register-variables.ll
  Branch folding merges the tails of if.then and if.else into if.else. Each
  blocks' debug-locations point to different scopes so when they're merged we
  can't use either. Because of this the variable 'c' ends up with a location
  range which doesn't cover any instructions in its scope; with the patch
  applied the location range is dropped and its flag changes to IsOptimizedOut.

Modified llvm/test/DebugInfo/X86/live-debug-variables.ll
Modified llvm/test/DebugInfo/ARM/PR26163.ll
  In both tests an out of scope location is now removed. The remaining location
  covers the entire scope of the variable allowing us to emit it as a single
  location.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D82129
2020-07-22 12:45:21 +01:00
Matt Arsenault bf6bc62d1f GlobalISel: Use Register and update comment physical register syntax 2020-07-21 19:11:57 -04:00
Amara Emerson 791544422a Revert "[AArch64][GlobalISel] Add post-legalize combine for sext_inreg(trunc(sextload)) -> copy"
This reverts commit 64eb3a4915.

It caused miscompiles with optimizations enabled. Reverting while I investigate.
2020-07-21 16:01:18 -07:00
Matt Arsenault 7cd8a0256d GlobalISel: Legalize G_FPOWI 2020-07-21 18:13:04 -04:00
Matt Arsenault 7941dc5041 GlobalISel: Translate llvm.powi intrinsic
There are a few questionable things about this intrinsic and existing
DAG implementation. For some reason the intrinsic hardcodes the second
operand to be scalar-only i32, and SelectionDAG builder makes a
legalization decision based on whether the operand is constant.
2020-07-21 18:13:04 -04:00
Matt Arsenault f659c44016 CodeGen: Add support for lowering byref attribute 2020-07-21 17:38:15 -04:00
Matt Arsenault 2fe0ea8261 DAG: Handle expanding strict_fsub into fneg and strict_fadd
The AMDGPU handling of f16 vectors is terrible still since it gets
scalarized even when the vector operation is legal.

The code is is essentially duplicated between the non-strict and
strict case. Apparently no other expansions are currently trying to do
this. This is mostly because I found the behavior of
getStrictFPOperationAction to be confusing. In the ARM case, it would
expand strict_fsub even though it shouldn't due to the later check. At
that point, the logic required to check for legality was more complex
than just duplicating the 2 instruction expansion.
2020-07-21 16:17:10 -04:00
Guozhi Wei 28759e9fcc [MBP] Use profile count to compute tail dup cost if it is available
Current tail duplication in machine block placement pass uses block frequency
information in cost model. But frequency number has only relative meaning
compared to other basic blocks in the same function. A large frequency number
doesn't mean it is hot and a small frequency number doesn't mean it is cold.

To overcome this problem, this patch uses profile count in cost model if it's
available. So we can tail duplicate real hot basic blocks.

Differential Revision: https://reviews.llvm.org/D83265
2020-07-21 11:18:06 -07:00
David Blaikie 38fbba4cb8 DebugInfo: Move getMD5AsBytes from DwarfUnit to DwarfDebug
It wasn't using any state from DwarfUnit anyway.
2020-07-20 19:21:39 -07:00
Matt Arsenault 1ef3ed0eb4 GlobalISel: Rewrite getLCMType
Try to make the behavior more consistent with getGCDType, and bias
towards returning something closer to the source type whenever there's
an ambiguity.
2020-07-20 21:06:30 -04:00
Matt Arsenault 12d5bec8c7 GlobalISel: Handle more cases in getGCDType
Try harder to find a canonical unmerge type when trying to cover the
desired target type. Handle finding a compatible unmerge type for two
vectors with different element types. This will return the largest
multiple of the source vector element that will evenly divide the
target vector type.

Also make the handling mixing scalars and vectors, and prefer the
source element type as the unmerge target type.
2020-07-20 20:53:35 -04:00
Eli Friedman b8f765a1e1 [AArch64][SVE] Add support for trunc to <vscale x N x i1>.
This isn't a natively supported operation, so convert it to a
mask+compare.

In addition to the operation itself, fix up some surrounding stuff to
make the testcase work: we need concat_vectors on i1 vectors, we need
legalization of i1 vector truncates, and we need to fix up all the
relevant uses of getVectorNumElements().

Differential Revision: https://reviews.llvm.org/D83811
2020-07-20 13:11:02 -07:00
Yuanfang Chen efcb8a1903 [NFC] remove unneeded TargetLoweringObjectFile init after 85c30f3374 2020-07-20 10:43:28 -07:00
Yuanfang Chen 589c646a7e [llc] (almost) remove `--print-machineinstrs`
Its effect could be achieved by
`-stop-after`,`-print-after`,`-print-after-all`. But a few tests need to
print MIR after ISel which could not be done with
`-print-after`/`-stop-after` since isel pass does not have commandline name.
That's the reason `--print-machineinstrs` is downgraded to
`--print-after-isel` in this patch. `--print-after-isel` could be
removed after we switch to new pass manager since isel pass would have a
commandline text name to use `print-after` or equivalent switches.

The motivation of this patch is to reduce tests dependency on
would-be-deprecated feature.

Reviewed By: arsenm, dsanders

Differential Revision: https://reviews.llvm.org/D83275
2020-07-20 10:43:28 -07:00
Alok Kumar Sharma 2d10258a31 [DebugInfo] Support for DW_AT_associated and DW_AT_allocated.
Summary:
This support is needed for the Fortran array variables with pointer/allocatable
attribute. This support enables debugger to identify the status of variable
whether that is currently allocated/associated.

  for pointer array (before allocation/association)
  without DW_AT_associated

(gdb) pt ptr
type = integer (140737345375288:140737354129776)
(gdb) p ptr
value requires 35017956 bytes, which is more than max-value-size

  with DW_AT_associated

(gdb) pt ptr
type = integer (:)
(gdb) p ptr
$1 = <not associated>

  for allocatable array (before allocation)

  without DW_AT_allocated

(gdb) pt arr
type = integer (140737345375288:140737354129776)
(gdb) p arr
value requires 35017956 bytes, which is more than max-value-size

  with DW_AT_allocated

(gdb) pt arr
type = integer, allocatable (:)
(gdb) p arr
$1 = <not allocated>

    Testing
- unit test cases added
- check-llvm
- check-debuginfo

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D83544
2020-07-20 19:54:35 +05:30
Petar Avramovic 6a1030aa0e AMDGPU/GlobalISel: Legalize s16->s64 G_FPEXT
Legalize using narrowScalar as s16->s32 G_FPEXT
followed by s32->s64 G_FPEXT.

Differential Revision: https://reviews.llvm.org/D84030
2020-07-20 16:12:19 +02:00
Matt Arsenault 5cbd4e415e GlobalISel: Don't handle widenScalar for vector G_INSERT
This handling didn't make any sense for vectors.
2020-07-20 10:06:18 -04:00
Matt Arsenault a679f27e98 GlobalISel: Consistently get TII from MIRBuilder 2020-07-20 10:06:18 -04:00
Petar Avramovic ba938f6388 AMDGPU/GlobalISel: Legalize s16->s64 G_FPTOSI/G_FPTOUI
Add narrowScalarFor action.
Add narrow scalar for typeIndex == 0 for G_FPTOSI/G_FPTOUI.
Legalize using narrowScalarFor as s16->s32 G_FPTOSI/G_FPTOUI
followed by s32->s64 G_SEXT/G_ZEXT.

Differential Revision: https://reviews.llvm.org/D84010
2020-07-20 11:06:11 +02:00
Evgeny Leviant 24089928be [CodeGen][TargetPassConfig] Add TargetTransformInfo pass correctly
Patch adds tti pass directly enforcing its execution with correctly set
TargetTransformInfo.

Differential revision: https://reviews.llvm.org/D84047
2020-07-18 14:11:40 +03:00
Aditya Nandakumar 63c081e73d [GISel: Add support for CSEing SrcOps which are immediates
https://reviews.llvm.org/D84072

Add G_EXTRACT to CSEConfigFull and add unit test as well.
2020-07-17 16:04:24 -07:00
Sam Tebbs 6c348e4067 [HWLoops] Stop converting to a while loop when it would be unsafe to
There were cases where a do-while loop would be converted to a while
loop before finding out that it would be unsafe to expand the SCEV in
this situation and then bailing out of hardware loop conversion.

This patch checks if it would be unsafe to expand the SCEV and if so stops converting the do-while into a while, allowing conversion to a hardware loop.

Differential Revision: https://reviews.llvm.org/D83953
2020-07-17 11:47:08 +01:00
Jay Foad 62fd7f767c [MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics
tryLatency compares two sched candidates. For the top zone it prefers
the one with lesser depth, but only if that depth is greater than the
total latency of the instructions we've already scheduled -- otherwise
its latency would be hidden and there would be no stall.

Unfortunately it only tests the depth of one of the candidates. This can
lead to situations where the TopDepthReduce heuristic does not kick in,
but a lower priority heuristic chooses the other candidate, whose depth
*is* greater than the already scheduled latency, which causes a stall.

The fix is to apply the heuristic if the depth of *either* candidate is
greater than the already scheduled latency.

All this also applies to the BotHeightReduce heuristic in the bottom
zone.

Differential Revision: https://reviews.llvm.org/D72392
2020-07-17 11:02:13 +01:00
Florian Hahn e297006d6f [ScheduleDAG] Move DBG_VALUEs after first term forward.
MBBs are not allowed to have non-terminator instructions after the first
terminator. Currently in some cases (see the modified test),
EmitSchedule can add DBG_VALUEs after the last terminator, for example
when referring a debug value that gets folded into a TCRETURN
instruction on ARM.

This patch updates EmitSchedule to move inserted DBG_VALUEs just before
the first terminator. I am not sure if there are terminators produce
values that can in turn be used by a DBG_VALUE. In that case, moving the
DBG_VALUE might result in referencing an undefined register. But in any
case, it seems like currently there is no way to insert a proper DBG_VALUEs
for such registers anyways.

Alternatively it might make sense to just remove those extra DBG_VALUES.

I am not too familiar with the details of debug info in the backend and
would appreciate any suggestions on how to address the issue in the best
possible way.

Reviewers: vsk, aprantl, jpaquette, efriedma, paquette

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D83561
2020-07-17 10:27:43 +01:00
Igor Kudrin f76a0cd97a [DebugInfo] Fix a misleading usage of DWARF forms with DIEExpr. NFCI.
For now, DIEExpr is used only in two places:

 1) in the debug info library unit test suite to emit
    a DW_AT_str_offsets_base attribute with the DW_FORM_sec_offset
    form, see dwarfgen::DIE::addStrOffsetsBaseAttribute();

 2) in DwarfCompileUnit::addLocationAttribute() to generate the location
    attribute for a TLS variable.

The later case used an incorrect DWARF form of DW_FORM_udata, which
implies storing an uleb128 value, not a 4/8 byte constant. The generated
result was as expected because DIEExpr::SizeOf() did not handle the used
form, but returned the size of the code pointer by default.

The patch fixes the issue by using more appropriate DWARF forms for
the problematic case and making DIEExpr::SizeOf() more straightforward.

Differential Revision: https://reviews.llvm.org/D83958
2020-07-17 13:49:27 +07:00
Denis Antrushin e04fe9aefd [Statepoint] Fix bug found by sanitaizer.
Statepoint has no static operands, so it cannot be verified
against MCInstrDescr. Revert NumDefs change introduced by ef658ebd62.
2020-07-16 23:06:53 +03:00
Nadav Rotem a394aa1b97 [LiveVariables] Replace std::vector with SmallVector.
Replace std::vector with SmallVector to reduce the number of mallocs.
This method is frequently executed, and the number of elements in the
vector is typically small.

https://reviews.llvm.org/D83920
2020-07-16 11:39:54 -07:00
Matt Arsenault 9d3e56e2ee DAG: Try scalarizing when expanding saturating add/sub
In an upcoming AMDGPU patch, the scalar cases will be legal and vector
ops should be scalarized, rather than producing a long sequence of
vector ops which will also need to be scalarized.

Use a lazy heuristic that seems to work and improves the thumb2 MVE
test.
2020-07-16 14:05:16 -04:00
Denis Antrushin ef658ebd62 MIR Statepoint refactoring. Part 1: Basic MI level changes.
Basic support for variadic-def MIR Statepoint:
- Change TableGen STATEPOINT description to variadic out list
  (For self-documentation purpose; by itself it does not affect
  code generation in any way).
- Update StatepointOpers helper class to handle variadic defs.
- Update MachineVerifier to properly handle them, too.

With this change, new Statepoint instruction can be passed through
backend (excluding ISEL) without errors.

Full change set is available at D81603.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D81645
2020-07-17 00:57:21 +07:00
Matt Arsenault 023883a834 IR: Rename Argument::hasPassPointeeByValueAttr to prepare for byref
When the byref attribute is added, there will need to be two similar
functions for the existing cases which have an associate value copy,
and byref which does not. Most, but not all of the existing uses will
use the existing version.

The associated size function added by D82679 also needs to
contextually differ, and will help eliminate a few places still
relying on pointee element types.
2020-07-16 13:50:49 -04:00
Petar Avramovic 6850033ca6 AMDGPU/GlobalISel: Legalize s64->s16 G_SITOFP/G_UITOFP
Add widenScalar for TypeIdx == 0 for G_SITOFP/G_UITOFP.
Legailize, using widenScalar, as s64->s32 G_SITOFP/G_UITOFP
followed by s32->s16 G_FPTRUNC.

Differential Revision: https://reviews.llvm.org/D83880
2020-07-16 16:31:57 +02:00
James Y Knight 60433c63ac Remove TwoAddressInstructionPass::sink3AddrInstruction.
This function has a bug which will incorrectly reschedule instructions
after an INLINEASM_BR (which can branch). (The bug may also allow
scheduling past a throwing-CALL, I'm not certain.)

I could fix that bug, but, as the removed FIXME notes, it's better to
attempt rescheduling before converting to 3-addr form, as that may
remove the need to convert in the first place. In fact, the code to do
such reordering was added to this pass only a few months later, in
2011, via the addition of the function rescheduleMIBelowKill. That
code does not contain the same bug.

The removal of the sink3AddrInstruction function is not a no-op: in
some cases it would move an instruction post-conversion, when
rescheduleMIBelowKill would not move the instruction pre-converison.
However, this does not appear to be important: the machine instruction
scheduler can reorder the after-conversion instructions, in any case.

This patch fixes a kernel panic 4.4 LTS x86_64 Linux kernels, when
built with clang after 4b0aa5724f.

Link: https://github.com/ClangBuiltLinux/linux/issues/1085

Differential Revision: https://reviews.llvm.org/D83708
2020-07-16 10:02:52 -04:00
Kerry McLaughlin 2762da0a16 [SVE][CodeGen] Legalisation of masked loads and stores
Summary:
This patch modifies IncrementMemoryAddress to use a vscale
when calculating the new address if the data type is scalable.

Also adds tablegen patterns which match an extract_subvector
of a legal predicate type with zip1/zip2 instructions

Reviewers: sdesmalen, efriedma, david-arm

Reviewed By: efriedma, david-arm

Subscribers: tschuett, hiraditya, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83137
2020-07-16 10:55:45 +01:00
Quentin Colombet 294be6b5d3 [CalcSpillWeights] Propagate the fact that a live-interval is not spillable
When we calculate the weight of a live-interval, add some code to
check if the original live-interval was markied as not spillable and
if so, progagate that information down to the new interval.

Previously we would just recompute a weight for the new interval,
thus, we could in theory just spill live-intervals marked as not
spillable by just splitting them. That goes against the spirit of
a non-spillable live-interval.

E.g., previously we could do:
v1 =  // v1 must not be spilled
...
= v1

Split:
v1 = // v1 must not be spilled
...
v2 = v1 // v2 can be spilled
...
v3 = v2 // v3 can be spilled
= v3

There's no test case for that one as we would need to split a
non-spillable live-interval without using LiveRangeEdit to see this
happening.
RegAlloc inserts non-spillable intervals only as part of the spilling
mechanism, thus at this point the intervals are not splittable anymore.
On top of that, RegAlloc uses the LiveRangeEdit API, which already
properly propagate that information.

In other words, this could only happen if a target was to mark
a live-interval as not spillable before register allocation and
split it without using LRE, e.g., through
LiveIntervals::splitSeparateComponent.
2020-07-15 17:57:36 -07:00
Hiroshi Yamauchi f233b92f92 [PGO][PGSO] Add profile guided size optimization to LegalizeDAG.
Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83333
2020-07-15 10:03:38 -07:00
Cameron McInally ae51a70030 [Legalize] Hoist invariant condition in ExpandVectorBuildThroughStack(...)
The operands of a BUILD_VECTOR must all have the same type, so we can hoist this invariant condition out of the loop.

Differential Revision: https://reviews.llvm.org/D83882
2020-07-15 11:05:20 -05:00
Tim Northover 37b96d51d0 CodeGenPrep: remove AssertingVH references before deleting dead instructions.
CodeGenPrepare keeps fairly close track of various instructions it's
seen, particularly GEPs, in maps and vectors. However, sometimes those
instructions become dead and get removed while it's still executing.
This triggers AssertingVH references to them in an asserts build and
could lead to miscompiles in a release build (I've only seen a later
segfault though).

So this patch adds a callback to
RecursivelyDeleteTriviallyDeadInstructions which can make sure the
instruction about to be deleted is removed from CodeGenPrepare's data
structures.
2020-07-15 15:19:21 +01:00
Tim Northover 5165b2b5fd AArch64+ARM: make LLVM consider system registers volatile.
Some of the system registers readable on AArch64 and ARM platforms
return different values with each read (for example a timer counter),
these shouldn't be hoisted outside loops or otherwise interfered with,
but the normal @llvm.read_register intrinsic is only considered to read
memory.

This introduces a separate @llvm.read_volatile_register intrinsic and
maps all system-registers on ARM platforms to use it for the
__builtin_arm_rsr calls. Registers declared with asm("r9") or similar
are unaffected.
2020-07-15 09:47:36 +01:00
Roger Ferrer Ibanez 14bc5e149d [DAGCombiner] Rebuild (setcc x, y, ==) from (xor (xor x, y), 1)
The existing code already considered this case. Unfortunately a typo in
the condition prevents it from triggering. Also the existing code, had
it run, forgot to do the folding.

This fixes PR42876.

Differential Revision: https://reviews.llvm.org/D65802
2020-07-15 07:34:22 +00:00
Krzysztof Pszeniczny c3e6555616 Call Frame Information (CFI) Handling for Basic Block Sections
This patch handles CFI with basic block sections, which unlike DebugInfo does
not support ranges. The DWARF standard explicitly requires emitting separate
CFI Frame Descriptor Entries for each contiguous fragment of a function. Thus,
the CFI information for all callee-saved registers (possibly including the
frame pointer, if necessary) have to be emitted along with redefining the
Call Frame Address (CFA), viz. where the current frame starts.

CFI directives are emitted in FDE’s in the object file with a low_pc, high_pc
specification. So, a single FDE must point to a contiguous code region unlike
debug info which has the support for ranges. This is what complicates CFI for
basic block sections.

Now, what happens when we start placing individual basic blocks in unique
sections:

* Basic block sections allow the linker to randomly reorder basic blocks in the
address space such that a given basic block can become non-contiguous with the
original function.
* The different basic block sections can no longer share the cfi_startproc and
cfi_endproc directives. So, each basic block section should emit this
independently.
* Each (cfi_startproc, cfi_endproc) directive will result in a new FDE that
caters to that basic block section.
* Now, this basic block section needs to duplicate the information from the
entry block to compute the CFA as it is an independent entity. It cannot refer
to the FDE of the original function and hence must duplicate all the stuff that
is needed to compute the CFA on its own.
* We are working on a de-duplication patch that can share common information in
FDEs in a CIE (Common Information Entry) and we will present this as a follow up
patch. This can significantly reduce the duplication overhead and is
particularly useful when several basic block sections are created.
* The CFI directives are emitted similarly for registers that are pushed onto
the stack, like callee saved registers in the prologue. There are cfi
directives that emit how to retrieve the value of the register at that point
when the push happened. This has to be duplicated too in a basic block that is
floated as a separate section.

Differential Revision: https://reviews.llvm.org/D79978
2020-07-14 12:54:12 -07:00
Logan Smith a19461d9e1 [NFC] Add 'override' keyword where missing in include/ and lib/.
This fixes warnings raised by Clang's new -Wsuggest-override, in preparation for enabling that warning in the LLVM build. This patch also removes the virtual keyword where redundant, but only in places where doing so improves consistency within a given file. It also removes a couple unnecessary virtual destructor declarations in derived classes where the destructor inherited from the base class is already virtual.

Differential Revision: https://reviews.llvm.org/D83709
2020-07-14 09:47:29 -07:00
Paul Walker 6e198aae1d [SelectionDAG] Prevent warnings when extracting fixed length vector from scalable.
ComputeNumSignBits and computeKnownBits both trigger "Scalable flag
may be dropped" warnings when a fixed length vector is extracted
from a scalable vector.  This patch assumes nothing about the
demanded elements thus matching the behaviour when extracting a
scalable vector from a scalable vector.

Differential Revision: https://reviews.llvm.org/D83642
2020-07-14 11:12:56 +00:00
Sam Elliott 1d15bbb9d9 Revert "[RISCV] Avoid Splitting MBB in RISCVExpandPseudo"
This reverts commit 97106f9d80.

This is based on feedback from https://reviews.llvm.org/D82988#2147105
2020-07-14 11:15:01 +01:00
David Sherwood 3b8eaf26db [SVE][CodeGen] Fix implicit TypeSize->uint64_t conversion in TransformFPLoadStorePair
In DAGCombiner::TransformFPLoadStorePair we were dropping the scalable
property of TypeSize when trying to create an integer type of equivalent
size. In fact, this optimisation makes no sense for scalable types
since we don't know the size at compile time. I have changed the code
to bail out when encountering scalable type sizes.

I've added a test to

  llvm/test/CodeGen/AArch64/sve-fp.ll

that exercises this code path. The test already emits an error if it
encounters warnings due to implicit TypeSize->uint64_t conversions.

Differential Revision: https://reviews.llvm.org/D83572
2020-07-14 08:07:30 +01:00
Amara Emerson 64eb3a4915 [AArch64][GlobalISel] Add post-legalize combine for sext_inreg(trunc(sextload)) -> copy
On AArch64 we generate redundant G_SEXTs or G_SEXT_INREGs because of this.

Differential Revision: https://reviews.llvm.org/D81993
2020-07-13 20:27:45 -07:00
zuojian lin fefe6a6642 Fix undefined behavior in DWARF emission
Caused by uninitialized load of llvm::DwarfDebug::PrevCU:
llvm::DwarfCompileUnit::addRange () at ../lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:276
llvm::DwarfDebug::endFunctionImpl () at ../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:1586
llvm::DebugHandlerBase::endFunction () at ../lib/CodeGen/AsmPrinter/DebugHandlerBase.cpp:319
llvm::AsmPrinter::EmitFunctionBody () at ../lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1230
llvm::ARMAsmPrinter::runOnMachineFunction () at ../lib/Target/ARM/ARMAsmPrinter.cpp:161

Most of the DebugInfo tests under `LLVM_LIT_ARGS:STRING=-sv --vg` prior to this fix, and pass with the fix applied.

Reviewed By: aprantl, dblaikie

Differential Revision: https://reviews.llvm.org/D81631
2020-07-13 18:32:36 -07:00
Matt Arsenault 23ec773d19 GlobalISel: Implement fewerElementsVector for saturating add/sub 2020-07-13 14:46:40 -04:00
Matt Arsenault 6a8c11a11f GlobalISel: Implement widenScalar for saturating add/sub
Add a placeholder legality rule for AMDGPU until the rest of the
actions are handled.
2020-07-13 14:46:40 -04:00
Sanjay Patel 8779b11410 [DAGCombiner] rot i16 X, 8 --> bswap X
We have this generic transform in IR (instcombine),
but as shown in PR41098:
http://bugs.llvm.org/PR41098
...the pattern may emerge in codegen too.

x86 has a potential refinement/reversal opportunity here,
but that should come later or needs a target hook to
avoid the transform. Converting to bswap is the more
specific form, so we should use it if it is available.
2020-07-13 12:01:53 -04:00
Sanjay Patel 2df46a5743 [DAGCombiner] allow load/store merging if pairs can be rotated into place
This carves out an exception for a pair of consecutive loads that are
reversed from the consecutive order of a pair of stores. All of the
existing profitability/legality checks for the memops remain between
the 2 altered hunks of code.

This should give us the same x86 base-case asm that gcc gets in
PR41098 and PR44895:
http://bugs.llvm.org/PR41098
http://bugs.llvm.org/PR44895

I think we are missing a potential subsequent conversion to use "movbe"
if the target supports that. That might be similar to what AArch64
would use to get "rev16".

Differential Revision: https://reviews.llvm.org/D83567
2020-07-13 08:57:00 -04:00
Sanjay Patel f1bbf3acb4 Revert "[DAGCombiner] allow load/store merging if pairs can be rotated into place"
This reverts commit 591a3af5c7.
The commit message was cut off and failed to include the review citation.
2020-07-13 08:55:29 -04:00
Sanjay Patel 591a3af5c7 [DAGCombiner] allow load/store merging if pairs can be rotated into place
This carves out an exception for a pair of consecutive loads that are
reversed from the consecutive order of a pair of stores. All of the
existing profitability/legality checks for the memops remain between
the 2 altered hunks of code.

This should give us the same x86 base-case asm that gcc gets in
PR41098 and PR44895:i
http://bugs.llvm.org/PR41098
http://bugs.llvm.org/PR44895

I think we are missing a potential subsequent conversion to use "movbe"
if the target supports that. That might be similar to what AArch64
would use to get "rev16".

Differential Revision:
2020-07-13 08:53:06 -04:00
Kerry McLaughlin afcc9a81d2 [SVE][Codegen] Add a helper function for pointer increment logic
Summary:
Helper used when splitting load & store operations to calculate
the pointer + offset for the high half of the split

Reviewers: efriedma, sdesmalen, david-arm

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83577
2020-07-13 10:53:40 +01:00
Petar Avramovic fd85b40aee [GlobalISel][InlineAsm] Fix buildCopy for inputs
Check that input size matches size of destination reg class.
Attempt to extend input size when needed.

Differential Revision: https://reviews.llvm.org/D83384
2020-07-13 10:52:33 +02:00
Sanjay Patel 39009a8245 [DAGCombiner] tighten fast-math constraints for fma fold
fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)

This is only allowed when "reassoc" is present on the fadd.

As discussed in D80801, this transform goes beyond
what is allowed by "contract" FMF (-ffp-contract=fast).
That is because we are fusing the trailing add of 'E' with a
multiply, but without "reassoc", the code mandates that the
products A*B and C*D are added together before adding in 'E'.

I've added this example to the LangRef to try to clarify the
meaning of "contract". If that seems reasonable, we should
probably do something similar for the clang docs because
there does not appear to be any formal spec for the behavior
of -ffp-contract=fast.

Differential Revision: https://reviews.llvm.org/D82499
2020-07-12 08:51:49 -04:00
Alexandre Ganea b71499ac9e Revert "Re-land [CodeView] Add full repro to LF_BUILDINFO record"
This reverts commit add59ecb34 and 41d2813a5f.
2020-07-10 19:46:16 -04:00
Alexandre Ganea add59ecb34 Re-land [CodeView] Add full repro to LF_BUILDINFO record
This patch adds some missing information to the LF_BUILDINFO which allows for rebuilding an .OBJ without any external dependency but the .OBJ itself (other than the compiler executable).

Some tools need this information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO therefore stores a full path to the compiler, the PWD (which is the CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variable). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.

For more information see PR36198 and D43002.

Differential Revision: https://reviews.llvm.org/D80833
2020-07-10 13:59:28 -04:00
Sanjay Patel 02fec9d2a5 [DAGCombiner] move/rename variables for readability; NFC 2020-07-10 11:28:51 -04:00
David Sherwood da731894a2 [CodeGen] Replace calls to getVectorNumElements() in DAGTypeLegalizer::SetSplitVector
In DAGTypeLegalizer::SetSplitVector I have changed calls in the assert
from getVectorNumElements() to getVectorElementCount(), since this
code path works for both fixed and scalable vectors.

This fixes up one warning in the test:

  sve-sext-zext.ll

Differential Revision: https://reviews.llvm.org/D83196
2020-07-10 08:29:17 +01:00
David Sherwood 229dfb4728 [CodeGen] Replace calls to getVectorNumElements() in SelectionDAG::SplitVector
This patch replaces some invalid calls to getVectorNumElements() with calls
to getVectorMinNumElements() instead, since the code paths changed in this
patch work for both fixed and scalable vector types.

Fixes warnings in this test:

  sve-sext-zext.ll

Differential Revision: https://reviews.llvm.org/D83203
2020-07-10 08:11:30 +01:00
Sanjay Patel a46cf40240 [DAGCombiner] convert if-chain in store merging to switch; NFC 2020-07-09 17:20:04 -04:00
Sanjay Patel b476e6a642 [DAGCombiner] add helper function for store merging of loaded values; NFC 2020-07-09 17:20:04 -04:00
Sanjay Patel f98a602c2e [DAGCombiner] add helper function for store merging of extracts; NFC 2020-07-09 17:20:03 -04:00
Sanjay Patel 8d74cb01b7 [DAGCombiner] add helper function for store merging of constants; NFC 2020-07-09 17:20:03 -04:00
Sanjay Patel 6890e2a17b [DAGCombiner] add helper function to manage list of consecutive stores; NFC 2020-07-09 17:20:03 -04:00
Christopher Tetreault ff5b9a7b3b [SVE] Remove calls to VectorType::getNumElements from CodeGen
Reviewers: efriedma, fpetrogalli, sdesmalen, RKSimon, arsenm

Reviewed By: RKSimon

Subscribers: wdng, tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82210
2020-07-09 12:43:36 -07:00
Sam Elliott 97106f9d80 [RISCV] Avoid Splitting MBB in RISCVExpandPseudo
Since the `RISCVExpandPseudo` pass has been split from
`RISCVExpandAtomicPseudo` pass, it would be nice to run the former as
early as possible (The latter has to be run as late as possible to
ensure correctness). Running earlier means we can reschedule these pairs
as we see fit.

Running earlier in the machine pass pipeline is good, but would mean
teaching many more passes about `hasLabelMustBeEmitted`. Splitting the
basic blocks also pessimises possible optimisations because some
optimisations are MBB-local, and others are disabled if the block has
its address taken (which is notionally what `hasLabelMustBeEmitted`
means).

This patch uses a new approach of setting the pre-instruction symbol on
the AUIPC instruction to a temporary symbol and referencing that. This
avoids splitting the basic block, but allows us to reference exactly the
instruction that we need to. Notionally, this approach seems more
correct because we do actually want to address a specific instruction.

This then allows the pass to be moved much earlier in the pass pipeline,
before both scheduling and register allocation. However, to do so we
must leave the MIR in SSA form (by not redefining registers), and so use
a virtual register for the intermediate value. By using this virtual
register, this pass now has to come before register allocation.

Reviewed By: luismarques, asb

Differential Revision: https://reviews.llvm.org/D82988
2020-07-09 13:54:13 +01:00
Lucas Prates fc39a9ca0e [CodeGen] Matching promoted type for 16-bit integer bitcasts from fp16 operand
Summary:
When legalizing a biscast operation from an fp16 operand to an i16 on a
target that requires both input and output types to be promoted to
32-bits, an assertion can fail when building the new node due to a
mismatch between the the operation's result size and the type specified to
the node.

This patches fix the issue by making sure the bit width of the types
match for the FP_TO_FP16 node, covering the difference with an extra
ANYEXTEND operation.

Reviewers: ostannard, efriedma, pirama, jmolloy, plotfi

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82552
2020-07-09 09:46:17 +01:00
serge-sans-paille a60c31fd62 Fix return status of AtomicExpandPass
Correctly reflect change in the return status.

Differential Revision: https://reviews.llvm.org/D83457
2020-07-09 10:27:48 +02:00
Qiu Chaofan 4254ed5c32 [Legalizer] Fix wrong operand in split vector helper
This should be a typo introduced in D69275, which may cause an unknown
segment fault in getNode.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D83376
2020-07-09 09:57:29 +08:00
Matt Arsenault 18bd821f02 DAG: Remove redundant finalizeLowering call
9cac4e6d1403554b06ec2fc9d834087b1234b695/D32628 intended to eliminate
this, and move all isel pseudo expansion to FinalizeISel. This was a
bad rebase or something, and failed to actually delete this call.

GlobalISel also has a redundant call of finalizeLowering. However, it
requires more work to remove it since it currently triggers a lot of
verifier errors in tests.
2020-07-08 18:48:20 -04:00
Matt Arsenault 2ec5fc0c61 DAG: Remove redundant handling of reg fixups
It looks like 9cac4e6d14 accidentally
added a second copy of this from a bad rebase or something. This
second copy was added, and the finalizeLowering call was not deleted
as intended.
2020-07-08 18:32:43 -04:00
Matt Arsenault 74a148ad39 GlobalISel: Verify G_BITCAST changes the type
Updated the AArch64 tests the best I could with my vague, inferred
understanding of AArch64 register banks. As far as I can tell, there
is only one 32-bit/64-bit type which will use the gpr register bank,
so we have to use the fpr bank for the other operand.
2020-07-08 17:16:27 -04:00
Sanjay Patel 1265eb2d5f [DAGCombiner] clean up in mergeConsecutiveStores(); NFC 2020-07-08 14:48:05 -04:00
Sanjay Patel 12c2271e53 [DAGCombiner] fix code comment and improve readability; NFC 2020-07-08 14:48:05 -04:00
Sanjay Patel 683a7f7025 [DAGCombiner] fix function-name formatting; NFC 2020-07-08 12:49:59 -04:00
Sanjay Patel 39329d5724 [DAGCombiner] add enum for store source value; NFC
This removes existing code duplication and allows us to
assert that we are handling the expected cases.

We have a list of outstanding bugs that could benefit by
handling truncated source values, so that's a possible
addition going forward.
2020-07-08 12:49:59 -04:00
Evgeny Leviant a074984250 [MIR] Speedup parsing of function with large number of basic blocks
Patch eliminates string length calculation when lexing a token. Speedup can be up to
1000x.

Differential revision: https://reviews.llvm.org/D83389
2020-07-08 18:50:00 +03:00
Paul Walker bb35f0fd89 [SelectionDAG] Fix incorrect offset when expanding CONCAT_VECTORS.
ExpandVectorBuildThroughStack is also used for CONCAT_VECTORS.
However, when calculating the offsets for each of the operands we
incorrectly use the element size rather than actual size and thus
the stores overlap.

Differential Revision: https://reviews.llvm.org/D83303
2020-07-08 15:39:25 +00:00
Ties Stuij 26a22478cd [CodeGen] Don't combine extract + concat vectors with non-legal types
Summary:
The following combine currently breaks in the DAGCombiner:

```
extract_vector_elt (concat_vectors v4i16:a, v4i16:b), x
   -> extract_vector_elt a, x
```

This happens because after we have combined these nodes we have inserted nodes
that use individual instances of the vector element type. In the above example
i16. However this isn't a legal type on all backends, and when the combining pass calls
the legalizer it breaks as it expects types to already be legal. The type legalizer has
already been run, and running it again would make a mess of the nodes.

In the example code at least, the generated code is still efficient after the change.

Reviewers: miyuki, arsenm, dmgreen, lebedev.ri

Reviewed By: miyuki, lebedev.ri

Subscribers: lebedev.ri, wdng, hiraditya, steven.zhang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83231
2020-07-08 15:29:57 +01:00
Petar Avramovic 419c92a749 [GlobalISel][InlineAsm] Fix matching input constraints to mem operand
Mark matching input constraint to mem operand as not supported.

Differential Revision: https://reviews.llvm.org/D83235
2020-07-08 12:32:17 +02:00
Jeremy Morse b9d977b0ca [DWARF] Add cuttoff guarding quadratic validThroughout behaviour
Occasionally we see absolutely massive basic blocks, typically in global
constructors that are vulnerable to heavy inlining. When these blocks are
dense with DBG_VALUE instructions, we can hit near quadratic complexity in
DwarfDebug's validThroughout function. The problem is caused by:

  * validThroughout having to step through all instructions in the block to
    examine their lexical scope,
  * and a high proportion of instructions in that block being DBG_VALUEs
    for a unique variable fragment,

Leading to us stepping through every instruction in the block, for (nearly)
each instruction in the block.

By adding this guard, we force variables in large blocks to use a location
list rather than a single-location expression, as shown in the added test.
This shouldn't change the meaning of the output DWARF at all: instead we
use a less efficient DWARF encoding to avoid a poor-performance code path.

Differential Revision: https://reviews.llvm.org/D83236
2020-07-08 10:30:09 +01:00
David Sherwood 9e66e9c30a [CodeGen] Fix wrong use of getVectorNumElements() in DAGTypeLegalizer::SplitVecRes_ExtendOp
In DAGTypeLegalizer::SplitVecRes_ExtendOp I have replaced an invalid
call to getVectorNumElements() with a call to getVectorMinNumElements(),
since the code path works for both fixed and scalable vectors.

This fixes up a warning in the following test:

  sve-sext-zext.ll

Differential Revision: https://reviews.llvm.org/D83197
2020-07-08 09:53:20 +01:00
David Sherwood 5b14f5051f [CodeGen] Fix wrong use of getVectorNumElements in PromoteIntRes_EXTRACT_SUBVECTOR
Calling getVectorNumElements() is not safe for scalable vectors and we
should normally use getVectorElementCount() instead. However, for the
code changed in this patch I decided to simply move the instantiation of
the variable 'OutNumElems' lower down to the place where only fixed-width
vectors are used, and hence it is safe to call getVectorNumElements().

Fixes up one warning in this test:

  sve-sext-zext.ll

Differential Revision: https://reviews.llvm.org/D83195
2020-07-08 09:36:34 +01:00
David Sherwood 15aeb805dc [CodeGen] Fix warnings in sve-ld1-addressing-mode-reg-imm.ll
For the GetElementPtr case in function
  AddressingModeMatcher::matchOperationAddr
I've changed the code to use the TypeSize class instead of relying
upon the implicit conversion to a uint64_t. As part of this we now
check for scalable types and if we encounter one just bail out for
now as the subsequent optimisations doesn't currently support them.

This changes fixes up all warnings in the following tests:

  llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll

Differential Revision: https://reviews.llvm.org/D83124
2020-07-08 09:16:00 +01:00
Heejin Ahn 7e6793aa33 [WebAssembly] Generate unreachable after __stack_chk_fail
`__stack_chk_fail` does not return, but `unreachable` was not generated
following `call __stack_chk_fail`. This had a possibility to generate an
invalid binary for functions with a return type, because
`__stack_chk_fail`'s return type is void and `call __stack_chk_fail` can
be the last instruction in the function whose return type is non-void.
Generating `unreachable` after it makes sure CFGStackify's
`fixEndsAtEndOfFunction` handles it correctly.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D83277
2020-07-08 01:02:05 -07:00
serge-sans-paille edc7da2405 Upgrade TypePromotionTransaction to be able to report changes in CodeGenPrepare
optimizeMemoryInst was reporting no change while still modifying the IR.
Inspect the status of TypePromotionTransaction to get a better status.

Related to https://reviews.llvm.org/D80916

Differential Revision: https://reviews.llvm.org/D81256
2020-07-08 08:35:44 +02:00
Philip Reames 22596e7b2f [Statepoint] Use early return to reduce nesting and clarify comments [NFC] 2020-07-07 16:19:05 -07:00
Philip Reames 9955876d74 [Statepoint] Reduce intendation and change a variable name [NFC] 2020-07-07 16:19:05 -07:00
Matt Arsenault 23157f3bdb GlobalISel: Handle EVT argument lowering correctly
handleAssignments was assuming every argument type is an MVT, and
assignArg would always fail. This fixes one of the hacks in the
current AMDGPU calling convention code that pre-processes the
arguments.
2020-07-07 16:36:14 -04:00
Philip Reames b172cd7812 [Statepoint] Factor out logic for non-stack non-vreg lowering [almost NFC]
This is inspired by D81648.  The basic idea is to have the set of SDValues which are lowered as either constants or direct frame references explicit in one place, and to separate them clearly from the spilling logic.

This is not NFC in that the handling of constants larger than > 64 bit has changed.  The old lowering would crash on values which could not be encoded as a sign extended 64 bit value.  The new lowering just spills all constants > 64 bits.  We could be consistent about doing the sext(Con64) optimization, but I happen to know that this code path is utterly unexercised in practice, so simple is better for now.
2020-07-07 13:34:28 -07:00
Stanislav Mekhanoshin 7c03872645 LIS: fix handleMove to properly extend main range
handleMoveDown or handleMoveUp cannot properly repair a main
range of a LiveInterval since they only get LiveRange. There
is a problem if certain use has moved few segments away and
there is a hole in the main range in between of these two
locations. We may get a SubRange with a very extended Segment
spanning several Segments of the main range and also spanning
that hole. If that happens then we end up with the main range
not covering its SubRange which is an error.

It might be possible to attempt fixing the main range in place
just between of the old and new index by extending all of its
Segments in between, but it is unclear this logic will be
faster than just straight constructMainRangeFromSubranges,
which itself is pretty cheap since it only contains interval
logic. That will also require shrinkToUses() call after which
is probably even more expensive.

In the test second move is from 64B to 92B for the sub1.
Subrange is correctly fixed:

L000000000000000C [16r,32B:0)[32B,92r:1)  0@16r 1@32B-phi

But the main range has a hole in between 80d and 88r after
updateRange():

%1 [16r,32B:0)[32B,80r:4)[80r,80d:3)[88r,96r:1)[96r,160B:2)

Since source position is 64B this segment is not even considered
by the updateRange().

Differential Revision: https://reviews.llvm.org/D82916
2020-07-07 11:52:32 -07:00
Kerry McLaughlin cdf2eef613 [SVE][CodeGen] Legalisation of unpredicated store instructions
Summary:
When splitting a store of a scalable type, the new address is
calculated in SplitVecOp_STORE using a vscale and an add instruction.

Reviewers: sdesmalen, efriedma, david-arm

Reviewed By: david-arm

Subscribers: tschuett, hiraditya, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83041
2020-07-07 11:47:10 +01:00
Kerry McLaughlin 5e8084beba [SVE][CodeGen] Legalisation of unpredicated load instructions
Summary:
When splitting a load of a scalable type, the new address is
calculated in SplitVecRes_LOAD using a vscale and an add instruction.

This patch also adds a DAG combiner fold to visitADD for vscale:
 - Fold (add (vscale(C0)), (vscale(C1))) to (add (vscale(C0 + C1)))

Reviewers: sdesmalen, efriedma, david-arm

Reviewed By: david-arm

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82792
2020-07-07 11:05:03 +01:00
David Sherwood 79d34a5a1b [SVE][CodeGen] Fix bug when falling back to DAG ISel
In an earlier commit 584d0d5c17 I
added functionality to allow AArch64 CodeGen support for falling
back to DAG ISel when Global ISel encounters scalable vector
types. However, it seems that we were not falling back early
enough as llvm::getLLTForType was still being invoked for scalable
vector types.

I've added a new fallback function to the call lowering class in
order to catch this problem early enough, rather than wait for
lowerFormalArguments to reject scalable vector types.

Differential Revision: https://reviews.llvm.org/D82524
2020-07-07 09:23:04 +01:00
David Sherwood c061e56e88 [CodeGen] Fix warnings in sve-vector-splat.ll and sve-trunc.ll
This patch fixes all remaining warnings in:

  llvm/test/CodeGen/AArch64/sve-trunc.ll
  llvm/test/CodeGen/AArch64/sve-vector-splat.ll

I hit some warnings related to getCopyPartsToVector. I fixed two
issues:

1. In widenVectorToPartType() we assumed that we'd always be
using BUILD_VECTOR nodes to expand from one vector type to another,
which is incorrect for scalable vector types. I've fixed this for now
by simply bailing out immediately for scalable vectors.
2. In getCopyToPartsVector() I've changed the code to compare
the element counts of different types.

Differential Revision: https://reviews.llvm.org/D83028
2020-07-07 09:21:47 +01:00
Sanjay Patel ea71ba11ab [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division
X / (fabs(A) * sqrt(Z)) --> X / sqrt(A*A*Z) --> X * rsqrt(A*A*Z)

In the motivating case from PR46406:
https://bugs.llvm.org/show_bug.cgi?id=46406
...this is restoring the sequence that was originally in the source code.
We extracted a term from within the sqrt because we do not know in
instcombine whether a target will expand a sqrt call.
Note: we could say that the transform in IR should be restricted, but
that would not solve the problem if the source was originally in the
pattern shown here.

This is a gray area for fast-math-flag requirements. I think we should at
least check fast-math-flags on the fdiv and fmul because I view this
transform as 2 pieces: reassociate the fmul operands and form reciprocal
from the fdiv (as with the existing transform). We could argue that the
sqrt also needs FMF, but that was not required before, so we should change
that in a follow-up patch if that seems better.

We don't currently have a way to check that the target will produce a sqrt
or recip estimate without actually creating nodes (the APIs are SDValue
getSqrtEstimate() and SDValue getRecipEstimate()), so we clean up
speculatively created nodes if we are not able to create an estimate.
The x86 test with doubles verifies that we are not changing a test with
no estimate sequence.

Differential Revision: https://reviews.llvm.org/D82716
2020-07-06 19:12:21 -04:00
Yuanfang Chen 1e495e10e6 [NFC] change getLimitedCodeGenPipelineReason to static function 2020-07-06 15:39:27 -07:00
Nicolai Hähnle 76c5cb05a3 DomTree: Remove getChildren() accessor
Summary:
Avoid exposing details about how children are stored. This will enable
subsequent type-erasure changes.

New methods are introduced to cover common access patterns.

Change-Id: Idb5f4b1b9c84e4cc71ddb39bb52a388682f5674f

Reviewers: arsenm, RKSimon, mehdi_amini, courbet

Subscribers: qcolombet, sdardis, wdng, hiraditya, jrtc27, zzheng, atanasyan, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83083
2020-07-06 21:58:11 +02:00
jasonliu 6d3ae365bd [XCOFF][AIX] Give symbol an internal name when desired symbol name contains invalid character(s)
Summary:

When a desired symbol name contains invalid character that the
system assembler could not process, we need to emit .rename
directive in assembly path in order for that desired symbol name
to appear in the symbol table.

Reviewed By: hubert.reinterpretcast, DiggerLin, daltenty, Xiangling_L

Differential Revision: https://reviews.llvm.org/D82481
2020-07-06 15:49:15 +00:00
Matt Arsenault 521ebc1681 GlobalISel: Move finalizeLowering call later
This matches the DAG behavior where this is called after the loop
checking for calls. The AMDGPU implementation depends on knowing if
there are calls in the function or not, so move this later.

Another problem is finalizeLowering is actually called twice; I was
seeing weird inconsistencies since the first call would produce
unexpected results and the second run would correct them in some
contexts. Since this requires disabling the verifier, and it's useful
to serialize the MIR immediately after selection, FinalizeISel should
probably not be a real pass.
2020-07-06 09:19:40 -04:00
Jay Foad babbeafa00 [TargetLowering] Improve expansion of FSHL/FSHR by non-zero amount
Use a simpler code sequence when the shift amount is known not to be
zero modulo the bit width.

Nothing much uses this until D77152 changes the translation of fshl and
fshr intrinsics.

Differential Revision: https://reviews.llvm.org/D82540
2020-07-06 12:07:14 +01:00
Jay Foad e7a4a24dc5 [TargetLowering] Improve expansion of ROTL/ROTR
Using a negation instead of a subtraction from a constant can save an
instruction on some targets.

Nothing much uses this until D77152 changes the translation of fshl and
fshr intrinsics.

Differential Revision: https://reviews.llvm.org/D82539
2020-07-06 12:07:14 +01:00
Craig Topper 76123d338d [DAGCombiner] visitSIGN_EXTEND_INREG should fold sext_vector_inreg(undef) to 0 not undef.
We need to ensure that the sign bits of the result all match
so we can't fold to undef.

Similar to PR46585.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D83163
2020-07-04 14:35:49 -07:00
Craig Topper 120c5f1057 [DAGCombiner] Don't fold zext_vector_inreg/sext_vector_inreg(undef) to undef. Fold to 0.
zext_vector_inreg needs to produces 0s in the extended bits and
sext_vector_inreg needs to produce upper bits that are all the
same. So we should fold them to a 0 vector instead of undef.

Fixes PR46585.
2020-07-04 11:42:53 -07:00
Simon Pilgrim 56a8a5c9fe [DAG] matchBinOpReduction - match subvector reduction patterns beyond a matched shufflevector reduction
Currently matchBinOpReduction only handles shufflevector reduction patterns, but in many cases these only occur in the final stages of a reduction, once we're down to legal vector widths.

Before this its likely that we are performing reductions using subvector extractions to repeatedly split the source vector in half and perform the binop on the halves.

Assuming we've found a non-partial reduction, this patch continues looking for subvector reductions as far as it can beyond the last shufflevector.

Fixes PR37890
2020-07-04 15:28:15 +01:00
David Green 9e03547cab [ARM][HWLoops] Create hardware loops for sibling loops
Given a loop with two subloops, it should be possible for both to be
converted to hardware loops. That's what this patch does, simply enough.
It slightly alters the loop iterating order to try and convert all
subloops. If one (or more) succeeds, it stops as before.

Differential Revision: https://reviews.llvm.org/D78502
2020-07-03 17:20:02 +01:00
Guillaume Chatelet 87e2751cf0 [Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D83082
2020-07-03 08:06:43 +00:00
Sanjay Patel bc110de78a [SelectionDAG] don't split branch on logic-of-vector-compares
SelectionDAGBuilder converts logic-of-compares into multiple branches based
on a boolean TLI setting in isJumpExpensive(). But that probably never
considered the pattern of extracted bools from a vector compare - it seems
unlikely that we would want to turn vector logic into control-flow.

The motivating x86 reduction case is shown in PR44565:
https://bugs.llvm.org/show_bug.cgi?id=44565
...and that test shows the expected improvement from using pmovmsk codegen.

For AArch64, I modified the test to include an extra op because the simpler
test gets transformed by a codegen invocation of SimplifyCFG.

Differential Revision: https://reviews.llvm.org/D82602
2020-07-02 17:05:24 -04:00
Sander de Smalen 143e324e75 [CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR
There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics,
that shouldn't really have been there (I suspect this was a remnant from when
we expected the wider vector always to have come from a vector CONCAT).

When I tried to create a more minimal reproducer, I found a bug in
DAGCombiner where it drops the scalable flag when trying to fold:

      extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')

This patch fixes both issues.

Reviewers: david-arm, efriedma, spatel

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82910
2020-07-02 10:16:43 +01:00
David Sherwood c7df35d2b2 [CodeGen] Fix warnings in getCopyToPartsVector
Whilst trying to assemble the following test:

  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_set2.c

I discovered we were hitting some warnings about possible invalid
calls to getVectorNumElements() in getCopyToPartsVector(). I've
tried to fix these by using ElementCount types where possible and
I've made the assumption that we don't support using a fixed width
vector to copy parts of a scalable vector, and vice versa. Looking
at how the copy is implemented I think that's the right thing for
now.

Differential Revision: https://reviews.llvm.org/D82744
2020-07-02 09:08:20 +01:00
Krzysztof Pszeniczny e4b3c138de This patch adds basic debug info support with basic block sections.
This patch uses ranges for debug information when a function contains basic block sections rather than using [lowpc, highpc]. This is also the first in a series of patches for debug info and does not contain the support for linker relaxation. That will be done as a follow up patch.

Differential Revision: https://reviews.llvm.org/D78851
2020-07-01 23:53:00 -07:00
Matt Arsenault afb3bd9914 RegAllocGreedy: Use TargetInstrInfo already in the class 2020-07-01 18:58:59 -04:00
Craig Topper 51e92b223b [X86] Speculatively apply the same fix from 361853c96f to PromoteIntOp_MGATHER.
The UpdateNodeOperands here is also subject to CSE.
2020-07-01 11:57:59 -07:00
Craig Topper 361853c96f [LegalizeTypes] Properly handle the case when UpdateNodeOperands in PromoteIntOp_MLOAD triggers CSE instead of updating the node in place.
The caller can't handle the node having multiple results like a
masked load does. So we need to detect the case and do our own
result replacement.

Fixes PR46532.
2020-07-01 11:48:50 -07:00
David Sherwood f11305780f [CodeGen] Fix warnings in DAGCombiner::visitSCALAR_TO_VECTOR
In visitSCALAR_TO_VECTOR we try to optimise cases such as:

  scalar_to_vector (extract_vector_elt %x)

into vector shuffles of %x. However, it led to numerous warnings
when %x is a scalable vector type, so for now I've changed the
code to only perform the combination on fixed length vectors.
Although we probably could change the code to work with scalable
vectors in certain cases, without a proper profit analysis it
doesn't seem worth it at the moment.

This change fixes up one of the warnings in:

  llvm/test/CodeGen/AArch64/sve-merging-stores.ll

I've also added a simplified version of the same test to:

  llvm/test/CodeGen/AArch64/sve-fp.ll

which already has checks for no warnings.

Differential Revision: https://reviews.llvm.org/D82872
2020-07-01 18:47:13 +01:00
James Y Knight 4b0aa5724f Change the INLINEASM_BR MachineInstr to be a non-terminating instruction.
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.

Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.

Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.

Reviewed By: nickdesaulniers, void

Differential Revision: https://reviews.llvm.org/D79794
2020-07-01 12:51:50 -04:00
Yuanfang Chen 78c69a00a4 [NFC] Clean up uses of MachineModuleInfoWrapperPass 2020-07-01 09:45:05 -07:00
David Green ca4c1ad854 [Outliner] Set nounwind for outlined functions
This prevents the outlined functions from pulling in a lot of unnecessary code
in our downstream libraries/linker. Which stops outlining making codesize
worse in c++ code with no-exceptions.

Differential Revision: https://reviews.llvm.org/D57254
2020-07-01 17:18:34 +01:00
Guillaume Chatelet ef36f5143d [Alignment] TargetLowering::hasPairedLoad must use Align for RequiredAlignment
As per documentation of `hasPairLoad`:
"`RequiredAlignment` gives the minimal alignment constraints that must be met to be able to select this paired load."
In this sense, `0` is strictly equivalent to `1`. We make this obvious by using `Align` instead of unsigned.
There is only one implementor of this interface.

Differential Revision: https://reviews.llvm.org/D82958
2020-07-01 14:32:30 +00:00
Guillaume Chatelet d3085c2501 [Alignment][NFC] Transition and simplify calls to DL::getABITypeAlignment
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82956
2020-07-01 14:31:56 +00:00
Guillaume Chatelet 27bbc8ede1 [Alignment][NFC] Migrate TargetTransformInfo::CreateVariableSizedObject to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82939
2020-07-01 14:31:21 +00:00
David Sherwood 97a7a9abb2 [CodeGen] Fix up warnings in visitEXTRACT_SUBVECTOR
It's perfectly valid to do certain DAG combines where we extract
subvectors from a concat vector when we have scalable vector types.
However, we can do this in a way that avoids generating compiler
warnings by replacing calls to getVectorNumElements() with
getVectorMinNumElements(). Due to the way subvector extracts are
designed to work with scalable vector types this is ok.

This eliminates some warnings from existing tests in this file:

  llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll

Differential Revision: https://reviews.llvm.org/D82655
2020-07-01 15:10:53 +01:00
David Stenberg 85460c4ea2 [DebugInfo] Do not emit entry values for composite locations
Summary:
This is a fix for PR45009.

When working on D67492 I made DwarfExpression emit a single
DW_OP_entry_value operation covering the whole composite location
description that is produced if a register does not have a valid DWARF
number, and is instead composed of multiple register pieces. Looking
closer at the standard, this appears to not be valid DWARF. A
DW_OP_entry_value operation's block can only be a DWARF expression or a
register location description, so it appears to not be valid for it to
hold a composite location description like that.

See DWARFv5 sec. 2.5.1.7:

"The DW_OP_entry_value operation pushes the value that the described
 location held upon entering the current subprogram. It has two
 operands: an unsigned LEB128 length, followed by a block containing a
 DWARF expression or a register location description (see Section
 2.6.1.1.3 on page 39)."

Here is a dwarf-discuss mail thread regarding this:

http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2020-March/004610.html

There was not a strong consensus reached there, but people seem to lean
towards that operations specified under 2.6 (e.g. DW_OP_piece) may not
be part of a DWARF expression, and thus the DW_OP_entry_value operation
can't contain those.

Perhaps we instead want to emit a entry value operation per each
DW_OP_reg* operation, e.g.:

  - DW_OP_entry_value(DW_OP_regx sub_reg0),
    DW_OP_stack_value,
    DW_OP_piece 8,
  - DW_OP_entry_value(DW_OP_regx sub_reg1),
    DW_OP_stack_value,
    DW_OP_piece 8,
  [...]

The question then becomes how the call site should look; should a
composite location description be emitted there, and we then leave it up
to the debugger to match those two composite location descriptions?
Another alternative could be to emit a call site parameter entry for
each sub-register, but firstly I'm unsure if that is even valid DWARF,
and secondly it seems like that would complicate the collection of call
site values quite a bit. As far as I can tell GCC does not emit any
entry values / call sites in these cases, so we do not have something to
compare with, but the former seems like the more reasonable approach.

Currently when trying to emit a call site entry for a parameter composed
of multiple DWARF registers a (DwarfRegs.size() == 1) assert is
triggered in addMachineRegExpression(). Until the call site
representation is figured out, and until there is use for these entry
values in practice, this commit simply stops the invalid DWARF from
being emitted.

Reviewers: djtodoro, vsk, aprantl

Reviewed By: djtodoro, vsk

Subscribers: jyknight, hiraditya, fedor.sergeev, jrtc27, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D75270
2020-07-01 10:50:55 +02:00
Guillaume Chatelet 7f37d88306 [Alignment][NFC] Migrate MachineFrameInfo::CreateSpillStackObject to Align
iThis patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82934
2020-07-01 08:49:28 +00:00
Sam Parker 3ee580d017 [ARM][LowOverheadLoops] Handle reductions
While validating live-out values, record instructions that look like
a reduction. This will comprise of a vector op (for now only vadd),
a vorr (vmov) which store the previous value of vadd and then a vpsel
in the exit block which is predicated upon a vctp. This vctp will
combine the last two iterations using the vmov and vadd into a vector
which can then be consumed by a vaddv.

Once we have determined that it's safe to perform tail-predication,
we need to change this sequence of instructions so that the
predication doesn't produce incorrect code. This involves changing
the register allocation of the vadd so it updates itself and the
predication on the final iteration will not update the falsely
predicated lanes. This mimics what the vmov, vctp and vpsel do and
so we then don't need any of those instructions.

Differential Revision: https://reviews.llvm.org/D75533
2020-07-01 08:31:49 +01:00
Guillaume Chatelet 28de229bc6 [Alignment][NFC] Migrate MachineFrameInfo::CreateStackObject to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82894
2020-07-01 07:28:11 +00:00
JF Bastien ca134e4c52 [NFC] fix diagnostic
It's pretty silly to diagnose on a scalar copy but the build does that:
  loop variable 'SibReg' of type 'const llvm::Register' creates a copy from type 'const llvm::Register' [-Wrange-loop-analysis]
2020-06-30 21:49:01 -07:00
Matt Arsenault e9eab30339 GlobalISel: Disallow undef generic virtual register uses
With an undef operand, it's possible for getVRegDef to fail and return
null. This is an edge case very little code bothered to
consider. Proper gMIR should use G_IMPLICIT_DEF instead.

I initially tried to apply this restriction to all SSA MIR, so then
getVRegDef would never fail anywhere. However, ProcessImplicitDefs
does technically run while the function is in SSA. ProcessImplicitDefs
and DetectDeadLanes would need to either move, or a new pseudo-SSA
type of function property would need to be introduced.
2020-06-30 19:18:01 -04:00
Hendrik Greving 50ac7ce94f [ModuloSchedule] Make PeelingModuloScheduleExpander inheritable.
Basically a NFC, but allows subclasses access to the entire PeelingModuloScheduleExpander
class. We are doing this to allow backends, particularly one that are not necessarily
upstreamed, to inherit from PeelingModuloScheduleExpander and access its basic structures.

Renames Info into LoopInfo for consistency in PeelingModuloScheduleExpander.

Differential Revision: https://reviews.llvm.org/D82673
2020-06-30 15:56:13 -07:00
Hsiangkai Wang a7b0f39185 [MVT] Add new MVT types for RISC-V vector.
In RISC-V vector extension, users could group multiple vector registers
as one pseudo register. In mixed width operations, users could use
partial vector registers to reduce the register pressure. The parameter
to control register grouping and partial use is called LMUL. LMUL is a
part of the type. So, we have a bunch of vector types. In order to
support all these types, we need new MVT types in LLVM. In this patch, I
added several MVT types that are used in RISC-V vector implementation.
This is a standalone patch for MVT types without RISC-V related implementation.

Differential revision: https://reviews.llvm.org/D81724
2020-07-01 01:07:50 +08:00
Matt Arsenault b7f6ecf0c7 RegAlloc: Start using Register 2020-06-30 12:13:08 -04:00
Matt Arsenault af1eeaf380 BranchFolding: Use Register 2020-06-30 12:13:08 -04:00
Matt Arsenault edb4a5cb36 TailDuplicator: Use Register 2020-06-30 12:13:08 -04:00
Guillaume Chatelet 423458ec09 [Alignment][NFC] TargetLowering::allowsMemoryAccessForAlignment
First patch of a series to adapt TargetLowering::allowsXXX functions

This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D81372
2020-06-30 15:31:24 +00:00
Guillaume Chatelet c1cd61e02a [Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemcpy to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82849
2020-06-30 13:12:31 +00:00
Guillaume Chatelet 306d7c6929 [Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemmove to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82850
2020-06-30 12:46:59 +00:00
Guillaume Chatelet 6a6af30d43 [Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemset to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82851
2020-06-30 12:46:26 +00:00
Guillaume Chatelet 2c5ff48e61 [Alignment][NFC] Migrate AtomicExpandPass to Align
This is a followup on D78403.
I'm unsure about `getAtomicOpAlign` overloads that take `AtomicRMWInst` and `AtomicCmpXchgInst`, shouldn't `getAlign` provide the correct answer already?

Differential Revision: https://reviews.llvm.org/D81369
2020-06-30 09:54:45 +00:00
Petar Avramovic 4b980cc9ca [GlobalISel][InlineAsm] Add support for matching input constraints
Find def operand that corresponds to matching constraint and
tie input to that operand.

Differential Revision: https://reviews.llvm.org/D82651
2020-06-30 10:49:05 +02:00
Guillaume Chatelet 5f8bdb3e6a [Alignment][NFC] TargetLowering::allowsMemoryAccess
Second patch of a series to adapt TargetLowering::allowsXXX functions

This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82785
2020-06-30 08:17:00 +00:00
David Sherwood c02332a693 [CodeGen] Fix warning in getNode for EXTRACT_SUBVECTOR
Fix a warning in getNode() when extracting a subvector from a
concat vector. We can simply replace the call to getVectorNumElements
with getVectorMinNumElements as this follows the defined behaviour
for EXTRACT_SUBVECTOR.

Differential Revision: https://reviews.llvm.org/D82746
2020-06-30 08:11:41 +01:00
David Sherwood 46a7f4d6f4 [SVE][CodeGen] Fix bug in DAGCombiner::reduceBuildVecToShuffle
When trying to reduce a BUILD_VECTOR to a SHUFFLE_VECTOR it's
important that we carefully check the vector types that led to
that BUILD_VECTOR. In the test I have attached to this commit
there is a case where the results of two SVE faddv instructions
are being stored to consecutive memory locations. With my fix,
as part of merging those stores we discover that each BUILD_VECTOR
element came from an extract of a SVE vector element and
therefore bail out.

Differential Revision: https://reviews.llvm.org/D82564
2020-06-30 07:28:15 +01:00
Guillaume Chatelet 368a5e3a66 [Alignment][NFC] migrate DataLayout::getPreferredAlignment
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82752
2020-06-29 11:24:36 +00:00
Simon Pilgrim 3521ecf1f8 [X86] Add vector support to targetShrinkDemandedConstant for OR/XOR opcodes
If a constant is only allsignbits in the demanded/active bits, then sign extend it to an allsignbits bool pattern for OR/XOR ops.

This also requires SimplifyDemandedBits XOR handling to be modified to call ShrinkDemandedConstant on any (non-NOT) XOR pattern to account for non-splat cases.

Next step towards fixing PR45808 - with this patch we now get a <-1,-1,0,0> v4i64 constant instead of <1,1,0,0>.

Differential Revision: https://reviews.llvm.org/D82257
2020-06-29 12:19:05 +01:00
Simon Pilgrim 973685fc78 [TargetLowering] Add DemandedElts arg to ShrinkDemandedConstant
Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.
2020-06-29 11:46:58 +01:00
Guillaume Chatelet 3500d9ec95 Fix invalid alignment in DAGCombiner::isLegalNarrowLdSt
`ShAmt / 8` can be a non power of two, this can lead to an invalid alignment.
context: https://reviews.llvm.org/D41350#inline-749165

Differential Revision: https://reviews.llvm.org/D82565
2020-06-29 09:22:15 +00:00
madhur13490 299dee91b3 Revert accidentally landed patch citing o build errors
Summary: This reverts commit c73966c2f7.

Reviewers:

Subscribers:
2020-06-28 11:52:33 +00:00
madhur13490 c73966c2f7 Improve stack object printing. NFC.
Reviewers: madhur13490

Reviewed By: madhur13490

Subscribers: qcolombet, arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82712
2020-06-28 11:43:33 +00:00
Simon Pilgrim 6bdb3ce452 [DAG] reduceBuildVecExtToExtBuildVec - don't combine if it would break a splat.
reduceBuildVecExtToExtBuildVec was breaking a splat(zext(x)) pattern into buildvector(x, 0, x, 0, ..) resulting in much more complex insert+shuffle codegen.

We already go to some lengths to avoid this in SimplifyDemandedVectorElts etc. when we encounter splat buildvectors.

It should be OK to fold all splat(aext(x)) patterns - we might need to tighten this if we find a case where we mustn't introduce a buildvector(x, undef, x, undef, ..) but I can't find one.

Fixes PR46461.
2020-06-27 11:03:57 +01:00
Matt Arsenault c2e403c19d GlobalISel: Don't fail translate on weak cmpxchg
The translation of cmpxchg added by
9481399c0f specifically skipped weak
cmpxchg due to not understanding the meaning. Weak cmpxchg was added
in 420a216817. As explained in the
commit message, the weak mode is implicit in how
ATOMIC_CMP_SWAP_WITH_SUCCESS is lowered. If it's expanded to a regular
ATOMIC_CMP_SWAP, it's replaced with a strong cmpxchg.

This handling seems weird to me, but this was already following the
DAG behavior. I would expect the strong IR instruction to not have the
boolean output. Failing that, I might expect the IRTranslator to emit
ATOMIC_CMP_SWAP and a constant for the boolean.
2020-06-26 17:52:18 -04:00
Sanjay Patel e7f7715eb9 [DAGCombiner] rename variables for readability; NFC
PR46406 shows a pattern where we can do better, so try to clean this up
before adding more code.
2020-06-26 14:22:11 -04:00
Guillaume Chatelet fdc7c7fb87 [Alignment][NFC] Migrate TTI::getInterleavedMemoryOpCost to Align
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82573
2020-06-26 11:00:53 +00:00
Simon Pilgrim da426ead73 LiveRangeEdit.h - reduce AliasAnalysis.h include to forward declaration. NFC.
Move include to LiveRangeEdit.cpp and replace legacy AliasAnalysis typedef with AAResults where necessary.
2020-06-26 09:58:21 +01:00
Sjoerd Meijer 243a5329d4 [SelectionDAG] Lower @llvm.get.active.lane.mask to setcc
This lowers intrinsic @llvm.get.active.lane.mask to a setcc node, i.e. an icmp
ule, and creates vectors for its 2 arguments on which the comparison is
performed.

Differential Revision: https://reviews.llvm.org/D82292
2020-06-26 07:46:38 +01:00
Igor Kudrin 70165bb7e9 [DebugInfo] Fix emitting offsets to CUs with -dwarf-sections-as-references=Enable.
The size of the field depends on the DWARF format, not the address size
of the target.

Differential Revision: https://reviews.llvm.org/D82311
2020-06-26 12:12:26 +07:00
Wouter van Oortmerssen b9a539c010 [WebAssembly] Adding 64-bit versions of __stack_pointer and other globals
We have 6 globals, all of which except for __table_base are 64-bit under wasm64.

Differential Revision: https://reviews.llvm.org/D82130
2020-06-25 15:52:44 -07:00
Paul Walker 2c09e91054 [MVT] Add missing floating point types for 1024/2048-bit vectors.
Summary:
This patch adds entries for:
  v64f16
  v128f16
  v64bf16
  v128bf16
  v32f64

Subscribers: dschuff, hiraditya, aheejin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82466
2020-06-25 21:13:31 +00:00
Simon Pilgrim 1815b77c3e LiveIntervals.h.h - reduce AliasAnalysis.h include to forward declaration. NFC.
Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.
2020-06-25 14:22:21 +01:00
Simon Pilgrim 792e4a8c97 CodeGenPrepare.cpp - remove unused IntrinsicsX86.h header. NFC. 2020-06-25 14:22:19 +01:00
Simon Pilgrim 172c36a100 Fix typos in CodeGenPrepare::splitLargeGEPOffsets comments. 2020-06-25 14:22:19 +01:00
Scott Linder 4d81aec40c [MIR] Fix CFI_INSTRUCTION escape printing
Summary:
The printer seems to intend to not print the trailing comma but has a
copy-paste error for the last value in the escape, and the parser
enforces having no trailing comma, but somehow a test was never included
to actually confirm it.

Reviewers: thegameg, arsenm

Reviewed By: thegameg, arsenm

Subscribers: wdng, arsenm, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82478
2020-06-24 18:15:28 -04:00
Simon Pilgrim a53dddb3e9 Local.h - reduce includes to forward declarations. NFC.
Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.
2020-06-24 19:27:37 +01:00
Simon Pilgrim bf77c7ef2d Loads.h - reduce AliasAnalysis.h include to forward declarations. NFC.
Fix implicit include dependencies in source files.
2020-06-24 13:49:04 +01:00
Eli Friedman a2caa3b614 Remove GlobalValue::getAlignment().
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value.  If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.

This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.

Differential Revision: https://reviews.llvm.org/D80368
2020-06-23 19:13:42 -07:00
Eli Friedman e9d4e34ab8 [AArch64][SVE] Add legalization support for i32/i64 vector srem/urem
Implement them on top of sdiv/udiv, similar to what we do for integer
types.

Potential future work: implementing i8/i16 srem/urem, optimizations for
constant divisors, optimizing the mul+sub to mls.

Differential Revision: https://reviews.llvm.org/D81511
2020-06-23 16:27:52 -07:00
hsmahesha 5832950adb [AMDGPU/MemOpsCluster] Compute `width` for `MIMG` instruction class.
Summary:
`width` computation is missing for newly added `MIMG`
instruction class. Add it.

Reviewers: foad, rampitec, arsenm

Reviewed By: foad

Subscribers: MatzeB, javed.absar, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81649
2020-06-23 17:32:17 +05:30
Kerry McLaughlin 5080503174 [SVE][CodeGen] Legalisation of vsetcc with scalable types
Summary: Changes SplitVecOp_VSETCC to use getVectorElementCount()

Reviewers: sdesmalen, efriedma, dancgr

Reviewed By: efriedma

Subscribers: david-arm, tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79167
2020-06-23 11:56:29 +01:00
Simon Pilgrim bcc0dc3832 [DAG] visitSIGN_EXTEND_INREG - rename EVT variable. NFCI.
We had a EVT type variable called EVT, which isn't a good idea....
2020-06-23 10:45:27 +01:00
Paul Walker 499c63288f [SVE] Code generation for fixed length vector loads & stores.
Summary:
This patch adds base support for code generating fixed length
vector operations targeting a known SVE vector length. To achieve
this we lower fixed length vector operations to equivalent scalable
vector operations, whereby SVE predication is used to limit the
elements processed to those present within the fixed length vector.

Specifically this patch implements load and store operations, which
get lowered to their masked counterparts thusly:

  V = load(Addr) =>
    V = extract_fixed_vector(masked_load(make_pred(V.NumElts), Addr))

  store(V, (Addr)) =>
    masked_store(insert_fixed_vector(V), make_pred(V.NumElts), Addr))

Reviewers: rengolin, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80385
2020-06-23 09:39:03 +00:00
Simon Pilgrim 0acd22b8fb StatepointLowering.cpp - fix implicit CommandLine.h dependency. NFC.
StatepointLowering defines a cl::opt but don't include CommandLine.h.
2020-06-23 09:43:39 +01:00
Michael Liao b1360caa82 [SDAG] Add new AssertAlign ISD node.
Summary:
- AssertAlign node records the guaranteed alignment on its source node,
  where these alignments are retrieved from alignment attributes in LLVM
  IR. These tracked alignments could help DAG combining and lowering
  generating efficient code.
- In this patch, the basic support of AssertAlign node is added. So far,
  we only generate AssertAlign nodes on return values from intrinsic
  calls.
- Addressing selection in AMDGPU is revised accordingly to capture the
  new (base + offset) patterns.

Reviewers: arsenm, bogner

Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81711
2020-06-23 00:51:11 -04:00
stozer 539381da26 [DebugInfo] Update MachineInstr to help support variadic DBG_VALUE instructions
Following on from this RFC[0] from a while back, this is the first patch towards
implementing variadic debug values.

This patch specifically adds a set of functions to MachineInstr for performing
operations specific to debug values, and replacing uses of the more general
functions where appropriate. The most prevalent of these is replacing
getOperand(0) with getDebugOperand(0) for debug-value-specific code, as the
operands corresponding to values will no longer be at index 0, but index 2 and
upwards: getDebugOperand(x) == getOperand(x+2). Similar replacements have been
added for the other operands, along with some helper functions to replace
oft-repeated code and operate on a variable number of value operands.

[0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139376.html<Paste>

Differential Revision: https://reviews.llvm.org/D81852
2020-06-22 16:01:12 +01:00
Simon Pilgrim 48d1a2d6d0 [DAG] Add SimplifyMultipleUseDemandedVectorElts helper for SimplifyMultipleUseDemandedBits. NFCI.
We have many cases where we call SimplifyMultipleUseDemandedBits and demand specific vector elements, but all the bits from them - this adds a helper wrapper to handle this.
2020-06-22 14:24:39 +01:00
Simon Pilgrim ecc5d7ee0d [DAG] SimplifyMultipleUseDemandedBits - drop unnecessary *_EXTEND_VECTOR_INREG cases
For little endian targets, if we only need the lowest element and none of the extended bits then we can just use the (bitcasted) source vector directly.

We already do this in SimplifyDemandedBits, this adds the SimplifyMultipleUseDemandedBits equivalent.
2020-06-22 12:35:32 +01:00
Tres Popp 09d72ad399 Revert "[CGP] Enable CodeGenPrepares phi type convertion."
This reverts commit 67121d7b82.

This is causing compile times to be 2x slower on some large binaries.
2020-06-22 13:06:18 +02:00
David Green 67121d7b82 [CGP] Enable CodeGenPrepares phi type convertion. 2020-06-21 16:46:16 +01:00
David Green 730ecb63ec [CGP] Convert phi types
If a collection of interconnected phi nodes is only ever loaded, stored
or bitcast then we can convert the whole set to the bitcast type,
potentially helping to reduce the number of register moves needed as the
phi's are passed across basic block boundaries. This has to be done in
CodegenPrepare as it naturally straddles basic blocks.

The alorithm just looks from phi nodes, looking at uses and operands for
a collection of nodes that all together are bitcast between float and
integer types. We record visited phi nodes to not have to process them
more than once. The whole subgraph is then replaced with a new type.
Loads and Stores are bitcast to the correct type, which should then be
folded into the load/store, changing it's type.

This comes up in the biquad testcase due to the way MVE needs to keep
values in integer registers. I have also seen it come up from aarch64
partner example code, where a complicated set of sroa/inlining produced
integer phis, where float would have been a better choice.

I also added undef and extract element handling which increased the
potency in some cases.

This adds it with an option that defaults to off, and disabled for 32bit
X86 due to potential issues around canonicalizing NaNs.

Differential Revision: https://reviews.llvm.org/D81827
2020-06-21 15:54:17 +01:00
David Sherwood 584d0d5c17 [SVE] Fall back on DAG ISel at -O0 when encountering scalable types
At the moment we use Global ISel by default at -O0, however it is
currently not capable of dealing with scalable vectors for two
reasons:

1. The register banks know nothing about SVE registers.
2. The LLT (Low Level Type) class knows nothing about scalable
   vectors.

For now, the easiest way to avoid users hitting issues when using
the SVE ACLE is to fall back on normal DAG ISel when encountering
instructions that operate on scalable vector types.

I've added a couple of RUN lines to existing SVE tests to ensure
we can compile at -O0. I've also added some new tests to

  CodeGen/AArch64/GlobalISel/arm64-fallback.ll

that demonstrate we correctly fallback to DAG ISel at -O0 when
lowering formal arguments or translating instructions that involve
scalable vector types.

Differential Revision: https://reviews.llvm.org/D81557
2020-06-19 10:57:00 +01:00
Jay Foad 7cdf4326a8 [LiveIntervals] Fix early-clobber handling in handleMoveUp
Without this fix, handleMoveUp can create an invalid live range like
this:

[98904e,98908r:0)[98908e,227504r:1)

where the two segments overlap, but only because we have lost the "e"
(early-clobber) on the end point of the first segment.

Differential Revision: https://reviews.llvm.org/D82110
2020-06-19 10:17:04 +01:00
David Sherwood 7edc7f6edb [CodeGen] Fix SimplifyDemandedBits for scalable vectors
For now I have changed SimplifyDemandedBits and it's various callers
to assume we know nothing for scalable vectors and to ignore the
demanded bits completely. I have also done something similar for
SimplifyDemandedVectorElts. These changes fix up lots of warnings
due to calls to EVT::getVectorNumElements() for types with scalable
vectors. These functions are all used for optimisations, rather than
functional requirements. In future we can revisit this code if
there is a need to improve code quality for SVE.

Differential Revision: https://reviews.llvm.org/D80537
2020-06-19 07:59:35 +01:00
David Sherwood 9e811b0d93 [CodeGen] Fix ComputeNumSignBits for scalable vectors
When trying to calculate the number of sign bits for scalable vectors
we should just bail out for now and pretend we know nothing.

Differential Revision: https://reviews.llvm.org/D81093
2020-06-19 07:58:42 +01:00
Vitaly Buka fcd67665a8 [StackSafety] Add "Must Live" logic
Summary:
Extend StackLifetime with option to calculate liveliness
where alloca is only considered alive on basic block entry
if all non-dead predecessors had it alive at terminators.

Depends on D82043.

Reviewers: eugenis

Reviewed By: eugenis

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82124
2020-06-18 16:53:37 -07:00
Nathan James 8b0df1c1a9
[NFC] Refactor Registry loops to range for 2020-06-19 00:40:10 +01:00
Matt Arsenault 95605b784b AMDGPU/GlobalISel: Implement computeKnownAlignForTargetInstr
We probably need to move where intrinsics are lowered to copies to
make this useful.
2020-06-18 17:28:00 -04:00
Matt Arsenault 7f8b2e1b91 GlobalISel: Pass LegalizerHelper to custom legalize callbacks
This was passing in all the parameters needed to construct a
LegalizerHelper in the custom legalization, when it's simpler to just
pass in the existing helper.

This is slightly more annoying to use in the common case where you
don't need the legalizer helper, but we could add back the common
parameters back in addition to the helper.

I didn't propagate this to all the internal target changes that this
logically implies, but did update a sample one for
legalizeMinNumMaxNum.

This is in preparation for moving AMDGPU load/store legalization
entirely into custom lowering. The current set of legalization actions
is really constraining and not really capable of expressing all the
actions needed to legalize loads/stores. In particular there's no way
to express when the memory access itself needs to change size vs. the
result type. There's also a lot of redundancy since the same
split/widen actions need to be applied in both vector and scalar
cases. All of the sub-cases logically belong as steps in the legalizer
helper, but it will be easier to consider everything at once in custom
lowering.
2020-06-18 17:17:38 -04:00
Alexandre Ganea 2ae0df5be7 [CodeView] Revert 8374bf4363 and 403f953792
This reverts:
8374bf4363 [CodeView] Fix generated command-line expansion in LF_BUILDINFO. Fix the 'pdb' entry which was previously a null reference, now an empty string.
403f953792 [CodeView] Add full repro to LF_BUILDINFO record

This is causing the lld/test/COFF/pdb-relative-source-lines.test to fail: http://lab.llvm.org:8011/builders/lld-x86_64-win/builds/1096/steps/test-check-all/logs/FAIL%3A%20lld%3A%3Apdb-relative-source-lines.test
And clang/test/CodeGen/debug-info-codeview-buildinfo.c fails as well: http://lab.llvm.org:8011/builders/clang-s390x-linux/builds/33346/steps/ninja%20check%201/logs/FAIL%3A%20Clang%3A%3Adebug-info-codeview-buildinfo.c
2020-06-18 16:18:46 -04:00
Simon Pilgrim 2474421398 [TargetLowering] SimplifyMultipleUseDemandedBits - drop already extended ISD::SIGN_EXTEND_INREG nodes.
If the source of the SIGN_EXTEND_INREG node is already sign extended, use the source directly.
2020-06-18 16:41:08 +01:00
Alexandre Ganea 8374bf4363 [CodeView] Fix generated command-line expansion in LF_BUILDINFO. Fix the 'pdb' entry which was previously a null reference, now an empty string.
Previously, the DIA SDK didn't like the empty reference in the 'pdb' entry.
2020-06-18 10:07:30 -04:00
Alexandre Ganea 403f953792 [CodeView] Add full repro to LF_BUILDINFO record
This patch adds some missing information to the LF_BUILDINFO which allows for rebuilding an .OBJ without any external dependency but the .OBJ itself (other than the compiler executable).

Some tools need this information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO therefore stores a full path to the compiler, the PWD (which is the CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variable). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.

For more information see PR36198 and D43002.

Differential Revision: https://reviews.llvm.org/D80833
2020-06-18 09:17:15 -04:00
Lucas Prates a255931c40 [ARM] Supporting lowering of half-precision FP arguments and returns in AArch32's backend
Summary:
Half-precision floating point arguments and returns are currently
promoted to either float or int32 in clang's CodeGen and there's
no existing support for the lowering of `half` arguments and returns
from IR in AArch32's backend.

Such frontend coercions, implemented as coercion through memory
in clang, can cause a series of issues in argument lowering, as causing
arguments to be stored on the wrong bits on big-endian architectures
and incurring in missing overflow detections in the return of certain
functions.

This patch introduces the handling of half-precision arguments and returns in
the backend using the actual "half" type on the IR. Using the "half"
type the backend is able to properly enforce the AAPCS' directions for
those arguments, making sure they are stored on the proper bits of the
registers and performing the necessary floating point convertions.

Reviewers: rjmccall, olista01, asl, efriedma, ostannard, SjoerdMeijer

Reviewed By: ostannard

Subscribers: stuij, hiraditya, dmgreen, llvm-commits, chill, dnsampaio, danielkiss, kristof.beyls, cfe-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75169
2020-06-18 13:15:13 +01:00
Jeremy Morse 3626eba11f [NFC][LiveDebugValues] Document how LiveDebugValues operates
We're missing a plain English explanation of how this pass is supposed
to operate -- add one to the file comment.

Differential Revision: https://reviews.llvm.org/D80929
2020-06-18 10:54:09 +01:00
David Sherwood 7e30ef77f6 [CodeGen] Fix warnings in getVectorTypeBreakdown
Added NextPowerOf2() routine to TypeSize and rewritten the code
in getVectorTypeBreakdown to avoid warnings being generated.

Differential Revision: https://reviews.llvm.org/D81578
2020-06-18 09:54:16 +01:00
David Sherwood 65912a9768 [CodeGen] Fix warnings in foldCONCAT_VECTORS
Instead of asserting the number of elements is the same, we should be
comparing the element counts instead. In addition, when looking at
concats of extract_subvectors it's fine to use getVectorMinNumElements()
for scalable vectors.

I discovered these warnings when compiling the structured loads tests in
this file:

  test/CodeGen/AArch64/sve-intrinsics-loads.ll

Differential Revision: https://reviews.llvm.org/D81936
2020-06-18 09:29:37 +01:00
Nick Desaulniers e7816f263b [InlineSpiller] add assert about spills post terminators
Summary:
This invariant is being violated in the test case
https://reviews.llvm.org/D77849, related to the use of the relatively
new ability for callbr to have return values, and MachineBasicBlocks
with INLINEASM_BR terminators to emit live out register defs.

As noted in the comment, this triggers invariant violations in
MachineVerifier via `llc -verify-machineinstrs` or
`llc -verify-regalloc`, since only MachineInstrs that are terminators
are allowed to follow the first terminator.

https://reviews.llvm.org/D75098 may rework this very assertion if we're
spilling via a (proposed) TCOPY MachineInstr.

Reviewers: void, efriedma, arsenm

Reviewed By: efriedma

Subscribers: qcolombet, wdng, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78166
2020-06-17 11:51:58 -07:00
Davide Italiano 1cbaf847ab [CGP] Reset the debug location when promoting zext(s).
When the zext gets promoted, it used to retain the original location,
which pessimizes the debugging experience causing an unexpected
jump in stepping at -Og.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46120 (which also
contains a full C repro).

Differential Revision:  https://reviews.llvm.org/D81437
2020-06-17 11:13:13 -07:00
Ian Levesque 7c7c8e0da4 [xray] Option to omit the function index
Summary:
Add a flag to omit the xray_fn_idx to cut size overhead and relocations
roughly in half at the cost of reduced performance for single function
patching.  Minor additions to compiler-rt support per-function patching
without the index.

Reviewers: dberris, MaskRay, johnislarry

Subscribers: hiraditya, arphaman, cfe-commits, #sanitizers, llvm-commits

Tags: #clang, #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D81995
2020-06-17 13:49:01 -04:00
Vitaly Buka d812efb121 [SafeStack,NFC] Fix names after files move
Summary: Depends on D81831.

Reviewers: eugenis, pcc

Reviewed By: eugenis

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81832
2020-06-17 01:08:40 -07:00
Vitaly Buka 6754a0e2ed [SafeStack,NFC] Move SafeStackColoring code
Summary:
This code is going to be used in StackSafety.
This patch is file move with minimal changes. Identifiers
will be fixed in the followup patch.

Reviewers: eugenis, pcc

Reviewed By: eugenis

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81831
2020-06-17 01:07:47 -07:00
Aaron Smith 7e01675ea5 [SelectionDAG] Add MVT::bf16 to getConstantFP()
Summary:
This was probably overlooked in recent bfloat patches.
Needed to handle bf16 constants in SelectionDAG.

  ConstantFP:bf16<APFloat(0)>

Reviewers: stuij

Reviewed By: stuij

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81779
2020-06-16 15:10:05 -07:00
Matt Arsenault e4f19d1dda GlobalISel: Fix not failing on widening G_INSERT_VECTOR_ELT
This doesn't actually handled type idx 0, but was reporting Legalized
on it. No test changes because nothing was trying to use this.
2020-06-16 15:48:57 -04:00
Matt Arsenault 8a3340d25d GlobalISel: Use early return and reduce indentation 2020-06-16 14:47:08 -04:00
Fangrui Song 4799fb63b5 [GlobalISel] Delete unused variable after r353432 2020-06-16 08:32:09 -07:00
Jessica Paquette 5a4c3f6b06 [GlobalISel] Look through extends etc in CombinerHelper::matchConstantOp
It's possible to end up with a zext or something in the way of a G_CONSTANT,
even pre-legalization. This can happen with memsets.

e.g.

https://godbolt.org/z/Bjc8cw

To make sure we can catch these cases, use `getConstantVRegValWithLookThrough`
instead of `mi_match`.

Differential Revision: https://reviews.llvm.org/D81875
2020-06-15 16:34:25 -07:00
Amara Emerson fc905ae003 [GlobalISel] Don't emit multiply by magic constant for zero memset values. 2020-06-15 14:42:14 -07:00
Davide Italiano c2dccf9d5e [CodeGenPrepare] Reset the debug location when promoting trunc(s)
The promotion machinery in CGP moves instructions retaining
debug locations. When the transformation is local, this is mostly
correct, but when instructions are moved cross-BBs, this is not
always true and causes jumpiness in line tables. This is the first
of a series of commits. sext(s) and zext(s) need to be treated
similarly.

Differential Revision:  https://reviews.llvm.org/D81879
2020-06-15 14:25:43 -07:00
Jessica Paquette 1ac8451a9b [GlobalISel] Simplify G_ADD when it has (0-X) on the LHS or RHS
This implements the following combines:

((0-A) + B) -> B-A
(A + (0-B)) -> A-B

Porting over the basic algebraic combines from the DAGCombiner. There are
several combines which fold adds away into subtracts. This is just the simplest
one.

I noticed that add combines are some of the most commonly hit across CTMark,
(via print statements when they fire), so I'm porting over some of the obvious
ones.

This gives some minor code size improvements on CTMark at -O3 on AArch64.

Differential Revision: https://reviews.llvm.org/D77453
2020-06-15 09:43:24 -07:00
Dominik Montada 87e5742654 [NFC] Add braces to if-statement in MachineVerifier 2020-06-15 16:33:56 +02:00
Matt Arsenault 33e9086501 GlobalISel: Support lowering vector->vector G_BITCAST
Extract subvectors and cast to the result element type before
remerging.
2020-06-15 07:36:30 -04:00
Dominik Montada c87bf29149 [MachineVerifier][GlobalISel] Check that branches have a MBB operand or are declared indirect. Add missing properties to G_BRJT, G_BRINDIRECT
Summary:
Teach MachineVerifier to check branches for MBB operands if they are not declared indirect.

Add `isBarrier`, `isIndirectBranch` to `G_BRINDIRECT` and `G_BRJT`.
Without these, `MachineInstr.isConditionalBranch()` was giving a
false-positive for those instructions.

Reviewers: aemerson, qcolombet, dsanders, arsenm

Reviewed By: dsanders

Subscribers: hiraditya, wdng, simoncook, s.egerton, arsenm, rovka, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81587
2020-06-15 11:17:09 +02:00
Vitaly Buka ca2dcbd030 [SafeStack,NFC] Make StackColoring read-only
Move core which removes markers out of StackColoring.
2020-06-14 23:05:43 -07:00
Vitaly Buka c6426e2657 [SafeStack,NFC] Remove unneded branch 2020-06-14 23:05:43 -07:00
Vitaly Buka 7282da1ea8 [SafeStack,NFC] Fix naming style 2020-06-14 23:05:42 -07:00
Vitaly Buka 2f5e535a84 [SafeStack,NFC] Cleanup LiveRange interface 2020-06-14 23:05:42 -07:00
Vitaly Buka adefa9ca2e [SafeStack,NFC] "const" cleanup 2020-06-14 23:05:42 -07:00
Vitaly Buka fb1e0f324f [SafeStack,NFC] Add BlockLifetimeInfo constructor 2020-06-14 23:05:42 -07:00
Vitaly Buka 645058036a [SafeStack,NFC] Use IntrinsicInst instead of Instruction 2020-06-14 23:05:41 -07:00
Vitaly Buka f8e411656e [SafeStack,NFC] Move ClColoring into SafeStack.cpp
This allows to reuse the code in other components.
2020-06-14 23:05:41 -07:00
Vitaly Buka 05590a9cb8 [SafeStack,NFC] Move unconditional code into constructor
Prepare to move ClColoring from SafeStackCode to SafeStackLayout.
This will allow to reuse the code in other components.
2020-06-14 23:05:41 -07:00
Chen Zheng bd7096b977 [PowerPC] fma chain break to expose more ILP
This patch tries to reassociate two patterns related to FMA to expose
more ILP on PowerPC.

// Pattern 1:
//   A =  FADD X,  Y          (Leaf)
//   B =  FMA  A,  M21,  M22  (Prev)
//   C =  FMA  B,  M31,  M32  (Root)
// -->
//   A =  FMA  X,  M21,  M22
//   B =  FMA  Y,  M31,  M32
//   C =  FADD A,  B

// Pattern 2:
//   A =  FMA  X,  M11,  M12  (Leaf)
//   B =  FMA  A,  M21,  M22  (Prev)
//   C =  FMA  B,  M31,  M32  (Root)
// -->
//   A =  FMUL M11,  M12
//   B =  FMA  X,  M21,  M22
//   D =  FMA  A,  M31,  M32
//   C =  FADD B,  D

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D80175
2020-06-15 00:00:04 -04:00
Qiu Chaofan f8ef7c99a0 [DAGCombiner] Require ninf for division estimation
Current implementation of division estimation isn't correct for some
cases like 1.0/0.0 (result is nan, not expected inf).

And this change exposes a potential infinite loop: we use
isConstOrConstSplatFP in combineRepeatedFPDivisors to look up if the
divisor is some constant. But it doesn't work after legalized on some
platforms. This patch restricts the method to act before LegalDAG.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D80542
2020-06-14 22:58:22 +08:00
Amanieu d'Antras 6973125cb7 Fix FastISel dropping srcloc metadata from InlineAsm
Summary:
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=46060

I've also added the Extra_IsConvergent flag which was missing from FastISel.

Reviewers: echristo

Reviewed By: echristo

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80759
2020-06-13 16:52:37 +01:00
Roman Lebedev 17f7654152
[NFCI][MachineCopyPropagation] invalidateRegister(): use SmallSet<8> instead of DenseSet.
This decreases the time consumed by the pass [during RawSpeed unity build]
by 25% (0.0586 s -> 0.04388 s).

While that isn't really impressive overall, that wasn't the goal here.
The memory results here are noticeable.
The baseline results are:
```
total runtime: 55.65s.
calls to allocation functions: 19754254 (354960/s)
temporary memory allocations: 4951609 (88974/s)
peak heap memory consumption: 239.13MB
peak RSS (including heaptrack overhead): 463.79MB
total memory leaked: 198.01MB
```
While with this patch the results are:
```
total runtime: 55.37s.
calls to allocation functions: 19068237 (344403/s)   # -3.47 %
temporary memory allocations: 4261772 (76974/s)      # -13.93 % (!!!)
peak heap memory consumption: 239.13MB
peak RSS (including heaptrack overhead): 463.73MB
total memory leaked: 198.01MB
```

So we get rid of *a lot* of temporary allocations.

Using `SmallSet<8>` makes sense to me because at least here
for x86 BdVer2, the size of that set is *never* more than 3,
over all of llvm test-suite + RawSpeed.

The story might be different on other targets,
not sure if it will ever justify whole DenseSet,
but if it does SmallDenseSet might be a compromise.
2020-06-12 23:10:54 +03:00
Michael Liao e7b920e6fe [DAGCombine] Generalize the case (add (or x, c1), c2) -> (add x, (c1 + c2))
Reviewers: arsenm

Subscribers: sdardis, wdng, hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, ecnelises, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81708
2020-06-12 13:53:08 -04:00
Matt Arsenault 350ee7fb3f GlobalISel: Fix not erasing old instruction in sitofp/uitofp lowering 2020-06-12 10:33:23 -04:00
Simon Pilgrim 5509e2cc2e [DAG] foldAddSubOfSignBit - add support for non-uniform vector constants 2020-06-12 14:58:15 +01:00
diggerlin c6be3ea524 [NFC] clean up the AsmPrinter::emitLinkage for AIX part
SUMMARY:

Since we deal with aix emitLinkage in the PPCAIXAsmPrinter::emitLinkage() in the patch https://reviews.llvm.org/D75866. It do not go to AsmPrinter::emitLinkage() any more, we clean up some aix related code in the AsmPrinter::emitLinkage()

Reviewers:  Jason liu

Differential Revision: https://reviews.llvm.org/D81613
2020-06-11 13:33:51 -04:00
Petar Avramovic bd3d951b8b AMDGPU/GlobalISel: Fix lower for f64->f16 G_FPTRUNC
Put AND before ADD in LegalizerHelper::lowerFPTRUNC_F64_TO_F16
in order to match algorithm from AMDGPUTargetLowering::LowerFP_TO_FP16.

Differential Revision: https://reviews.llvm.org/D81666
2020-06-11 18:19:27 +02:00
Dominik Montada f24e2e9eeb [GlobalISel] fix crash in IRTranslator, MachineIRBuilder when translating @llvm.dbg.value intrinsic and using -debug
Summary:
Fix crash when using -debug caused by the GlobalISel observer trying to print
an incomplete DBG_VALUE instruction. This was caused by the MachineIRBuilder
using buildInstr, which immediately inserts the instruction causing print,
instead of using BuildMI to first build up the instruction and using
insertInstr when finished.

Add RUN-line to existing debug-insts.ll test with -debug flag set to make sure
no crash is happening.

Also fixed a missing %s in the 2nd RUN-line of the same test.

Reviewers: t.p.northover, aditya_nandakumar, aemerson, dsanders, arsenm

Reviewed By: arsenm

Subscribers: wdng, arsenm, rovka, hiraditya, volkan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76934
2020-06-11 10:47:49 +02:00
David Sherwood bd97342a0c [CodeGen] Let computeKnownBits do something sensible for scalable vectors
Until we have a real need for computing known bits for scalable
vectors I have simply changed the code to bail out for now and
pretend we know nothing. I've also fixed up some simple callers of
computeKnownBits too.

Differential Revision: https://reviews.llvm.org/D80437
2020-06-11 08:17:11 +01:00
Matt Arsenault 0671a4c508 RegAllocFast: Avoid unused method warning in release builds 2020-06-10 15:23:56 -04:00
Matt Arsenault 0f2af15c1b GlobalISel: Make default implementation of legalizeCustom unreachable
If the target explicitly requested custom legalization, it should be
required to implement this. Also move default legalizeIntrinsic
implementation into the header so it's next to the related
legalizeCustom.
2020-06-10 11:05:59 -04:00
Wang, Pengfei 6eb9eae010 [MS] Copy the symbols assigned to the former instruction when memory folding.
The memory folding raplaced the old instruction without copying the symbols assigned. Which will resulted in built fail due to the lost symbols.

Reviewed by craig.topper

Differential Revision: https://reviews.llvm.org/D78471
2020-06-10 15:38:32 +08:00
diggerlin edd819c757 [AIX] supporting the visibility attribute for aix assembly
SUMMARY:

in the aix assembly , it do not have .hidden and .protected directive.
in current llvm. if a function or a variable which has visibility attribute, it will generate something like the .hidden or .protected , it can not recognize by aix as.
in aix assembly, the visibility attribute are support in the pseudo-op like
.extern Name [ , Visibility ]
.globl Name [, Visibility ]
.weak Name [, Visibility ]

in this patch, we implement the visibility attribute for the global variable, function or extern function .

for example.

extern __attribute__ ((visibility ("hidden"))) int
  bar(int* ip);
__attribute__ ((visibility ("hidden"))) int b = 0;
__attribute__ ((visibility ("hidden"))) int
  foo(int* ip){
   return (*ip)++;
}
the visibility of .comm linkage do not support , we will have a separate patch for it.
we have the unsupported cases ("default" and "internal") , we will implement them in a a separate patch for it.

Reviewers: Jason Liu ,hubert.reinterpretcast,James Henderson

Differential Revision: https://reviews.llvm.org/D75866
2020-06-09 16:15:06 -04:00
Matt Arsenault 32823091c3 GlobalISel: Set instr/debugloc before any legalizer action
It was annoying enough that every custom lowering needed to set the
insert point, but this was made worse since now these all needed to be
updated to setInstrAndDebugLoc. Consolidate these so every
legalization action has the right insert position by default.

This should fix dropping debug info in every custom AMDGPU
legalization.
2020-06-09 15:37:02 -04:00
Matt Arsenault b94c9e3b55 GlobalISel: Improve MachineIRBuilder construction
The current relationship between LegalizerHelper and MachineIRBuilder
confuses me, because the LegalizerHelper modifies the MachineIRBuilder
which it does not own. Constructing a LegalizerHelper destroys the
insert point, since the constructor calls setMF, which clears all the
fields. Try to separate these functions, so it's possible to construct
a LegalizerHelper from an existing MachineIRBuilder without losing the
insert point/debug loc.
2020-06-09 15:05:04 -04:00
Matt Arsenault babbf4441b GlobalISel: Move some trivial MIRBuilder methods into the header
The construction APIs for MachineIRBuilder don't make much sense, and
it's been annoying to sort through it with these trivial functions
separate from the declaration.
2020-06-09 15:04:48 -04:00
Matt Arsenault bb6cb6bfe4 GlobalISel: Remove redundant check in verifier
This was already checked earlier for all instructions.
2020-06-09 15:04:27 -04:00
Matt Arsenault 6eeac6ae33 GlobalISel: Fix double printing new instructions in legalizer
New instructions were getting printed both in createdInstr, and in the
final printNewInstrs, so it made it look like the same instructions
were created twice. This overall made reading the debug output
harder. Stop printing the initial construction and only print new
instructions in the summary at the end. This avoids printing the less
useful case where instructions are sometimes initially created with no
operands.

I'm not sure this is the correct instance to remove; now the visible
ordering is different. Now you will typically see the one erased
instruction message before all the new instructions in order. I think
this is the more logical view of typical legalization changes,
although it's mechanically backwards from the normal
insert-new-erase-old pattern.
2020-06-09 15:02:31 -04:00
David Green 2fea3fe41c [MachineScheduler] Update available queue on the first mop of a new cycle
If a resource can be held for multiple cycles in the schedule model
then an instruction can be placed into the available queue, another
instruction can be scheduled, but the first will not be taken back out if
the two instructions hazard. To fix this make sure that we update the
available queue even on the first MOp of a cycle, pushing available
instructions back into the pending queue if they now conflict.

This happens with some downstream schedules we have around MVE
instruction scheduling where we use ResourceCycles=[2] to show the
instruction executing over two beats. Apparently the test changes here
are OK too.

Differential Revision: https://reviews.llvm.org/D76909
2020-06-09 19:13:53 +01:00
Sanjay Patel 702cf93356 [DAGCombiner] allow more folding of fadd + fmul into fma
If fmul and fadd are separated by an fma, we can fold them together
to save an instruction:
fadd (fma A, B, (fmul C, D)), N1 --> fma(A, B, fma(C, D, N1))

The fold implemented here is actually a specialization - we should
be able to peek through >1 fma to find this pattern. That's another
patch if we want to try that enhancement though.

This transform was guarded by the TLI hook enableAggressiveFMAFusion(),
so it was done for some in-tree targets like PowerPC, but not AArch64
or x86. The hook is protecting against forming a potentially more
expensive computation when fma takes longer to execute than a single
fadd. That hook may be needed for other transforms, but in this case,
we are replacing fmul+fadd with fma, and the fma should never take
longer than the 2 individual instructions.

'contract' FMF is all we need to allow this transform. That flag
corresponds to -ffp-contract=fast in Clang, so we are allowed to form
fma ops freely across expressions.

Differential Revision: https://reviews.llvm.org/D80801
2020-06-09 10:41:27 -04:00
Guillaume Chatelet 800e100588 Revert "[Alignment][NFC] Migrate TargetLowering::allowsMemoryAccess"
This reverts commit f21c52667e.
2020-06-09 10:43:59 +00:00
Simon Wallis 4dba59689d [ARM] prologue instructions emitted for naked function with >64 byte argument
Summary:

The naked function attribute is meant to suppress all function
prologue/epilogue instructions.

On ARM, some are still emitted if an argument greater than 64 bytes in size
(the threshold for using the byval attribute in IR) is passed partially
in registers.

Perform the check for Attribute::Naked and early exit in
SelectionDAGISel::LowerArguments().

Checking in ARMFrameLowering::determineCalleeSaves() is too late.

A test case is included.

Reviewers: llvm-commits, olista01, danielkiss

Reviewed By: danielkiss

Subscribers: kristof.beyls, hiraditya, danielkiss

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80715

Change-Id: Icedecf2a4ad31bc3c35ab0df7489a9d346e1f7cc
2020-06-09 11:33:03 +01:00
Guillaume Chatelet 3b6196c9b3 [Alignment][NFC] TargetLowering::allowsMisalignedMemoryAccesses
Summary:
Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMisalignedMemoryAccesses` without marking it override.

This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81374
2020-06-09 10:17:42 +00:00
Guillaume Chatelet f21c52667e [Alignment][NFC] Migrate TargetLowering::allowsMemoryAccess
Summary:
Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMemoryAccess` without marking it override.

This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81379
2020-06-09 10:11:07 +00:00
Guillaume Chatelet e26ed6bdae Fix unused variable warning 2020-06-09 08:56:05 +00:00
Kang Zhang 1b6602275d [MachineVerifier] Add TiedOpsRewritten flag to fix verify two-address error
Summary:
Currently, MachineVerifier will attempt to verify that tied operands
satisfy register constraints as soon as the function is no longer in
SSA form. However, PHIElimination will take the function out of SSA
form while TwoAddressInstructionPass will actually rewrite tied operands
to match the constraints. PHIElimination runs first in the pipeline.
Therefore, whenever the MachineVerifier is run after PHIElimination,
it will encounter verification errors on any tied operands.

This patch adds a function property called TiedOpsRewritten that will be
set by TwoAddressInstructionPass and will control when the verifier checks
tied operands.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D80538
2020-06-09 07:39:42 +00:00
David Sherwood cc8872400c [CodeGen] Ensure callers of CreateStackTemporary use sensible alignments
In two instances of CreateStackTemporary we are sometimes promoting
alignments beyond the stack alignment. I have introduced a new function
called getReducedAlign that will return the alignment for the broken
down parts of illegal vector types. For example, on NEON a <32 x i8>
type is made up of two <16 x i8> types - in this case the sensible
alignment is 16 bytes, not 32.

In the legalization code wherever we create stack temporaries I have
started using the reduced alignments instead for illegal vector types.

I added a test to

  CodeGen/AArch64/build-one-lane.ll

that tries to insert an element into an illegal fixed vector type
that involves creating a temporary stack object.

Differential Revision: https://reviews.llvm.org/D80370
2020-06-09 08:10:17 +01:00
Yonghong Song 3eb465a329 [DebugInfo] Fix assertion for extern void type
Commit d77ae1552f ("[DebugInfo] Support to emit debugInfo
for extern variables") added support to emit debuginfo
for extern variables. Currently, only BPF target enables to
emit debuginfo for extern variables.

But if the extern variable has "void" type, the compilation will
fail.

  -bash-4.4$ cat t.c
  extern void bla;
  void *test() {
    void *x = &bla;
    return x;
  }
  -bash-4.4$ clang -target bpf -g -O2 -S t.c
  missing global variable type
  !1 = distinct !DIGlobalVariable(name: "bla", scope: !2, file: !3, line: 1,
                                  isLocal: false, isDefinition: false)
  ...
  fatal error: error in backend: Broken module found, compilation aborted!
  PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace,
      preprocessed source, and associated run script.
  Stack dump:
  ...

The IR requires a DIGlobalVariable must have a valid type and the
"void" type does not generate any type, hence the above fatal error.

Note that if the extern variable is defined as "const void", the
compilation will succeed.

-bash-4.4$ cat t.c
extern const void bla;
const void *test() {
  const void *x = &bla;
  return x;
}
-bash-4.4$ clang -target bpf -g -O2 -S t.c
-bash-4.4$ cat t.ll
...
!1 = distinct !DIGlobalVariable(name: "bla", scope: !2, file: !3, line: 1,
                                type: !6, isLocal: false, isDefinition: false)
!6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: null)
...

Since currently, "const void extern_var" is supported by the
debug info, it is natural that "void extern_var" should also
be supported. This patch disabled assertion of "void extern_var"
in IR verifier and add proper guarding when emiting potential
null debug info type to dwarf types.

Differential Revision: https://reviews.llvm.org/D81131
2020-06-08 13:43:18 -07:00
Andrew Litteken bb677cacc8 [SuffixTree][MachOpt] Factoring out Suffix Tree and adding Unit Tests
This moves the SuffixTree test used in the Machine Outliner and moves it into Support for use in other outliners elsewhere in the compilation pipeline.

Differential Revision: https://reviews.llvm.org/D80586
2020-06-08 12:44:18 -07:00
Hendrik Greving f3d8a93970 [ModuloSchedule] Support instructions with > 1 destination when walking canonical use.
Fixes a minor bug that led to finding the wrong register if the definition had more
than one register destination.
2020-06-08 11:43:59 -07:00
Jan-Willem Maessen 3610d31e7a [NFC] Fix quadratic LexicalScopes::constructScopeNest
We sometimes have functions with large numbers of sibling basic
blocks (usually with an error path exit from each one). This was
triggering the qudratic behavior in this function - after visiting
each child llvm would re-scan the parent from the beginning again. We
modify the work stack to record the next index to be worked on
alongside the pointer. This avoids the need to linearly search for
the next unfinished child.

Differential Revision: https://reviews.llvm.org/D80029
2020-06-08 18:40:56 +01:00
Christopher Tetreault caa2fddce7 [SVE] Eliminate calls to default-false VectorType::get() from CodeGen
Reviewers: efriedma, c-rhodes, david-arm, spatel, craig.topper, aqjune, paquette, arsenm, gchatelet

Reviewed By: spatel, gchatelet

Subscribers: wdng, tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80313
2020-06-08 10:26:10 -07:00
Guillaume Chatelet 54076610dc [Alignment][NFC] Deprecate dead code from CallingConvLower.h
Summary: This is a followup on D81196.

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81362
2020-06-08 14:49:39 +00:00
Matt Arsenault 5f7e38d8f4 GlobalISel: Use Register 2020-06-08 10:15:53 -04:00
Matt Arsenault f13ba22227 GlobalISel: Remove unused header 2020-06-08 10:15:53 -04:00
Matt Arsenault f41994f85b GlobalISel: Make it clearer that regbank/class are mutually exclusive 2020-06-08 10:15:53 -04:00
Matt Arsenault c1d771dc4b GlobalISel: Simplify debug printing 2020-06-08 10:15:53 -04:00
Guillaume Chatelet 94b0c32a0b [Alignment][NFC] Migrate HandleByVal to Align
Summary: Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::HandleByVal` without marking it `override`.

This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81365
2020-06-08 10:50:27 +00:00
Sander de Smalen ae09670ee4 [CodeGen][SVE] CopyToReg: Split scalable EVTs that are not powers of 2
Scalable vectors cannot use 'BUILD_VECTOR', so it is necessary to
properly split and widen scalable vectors when passing them
to CopyToReg/CopyFromReg.

This functionality is added to TargetLoweringBase::getVectorTypeBreakdown().

This patch only adds support for 'splitting' scalable vectors that
are a multiple of some legal type, e.g.

      <vscale x 6 x i64> -> 3 x <vscale x 2 x i64>

Reviewers: efriedma, c-rhodes

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80139
2020-06-08 10:39:18 +01:00
James Y Knight 748d92b4d3 Simplify MachineVerifier's block-successor verification.
There's two properties we want to verify:

1. That the successors returned by analyzeBranch are in the CFG
   successor list, and
2. That there are no extraneous successors are in the CFG successor
   list.

The previous implementation mostly accomplished this, but in a very
convoluted manner.

Differential Revision: https://reviews.llvm.org/D79793
2020-06-06 22:30:51 -04:00
James Y Knight 1978309db1 MachineBasicBlock::updateTerminator now requires an explicit layout successor.
Previously, it tried to infer the correct destination block from the
successor list, but this is a rather tricky propspect, given the
existence of successors that occur mid-block, such as invoke, and
potentially in the future, callbr/INLINEASM_BR. (INLINEASM_BR, in
particular would be problematic, because its successor blocks are not
distinct from "normal" successors, as EHPads are.)

Instead, require the caller to pass in the expected fallthrough
successor explicitly. In most callers, the correct block is
immediately clear. But, in MachineBlockPlacement, we do need to record
the original ordering, before starting to reorder blocks.

Unfortunately, the goal of decoupling the behavior of end-of-block
jumps from the successor list has not been fully accomplished in this
patch, as there is currently no other way to determine whether a block
is intended to fall-through, or end as unreachable. Further work is
needed there.

Differential Revision: https://reviews.llvm.org/D79605
2020-06-06 22:30:51 -04:00
Simon Pilgrim f14d4c9c54 EHPersonalities.h - reduce Triple.h include to forward declaration. NFC.
Move implicit include dependencies down to source files.
2020-06-06 15:48:31 +01:00
Sanjay Patel 302cc8a121 [DAGCombiner] clean-up FMA+FMUL folds; NFC
D80801 suggests some readability improvements before mocing this block.
2020-06-06 10:32:54 -04:00
Nikita Popov cb5724c71e [CGP] Remove unnecessary MaybeAlign use (NFC)
Stores now always have an alignment.
2020-06-05 23:18:26 +02:00
Matt Arsenault eaa8af9322 GlobalISel: Add helper for constructing load from offset 2020-06-05 15:06:03 -04:00
Matt Arsenault 45e1a22a92 GlobalISel: Make known bits/alignment API more consistent
Just computing the alignment makes sense without caring about the
general known bits, such as for non-integral pointers. Separate the
two and start calling into the TargetLowering hooks for frame indexes.

Start calling the TargetLowering implementation for FrameIndexes,
which improves the AMDGPU matching for stack addressing modes. Also
introduce a new hook for returning known alignment of target
instructions. For AMDGPU, it would be useful to report the known
alignment implied by certain intrinsic calls.

Also stop using MaybeAlign.
2020-06-05 14:57:22 -04:00
Nikita Popov d370088611 [LiveDebugValues] Fix output stream (NFC)
This should dump to the provided Out, rather than dbgs(), though
they coincide in current usage.
2020-06-05 20:02:22 +02:00
Nikita Popov 6a53264926 [LiveDebugValues] Remove PendingInLocs (NFC)
PendingInLocs ends up having the same value as InLocs, just computed
a bit more indirectly. It is a leftover of a previous implementation
approach.

This patch drops PendingInLocs, as well as the Diff and Removed
calulations, which are no longer needed.

Differential Revision: https://reviews.llvm.org/D80868
2020-06-05 20:01:29 +02:00
Sander de Smalen 937cb7a8c7 Reland D80640: [CodeGen][SVE] Calculate correct type legalization for scalable vectors.
This reverts commit 9bcef270d7.
2020-06-05 18:09:31 +01:00
Sander de Smalen 9bcef270d7 Revert "[CodeGen][SVE] Calculate correct type legalization for scalable vectors."
Seems to break some buildbots, reverting the patch for now.

This reverts commit 164f4b9d26.
2020-06-05 16:03:52 +01:00
Sander de Smalen 164f4b9d26 [CodeGen][SVE] Calculate correct type legalization for scalable vectors.
This patch updates TargetLoweringBase::computeRegisterProperties and
TargetLoweringBase::getTypeConversion to support scalable vectors,
and make the right calls on how to legalise them. These changes are required
to legalise both MVTs and EVTs.

Reviewers: efriedma, david-arm, ctetreau

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80640
2020-06-05 15:20:34 +01:00
Denis Antrushin dae64d8f42 Fix build breakage caused by 66a1b83bf9 2020-06-05 15:53:09 +03:00
Denis Antrushin 66a1b83bf9 [TargetLowering][NFC] More efficient emitPatchpoint().
Current implementation of emitPatchpoint() is very inefficient:
for every FrameIndex operand if creates new MachineInstr with
that operand expanded and all other copied as is.
Since PATCHPOINT/STATEPOINT instructions may have *a lot* of
FrameIndex operands, we end up creating and erasing many
machine instructions. But we can do it in single pass, with only
one new machine instruction generated.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D81181
2020-06-05 14:57:29 +03:00
Kerry McLaughlin 89fc0166f5 [CodeGen][SVE] Legalisation of extends with scalable types
Summary:
This patch adds legalisation of extensions where the operand
of the extend is a legal scalable type but the result is not.

EXTRACT_SUBVECTOR is used to split the result, before
being replaced by target-specific [S|U]UNPK[HI|LO] operations.

For example:

```
zext <vscale x 16 x i8> %a to <vscale x 16 x i16>
```
should emit:

```
uunpklo z2.h, z0.b
uunpkhi z1.h, z0.b
```

Reviewers: sdesmalen, efriedma, david-arm

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79587
2020-06-05 12:08:42 +01:00
Philip Reames 4c735439fd [Statepoint] Migrate a few tests to gc-live bundle format and fix assert
The assert was missed in 0e7c7705, migrating the test revealed the problem.
2020-06-04 18:15:58 -07:00
Vedant Kumar 198762680e [LiveDebugValues] Cache LexicalScopes::getMachineBasicBlocks, NFCI
Summary:
Cache the results from getMachineBasicBlocks in LexicalScopes to speed
up UserValueScopes::dominates queries.  This replaces the caching done
in UserValueScopes. Compared to the old caching method, this reduces
memory traffic when a VarLoc is copied (e.g. when a VarLocMap grows),
and enables caching across basic blocks.

When compiling sqlite 3.5.7 (CTMark version), this patch reduces the
number of calls to getMachineBasicBlocks from 10,207 to 1,093. I also
measured a small compile-time reduction (~ 0.1% of total wall time, on
average, on my machine).

As a drive-by, I made the DebugLoc in UserValueScopes a const reference
to cut down on MetadataTracking traffic.

Reviewers: jmorse, Orlando, aprantl, nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80957
2020-06-04 16:58:45 -07:00
Matt Arsenault af867b7850 DAG: Change computeKnownBitsForFrameIndex to be usable by GISel
This wasn't getting much value from the DAG or depth arguments, since
it's only called on the frame index root nodes. FrameIndexes can also
only return a scalar value, so it also didn't need DemandedElts.
2020-06-04 10:50:26 -04:00
Matt Arsenault 931a68f26b RegAllocFast: Remove dead code 2020-06-04 09:38:31 -04:00
Sanjay Patel 652b3757c8 [x86] add test/code comment for chain value use (PR46195); NFC 2020-06-04 09:15:17 -04:00
Simon Pilgrim adf10dcf2e [DAG] scalarizeBinOpOfSplats - extract from the source of splat vector (PR46189)
D79003/rG9fa58d1bf2f8 exposed an issue with scalarizeBinOpOfSplats that we were extracting from the splatted vector result instead of the source, the splat index is only valid for the source vector not the result, which may contain undefs, including at the splat index.
2020-06-04 11:58:59 +01:00
Tim Northover 87e24c3200 Revert "[DAGCombiner] avoid unnecessary indirection from SDNode/SDValue; NFCI"
This reverts commit 21dadd774f.

In at least PromoteIntBinOps, they wanted to know about users of *all* values
produced by the node not just the integer being promoted. For example not
replacing chain users if the operation was a load breaks the ordering of the
DAG.
2020-06-04 11:53:14 +01:00
Madhur Amilkanthwar b3cff3c720 Utility to dump .dot representation of SelectionDAG without firing viewer
Summary:
This patch adds support for dumping .dot
representation of SelectionDAG. It is inspired from the fact that,
a developer may want to just dump the graph at
a predictable path with a simple name to compare.
The exisitng utility (i.e. viewGraph) are overkill
for this motive hence this patch adds the requires support
while using the core routines from GraphWriter.

Example usage: DAG.dumpDotGraph("/tmp/graph.dot", "MyGraph")
will create /tmp/graph.dot file when DAG is an
object of SelectionDAG class.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D80711
2020-06-04 11:51:48 +05:30
Philip Reames ab6779bbd8 [Statepoint] Remove last of old ImmutableStatepoint code
To do so, I had to sink the old school inline operand handling into GCStatepointInst which is non ideal.  This code should be removed shortly and I was able to at least clean it up a bunch.
2020-06-03 20:31:17 -07:00
Philip Reames 91dd2f2536 [Statepoint] Delete more dead code from old wrappers
The verify() routine duplicates IR/Verifier.cpp checks, so while not technically dead it doesn't add any value either.
2020-06-03 20:10:30 -07:00
Matt Arsenault ed5017e153 GlobalISel: Start defining strict FP instructions
The AMDGPU lowering for unconstrained G_FDIV sometimes needs to
introduce a mode switch in the middle, so it's helpful to have
constrained instructions available to legalize this. Right now nothing
is preventing reordering of the mode switch with the other
instructions in the expansion.
2020-06-03 20:46:37 -04:00
Quentin Colombet ccb3c8e861 [RegisterCoalescer] Update empty subranges when rematerializing
When we rematerialize a value as part of the coalescing, we may
widen the register class of the destination register.
When this happens, updateRegDefUses may create additional subranges
to account for the wider register class.
The created subranges are empty and if they are not defined by
the rematerialized instruction we clean them up.
However, if they are defined by the rematerialized instruction but
unused, we failed to flag them as dead definition and would leave
them as empty live-range.
This is wrong because empty live-ranges don't interfere with anything,
thus if we don't fix them, we would fail to account that the
rematerialized instruction clobbers some lanes.

E.g., let us consider the following pseudo code:
def.lane_low64:reg128 = ldimm
newdef:reg32 = COPY def.lane_low64_low32

When rematerialization happens for newdef, we end up with:
newdef.lane_low64:reg128 = ldimm
 = use newdef.lane_low64_low32

Let's look at the live interval of newdef.
Before rematerialization, we would get:
newdef [defIdx, useIdx:0) 0@defIdx

Right after updateRegDefUses, newdef register class is widen to reg128
and the subrange definitions will be augmented to fill the subreg that
is used at the definition point, here lane_low64.
The resulting live interval would be:
newdef [newDefIdx, useIdx:0) 0@newDefIdx
 * lane_low64_high32 EMPTY
 * lane_low64_low32 [newDefIdx, useIdx:0)

Before this patch this would be the final status of the live interval.
Therefore we miss that lane_low64_high32 is actually live on the
definition point of newdef.

With this patch, after rematerializing, we check all the added subranges
and for the ones that are defined but empty, we flag them as dead def.
Thus, in that case, newdef would look like this:
newdef [newDefIdx, useIdx:0) 0@newDefIdx
 * lane_low64_high32 [newDefIdx, newDefIdxDead) ; <-- instead of EMPTY
 * lane_low64_low32 [newDefIdx, useIdx:0)

This fixes https://www.llvm.org/PR46154
2020-06-03 17:10:55 -07:00
Matt Arsenault 3866e0a563 GlobalISel: Fail expansion of G_DYN_STACKALLOC for StackGrowsUp 2020-06-03 19:56:07 -04:00
Philip Reames 382b3023cb [Statepoints][CGP] Minor parameter type cleanup 2020-06-03 16:00:38 -07:00
Matt Arsenault 66251f7e1d RegAllocFast: Record internal state based on register units
Record internal state based on register units. This is often more
efficient as there are typically fewer register units to update
compared to iterating over all the aliases of a register.

Original patch by Matthias Braun, but I've been rebasing and fixing it
for almost 2 years and fixed a few bugs causing intermediate failures
to make this patch independent of the changes in
https://reviews.llvm.org/D52010.
2020-06-03 16:51:46 -04:00
Victor Huang 3abe7aca45 [CodeGen] Enable tail call position check for speculatable functions
In the function "Analysis.cpp:isInTailCallPosition", it only checks whether
a call is in a tail call position if the call has side effects, access memory
or it is not safe to speculative execute. Therefore, a speculatable function
will not go through tail call position check and improperly tail called when
it is not in a tail-call position. This patch enables tail call position check
for speculatable functions.

Differential Revision: https://reviews.llvm.org/D80661
2020-06-03 10:37:45 -05:00
Kang Zhang 2cc77b2b8a [LiveVariables] Don't set undef reg PHI used as live for FromMBB
Summary:
In the patch D73152, it adds a new function LiveVariables::addNewBlock.
This new function will add the reg which PHI used to the MBB which reg
is from.
But the new function may cause LiveVariable Verification failed when the
Src reg in PHI is undef.

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D80077
2020-06-03 15:25:30 +00:00
Henry Kao c57e41c000 [CodeGen][SVE] Replace deprecated calls in getCopyFromPartsVector()
Summary: Replaced getVectorNumElements() with getVectorElementCount(). Added operator overloads for class ElementCount. Fixes warning in several AArch64 unit tests.

Reviewers: sdesmalen, kmclaughlin, dancgr, efriedma, each, andwar, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80826
2020-06-03 11:20:02 -04:00
Simon Pilgrim ea80b40669 [DAG] SimplifyDemandedBits - peek through SHL if we only demand sign bits.
If we're only demanding the (shifted) sign bits of the shift source value, then we can use the value directly.

This handles SimplifyDemandedBits/SimplifyMultipleUseDemandedBits for both ISD::SHL and X86ISD::VSHLI.

Differential Revision: https://reviews.llvm.org/D80869
2020-06-03 16:11:54 +01:00
Simon Pilgrim c438b257f1 [DAG] GetDemandedBits - don't bother asserting for a non-null cast<> result. NFC.
cast<> will assert on failure anyhow.

This lets us fold the cast<> with the getAPIntValue() that uses it.
2020-06-03 12:43:07 +01:00
Simon Pilgrim 7a96c181d0 TargetFrameLowering.h - remove unnecessary includes. NFC.
Move TargetFrameLowering.h include to the top of the TargetFrameLoweringImpl.cpp includes (clang-format doesn't do this by default as the filenames don't match).
2020-06-03 11:12:42 +01:00
Kadir Cetinkaya c5468253aa
[llvm] Fix unused variable warnings 2020-06-03 11:49:01 +02:00
Djordje Todorovic dd1bc59b72 [CSInfo][MIPS][DwarfDebug] Add support for delay slots
This adds call site info support for call instructions with delay slot.
Search for instructions inside call delay slot, which load value
into parameter forwarding registers.
Return address of the call points to instruction after call delay slot,
which is not the one, immediately after the call instruction.

Patch by Nikola Tesic

Differential revision: https://reviews.llvm.org/D78107
2020-06-03 11:25:17 +02:00
Eric Christopher 153a24ab0f Undo initialization of TRI in CGP as this is unconditionally initialized
later.
2020-06-02 15:08:54 -07:00
Kadir Cetinkaya af86a10bad
[llvm] Fix unused variable warning 2020-06-02 22:46:24 +02:00
Eric Christopher 971459c3ef Fix up clang-tidy warnings around null and pointers. 2020-06-02 13:24:20 -07:00
Amy Kwan a3ada630d8 [DAGCombiner] Combine shifts into multiply-high
This patch implements a target independent DAG combine to produce multiply-high
instructions from shifts. This DAG combine will combine shifts for any type as
long as the MULH on the narrow type is legal.

For now, it is enabled on PowerPC as PowerPC is the only target that has an
implementation of the isMulhCheaperThanMulShift TLI hook introduced in
D78271.

Moreover, this DAG combine focuses on catching the pattern:
(shift (mul (ext <narrow_type>:$a to <wide_type>), (ext <narrow_type>:$b to <wide_type>)), <narrow_width>)
to produce mulhs when we have a sign-extend, and mulhu when we have
a zero-extend.

The patch performs the following checks:
- Operation is a right shift arithmetic (sra) or logical (srl)
- Input to the shift is a multiply
- Both operands to the shift are sext/zext nodes
- The extends into the multiply are both the same
- The narrow type is half the width of the wide type
- The shift amount is the width of the narrow type
- The respective mulh operation is legal

Differential Revision: https://reviews.llvm.org/D78272
2020-06-02 15:22:48 -05:00
Djordje Todorovic 4e8e5d60b4 [CSInfo][NFC] Interpret loaded parameter value separately
The collectCallSiteParameters() method searches for instructions
which load values into registers used for parameters passing.
Previously, interpretation of those values, loaded by one such
instruction, was implemented inside collectCallSiteParameters() method.

This patch moves the interpretation code from collectCallSiteParameters()
method into a separate static method named interpretValue. New method is
called from collectCallSiteParameters() to process each instruction from
targeted instruction scope.

The collectCallSiteParameters() searches for loaded parameter value
among instructions which precede the call instruction, inside the same
basic block. When needed, new method (interpretValue) could be used for
searching any instruction scope.

This is preparation for search of parameter value, loaded inside call
delay slot.

Patch by Nikola Tesic

Differential revision: https://reviews.llvm.org/D78106
2020-06-02 13:05:04 +02:00
Sriraman Tallam e0bca46b08 Options for Basic Block Sections, enabled in D68063 and D73674.
This patch adds clang options:
-fbasic-block-sections={all,<filename>,labels,none} and
-funique-basic-block-section-names.
LLVM Support for basic block sections is already enabled.

+ -fbasic-block-sections={all, <file>, labels, none} : Enables/Disables basic
block sections for all or a subset of basic blocks. "labels" only enables
basic block symbols.
+ -funique-basic-block-section-names: Enables unique section names for
basic block sections, disabled by default.

Differential Revision: https://reviews.llvm.org/D68049
2020-06-02 00:23:32 -07:00
Denis Antrushin fa818ded24 [StatepointLowering] Handle UNDEF gc values.
Do not spill UNDEF GC values. Instead, replace corresponding
gc.relocate intrinsic with an (arbitrary, but recognizable) constant.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D80714
2020-06-02 10:18:33 +03:00
Richard Smith 4ccb6c36a9 Fix violations of [basic.class.scope]p2.
These cases all follow the same pattern:

struct A {
  friend class X;
  //...
  class X {};
};

But 'friend class X;' injects 'X' into the surrounding namespace scope,
rather than introducing a class member. So the second 'class X {}' is a
completely different type, which changes the meaning of the earlier name
'X' from '::X' to 'A::X'.

Additionally, the friend declaration is pointless -- members of a class
don't need to be befriended to be able to access private members.
2020-06-01 22:03:05 -07:00
Vedant Kumar 776708b00b [LiveDebugValues] Remove early-exit when testing regmasks, NFC
In transferRegisterDef, if the instruction has a regmask attached, we'll
check if any currently used register is clobbered by the regmask.

The early exit in this scan isn't necessary, costs a set lookup, and is
almost never taken [1]. Delete it.

[1]
http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp.html#L1136
2020-06-01 15:16:10 -07:00
Vedant Kumar 11c617c417 [LiveDebugValues] Add LocIndex::u32_{location,index}_t types for readability, NFC
This is per Adrian's suggestion in https://reviews.llvm.org/D80684.
2020-06-01 11:02:36 -07:00
Vedant Kumar 2ecaf93525 [LiveDebugValues] Speed up removeEntryValue, NFC
Summary:
Instead of iterating over all VarLoc IDs in removeEntryValue(), just
iterate over the interval reserved for entry value VarLocs. This changes
the iteration order, hence the test update -- otherwise this is NFC.

This appears to give an ~8.5x wall time speed-up for LiveDebugValues when
compiling sqlite3.c 3.30.1 with a Release clang (on my machine):

```
          ---User Time---   --System Time--   --User+System--   ---Wall Time--- --- Name ---
  Before: 2.5402 ( 18.8%)   0.0050 (  0.4%)   2.5452 ( 17.3%)   2.5452 ( 17.3%) Live DEBUG_VALUE analysis
   After: 0.2364 (  2.1%)   0.0034 (  0.3%)   0.2399 (  2.0%)   0.2398 (  2.0%) Live DEBUG_VALUE analysis
```

The change in removeEntryValue() is the only one that appears to affect
wall time, but for consistency (and to resolve a pending TODO), I made
the analogous changes for iterating over SpillLocKind VarLocs.

Reviewers: nikic, aprantl, jmorse, djtodoro

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80684
2020-06-01 11:02:36 -07:00
Matt Arsenault 836c7dcf12 DAG: Fix getNode dropping flags if there's a glue output
The AMDGPU non-strict fdiv lowering needs to introduce an FP mode
switch in some cases, and has custom nodes to provide chain/glue for
the intermediate FP operations. We need to propagate nofpexcept here,
but getNode was dropping the flags.

Adding nofpexcept in the AMDGPU custom lowering is left to a future
patch.

Also fix a second case where flags were dropped, but in this case it
seems it just didn't handle this number of operands.

Test will be included in future AMDGPU patch.
2020-06-01 13:48:02 -04:00
hsmahesha 0ed2c04636 [AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes
Summary:
While clustering mem ops, AMDGPU target needs to consider number of clustered bytes
to decide on max number of mem ops that can be clustered. This patch adds support to pass
number of clustered bytes to target mem ops clustering logic.

Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar

Reviewed By: foad

Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80545
2020-06-01 22:52:34 +05:30
Chen Zheng 2a24d350db [MachineCombine] add a hook for resource length limit 2020-05-31 23:21:04 -04:00
Matt Arsenault 95f65a7c6c AArch64/GlobalISel: Fix incorrect ptrmask usage for alignment
I inverted the mask when I ported to the new form of G_PTRMASK in
8bc03d2168.

I don't think this really broke anything, since G_VASTART isn't
handled for types with an alignment higher than the stack alignment.
2020-05-31 10:56:55 -04:00
Florian Hahn ec25a71eb7 [ScheduleDAG] Avoid unnecessary recomputation of topological order.
In some cases ScheduleDAGRRList has to add new nodes to resolve problems
with interfering physical registers. When new nodes are added, it
completely re-computes the topological order, which can take a long
time, but is unnecessary. We only add nodes one by one, and initially
they do not have any predecessors. So we can just insert them at the end
of the vector. Later we add predecessors, but the helper function
properly updates the topological order much more efficiently. With this
change, the compile time for the program below drops from 300s to 30s on
my machine.

    define i11129 @test1() {
      %L1 = load i11129, i11129* undef
      %B30 = ashr i11129 %L1, %L1
      store i11129 %B30, i11129* undef
      ret i11129 %L1
    }

This should be generally beneficial, as we can skip a large amount of
work. Theoretically there are some scenarios where we might not safe
much, e.g. when we add a dependency between the first and last node.
Then we would have to shift all nodes. But we still do not have to spend
the time re-computing the initial order.

Reviewers: MatzeB, atrick, efriedma, niravd, paquette

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D59722
2020-05-31 11:04:35 +01:00