Commit Graph

28401 Commits

Author SHA1 Message Date
Sanjay Patel 29a2b20ab3 [SDAG] simplify FP binops to undef
As discussed in the commit thread for rGa253a2a and D73978, we can do more undef folding for FP ops.
The nnan and ninf fast-math-flags specify that if an operand is the disallowed value, the result is
poison, so we can produce an undef result.

But this doesn't work as expected (the undef operand cases remain) because of a Flags propagation
problem in SelectionDAGBuilder.

I've added DAGCombiner calls to enable these for the other cases because we've shown in other
patches that (because of the limited way that SDAG iterates), it is possible to miss simplifications
like this if they are done only at node creation time.

Several potential follow-ups to expand on this patch are possible.

Differential Revision: https://reviews.llvm.org/D75576
2020-03-04 10:42:16 -05:00
Amara Emerson e91e1df6ab [GlobalISel][Localizer] Enable intra-block localization of already-local uses.
This changes the localizer to attempt intra-block localizer of instructions
that have local uses. This is useful because sometimes the entry block itself
has many uses of constant-like instructions, which would benefit from shortening
live ranges. Previously if an inst had no non-local uses, we wouldn't add it to
the list of instructions to attempt further intra-block localization.

This gives a 0.7% geomean code size improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D75555
2020-03-03 18:14:57 -08:00
Fangrui Song 90acc505ed [MCDwarf] Change emitListsTableHeaderStart to use a reference and fold Start/End symbols generation into it
Apply @dblaikie's suggestions in a post-commit review for D75375

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D75568
2020-03-03 16:20:40 -08:00
Amy Huang 5b3b21f025 [DebugInfo] Fix for adding "returns cxx udt" option to functions in CodeView.
Summary:
This change checks for the return type in the frontend and adds a flag
to the DISubroutineType to indicate that the option should be added in
CodeViewDebug.

Previously function types sometimes appeared twice in the PDB: once with
"returns cxx udt" and once without.
See https://bugs.llvm.org/show_bug.cgi?id=44785.

Reviewers: rnk, asmith

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75215
2020-03-03 14:00:08 -08:00
Vedant Kumar f002ee55c7 [MachineVerifier] Remove placement rule exception for debug entry values
There should not be an exception allowing debug entry values to be
placed after a terminator.

Differential Revision: https://reviews.llvm.org/D75559
2020-03-03 13:02:18 -08:00
Vedant Kumar 2bf496620c [LiveDebugValues] Do not insert DBG_VALUEs after a MBB terminator
This fixes a miscompile that happened because a DBG_VALUE interfered
with the MachineOutliner's liveness analysis.

Inserting a DBG_VALUE after a terminator breaks predicates on MBB such
as isReturnBlock(). And the resulting DBG_VALUE cannot be "live".

I plan to introduce a MachineVerifier check for this situation in a
follow up.

rdar://59859175

Testing: check-llvm, LNT build with a stage2 compiler & entry values
enabled

Differential Revision: https://reviews.llvm.org/D75548
2020-03-03 13:00:52 -08:00
Fangrui Song 55a56041d1 [MCDwarf] Generate DWARF v5 .debug_rnglists for assembly files
```
// clang -c -gdwarf-5 a.s -o a.o
.section .init; ret
.text; ret
```

.debug_info contains DW_AT_ranges and llvm-dwarfdump will report
a verification error because .debug_rnglists does not exist (not
implemented).

This patch generates .debug_rnglists for assembly files.
emitListsTableHeaderStart() in DwarfDebug.cpp can be shared with
MCDwarf.cpp. Because CodeGen depends on MC, I move the function to
MCDwarf.cpp

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D75375
2020-03-03 09:03:34 -08:00
Craig Topper d8ad7cc088 [DAGCombiner][X86] Improve narrowExtractedVectorLoad to handle cases where the element size isn't byte sized by the subvector is.
Summary:
Follow up from D75377. If the subvector is byte sized and the
index is aligned to the subvector size, we can shrink the load.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: dbabokin, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75434
2020-03-03 08:41:31 -08:00
Sam Parker 5618e9be37 [RDA][ARM] collectKilledOperands across multiple blocks
Use MIOperand in collectLocalKilledOperands to make the search
global, as we already have to search for global uses too. This
allows us to delete more dead code when tail predicating.

Differential Revision: https://reviews.llvm.org/D75167
2020-03-03 15:23:05 +00:00
Sam Parker dfe8f5da4c [ARM][RDA] Allow multiple killed users
In RDA, check against the already decided dead instructions when
looking at users. This allows an instruction to be removed if it
has multiple users, but they're all dead.

This means that IT instructions can be considered killed once all
the itstate using instructions are dead.

Differential Revision: https://reviews.llvm.org/D75245
2020-03-03 15:12:29 +00:00
Clement Courbet b0ae20d92e [ExpandMemCmp][NFC] Fix typo in comment. 2020-03-03 11:07:13 +01:00
Awanish Pandey 1cb0e01e42 [DebugInfo][DWARF5]: Added support for debuginfo generation for defaulted parameters
This patch adds support for dwarf emission/dumping part of debuginfo
generation for defaulted parameters.

Reviewers: probinson, aprantl, dblaikie

Reviewed By: aprantl, dblaikie

Differential Revision: https://reviews.llvm.org/D73462
2020-03-03 13:09:53 +05:30
Vedant Kumar d64a22a2ad [LiveDebugValues] Prevent some misuse of LocIndex::fromRawInteger, NFC
Make it a compile-time error to pass an int/unsigned/etc to
fromRawInteger.

Hopefully this prevents errors of the form:

```
for (unsigned ID : getVarLocs()) {
  auto VL = LocMap[LocIndex::fromRawInteger(ID)];
  ...
```
2020-03-02 16:59:09 -08:00
Jordan Rupprecht d7803c3832 Add default case to fix -Wswitch errors 2020-03-02 14:23:46 -08:00
Craig Topper adc69729ec [TargetLowering] Fix what look like copy/paste mistakes in compare with infinity handling SimplifySetCC.
I expect that the isCondCodeLegal checks should match that CC of
the node that we're going to create.

Rewriting to a switch to minimize repeated mentions of the same
constants.
2020-03-02 14:12:16 -08:00
Stanislav Mekhanoshin 1bacdcf48d Extend LaneBitmask to 64 bit
This is needed for D74873, AMDGPU going to have 16 bit subregs
and the largest tuple is 32 VGPRs, which results in 64 lanes.

Differential Revision: https://reviews.llvm.org/D75378
2020-03-02 12:10:52 -08:00
Volkan Keles 4167645d1e GlobalISel: Move Localizer::shouldLocalize(..) to TargetLowering
Add a new target hook for shouldLocalize so that
targets can customize the logic.

https://reviews.llvm.org/D75207
2020-03-02 09:15:40 -08:00
Simon Pilgrim d20fb7ea13 Fix shadow variable warning. NFC. 2020-03-02 11:41:20 +00:00
Simon Pilgrim e4380b07cc Fix operator precedence warning. NFCI. 2020-03-02 10:56:58 +00:00
Serguei Katkov 496e0a99c7 [InlineSpiller] Relax re-materialization restriction for statepoint
We should be careful to allow count of re-materialization of operands to be less
then number of physical registers.

STATEPOINT instruction has a variable number of operands and potentially very big.
So re-materialization for all operands is disabled at the moment if restrict-statepoint-remat is true.

The patch relaxes the re-materialization restriction for STATEPOINT instruction allowing it for
fixed operands. Specifically it is about call target.

Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits, qcolombet, hiraditya
Differential Revision: https://reviews.llvm.org/D75335
2020-03-02 11:25:44 +07:00
Craig Topper 0cd6712a7a [DAGCombiner][X86] Disable narrowExtractedVectorLoad if the element type size isn't byte sized
The address calculation for the offset assumes that you can calculate the offset by multiplying the index by the store size of the element. But that only works if the element's store size is exactly its real size since we store vectors tightly packed in memory. There are improvements we could make to this like special casing extracting element 0. I think we could also handle cases where the extracted VT is byte sized and the index is aligned with the extract element count.

Differential Revision: https://reviews.llvm.org/D75377
2020-03-01 18:13:25 -08:00
Craig Topper b6e2796114 [X86][TwoAddressInstructionPass] Teach tryInstructionCommute to continue checking for commutable FMA operands in more cases.
Previously we would only check for another commutable operand if the first commute was an aggressive commute.

But if we have two kill operands and neither is tied to the def at the start, we should consider both operands as the one to use as the new def.

This improves the loop in the fma-commute-loop.ll test. This test is derived from a post from discourse here https://llvm.discourse.group/t/unnecessary-vmovapd-instructions-generated-can-you-hint-in-favor-of-vfmadd231pd/582

Differential Revision: https://reviews.llvm.org/D75016
2020-03-01 16:38:08 -08:00
Craig Topper 211fb91f10 [DAGCombiner] Don't emit select_cc from visitSINT_TO_FP/visitUINT_TO_FP. Use plain select instead.
Select_cc isn't used by all targets. X86 doesn't have optimizations
for it.

Since we already know the input to the sint_to_fp/uint_to_fp is
a setcc we can just emit a plain select using that setcc as the
condition. Other DAG combines can turn that into a select_cc on
targets that support it.

Differential Revision: https://reviews.llvm.org/D75415
2020-03-01 10:52:17 -08:00
Sanjay Patel 619d7dc39a [DAGCombiner] recognize shuffle (shuffle X, Mask0), Mask --> splat X
We get the simple cases of this via demanded elements and other folds,
but that doesn't work if the values have >1 use, so add a dedicated
match for the pattern.

We already have this transform in IR, but it doesn't help the
motivating x86 tests (based on PR42024) because the shuffles don't
exist until after legalization and other combines have happened.
The AArch64 test shows a minimal IR example of the problem.

Differential Revision: https://reviews.llvm.org/D75348
2020-03-01 09:10:25 -05:00
Simon Pilgrim d955b221cb [MachineInst] Remove dead code. NFCI.
The MachineFunction MF value is not used any more and is always null.
2020-02-29 19:25:02 +00:00
Simon Pilgrim 6e7a768354 Make argument const to silence cppcheck warning. NFCI. 2020-02-29 19:25:01 +00:00
Fangrui Song 692e0c9648 [MC] Add MCStreamer::emitInt{8,16,32,64}
Similar to AsmPrinter::emitInt{8,16,32,64}.
2020-02-29 09:40:21 -08:00
Vedant Kumar dd1ea9de2e Reland: [Coverage] Revise format to reduce binary size
Try again with an up-to-date version of D69471 (99317124 was a stale
revision).

---

Revise the coverage mapping format to reduce binary size by:

1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.

This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).

Rationale for changes to the format:

- With the current format, most coverage function records are discarded.
  E.g., more than 97% of the records in llc are *duplicate* placeholders
  for functions visible-but-not-used in TUs. Placeholders *are* used to
  show under-covered functions, but duplicate placeholders waste space.

- We reached general consensus about giving (1) a try at the 2017 code
  coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
  duplicates is simpler than alternatives like teaching build systems
  about a coverage-aware database/module/etc on the side.

- Revising the format is expensive due to the backwards compatibility
  requirement, so we might as well compress filenames while we're at it.
  This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).

See CoverageMappingFormat.rst for the details on what exactly has
changed.

Fixes PR34533 [2], hopefully.

[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533

Differential Revision: https://reviews.llvm.org/D69471
2020-02-28 18:12:04 -08:00
Vedant Kumar 3388871714 Revert "[Coverage] Revise format to reduce binary size"
This reverts commit 99317124e1. This is
still busted on Windows:

http://lab.llvm.org:8011/builders/lld-x86_64-win7/builds/40873

The llvm-cov tests report 'error: Could not load coverage information'.
2020-02-28 18:03:15 -08:00
Vedant Kumar 99317124e1 [Coverage] Revise format to reduce binary size
Revise the coverage mapping format to reduce binary size by:

1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.

This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).

Rationale for changes to the format:

- With the current format, most coverage function records are discarded.
  E.g., more than 97% of the records in llc are *duplicate* placeholders
  for functions visible-but-not-used in TUs. Placeholders *are* used to
  show under-covered functions, but duplicate placeholders waste space.

- We reached general consensus about giving (1) a try at the 2017 code
  coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
  duplicates is simpler than alternatives like teaching build systems
  about a coverage-aware database/module/etc on the side.

- Revising the format is expensive due to the backwards compatibility
  requirement, so we might as well compress filenames while we're at it.
  This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).

See CoverageMappingFormat.rst for the details on what exactly has
changed.

Fixes PR34533 [2], hopefully.

[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533

Differential Revision: https://reviews.llvm.org/D69471
2020-02-28 17:33:25 -08:00
Vedant Kumar 0368b42295 [entry values] ARM: Add a describeLoadedValue override (PR45025)
As a narrow stopgap for the assertion failure described in PR45025, add
a describeLoadedValue override to ARMBaseInstrInfo and use it to detect
copies in which the forwarding reg is a super/sub reg of the copy
destination. For the moment this is unsupported.

Several follow ups are possible:

1) Handle VORRq. At the moment, we do not, because isCopyInstrImpl
   returns early when !MI.isMoveReg().

2) In the case where forwarding reg is a super-reg of the copy
   destination, we should be able to describe the forwarding reg as a
   subreg within the copy destination. I'm not 100% sure about this, but
   it looks like that's what's done in AArch64InstrInfo.

3) In the case where the forwarding reg is a sub-reg of the copy
   destination, maybe we could describe the forwarding reg using the
   copy destinaion and a DW_OP_LLVM_fragment (I guess this should be
   possible after D75036).

https://bugs.llvm.org/show_bug.cgi?id=45025
rdar://59772698

Differential Revision: https://reviews.llvm.org/D75273
2020-02-28 14:30:40 -08:00
David Green 1de1070559 [DAGCombine] Fix alias analysis for unaligned accesses
The alias analysis in DAG Combine looks at the BaseAlign, the Offset and
the Size of two accesses, and determines if they are known to access
different parts of memory by the fact that they are different offsets
from inside that "alignment window". It does not seem to account for
accesses that are not a multiple of the size, and may overflow from one
alignment window into another.

For example in the test case we have a 19byte memset that is splits into
a 16 byte neon store and an unaligned 4 byte store with a 15 byte
offset. This 15byte offset (with a base align of 8) wraps around to the
next alignment windows. When compared to an access that is a 16byte
offset (of the same 4byte size and 8byte basealign), the two accesses
are said not to alias.

I've fixed this here by just ensuring that the offsets are a multiple of
the size, ensuring that they don't overlap by wrapping. Fixes PR45035,
which was exposed by the UseAA changes in the arm backend.

Differential Revision: https://reviews.llvm.org/D75238
2020-02-28 18:44:36 +00:00
Simon Pilgrim 4bc6f63320 [TargetLowering] SimplifyDemandedBits - fix SCALAR_TO_VECTOR knownbits bug
We can only report the knownbits for a SCALAR_TO_VECTOR node if we only demand the 0'th element - the upper elements are undefined and shouldn't be trusted.

This is causing a number of regressions that need addressing but we need to get the bugfix in first.
2020-02-28 15:23:37 +00:00
Jeremy Morse 6af859dcca [DebugInfo] Re-implement LexicalScopes dominance method, add unit tests
Way back in D24994, the combination of LexicalScopes::dominates and
LiveDebugValues was identified as having worst-case quadratic complexity,
but it wasn't triggered by any code path at the time. I've since run into a
scenario where this occurs, in a very large basic block where large numbers
of inlined DBG_VALUEs are present.

The quadratic-ness comes from LiveDebugValues::join calling "dominates" on
every variable location, and LexicalScopes::dominates potentially touching
every instruction in a block to test for the presence of a scope. We have,
however, already computed the presence of scopes in blocks, in the
"InstrRanges" of each scope. This patch switches the dominates method to
examine whether a block is present in a scope's InsnRanges, avoiding
walking through the whole block.

At the same time, fix getMachineBasicBlocks to account for the fact that
InsnRanges can cover multiple blocks, and add some unit tests, as Lexical
Scopes didn't have any.

Differential revision: https://reviews.llvm.org/D73725
2020-02-28 11:41:28 +00:00
Sam Parker bf61421a02 [RDA] Track implicit-defs
Ensure that we're recording implicit defs, as well as visiting implicit
uses and implicit defs when we're walking through operands.

Differential Revision: https://reviews.llvm.org/D75185
2020-02-28 11:14:42 +00:00
serge-sans-paille 6d15c4deab No longer generate calls to *_finite
According to Joseph Myers, a libm maintainer

> They were only ever an ABI (selected by use of -ffinite-math-only or
> options implying it, which resulted in the headers using "asm" to redirect
> calls to some libm functions), not an API. The change means that ABI has
> turned into compat symbols (only available for existing binaries, not for
> anything newly linked, not included in static libm at all, not included in
> shared libm for future glibc ports such as RV32), so, yes, in any case
> where tools generate direct calls to those functions (rather than just
> following the "asm" annotations on function declarations in the headers),
> they need to stop doing so.

As a consequence, we should no longer assume these symbols are available on the
target system.

Still keep the TargetLibraryInfo for constant folding.

Differential Revision: https://reviews.llvm.org/D74712
2020-02-28 10:07:37 +01:00
Vedant Kumar a993720397 [LiveDebugValues] Encode register location within VarLoc IDs [3/3]
This is part 3 of a 3-part series to address a compile-time explosion
issue in LiveDebugValues.

---

Start encoding register locations within VarLoc IDs, and take advantage
of this encoding to speed up transferRegisterDef.

There is no fundamental algorithmic change: this patch simply swaps out
SparseBitVector in favor of CoalescingBitVector. That changes iteration
order (hence the test updates), but otherwise this patch is NFCI.

The only interesting change is in transferRegisterDef. Instead of doing:

```
KillSet = {}
for (ID : OpenRanges.getVarLocs())
  if (DeadRegs.count(ID))
    KillSet.add(ID)
```

We now do:

```
KillSet = {}
for (Reg : DeadRegs)
  for (ID : intervalsReservedForReg(Reg, OpenRanges.getVarLocs()))
    KillSet.add(ID)
```

By not visiting each open location every time we visit an instruction,
this eliminates some potentially quadratic behavior. The new
implementation basically does a constant amount of work per instruction
because the interval map lookups are very fast.

For a file in WebKit, this brings the time spent in LiveDebugValues down
from ~2.5 minutes to 4 seconds, reducing compile time spent in that pass
from 28% of the total to just over 1%.

Before:

```
2.49 min   27.8%	0 s	LiveDebugValues::process
2.41 min   27.0%	5.40 s	LiveDebugValues::transferRegisterDef
1.51 min   16.9%	1.51 min LiveDebugValues::VarLoc::isDescribedByReg() const
32.73 s    6.1%		8.70 s	 llvm::SparseBitVector<128u>::SparseBitVectorIterator::operator++()
```

After:

```
4.53 s	1.1%	0 s	LiveDebugValues::process
3.00 s	0.7%	107.00 ms		LiveDebugValues::transferRegisterCopy
892.00 ms	0.2%	406.00 ms	LiveDebugValues::transferSpillOrRestoreInst
404.00 ms	0.1%	32.00 ms	LiveDebugValues::transferRegisterDef
110.00 ms	0.0%	2.00 ms		  LiveDebugValues::getUsedRegs
57.00 ms	0.0%	1.00 ms		  std::__1::vector<>::push_back
40.00 ms	0.0%	1.00 ms		  llvm::CoalescingBitVector<>::find(unsigned long long)
```

FWIW, I tried the same approach using SparseBitVector, but got bad
results. To do that, I had to extend SparseBitVector to support 64-bit
indices and expose its lower bound operation. The problem with this is
that the performance is very hard to predict: SparseBitVector's lower
bound operation falls back to O(n) linear scans in a std::list if you're
not /very/ careful about managing iteration order. When I profiled this
the performance looked worse than the baseline.

You can see the full CoalescingBitVector-based implementation here:

  https://github.com/vedantk/llvm-project/commits/try-coalescing

You can see the full SparseBitVector-based implementation here:

  https://github.com/vedantk/llvm-project/commits/try-sparsebitvec-find

Depends on D74984 and D74985.

Differential Revision: https://reviews.llvm.org/D74986
2020-02-27 12:39:47 -08:00
Vedant Kumar 210c4853de [LiveDebugValues] Encode a location in VarLoc IDs, NFC [2/3]
This is part 2 of a 3-part series to address a compile-time explosion
issue in LiveDebugValues.

---

Each VarLoc has a unique ID: this ID is used to look up a VarLoc in the
VarLocMap, and to virtually insert a VarLoc into a VarLocSet. Instead of
inserting the VarLoc /itself/ into the VarLocSet, we insert just the ID,
because this can be represented efficiently with a SparseBitVector.

This change introduces LocIndex, a layer of abstraction on top of VarLoc
IDs. Prior to this change, an ID was just an index into a vector. With
this change, an ID encodes both an index /and/ a register location. The
type-checker ensures that conversions to and from LocIndex are correct.

For the moment the register location is always 0 (undef). We have plenty
of bits left over to encode physregs, stack slots, and other locations
in the future.

Differential Revision: https://reviews.llvm.org/D74985
2020-02-27 12:39:47 -08:00
Sanjay Patel 90fd859f51 [x86] use instruction-level fast-math-flags to drive MachineCombiner
The code changes here are hopefully straightforward:

1. Use MachineInstruction flags to decide if FP ops can be reassociated
   (use both "reassoc" and "nsz" to be consistent with IR transforms;
   we probably don't need "nsz", but that's a safer interpretation of
   the FMF).
2. Check that both nodes allow reassociation to change instructions.
   This is a stronger requirement than we've usually implemented in
   IR/DAG, but this is needed to solve the motivating bug (see below),
   and it seems unlikely to impede optimization at this late stage.
3. Intersect/propagate MachineIR flags to enable further reassociation
   in MachineCombiner.

We managed to make MachineCombiner flexible enough that no changes are
needed to that pass itself. So this patch should only affect x86
(assuming no other targets have implemented the hooks using MachineIR
flags yet).

The motivating example in PR43609 is another case of fast-math transforms
interacting badly with special FP ops created during lowering:
https://bugs.llvm.org/show_bug.cgi?id=43609
The special fadd ops used for converting int to FP assume that they will
not be altered, so those are created without FMF.

However, the MachineCombiner pass was being enabled for FP ops using the
global/function-level TargetOption for "UnsafeFPMath". We managed to run
instruction/node-level FMF all the way down to MachineIR sometime in the
last 1-2 years though, so we can do better now.

The test diffs require some explanation:

1. llvm/test/CodeGen/X86/fmf-flags.ll - no target option for unsafe math was
   specified here, so MachineCombiner kicks in where it did not previously;
   to make it behave consistently, we need to specify a CPU schedule model,
   so use the default model, and there are no code diffs.
2. llvm/test/CodeGen/X86/machine-combiner.ll - replace the target option for
   unsafe math with the equivalent IR-level flags, and there are no code diffs;
   we can't remove the NaN/nsz options because those are still used to drive
   x86 fmin/fmax codegen (special SDAG opcodes).
3. llvm/test/CodeGen/X86/pow.ll - similar to #1
4. llvm/test/CodeGen/X86/sqrt-fastmath.ll - similar to #1, but MachineCombiner
   does some reassociation of the estimate sequence ops; presumably these are
   perf wins based on latency/throughput (and we get some reduction of move
   instructions too); I'm not sure how it affects numerical accuracy, but the
   test reflects reality better now because we would expect MachineCombiner to
   be enabled if the IR was generated via something like "-ffast-math" with clang.
5. llvm/test/CodeGen/X86/vec_int_to_fp.ll - this is the test added to model PR43609;
   the fadds are not reassociated now, so we should get the expected results.
6. llvm/test/CodeGen/X86/vector-reduce-fadd-fast.ll - similar to #1
7. llvm/test/CodeGen/X86/vector-reduce-fmul-fast.ll - similar to #1

Differential Revision: https://reviews.llvm.org/D74851
2020-02-27 15:19:37 -05:00
Djordje Todorovic 016d91ccbd [CallSiteInfo] Handle bundles when updating call site info
This will address the issue: P8198 and P8199 (from D73534).

The methods was not handle bundles properly.

Differential Revision: https://reviews.llvm.org/D74904
2020-02-27 13:57:06 +01:00
David Stenberg 6d857166d2 [DebugInfo] Describe call site values for chains of expression producing instrs
Summary:
If the describeLoadedValue() hook produced a DIExpression when
describing a instruction, and it was not possible to emit a call site
entry directly (the value operand was not an immediate nor a preserved
register), then that described value could not be inserted into the
worklist, and would instead be dropped, meaning that the parameter's
call site value couldn't be described.

This patch extends the worklist so that each entry has an DIExpression
that is built up when iterating through the instructions.

This allows us to describe instruction chains like this:

  $reg0 = mv $fp
  $reg0 = add $reg0, offset
  call @call_with_offseted_fp

Since DW_OP_LLVM_entry_value operations can't be combined with any other
expression, such call site entries will not be emitted. I have added a
test, dbgcall-site-expr-entry-value.mir, which verifies that we don't
assert or emit broken DWARF in such cases.

Reviewers: djtodoro, aprantl, vsk

Reviewed By: djtodoro, vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D75036
2020-02-27 11:18:51 +01:00
David Stenberg ff574ff291 [DebugInfo][NFC] Move out lambdas from collectCallSiteParameters()
Summary:
This is a preparatory patch for D75036, in which a debug expression is
associated with each parameter register in the worklist. In that patch
the two lambda functions addToWorklist() and finishCallSiteParams() grow
a bit, so move those out to separate functions. This patch also prepares
for each parameter register having their own expression moving the
creation of the DbgValueLoc into finishCallSiteParams().

Reviewers: djtodoro, vsk

Reviewed By: djtodoro, vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D75050
2020-02-27 11:18:51 +01:00
Matt Arsenault 6fc0d00823 GlobalISel: Fix lowering for G_UADDE/G_USUBE
The type parameter passed into lower is invalid and should be removed
from the function.
2020-02-26 19:10:52 -08:00
Matt Arsenault c7e8d8b13e GlobalISel: Cleanup code with MachineIRBuilder features 2020-02-26 19:10:34 -08:00
Krzysztof Parzyszek fd7c2e24c1 [SDAG] Add SDNode::values() = make_range(values_begin(), values_end())
Also use it in a few places to simplify code a little bit.  NFC
2020-02-26 12:07:38 -06:00
Sanjay Patel b3d0c79836 [DAGCombiner] avoid narrowing fake fneg vector op
This may inhibit vector narrowing in general, but there's
already an inconsistency in the way that we deal with this
pattern as shown by the test diff.

We may want to add a dedicated function for narrowing fneg.
It's often folded into some other op, so moving it away from
other math ops may cause regressions that we would not see
for normal binops.

See D73978 for more details.
2020-02-26 11:25:56 -05:00
Simon Pilgrim bbb0933e3d [DAG] visitRotate - modulo non-uniform constant rotation amounts 2020-02-26 15:43:12 +00:00
Sam Parker 1d06e75df2 [ARM][RDA] add getUniqueReachingMIDef
Add getUniqueReachingMIDef to RDA which performs a global search for
a machine instruction that produces a unique definition of a given
register at a given point. Also add two helper functions
(getMIOperand) that wrap around this functionality to get the
incoming definition uses of a given instruction. These now replace
the uses of getReachingMIDef in ARMLowOverheadLoops. getReachingMIDef
has been renamed to getReachingLocalMIDef and has been made private
along with getInstFromId.

Differential Revision: https://reviews.llvm.org/D74605
2020-02-26 11:15:26 +00:00
Fangrui Song b61a4aaca5 [MC] Default MCContext::UseNamesOnTempLabels to false and only set it to true for MCAsmStreamer
Only MCAsmStreamer (assembly output) needs to keep names of temporary labels created by
MCContext::createTempSymbol().

This change made the rL236642 optimization available for cc2as and
probably some other users.

This eliminates a behavior difference between llvm-mc -filetype=obj and cc1as, which caused
https://reviews.llvm.org/D74006#1890487

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D75097
2020-02-25 18:23:10 -08:00
Craig Topper 735d27dc40 [SelectionDAG][PowerPC][AArch64][X86][ARM] Add chain input and output the ISD::FLT_ROUNDS_
This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order.

This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code.

Differential Revision: https://reviews.llvm.org/D75132
2020-02-25 16:58:23 -08:00
Quentin Colombet 5bf0023b0d [GISel][KnownBits] Update a comment regarding the effect of cache on PHIs
Unlike what I claimed in my previous commit. The caching is
actually not NFC on PHIs.

When we put a big enough max depth, we end up simulating loops.
The cache is effectively cutting the simulation short and we
get less information as a result.
E.g.,
```
v0 = G_CONSTANT i8 0xC0
jump
v1 = G_PHI i8 v0, v2
v2 = G_LSHR i8 v1, 1
```

Let say we want the known bits of v1.
- With cache:
Set v1 cache to we know nothing
v1 is v0 & v2
v0 gives us 0xC0
v2 gives us known bits of v1 >> 1
v1 is in the cache
=> v1 is 0, thus v2 is 0x80
Finally v1 is v0 & v2 => 0x80

- Without cache and enough depth to do two iteration of the loop:
v1 is v0 & v2
v0 gives us 0xC0
v2 gives us known bits of v1 >> 1
v1 is v0 & v2
v0 is 0xC0
v2 is v1 >> 1
Reach the max depth for v1...
unwinding
v1 is know nothing
v2 is 0x80
v0 is 0xC0
v1 is 0x80
v2 is 0xC0
v0 is 0xC0
v1 is 0xC0

Thus now v1 is 0xC0 instead of 0x80.

I've added a unittest demonstrating that.

NFC
2020-02-25 15:56:15 -08:00
Scott Linder 915b4aa139 Support emitting .cfi_undefined in CodeGen
This will be used by AMDGPU.

Differential Revision: https://reviews.llvm.org/D74914
2020-02-25 14:00:01 -05:00
Quentin Colombet a12f1d6a52 [MachineInstr] Add a dumpr method
Add a dump method that recursively prints an instruction and all
the instructions defining its operands and so on.

This is helpful when looking at combiner issue.

NFC

Differential Revision: https://reviews.llvm.org/D75094
2020-02-25 10:46:29 -08:00
Roman Lebedev d20907d1de
[Codegen] Revert rL354676/rL354677 and followups - introduced PR43446 miscompile
This reverts https://reviews.llvm.org/D58468
(rL354676, 44037d7a63),
and all and any follow-ups to that code block.

https://bugs.llvm.org/show_bug.cgi?id=43446
2020-02-25 20:30:12 +03:00
Jay Foad ccee390767 GlobalISel: NFC minor cleanup to avoid a couple of fixed size local arrays 2020-02-25 09:49:19 +00:00
Roman Tereshin b3bce6a3dd [MachineVerifier] Doing ::calcRegsPassed over faster sets: ~15-20% faster MV, NFC
MachineVerifier still takes 45-50% of total compile time with
-verify-machineinstrs, with calcRegsPassed dataflow taking ~50-60% of
MachineVerifier.

The majority of that time is spent in BBInfo::addPassed, mostly within
DenseSet implementing the sets the dataflow is operating over.

In particular, 1/4 of that DenseSet time is spent just iterating over it
(operator++), 40-50% on insertions, and most of the rest in ::count.

Given that, we're implementing custom sets just for this analysis here,
focusing on cheap insertions and O(n) iteration time (as opposed to
O(U), where U is the universe).

As it's based _mostly_ on BitVector for sparse and SmallVector for
dense, it may remotely resemble SparseSet. The difference is, our
solution is a lot less clever, doesn't have constant time `clear` that
we won't use anyway as reusing these sets across analyses is cumbersome,
and thus more space efficient and safer (got a resizable Universe and a
fallback to DenseSet for sparse if it gets too big).

With this patch MachineVerifier gets ~15-20% faster, its contribution to
total compile time drops from 45-50% to ~35%, while contribution of
calcRegsPassed to MachineVerifier drops from 50-60% to ~35% as well.

calcRegsPassed itself gets another 2x faster here.

All measured on a large suite of shaders targeting a number of GPUs.

Reviewers: bogner, stoklund, rudkx, qcolombet

Reviewed By: rudkx

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75033
2020-02-24 19:01:21 -08:00
Bill Wendling 23c2a5ce33 Allow "callbr" to return non-void values
Summary:
Terminators in LLVM aren't prohibited from returning values. This means that
the "callbr" instruction, which is used for "asm goto", can support "asm goto
with outputs."

This patch removes all restrictions against "callbr" returning values. The
heavy lifting is done by the code generator. The "INLINEASM_BR" instruction's
a terminator, and the code generator doesn't allow non-terminator instructions
after a terminator. In order to correctly model the feature, we need to copy
outputs from "INLINEASM_BR" into virtual registers. Of course, those copies
aren't terminators.

To get around this issue, we split the block containing the "INLINEASM_BR"
right before the "COPY" instructions. This results in two cheats:

  - Any physical registers defined by "INLINEASM_BR" need to be marked as
    live-in into the block with the "COPY" instructions. This violates an
    assumption that physical registers aren't marked as "live-in" until after
    register allocation. But it seems as if the live-in information only
    needs to be correct after register allocation. So we're able to get away
    with this.

  - The indirect branches from the "INLINEASM_BR" are moved to the "COPY"
    block. This is to satisfy PHI nodes.

I've been told that MLIR can support this handily, but until we're able to
use it, we'll have to stick with the above.

Reviewers: jyknight, nickdesaulniers, hfinkel, MaskRay, lattner

Reviewed By: nickdesaulniers, MaskRay, lattner

Subscribers: rriddle, qcolombet, jdoerfert, MatzeB, echristo, MaskRay, xbolva00, aaron.ballman, cfe-commits, JonChesterfield, hiraditya, llvm-commits, rnk, craig.topper

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D69868
2020-02-24 18:29:06 -08:00
Matt Arsenault 11e3dde625 GlobalISel: Reimplement fewerElementsVectorBasic
Changes the handling of odd breakdowns, and avoids using
G_EXTRACT/G_INSERT. Pad with undef to a wider size, and unmerge. Also
avoid introducing instructions for the fully undef components.
2020-02-24 21:19:47 -05:00
Craig Topper a5fa778882 [LegalizeTypes] Scalarize non-byte sized loads in WidenRecRes_Load and SplitVecResLoad
Should fix PR42803 and PR44902

Differential Revision: https://reviews.llvm.org/D74590
2020-02-24 15:14:33 -08:00
Roman Tereshin 6f87b162e6 [MachineVerifier] Doing ::calcRegsPassed in RPO: ~35% faster MV, NFC
Depending on the target, test suite, pipeline config and perhaps other
factors machine verifier when forced on with -verify-machineinstrs can
increase compile time 2-2.5 times over (Release, Asserts On), taking up
~60% of the time. An invaluable tool, it significantly slows down
machine verifier-enabled testing.

Nearly 75% of its time MachineVerifier spends in the calcRegsPassed
method. It's a classic forward dataflow analysis executed over sets, but
visiting MBBs in arbitrary order. We switch that to RPO here.

This speeds up MachineVerifier by about 35%, decreasing the overall
compile time with -verify-machineinstrs by 20-25% or so.

calcRegsPassed itself gets 2x faster here.

All measured on a large suite of shaders targeting a number of GPUs.

Reviewers: bogner, stoklund, rudkx, qcolombet

Reviewed By: bogner

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75032
2020-02-24 13:30:01 -08:00
Simon Pilgrim 53b597cfa2 [SelectionDAG] Merge constant SDNode arithmetic into foldConstantArithmetic
This is the second patch as part of https://bugs.llvm.org/show_bug.cgi?id=36544

Merging in the ConstantSDNode variant of FoldConstantArithmetic. After this, I will begin merging in FoldConstantVectorArithmetic

I've ensured this patch can build & pass all lit tests in Windows and Linux environments.

Patch by @justice_adams (Justice Adams)

Differential Revision: https://reviews.llvm.org/D74881
2020-02-24 18:54:22 +00:00
Sjoerd Meijer 7efabe5c7d [MIR][ARM] MachineOperand comments
This adds infrastructure to print and parse MIR MachineOperand comments.
The motivation for the ARM backend is to print condition code names instead of
magic constants that are difficult to read (for human beings). For example,
instead of this:

  dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14, $noreg
  t2Bcc %bb.4, 0, killed $cpsr

we now print this:

  dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14 /* CC::always */, $noreg
  t2Bcc %bb.4, 0 /* CC:eq */, killed $cpsr

This shows that MachineOperand comments are enclosed between /* and */. In this
example, the EOR instruction is not conditionally executed (i.e. it is "always
executed"), which is encoded by the 14 immediate machine operand. Thus, now
this machine operand has /* CC::always */ as a comment. The 0 on the next
conditional branch instruction represents the equal condition code, thus now
this operand has /* CC:eq */ as a comment.

As it is a comment, the MI lexer/parser completely ignores it. The benefit is
that this keeps the change in the lexer extremely minimal and no target
specific parsing needs to be done. The changes on the MIPrinter side are also
minimal, as there is only one target hooks that is used to create the machine
operand comments.

Differential Revision: https://reviews.llvm.org/D74306
2020-02-24 14:19:21 +00:00
Sam Parker a67eb221e2 [RDA][ARM][LowOverheadLoops] Iteration count IT blocks
Change the way that we remove the redundant iteration count code in
the presence of IT blocks. collectLocalKilledOperands has been
introduced to scan an instructions operands, collecting the killed
instructions and then visiting them too. This is used to delete the
code in the preheader which calculates the iteration count. We also
track any IT blocks within the preheader and, if we remove all the
instructions from the IT block, we also remove the IT instruction.
isSafeToRemove is used to remove any redundant uses of the iteration
count within the loop body.

Differential Revision: https://reviews.llvm.org/D74975
2020-02-24 13:51:03 +00:00
Bevin Hansson 6e561d1c94 [Intrinsic] Add fixed point saturating division intrinsics.
Summary:
This patch adds intrinsics and ISelDAG nodes for signed
and unsigned fixed-point division:

```
llvm.sdiv.fix.sat.*
llvm.udiv.fix.sat.*
```

These intrinsics perform scaled, saturating division
on two integers or vectors of integers. They are
required for the implementation of the Embedded-C
fixed-point arithmetic in Clang.

Reviewers: bjope, leonardchan, craig.topper

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71550
2020-02-24 10:50:52 +01:00
Bevin Hansson c3f36acc92 [MC] Widen the functional unit type from 32 to 64 bits.
Summary:
The type used to represent functional units in MC is
'unsigned', which is 32 bits wide. This is currently
not a problem in any upstream target as no one seems
to have hit the limit on this yet, but in our
downstream one, we need to define more than 32
functional units.

Increasing the size does not seem to cause a huge
size increase in the binary (an llc debug build went
from 1366497672 to 1366523984, a difference of 26k),
so perhaps it would be acceptable to have this patch
applied upstream as well.

Subscribers: hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71210
2020-02-24 09:37:00 +01:00
Craig Topper 3a6bb32bd2 [SelectionDAG] Remove ISD::LIFETIME_START/LIFETIME_END from assert in getMemIntrinsicNode.
These appear to have their own SDNode type and shouldn't use
MemIntrinsicSDNode.
2020-02-23 22:32:36 -08:00
Florian Hahn 7769030b93 Recommit "[PatternMatch] Match XOR variant of unsigned-add overflow check."
This version fixes a buildbot failure cause by picking the wrong insert
point for XORs. We cannot pick the XOR binary operator as insert point,
as it is not guaranteed that both input operands for the overflow
intrinsic are defined before it.

This reverts the revert commit
c7fc0e5da6.
2020-02-23 18:33:18 +00:00
Sanjay Patel a253a2a793 [SDAG] fold fsub -0.0, undef to undef rather than NaN
A question about this behavior came up on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139003.html
...and as part of backend improvements in D73978.

We decided not to implement a more general change that would have
folded any FP binop with nearly arbitrary constant + undef operand
to undef because that is not theoretically correct (even if it is
practically correct).

This is the SDAG-equivalent to the IR change in D74713.
2020-02-23 11:36:53 -05:00
Quentin Colombet b6d63c92ec [GISel][KnownBits] Suppress unused warning on the dump method
NFC
2020-02-21 21:07:04 -08:00
Quentin Colombet 618dec2aef [GISel][KnownBits] Add a cache mechanism to speed compile time
This patch adds a cache that is valid only for the duration of a call
to getKnownBits. With such short lived cache we avoid all the problems
of cache invalidation while still getting the benefits of reusing
the information we already computed.

This cache is useful whenever an instruction occurs more than once
in a chain of computation.
E.g.,
v0 = G_ADD v1, v2
v3 = G_ADD v0, v1

Previously we would compute the known bits for:
v1, v2, v0, then v1 again and finally v3.

With the patch, now we won't have to recompute v1 again.

NFC
2020-02-21 14:31:42 -08:00
Francesco Petrogalli 31ec721516 [llvm][CodeGen] DAG Combiner folds for vscale.
Summary:
This patch simplifies the DAGs generated when using the intrinsic `@llvm.vscale.*` as follows:

* Fold (add (vscale * C0), (vscale * C1)) to (vscale * (C0 + C1)).
* Canonicalize (sub X, (vscale * C)) to (add X,  (vscale * -C)).
* Fold (mul (vscale * C0), C1) to (vscale * (C0 * C1)).
* Fold (shl (vscale * C0), C1) to (vscale * (C0 << C1)).

The test `sve-gep-ll` have been updated to reflect the folding introduced by this patch.

Reviewers: efriedma, sdesmalen, andwar, rengolin

Reviewed By: sdesmalen

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74782
2020-02-21 18:03:12 +00:00
Hiroshi Yamauchi 0e3e242209 [BFI] Fix missed BFI updates in MachineSink.
Summary:
This prevents BFI queries on new blocks (from
MachineSinking::GetAllSortedSuccessors) and fixes a bunch of assert failures
under -check-bfi-unknown-block-queries=true.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74511
2020-02-21 09:50:54 -08:00
Nikita Popov a8db806d52 [SimplifyLibCalls][IRBuilder] Accept any IRBuilder in SimplifyLibCalls
This changes the SimplifyLibCalls utility to accept an IRBuilderBase,
which allows us to pass through the IRBuilder used by InstCombine.
This will ensure that new instructions get added to the worklist.
The annotated test-case drops from 4 to 2 InstCombine iterations thanks
to this.

To achieve this, I'm adding an IRBuilderBase::OperandBundlesGuard,
which is basically the same as the existing InsertPointGuard and
FastMathFlagsGuard, but for operand bundles. Also add a
setDefaultOperandBundles() method so these can be set outside the
constructor.

Differential Revision: https://reviews.llvm.org/D74792
2020-02-21 18:26:05 +01:00
Jay Foad cab39e4b8c GlobalISel: Fix narrowing of (G_ASHR i64:x, 32)
Reviewers: arsenm

Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74950
2020-02-21 16:51:03 +00:00
Simon Pilgrim 42ec6fdce9 [TargetLowering] Apply basic shift combines before recursive SimplifyDemandedBits calls.
Minor refactor/cleanup before we begin adding non-uniform support.
2020-02-21 16:31:20 +00:00
Simon Pilgrim 86c52af05a [TargetLowering] SimplifyDemandedBits - use getValidShiftAmountConstant helper.
Use the SelectionDAG::getValidShiftAmountConstant helper to get const/constsplat shift amounts, which allows us to drop the out of range shift amount early-out.

First step towards better non-uniform shift amount support in SimplifyDemandedBits.
2020-02-21 14:23:53 +00:00
Sam Clegg df74033ec9 [WebAssembly] Remove unneeded getWasmKindForNamedSection function
I believe this was carried over from getELFKindForNamedSection since
the wasm backend originally used ELF object writing as a template.

Differential Revision: https://reviews.llvm.org/D74565
2020-02-20 22:49:08 -08:00
Eli Friedman c767cf24e4 [SVE] Add support for lowering GEPs involving scalable vectors.
This includes both GEPs where the indexed type is a scalable vector, and
GEPs where the result type is a scalable vector.

Differential Revision: https://reviews.llvm.org/D73602
2020-02-20 13:45:41 -08:00
Quentin Colombet e4a9225f5d [GISel][KnownBits] Give up on PHI analysis as soon as we don't know anything
When analyzing PHIs, we gather the known bits for every operand and
merge them together to get the known bits of the result of the PHI.
It is not unusual that merging the information leads to know nothing
on the result (e.g., phi a: i8 3, b: i8 unknown, ..., after looking at the
second argument we know we will know nothing on the result), thus, as
soon as we reach that state, stop analyzing the following operand (i.e.,
on the previous example, we won't process anything after looking at `b`).

This improves compile time in particular with PHIs with a large number
of operands.

NFC.
2020-02-20 11:34:01 -08:00
Simon Pilgrim f9c326364e [DAGCombiner] Use SDValue::getConstantOperandAPInt helper where possible. NFC. 2020-02-20 18:23:05 +00:00
Simon Pilgrim fc2b4a02b1 [DAGCombine] visitEXTRACT_VECTOR_ELT - add SimplifyDemandedBits multi use support
Similar to what we already do with SimplifyDemandedVectorElts, call SimplifyDemandedBits across all the extracted elements of the source vector, treating it as single use.

There's a minor regression in store-weird-sizes.ll which will be addressed in an upcoming SimplifyDemandedBits patch.
2020-02-20 15:49:38 +00:00
Sam Parker 659500c0c9 [NFC][RDA] Break-up initialization code
Separate out the initialization code from the loop traversal so
that the analysis can be reset and re-run by a user.
2020-02-20 14:59:42 +00:00
Djordje Todorovic 2f215cf36a Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGfaff707db82d.
A failure found on an ARM 2-stage buildbot.
The investigation is needed.
2020-02-20 14:41:39 +01:00
Bill Wendling 129c911efa Include static prof data when collecting loop BBs
Summary:
If the programmer adds static profile data to a branch---i.e. uses
"__builtin_expect()" or similar---then we should honor it. Otherwise,
"__builtin_expect()" is ignored in crucial situations. So we trust that
the programmer knows what they're doing until proven wrong.

Subscribers: hiraditya, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74809
2020-02-19 11:33:48 -08:00
Florian Hahn c7fc0e5da6 Revert "[PatternMatch] Match XOR variant of unsigned-add overflow check."
This reverts commit e01a3d49c2.
and commit a6a585b803.

This causes a failure on GreenDragon:
http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/9597
2020-02-19 19:37:08 +01:00
Florian Hahn e01a3d49c2 [PatternMatch] Match XOR variant of unsigned-add overflow check.
Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match
the expected pattern in CodeGenPerpare via UAddWithOverflow.

This causes a regression over Clang 7 on both X86 and AArch64:
https://gcc.godbolt.org/z/juhXYV

This patch extends UAddWithOverflow to also catch the XOR case, if the
XOR is only used in the ICMP. This covers just a single case, but I'd
like to make sure I am not missing anything before tackling the other
cases.

Reviewers: nikic, RKSimon, lebedev.ri, spatel

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D74228
2020-02-19 15:25:18 +01:00
Florian Hahn 216afd3301 [TargetLower] Update shouldFormOverflowOp check if math is used.
On some targets, like SPARC, forming overflow ops is only profitable if
the math result is used: https://godbolt.org/z/DxSmdB
This patch adds a new MathUsed parameter to allow the targets
to make the decision and defaults to only allowing it
if the math result is used. That is the conservative choice.

This patch also updates AArch64ISelLowering, X86ISelLowering,
ARMISelLowering.h, SystemZISelLowering.h to allow forming overflow
ops if the math result is not used. On those targets using the
overflow intrinsic for the overflow check only generates better code.

Reviewers: nikic, RKSimon, lebedev.ri, spatel

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D74722
2020-02-19 11:28:33 +01:00
Djordje Todorovic faff707db8 Reland "[DebugInfo] Enable the debug entry values feature by default"
Differential Revision: https://reviews.llvm.org/D73534
2020-02-19 11:12:26 +01:00
Aditya Nandakumar b91d9ec0bb [GlobalISel]: Fix some non determinism exposed in CSE due to not notifying observers about mutations + add verification for CSE
https://reviews.llvm.org/D67133

While investigating some non determinism (CSE doesn't produce wrong
code, it just doesn't CSE some times) in GISel CSE on an out of tree
target, I realized that the core issue was that there were lots of code
that mutates (setReg, setRegClass etc), but doesn't notify observers
(CSE in this case but this could be any other observer). In order to
make the Observer be available in various parts of code and to avoid
having to thread it through various API, the MachineFunction now has the
observer as field. This allows it to be easily used in helper functions
such as constrainOperandRegClass.
Also added some invariant verification method in CSEInfo which can
catch these issues (when CSE is enabled).
2020-02-18 14:54:57 -08:00
Thomas Lively 9d37f5afac [WebAssembly] Implement multivalue call_indirects
Summary:
Unlike normal calls, call_indirects have immediate arguments that
caused a MachineVerifier failure without a small tweak to loosen the
verifier's requirements for variadicOpsAreDefs instructions.

One nice thing about the new call_indirects is that they do not need
to participate in the PCALL_INDIRECT mechanism because their post-isel
hook handles moving the function pointer argument and adding the flags
and typeindex arguments itself.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74191
2020-02-18 13:49:46 -08:00
Thomas Lively 7b64a59060 Reland "[WebAssembly][InstrEmitter] Foundation for multivalue call lowering"
This reverts commit 649aba93a2, now that
the approach started there has been shown to be workable in the patch
series culminating in https://reviews.llvm.org/D74192.
2020-02-18 13:49:46 -08:00
Simon Pilgrim d6eef0614f [TargetLowering] Add SimplifyMultipleUseDemandedBits 'all elements' helper wrapper. NFC. 2020-02-18 19:53:50 +00:00
Huihui Zhang 8ee0e1dc02 [NFC] Silence compiler warning [-Wmissing-braces]. 2020-02-18 10:37:12 -08:00
Sander de Smalen 8fbc925807 Add OffsetIsScalable to getMemOperandWithOffset
Summary:
Making `Scale` a `TypeSize` in AArch64InstrInfo::getMemOpInfo,
has the effect that all places where this information is used
(notably, TargetInstrInfo::getMemOperandWithOffset) will need
to consider Scale - and derived, Offset - possibly being scalable.

This patch adds a new operand `bool &OffsetIsScalable` to
TargetInstrInfo::getMemOperandWithOffset and fixes up all
the places where this function is used, to consider the
offset possibly being scalable.

In most cases, this means bailing out because the algorithm does not
(or cannot) support scalable offsets in places where it does some
form of alias checking for example.

Reviewers: rovka, efriedma, kristof.beyls

Reviewed By: efriedma

Subscribers: wuzish, kerbowa, MatzeB, arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, javed.absar, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72758
2020-02-18 15:53:29 +00:00
Djordje Todorovic 2bf44d11cb Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGa82d3e8a6e67.
2020-02-18 16:38:11 +01:00
Djordje Todorovic a82d3e8a6e Reland "[DebugInfo] Enable the debug entry values feature by default"
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-18 14:41:08 +01:00
James Clarke b3cd44f80b Use SETNE directly rather than SUB/SETNE 0 for stack guard check
Summary:
Backends should fold the subtraction into the comparison, but not all
seem to. Moreover, on targets where pointers are not integers, such as
CHERI, an integer subtraction is not appropriate. Instead we should just
compare the two pointers directly, as this should work everywhere and
potentially generate more efficient code.

Reviewers: bogner, lebedev.ri, efriedma, t.p.northover, uweigand, sunfish

Reviewed By: lebedev.ri

Subscribers: dschuff, sbc100, arichardson, jgravelle-google, hiraditya, aheejin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74454
2020-02-18 13:21:26 +00:00
Djordje Todorovic a5ac8ca3e0 [CSInfo][TailDuplicator] Delete the call site info when removing dead MBBs
This is needed for the debug entry values feature.

Differential Revision: https://reviews.llvm.org/D74702
2020-02-18 12:29:51 +01:00
Jim Lin 466f8843f5 [NFC] Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h,td}
2020-02-18 10:49:13 +08:00
Vedant Kumar 3f148eabe0 [LiveDebugValues] Visit open var locs just once in transferRegisterDef, NFC
For a file in WebKit, this brings the time spent in LiveDebugValues down
from 16 minutes to 2 minutes. The reduction comes from iterating the set
of open variable locations just once in transferRegisterDef. Post-patch,
the most expensive item inside of transferRegisterDef is a call to
VarLoc::isDescribedByReg, which we have to do.

Testing: I built LNT using the Os-g cmake cache with & without this
patch, then diffed the object files to verify there was no binary diff.

rdar://59446577

Differential Revision: https://reviews.llvm.org/D74633
2020-02-17 14:04:22 -08:00
Matt Arsenault 0e2eb357e0 GlobalISel: Extend narrowing to G_ASHR 2020-02-17 10:42:59 -08:00
Matt Arsenault 8550859535 GlobalISel: Extend shift narrowing to G_SHL 2020-02-17 09:13:37 -08:00
Benjamin Kramer 564a9de28e Hide implementation details. NFC> 2020-02-17 17:55:23 +01:00
Simon Pilgrim a1585aec6f [SelectionDAG] Expose the "getValidShiftAmount" helpers available. NFCI.
These are going to be useful in TargetLowering::SimplifyDemandedBits, so expose these helpers outside of SelectionDAG.cpp

Also add an getValidShiftAmountConstant early-out to getValidMinimumShiftAmountConstant/getValidMaximumShiftAmountConstant so we can use them for scalar cases as well.
2020-02-17 16:28:46 +00:00
Matt Arsenault 78d455adf0 GlobalISel: Add combine to narrow G_LSHR
Produce an unmerge to a narrower type and introduce a narrower shift
if needed. I wasn't sure if there was a better way to parameterize the
target's preferred shift type for the GICombineRule, so manually call
the combine helper.
2020-02-17 08:04:52 -08:00
Sander de Smalen a7a96c726e [AArch64] Implement passing SVE vectors by ref for AAPCS.
Summary:
This patch implements the part of the calling convention
where SVE Vectors are passed by reference. This means the
caller must allocate stack space for these objects and
pass the address to the callee.

Reviewers: efriedma, rovka, cameron.mcinally, c-rhodes, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71216
2020-02-17 15:20:28 +00:00
Sjoerd Meijer dad5f00e3b [DAGCombine] Combine pattern for REV16
This adds another pattern to the combiner for a case that we were not handling
to generate the REV16 instruction for ARM/Thumb2 and a bswap+ror on X86.

Differential Revision: https://reviews.llvm.org/D74032
2020-02-17 14:54:17 +00:00
Benjamin Kramer 5fc5c7db38 Strength reduce vectors into arrays. NFCI. 2020-02-17 15:37:35 +01:00
Fangrui Song 549b436beb [MC] De-capitalize MCStreamer::Emit{Bundle,Addrsig}* etc
So far, all non-COFF-related Emit* functions have been de-capitalized.
2020-02-15 09:11:48 -08:00
Simon Pilgrim ce2b5f1569 Fix gcc9.2 -Winit-list-lifetime warning. NFCI.
Reported by @lbenes (Luke Benes)
2020-02-15 16:48:51 +00:00
Fangrui Song 774971030d [MCStreamer] De-capitalize EmitValue EmitIntValue{,InHex} 2020-02-14 23:08:40 -08:00
Fangrui Song 895cad1a13 [AsmPrinter][XRay] Omit unique ID for xray_instr_map and xray_fn_idx
Follow-up for D74006.
2020-02-14 21:10:46 -08:00
Diogo Sampaio 8bc790f9e6 [AArch64][FPenv] Update chain of int to fp conversion
Summary:
When using strict fp, it is required to update the
chain when performing integer type promotion of a
operand to a integer to floating point conversion.

Reviewers: craig.topper, john.brawn

Reviewed By: craig.topper

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74597
2020-02-15 05:07:34 +00:00
Fangrui Song f554e27224 [AsmPrinter] Omit unique ID for __patchable_function_entries sections
Follow-up for D74006.

When the integrated assembler is used, we use SHF_LINK_ORDER.  The
linked-to symbol is part of ELFSectionKey, thus we can omit the unique
ID.
2020-02-14 20:54:54 -08:00
Fangrui Song 1dc16c752d [MC] Add MCSection::NonUniqueID and delete one MCContext::getELFSection overload 2020-02-14 20:25:52 -08:00
Fangrui Song 6d2d589b06 [MC] De-capitalize another set of MCStreamer::Emit* functions
Emit{ValueTo,Code}Alignment Emit{DTP,TP,GP}* EmitSymbolValue etc
2020-02-14 19:26:52 -08:00
Fangrui Song a55daa1461 [MC] De-capitalize some MCStreamer::Emit* functions 2020-02-14 19:11:53 -08:00
Matt Arsenault 3bb0ff8341 GlobalISel: Remove unused function argument 2020-02-14 15:57:39 -08:00
Sean Fertile b75692c30e [AsmPrinter] Use the McASMInfo to determine if we need descriptors.
In https://reviews.llvm.org/rG8b737688c21a9755cae14cb9343930e0882164ab I
switched the condition gating the creation of the descriptor symbol from
checking the MCAsmInfo if we need to support descriptors, to if the OS
was AIX. Technically the 2 should be interchangeable: if we are
targeting AIX then we need to emit XCOFF object files, and the MCAsmInfo
must return true for needing function descriptors.

This doesn't account for lit test with runsteps that only set the arch.
Eg: test/CodeGen/XCore/section-name.ll
which when run natively on AIX we end up with a target xcore-ibm-aix and
needFunctionDescriptors is false.

This patch reverts to using the MCAsmInfo and adds an assert that the
target OS must be AIX since that is the only target using the descriptor
hook.

Differential Revision: https://reviews.llvm.org/D74622
2020-02-14 15:20:39 -05:00
Matt Arsenault bfbfa18591 GlobalISel: Lower s64->s16 G_FPTRUNC
This is more or less directly ported from the AMDGPU custom lowering
for FP_TO_FP16. I made a few minor fixups (using G_UNMERGE_VALUES
instead of creating shift/trunc to extract the two halves, and zexting
an inverted compare instead of select_cc).

This also does not include the fast math expansion the DAG which
converts to f32 and then to f16. I think that belongs in a
pre-legalize combine instead.
2020-02-14 10:46:58 -08:00
Volkan Keles 187686a22f [GlobalISel] LegalizationArtifactCombiner: Fix a bug in tryCombineMerges
Like COPY instructions explained in D70616, we don't check the constraints
when combining G_UNMERGE_VALUES. Use the same logic used in D70616 to check
if registers can be replaced, or a COPY instruction needs to be built.

https://reviews.llvm.org/D70564
2020-02-14 10:45:58 -08:00
Alexandre Ganea 8404aeb56a [Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.

== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.

By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.

This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.

== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".

== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).

When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.

When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.

Differential Revision: https://reviews.llvm.org/D71775
2020-02-14 10:24:22 -05:00
Fangrui Song bcd24b2d43 [AsmPrinter][MCStreamer] De-capitalize EmitInstruction and EmitCFI* 2020-02-13 22:08:55 -08:00
Fangrui Song 1d49eb00d9 [AsmPrinter] De-capitalize all AsmPrinter::Emit* but EmitInstruction
Similar to rL328848.
2020-02-13 17:06:24 -08:00
Vedant Kumar 3091049446 Add dbgs() output to help track down missing DW_AT_location bugs, NFC 2020-02-13 14:38:44 -08:00
Vedant Kumar 8e77b33b3c [Local] Do not move around dbg.declares during replaceDbgDeclare
replaceDbgDeclare is used to update the descriptions of stack variables
when they are moved (e.g. by ASan or SafeStack). A side effect of
replaceDbgDeclare is that it moves dbg.declares around in the
instruction stream (typically by hoisting them into the entry block).
This behavior was introduced in llvm/r227544 to fix an assertion failure
(llvm.org/PR22386), but no longer appears to be necessary.

Hoisting a dbg.declare generally does not create problems. Usually,
dbg.declare either describes an argument or an alloca in the entry
block, and backends have special handling to emit locations for these.
In optimized builds, LowerDbgDeclare places dbg.values in the right
spots regardless of where the dbg.declare is. And no one uses
replaceDbgDeclare to handle things like VLAs.

However, there doesn't seem to be a positive case for moving
dbg.declares around anymore, and this reordering can get in the way of
understanding other bugs. I propose getting rid of it.

Testing: stage2 RelWithDebInfo sanitized build, check-llvm

rdar://59397340

Differential Revision: https://reviews.llvm.org/D74517
2020-02-13 14:35:02 -08:00
Fangrui Song 0bc77a0f0d [AsmPrinter] De-capitalize some AsmPrinter::Emit* functions
Similar to rL328848.
2020-02-13 13:38:33 -08:00
Fangrui Song 0dce409cee [AsmPrinter] De-capitalize Emit{Function,BasicBlock]* and Emit{Start,End}OfAsmFile 2020-02-13 13:22:49 -08:00
Matt Arsenault de256478e6 GlobalISel: Don't use LLT references
These should always be passed by value
2020-02-13 15:25:30 -05:00
Simon Pilgrim 32176133fa Move FIXME to start of comment so visual studio actually tags it. NFC. 2020-02-13 14:28:50 +00:00
Serguei Katkov a6f38b4697 [Statepoint] Remove redundant clear of call target on register
Patchable statepoint is lowered into sequence of nops, so zeroed call target
should not be on register. It is better to use getTargetConstant instead
of getConstant to select zero constant for call target.

Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D74465
2020-02-13 10:25:50 +07:00
Fangrui Song c662795b07 [AsmPrinter][ELF] Emit local alias for ExternalLinkage dso_local GlobalAlias 2020-02-12 17:08:22 -08:00
Guozhi Wei 369d086d78 [MBP] Partial tail duplication into hot predecessors
Current tail duplication embedded in MBP duplicates a BB into all or none of its predecessors without too much cost analysis. So sometimes it is duplicated into cold predecessors, and in other cases it may miss the duplication into hot predecessors.

This patch improves tail duplication in 3 aspects:

  A successor can be duplicated into part of its predecessors.
  A more fine-grained benefit analysis, combined with 1, now a successor is duplicated into hot predecessors only.
  If a successor can't be duplicated into one predecessor, it doesn't impact the duplication into other predecessors.

Differential Revision: https://reviews.llvm.org/D73387
2020-02-12 15:22:33 -08:00
Jay Foad 32aac25637 [KnownBits] Introduce anyext instead of passing a flag into zext
Summary:
This was a very odd API, where you had to pass a flag into a zext
function to say whether the extended bits really were zero or not. All
callers passed in a literal true or false.

I think it's much clearer to make the function name reflect the
operation being performed on the value we're tracking (rather than on
the KnownBits Zero and One fields), so zext means the value is being
zero extended and new function anyext means the value is being extended
with unknown bits.

NFC.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74482
2020-02-12 19:06:53 +00:00
Simon Pilgrim 9eb426c88c [TargetLowering] Add NegatibleCost enum for isNegatibleForFree return codes
The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not.

This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied.

It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on.

Differential Revision: https://reviews.llvm.org/D74221
2020-02-12 11:51:42 +00:00
Djordje Todorovic 97ed706a96 Revert "[DebugInfo] Enable the debug entry values feature by default"
This reverts commit rG9f6ff07f8a39.

Found a test failure on clang-with-thin-lto-ubuntu buildbot.
2020-02-12 11:59:04 +01:00
Clement Courbet 15488ff24b [CodeGen] Fix the computation of the alignment of split stores.
Summary:
Right now the alignment of the lower half of a store is computed as
align/2, which fails for unaligned stores (align = 1), and is overly
pessimitic for, e.g. a 8 byte store aligned to 4 bytes.
Fixes PR44851
Fixes PR44877

Reviewers: gchatelet, spatel, lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74311
2020-02-12 10:37:30 +01:00
Djordje Todorovic 9f6ff07f8a [DebugInfo] Enable the debug entry values feature by default
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-12 10:25:14 +01:00
Nicolai Hähnle 07a5b849f7 SelectionDAG: Fix bug in ClusterNeighboringLoads
Summary:
The method attempts to find loads that can be legally clustered by
looking for loads consuming the same chain glue token.

However, the old code looks at _all_ users of values produced by the
chain node -- including uses of the loaded/returned value of volatile
loads or atomics. This could lead to circular dependencies which then
failed during scheduling.

With this change, we filter out users by getResNo, i.e. by which
SDValue value they use, to ensure that we only look at users of the
chain glue token.

This appears to be a rather old bug, which is perhaps surprising.
However, the test case is actually quite fragile (i.e., it is hidden
by fairly small changes), and the test _must_ use volatile loads for
the bug to manifest.

Reviewers: arsenm, bogner, craig.topper, foad

Subscribers: MatzeB, jvesely, wdng, hiraditya, javed.absar, jfb, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74253
2020-02-12 09:12:55 +01:00
Craig Topper 0daf9b8e41 [X86][LegalizeTypes] Add SoftPromoteHalf support STRICT_FP_EXTEND and STRICT_FP_ROUND
This adds a strict version of FP16_TO_FP and FP_TO_FP16 and uses
them to implement soft promotion for the half type. This is
enough to provide basic support for __fp16 with strictfp.

Add the necessary X86 support to use VCVTPS2PH/VCVTPH2PS when F16C
is enabled.
2020-02-11 22:30:04 -08:00
lewis-revill a6bd1256ce [DebugInfo] Call site entries cannot be generated for FrameSetup calls
Instructions marked as FrameSetup do not cause requestLabelAfterInsn to
be called and so no such label is generated. Call instructions which
require call site entries to be generated require this label to be
present in order to calculate the return PC offset/address, but the
check for whether the call instruction is marked as FrameSetup was not
present.

Therefore in the case where a call instruction is marked as FrameSetup,
an assertion failure occurs if a call site entry is to be generated.
This is the case with RISC-V's implementation of save/restore via
library calls.

Differential Revision: https://reviews.llvm.org/D71593
2020-02-11 21:23:18 +00:00
OCHyams 35e0ab647b [DebugInfo][NFC] Fixup the UserValue methods to use FragmentInfo
Fixup the UserValue methods to use FragmentInfo instead of DIExpression because
the DIExpression is only ever used to get the to get the FragmentInfo. The
DIExpression is meaningless in the UserValue class because each definition point
added to a UserValue may have a unique DIExpression.

Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D74057
2020-02-11 10:20:24 +00:00
OCHyams 3aa33fde03 [DebugInfo][NFC] Rename the class DbgValueLocation to DbgVariableValue
Rename the class DbgValueLocation to DbgVariableValue and instances from Loc to
DbgValue. These names better express the new semantics introduced in D74053.

The class previously represented a { Location } only. It now represents a
{ Location, DIExpression } pair which together describe a value.

Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D74055
2020-02-11 10:20:24 +00:00
OCHyams 1e40799324 [DebugInfo] Teach LDV how to handle identical variable fragments
LiveDebugVariables uses interval maps to explicitly represent DBG_VALUE
intervals. DBG_VALUEs are filtered into an interval map based on their {
Variable, DIExpression }. The interval map will coalesce adjacent entries that
use the same { Location }.  Under this model, DBG_VALUEs which refer to the same
bits of the same variable will be filtered into different interval maps if they
have different DIExpressions which means the original intervals will not be
properly preserved.

This patch fixes the problem by using { Variable, Fragment } to filter the
DBG_VALUEs into maps, and coalesces adjacent entries iff they have the same
{ Location, DIExpression } pair.

The solution is not perfect because we see the similar issues appear when
partially overlapping fragments are encountered, but is far simpler than a
complete solution (i.e. D70121).

Fixes: pr41992, pr43957
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D74053
2020-02-11 10:20:24 +00:00
Amara Emerson 067dd9c6b1 [GlobalISel][CallLowering] Use stripPointerCasts().
A downstream test exposed a simple logic bug with the manual pointer
stripping code, fix that by just using stripPointerCasts() on the value.

I don't think there's a way to expose this issue upstream.
2020-02-10 15:43:57 -08:00
Matt Arsenault f270da6bfc RegisterCoalescer: Add LaneMask to debug printing 2020-02-10 12:34:33 -08:00
diggerlin aa86311e62 [AIX][XCOFF] Support Mergeable2ByteCString and Mergeable4ByteCString
SUMMARY:
The patch is enable to support Mergeable2ByteCString and Mergeable4ByteCString

Reviewers: daltenty
Subscribers: wuzish, nemanjai, hiraditya

Differential Revision: https://reviews.llvm.org/D74164
2020-02-10 14:45:54 -05:00
Sebastian Neubauer 7cddd15e56 [SelectionDAG] Optimize build_vector of truncates and shifts
Add a simplification to fuse a manual vector extract with shifts and
truncate into a bitcast.

Unpacking and packing values into vectors is only optimized with
extractelement instructions, not when manually unpacked using shifts
and truncates.
This patch simplifies shifts and truncates into a bitcast if possible.

Simplify (build_vec (trunc $1)
                    (trunc (srl $1 width))
                    (trunc (srl $1 (2 * width))) ...)
to (bitcast $1)

Differential Revision: https://reviews.llvm.org/D73892
2020-02-10 15:04:07 +01:00
Djordje Todorovic 3a4dc577c9 [CSInfo] Fix the assertions regarding updating the CSInfo
The call site info was not updated correctly when deleting
corresponding call instructions.

Differential Revision: https://reviews.llvm.org/D73700
2020-02-10 10:55:06 +01:00
Djordje Todorovic 68908993eb [CSInfo] Use isCandidateForCallSiteEntry() when updating the CSInfo
Use the isCandidateForCallSiteEntry().
This should mostly be an NFC, but there are some parts ensuring
the moveCallSiteInfo() and copyCallSiteInfo() operate with call site
entry candidates (both Src and Dest should be the call site entry
candidates).

Differential Revision: https://reviews.llvm.org/D74122
2020-02-10 10:03:14 +01:00
Amara Emerson 21c9d9ad43 [GlobalISel][CallLowering] Tighten constantexpr check for callee.
I'm not sure there's a test case for this, but it's better to be safe.
2020-02-09 22:59:48 -08:00
Matt Arsenault 312a9d1b83 GlobalISel: Fix narrowScalar for G_{CTLZ|CTTZ}_ZERO_UNDEF
Narrow these for 64-bit VALU for AMDGPU.
2020-02-09 19:02:38 -05:00
Matt Arsenault 6135f5eda4 GlobalISel: Fix narrowing of G_CTLZ/G_CTTZ
The result type is separate from the source type.
2020-02-09 18:11:43 -05:00
Craig Topper eeb63944e4 [LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results.
Remove code from LegalizeTypes that allowed this to work.

We were already using BUILD_PAIR for this in some places so this
standardizes on a single way to do this.
2020-02-08 09:52:31 -08:00
Craig Topper 2af1640f9a [LegalizeDAG][X86][AMDGPU] Use ANY_EXTEND instead of ZERO_EXTEND when promoting ISD::CTTZ/CTTZ_ZERO_UNDEF.
Summary:
For CTTZ we place a set bit just past where the non-promoted type
stopped so the extended bits won't be used for the count. For
CTTZ_ZERO_UNDEF we don't care what happens if no bits are set in
the original type and we end up counting into the extended bits.
So we can just use ANY_EXTEND for both cases.

This matches what is done in type legalization for these operations.
We make no effort to force the upper bits to zero.

Differential Revision: https://reviews.llvm.org/D74111
2020-02-07 22:25:56 -08:00
Amara Emerson 35c63d66aa [GlobalISel][CallLowering] Look through bitcasts from constant function pointers.
Calls to ObjC's objc_msgSend function are done by bitcasting the function global
to the required function type signature. This patch looks through this bitcast
so that we can do a direct call with bl on arm64 instead of using an indirect blr.

Differential Revision: https://reviews.llvm.org/D74241
2020-02-07 15:32:54 -08:00
Vedant Kumar 0d0ef315cb [MachineInstr] Add isCandidateForCallSiteEntry predicate
Add the isCandidateForCallSiteEntry predicate to MachineInstr to
determine whether a DWARF call site entry should be created for an
instruction.

For now, it's enough to have any call instruction that doesn't belong to
a blacklisted set of opcodes. For these opcodes, a call site entry isn't
meaningful.

Differential Revision: https://reviews.llvm.org/D74159
2020-02-07 10:10:41 -08:00
Petar Avramovic 7df5fc9e03 [GlobalISel] Add buildMerge with SrcOp initializer list
Allows more flexible use of buildMerge in places where
use operands are available as SrcOp since it does not
require explicit conversion to Register.
Simplify code with new buildMerge.

Differential Revision: https://reviews.llvm.org/D74223
2020-02-07 18:43:45 +01:00
Amara Emerson 28d22c2c9c [GlobalISel][IRTranslator] Add special case support for ~memory inline asm clobber.
This is a one off special case, since actually implementing full inline asm
support will be much more involved. This lets us compile a lot more code as a
common simple case.

Differential Revision: https://reviews.llvm.org/D74201
2020-02-07 08:55:23 -08:00
Jinsong Ji 01edae1271 [AsmPrinter] Print FP constant in hexadecimal form instead
Printing floating point number in decimal is inconvenient for humans.
Verbose asm output will print out floating point values in comments, it
helps.

But in lots of cases, users still need additional work to covert the
decimal back to hex or binary to check the bit patterns,
especially when there are small precision difference.

Hexadecimal form is one of the supported form in LLVM IR, and easier for
debugging.

This patch try to print all FP constant in hex form instead.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D73566
2020-02-07 16:00:55 +00:00
Matt Arsenault 3b198518ad GlobalISel: Fix narrowing of G_CTPOP
The result type is separate from the source type. Tests will be
included in a future AMDGPU patch which uses this from
RegBankSelect/applyMappingImpl.
2020-02-07 06:58:00 -08:00
Matt Arsenault 8de2dad9e0 GlobalISel: Fix lowering of G_CTLZ/G_CTTZ
The type passed to lower was invalid, so I'm not sure how this was
even working before. The source and destination type also do not have
to match, so make sure to use the right ones.
2020-02-07 06:54:12 -08:00
Guillaume Chatelet f85d3408e6 [NFC] Introduce an API for MemOp
Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73964
2020-02-07 11:32:27 +01:00
Sourabh Singh Tomar 84e5760a16 [DebugInfo]: Reorderd the emission of debug_str section.
Summary:
This patch reorders the emission of debug_str section, so that
string can come after macros.
This is necessary for macro forms like DW_MACRO_define_strp,
which emits macro as a string in debug_str section.
2020-02-07 11:15:55 +05:30
Amara Emerson ac8a12c874 [GlobalISel] Use G_ZEXTLOAD instead of an anyextending load for non-pow-2 legalization.
Fixes PR43288
2020-02-06 14:36:36 -08:00
Konstantin Schwarz 76986bdc46 [GlobalISel] Legalize more G_FP(EXT|TRUNC) libcalls.
This adds a new helper function for retrieving the
floating point type corresponding to the specified
bit-width.
2020-02-06 11:41:34 -08:00
Fangrui Song 727362e87b [MC][ELF] Rename MC related "Associated" to "LinkedToSym"
"linked-to section" is used by the ELF spec. By analogy, "linked-to
symbol" is a good name for the signature symbol.  The word "linked-to"
implies a directed edge and makes it clear its relation with "sh_link",
while one can argue that "associated" means an undirected edge.

Also, combine tests and add precise SMLoc to improve diagnostics.

Reviewed By: eugenis, grimar, jhenderson

Differential Revision: https://reviews.llvm.org/D74082
2020-02-06 11:31:04 -08:00
Jeremy Morse 6531a78ac4 Revert "[DebugInfo] Remove some users of DBG_VALUEs IsIndirect field"
This reverts commit ed29dbaafa.

I'm backing out D68945, which as the discussion for D73526 shows, doesn't
seem to handle the -O0 path through the codegen backend correctly. I'll
reland the patch when a fix is worked out, apologies for all the churn.
The two parent commits are part of this revert too.

Conflicts:
	llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
	llvm/test/DebugInfo/X86/dbg-addr-dse.ll

SelectionDAGBuilder conflict is due to a nearby change in e39e2b4a79
that's technically unrelated. dbg-addr-dse.ll conflicted because
41206b61e3 (legitimately) changes the order of two lines.

There are further modifications to dbg-value-func-arg.ll: it landed after
the patch being reverted, and I've converted indirection to be represented
by the isIndirect field rather than DW_OP_deref.
2020-02-06 14:41:40 +00:00
Jeremy Morse ece761427f Revert "[DebugInfo][DAG] Distinguish different kinds of location indirection"
This reverts commit 3137fe4d23.

I'm backing out D68945, which this patch is a follow up for. It'll be
re-landed when D68945 is fixed.

The changes to dbg-value-func-arg.ll occur because our handling of certain
kinds of location now mixes up indirection that happens at different points
in a DIExpression. While this is a regression, it's a return to the prior
behaviour while a better patch is sought.
2020-02-06 14:41:40 +00:00
Jeremy Morse ed5998d21e Revert "[SafeStack][DebugInfo] Insert DW_OP_deref in correct location"
This reverts commit 2d3174c4df.

The overall solution for this problem is reverting D68945, which wasn't
handling the -O0 path through the codegen backend correctly. See:
discussion in D73526.
2020-02-06 14:41:39 +00:00
Sjoerd Meijer 93b0536fd2 [RDA] getInstFromId: find instructions. NFC.
To find the instruction in the block for a given ID, first a count and then a
lookup was performed in the map, which is almost the same thing, thus doing
double the work.

Differential Revision: https://reviews.llvm.org/D73866
2020-02-06 14:13:31 +00:00
Sam Parker 0a8cae10fe [ReachingDefs] Make isSafeToMove more strict.
Test that we're not moving the instruction through instructions with
side-effects.

Differential Revision: https://reviews.llvm.org/D74058
2020-02-06 14:06:08 +00:00
Diogo Sampaio 8ba2b62810 [ARM] Fix non-determenistic behaviour
Summary:
ARM Type Promotion pass does not clear
the container that defines if one variable
was visited or not, missing optimization
opportunities by luck when two llvm:Values
from different functions  are allocated at
the same memory address.

Also fixes a comment and uses existing
method to pop and obtain last element
of the worklist.

Reviewers: samparker

Reviewed By: samparker

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73970
2020-02-06 09:21:13 +00:00
Matt Arsenault 7464e8d6ad GlobalISel: Remove check for illegal MIR
The verifier will catch this.
2020-02-05 18:37:17 -05:00
Jonas Paulsson 96ea377ea4 [PHIElimination] Compile time optimization for huge functions.
This is a compile-time optimization for PHIElimination (splitting of critical
edges), which was reported at https://bugs.llvm.org/show_bug.cgi?id=44249. As
discussed there, the way to remedy the slowdowns with huge functions is to
pre-compute the live-in registers for each MBB in an efficient way in
PHIElimination.cpp and then pass that information along to
LiveVariabless::addNewBlock().

In all the huge test programs where this slowdown has been noticable, it has
dissapeared entirely with this patch.

Review: Björn Pettersson, Quentin Colombet.

Differential Revision: https://reviews.llvm.org/D73152
2020-02-05 18:10:03 -05:00
Matt Arsenault 9087ef0765 GlobalISel: Allow CSE of G_IMPLICIT_DEF
The legalizer produces a lot of these, and they make reading legalized
MIR annoying. For some reason, this does seem to sometimes introduce
copies of implicit def, which is dumb.
2020-02-05 17:47:21 -05:00
Shu-Chun Weng ce9633633c [GlobalISel][AArch64] Fix contract cross-bank copies with SIMD instructions
contractCrossBankCopyIntoStore() finds the instruction defines the
source register and uses its output to replace the register. There are,
however, instructions that have multiple outputs, e.g. G_UNMERGE_VALUES.
Current implementation hardcodes to operand 0 and has no way of knowing
which output should be used.

This change adds another function to directly return the register that
is the source of the register and use that for folding.

This fixes https://bugs.llvm.org/show_bug.cgi?id=44783

Differential Revision: https://reviews.llvm.org/D74005
2020-02-05 10:38:35 -08:00
Simon Pilgrim 4592bb7195 visitINSERT_VECTOR_ELT - pull out repeated dyn_cast. NFCI.
This always gets called at least once.
2020-02-05 13:30:54 +00:00
Djordje Todorovic de90d73e03 [DebugInfo] Avoid the call site param for mem instrs with multiple defs
We currently only handle mem instructions with a single define.
Avoid the call site parameter debug info when we find the case with
multiple defs, rather than throwing an assert.

Differential Revision: https://reviews.llvm.org/D73954
2020-02-05 10:03:14 +01:00
Thomas Lively 649aba93a2 Revert "[WebAssembly][InstrEmitter] Foundation for multivalue call lowering"
Summary:
This reverts commit 3ef169e586. The
purpose of this commit was to allow stack machines to perform
instruction selection for instructions with variadic defs. However,
MachineInstrs fundamentally cannot support variadic defs right now, so
this change does not turn out to be useful.

Depends on D73927.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73928
2020-02-04 20:04:59 -08:00
David Blaikie ec50e10db4 DebugInfo: Hash DW_OP_convert in loclists when using Split DWARF
Originally committed in: 1ced28cbe7
            Reverted in: f75301d16d

(reverted due to tests failing on non-linux/x86 targets, tests have since been
generalized and specialized... since Split DWARF isn't supported on non-elf
targets anyway and we have no way to run on "whatever elf target is available"
so they fail on MacOS without an explicit target triple)

This code was incorrectly emitting extra bytes into arbitrary parts of
the object file when it was meant to be hashing them to compute the DWO
ID.

Follow-up patch(es) will refactor this API somewhat to make such bugs
harder to introduce, hopefully.
2020-02-04 19:25:47 -08:00
Francis Visoiu Mistrih 7531a5039f [Remarks] Extend the RemarkStreamer to support other emitters
This extends the RemarkStreamer to allow for other emitters (e.g.
frontends, SIL, etc.) to emit remarks through a common interface.

See changes in llvm/docs/Remarks.rst for motivation and design choices.

Differential Revision: https://reviews.llvm.org/D73676
2020-02-04 17:16:02 -08:00
Reid Kleckner 2d89e0a098 [SEH] Remove CATCHPAD SDNode and X86::EH_RESTORE MachineInstr
The CATCHPAD node mostly existed to be selected into the EH_RESTORE
instruction, which sets the frame back up when 32-bit Windows exceptions
return to the parent function. However, creating this MachineInstr early
increases the risk that other passes will come along and insert
instructions that use the stack before ESP and EBP are restored. That
happened in PR44697.

Instead of representing these in the instruction stream early, delay it
until PEI. Mark the blocks where this needs to happen as EHPads, but not
funclet entry blocks. Passes after PEI have to be careful not to hoist
instructions that can use stack across frame setup instructions, so this
should be relatively reliable.

Fixes PR44697

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D73752
2020-02-04 15:13:12 -08:00
Matt Arsenault 23b76096b7 CodeGenPrepare: Reorder check for cold and shouldOptimizeForSize
shouldOptimizeForSize is showing up in a profile, spending around 10%
of the pass time in one function. This should probably not be so slow,
but the much cheaper attribute check should be done first anyway.
2020-02-04 11:23:13 -08:00
Matt Arsenault de8451fe4d GlobalISel: Fold SmallVector resizes into constructors 2020-02-04 10:28:08 -08:00
Matt Arsenault a3c814d234 Separately track input and output denormal mode
AMDGPU and x86 at least both have separate controls for whether
denormal results are flushed on output, and for whether denormals are
implicitly treated as 0 as an input. The current DAGCombiner use only
really cares about the input treatment of denormals.
2020-02-04 12:59:21 -05:00
Nico Weber f75301d16d Revert "DebugInfo: Check DW_OP_convert in loclists with Split DWARF"
and follow-ups.

This reverts commit 1ced28cbe7.
This reverts commit 4f281f0474.
This reverts commit 552a8fe12b.

The test fails on non-Linux.
2020-02-04 10:06:46 -05:00
Jeremy Morse 41206b61e3 [DebugInfo] Re-instate LiveDebugVariables scope trimming
This patch reverts part of r362750 / D62650, which stopped
LiveDebugVariables from trimming leading variable location ranges down
to only covering those instructions that are in scope. I've observed some
circumstances where the number of DBG_VALUEs in a function can be
amplified in an un-necessary way, to cover more instructions that are
out of scope, leading to very slow compile times. Trimming the range
of instructions that the variables cover solves the slow compile times.

The specific problem that r362750 tries to fix is addressed by the
assignment to RStart that I've added. Any variable location that begins
at the first instruction of a block will now be considered to begin at the
start of the block. While these sound the same, the have different
SlotIndexes, and the register allocator may shoehorn additional
instructions in between the two. The test added in the past
(wrong_debug_loc_after_regalloc.ll) still works with this modification.

live-debug-variables.ll has a range trimmed to not cover the prologue of
the function, while dbg-addr-dse.ll has a DBG_VALUE sink past one
instruction with no DebugLoc, which is expected behaviour.

Differential Revision: https://reviews.llvm.org/D73691
2020-02-04 14:51:06 +00:00
Filipe Cabecinhas abada5036e [NFC] Fix some spelling mistakes to test pushing to GH. 2020-02-04 11:07:31 +00:00
Simon Pilgrim 3dd688a9ee [DAG] OptLevelChanger - fix uninitialized variable analyzer warning (PR44471)
Ensure that OptLevelChanger::SavedFastISel is initialized in the constructor.

This should be NFC - as the equivalent 'same opt level' early-out is used in the destructor as well, so SavedFastISel is only actually referenced in the general case.

Differential Revision: https://reviews.llvm.org/D73875
2020-02-04 10:54:33 +00:00
David Green 362d00e051 [ARM][VecReduce] Force expand vector_reduce_fmin
Under MVE, we do not have any lowering for fminimum, which a
vector_reduce_fmin without NoNan will be expanded into. As with the
other recent patches, force this to expand in the pre-isel pass. Note
that Neon lowering would be OK because the scalar fminimum uses the
vector VMIN instruction, but is probably better to just rely on the
scalar operations, which is what is done here.

Also fixes what appears to be the reversal of INF vs -INF in the
vector_reduce_fmin widening code.
2020-02-04 09:36:59 +00:00
Guillaume Chatelet b8144c0536 [NFC] Encapsulate MemOp logic
Summary:
This patch simply introduces functions instead of directly accessing the fields.
This helps introducing additional check logic. A second patch will add simplifying functions.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73945
2020-02-04 10:36:26 +01:00
David Blaikie 1ced28cbe7 DebugInfo: Hash DW_OP_convert in loclists when using Split DWARF
This code was incorrectly emitting extra bytes into arbitrary parts of
the object file when it was meant to be hashing them to compute the DWO
ID.

Follow-up patch(es) will refactor this API somewhat to make such bugs
harder to introduce, hopefully.
2020-02-03 19:16:42 -08:00
David Blaikie 031f83fb82 DebugInfo: Simplify emitDebugLocEntry by never passing a null CU 2020-02-03 18:47:14 -08:00
Matt Arsenault cd7650c186 GlobalISel: Implement fewerElementsVector for G_SEXT_INREG
Start using a new strategy with a combination of merge and unmerges.

This allows scalarizing before lowering, which in cases like
<2 x s128> avoids producing giant illegal shifts.
2020-02-03 11:47:33 -08:00
Quentin Colombet f26ff8c9df [TargetRegisterInfo] Make the heuristic to skip region split overridable by the target
RegAllocGreedy uses a fairly compile time intensive splitting heuristic
called region splitting. This heuristic was disabled via another heuristic
when it is likely that it won't be worth the compile time. The only way
to control this other heuristic was via a command line option (huge-size-for-split).

This commit gives more control on this heuristic by making it overridable
by the target using a target hook in TargetRegisterInfo called
shouldRegionSplitForVirtReg.

The default implementation of this hook keeps the heuristic as it was
before this patch.
2020-02-03 11:30:35 -08:00
Simon Pilgrim 61621f826a [TargetLowering] SimplifyDemandedBits - add basic KnownBits ZEXTLoad handling
We have to be careful in SimplifyDemandedBits with loads in case we attempt to combine back to a constant (which then gets turned into a constant pool load again), but we can at least set the upper KnownBits for a ZEXTLoad to zero.
2020-02-03 16:50:04 +00:00
Guillaume Chatelet 333f2ad8b8 [Alignment][NFC] Use Align for getMemcpy/Memmove/Memset
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73885
2020-02-03 17:13:19 +01:00
Guillaume Chatelet fc19465965 [Alignment][NFC] Use Align for code creating MemOp
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73874
2020-02-03 14:10:30 +01:00
Guillaume Chatelet 75d9994a51 Fix broken invariant
Summary:
A Copy with a source that is zeros is the same as a Set of zeros.
This fixes the invariant that SrcAlign should always be non-null.

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73791
2020-02-03 11:01:05 +01:00
Fangrui Song 44cdae68c3 [CodeGenPrepare] Delete dead !DL check
Follow-up for D73754

DL is assigned in CodeGenPrepare::runOnFunction and is guaranteed to be
non-null.
2020-02-02 09:49:06 -08:00
Fangrui Song 5a56a25b0b [CodeGenPrepare] Make TargetPassConfig required
The code paths in the absence of TargetMachine, TargetLowering or
TargetRegisterInfo are poorly tested. As rL285987 said, requiring
TargetPassConfig allows us to delete many (untested) checks littered
everywhere.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D73754
2020-02-02 09:28:45 -08:00
Fangrui Song 5932f7b8f2 [PatchableFunction] Use an empty DebugLoc
The current FirstMI.getDebugLoc() is actually null in almost all cases.
If it isn't, the generated .loc will be considered initial. The .loc
will have the prologue_end flag and terminate the prologue prematurely.

Also use an overload of BuildMI that will not prepend
PATCHABLE_FUNCTION_ENTRY to a MachineInstr bundle.
2020-02-01 14:12:06 -08:00
Craig Topper 943b5561d6 [LegalizeTypes][X86] Add a new strategy for type legalizing f16 type that softens it to i16, but promotes to f32 around arithmetic ops.
This is based on this llvm-dev thread http://lists.llvm.org/pipermail/llvm-dev/2019-December/137521.html

The current strategy for f16 is to promote type to float every except where the specific width is required like loads, stores, and bitcasts. This results in rounding occurring in odd places instead of immediately after arithmetic operations. This interacts in weird ways with the __fp16 type in clang which is a storage only type where arithmetic is always promoted to float. InstCombine can remove some fpext/fptruncs around such arithmetic and turn it into arithmetic on half. This wouldn't be so bad if SelectionDAG was able to put those fpext/fpround back in when it promotes.

It is also not obvious how to handle to make the existing strategy work with STRICT fp. We need to use STRICT versions of the conversions which require chain operands. But if the conversions are created for a bitcast, there is no place to get an appropriate chain from.

This patch implements a different strategy where conversions are emitted directly around arithmetic operations. And otherwise its passed around as an i16 including in arguments and return values. This can result in more conversions between arithmetic operations, but is closer to matching the IR the frontend generates for __fp16. And it will allow us to use the chain from constrained arithmetic nodes to link the STRICT_FP_TO_FP16/STRICT_FP16_TO_FP that will need to be added. I've set it up so that each target can opt into the new behavior. Converting all the targets myself was more than I was able to handle.

Differential Revision: https://reviews.llvm.org/D73749
2020-02-01 11:21:04 -08:00
Matt Arsenault bc101ffd77 GlobalISel: Support widening unmerge results with pointer source 2020-02-01 10:47:03 -05:00
David Blaikie 338beff4dc DwarfDebug.cpp: Fix some indentation 2020-01-31 16:01:57 -08:00
David Blaikie b33e5f3c3e DebugInfo: Split DWARF: Hash non-member function child DIEs
Significant missing hashing - as per the comment this was only meant to
skip member functions (unspecified, but I think it's legible as member
function declarations, not definitions) but was skipping all named
subprograms (so only hashed child DIEs for member function definitions -
because they didn't have a direct name, but only a name given indirectly
in the DW_AT_specification-referenced DIE)
2020-01-31 15:32:03 -08:00
Matt Arsenault 792d9b5719 DAG: Check if a value is divergent before requiresUniformRegister
This avoids a potentially expensive scan if we already know it doesn't
matter.
2020-01-31 15:27:18 -08:00
Jay Foad f465b1aff4 [GlobalISel] Tweak lowering of G_SMULO/G_UMULO
Summary:
Applying this cleanup:

    -      MIRBuilder.buildInstr(TargetOpcode::G_ASHR)
    -        .addDef(Shifted)
    -        .addUse(Res)
    -        .addUse(ShiftAmt);
    +      MIRBuilder.buildAShr(Shifted, Res, ShiftAmt);

caused an assertion failure here:

    llc: /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:404: llvm::MachineInstr *llvm::MachineRegisterInfo::getVRegDef(unsigned int) const: Assertion `(I.atEnd() || std::next(I) == def_instr_end()) && "getVRegDef assumes a single definition or no definition"' failed.

    #4  0x00000000050a6d96 in llvm::MachineRegisterInfo::getVRegDef (this=0x74606a0, Reg=2147483650) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:403
    #5  0x00000000066148f6 in llvm::getConstantVRegValWithLookThrough (VReg=2147483650, MRI=..., LookThroughInstrs=false, HandleFConstant=true) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:244
    #6  0x00000000066147da in llvm::getConstantVRegVal (VReg=2147483650, MRI=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:210
    #7  0x0000000006615367 in llvm::ConstantFoldBinOp (Opcode=101, Op1=2147483650, Op2=2147483656, MRI=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:341
    #8  0x000000000657eee0 in llvm::CSEMIRBuilder::buildInstr (this=0x7465010, Opc=101, DstOps=..., SrcOps=..., Flag=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp:160
    #9  0x0000000003645958 in llvm::MachineIRBuilder::buildAShr (this=0x7465010, Dst=..., Src0=..., Src1=..., Flags=...) at /home/jayfoad2/git/llvm-project/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h:1298
    #10 0x00000000065c35b1 in llvm::LegalizerHelper::lower (this=0x7fffffffb5f8, MI=..., TypeIdx=0, Ty=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2020

because at this point there are two instructions defining Res: the
original G_SMULO/G_UMULO and the new G_MUL that we built. The fix is
to modify the original mul in place, so that there is only ever one
definition of Res.

Reviewers: arsenm, aditya_nandakumar

Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72842
2020-01-31 19:21:01 +00:00
Simon Pilgrim 8fbc7fd567 [DAG] SimplifyMultipleUseDemandedBits - peek through unused ISD::INSERT_SUBVECTOR subvectors
If we don't demand any elements of the inserted subvector then just skip it.
2020-01-31 18:57:22 +00:00
Simon Pilgrim 5702dadf6f [DAG] Enable ISD::INSERT_SUBVECTOR SimplifyMultipleUseDemandedBits handling
This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::INSERT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.
2020-01-31 18:02:34 +00:00
Hiroshi Yamauchi ac8da31a0f [PGO][PGSO] Handle MBFIWrapper
Some code gen passes use MBFIWrapper to keep track of the frequency of new
blocks. This was not taken into account and could lead to incorrect frequencies
as MBFI silently returns zero frequency for unknown/new blocks.

Add a variant for MBFIWrapper in the PGSO query interface.

Depends on D73494.
2020-01-31 09:36:55 -08:00
Jay Foad 2a1b5af299 [GlobalISel] Tidy up unnecessary calls to createGenericVirtualRegister
Summary:
As a side effect some redundant copies of constant values are removed by
CSEMIRBuilder.

Reviewers: aemerson, arsenm, dsanders, aditya_nandakumar

Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, hiraditya, jrtc27, atanasyan, volkan, Petar.Avramovic, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73789
2020-01-31 17:07:16 +00:00
Guillaume Chatelet 3c89b75f23 [NFC] Introduce a type to model memory operation
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.

Reviewers: courbet

Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73785
2020-01-31 17:29:01 +01:00
Quentin Colombet cfebd77742 [GISel][KnownBits] Fix a bug where we could run out of stack space
One of the exit criteria of computeKnownBits is whether we reach the max
recursive call depth. Before this patch we would check that the
depth is exactly equal to max depth to exit.

Depth may get bigger than max depth if it gets passed to a different
GISelKnownBits object.
This may happen when say a generic part uses a GISelKnownBits object
with some max depth, but then we hit TL.computeKnownBitsForTargetInstr
which creates a new GISelKnownBits object with a different and smaller
depth. In that situation, when we hit the max depth check for the first
time in the target specific GISelKnownBits object, depth may already
be bigger than the current max depth. Hence we would continue to compute
the known bits, until we ran through the full depth of the chain of
computation or ran out of stack space.

For instance, let say we have
GISelKnownBits Info(/*MaxDepth*/ = 10);
Info.getKnownBits(Foo)
// 9 recursive calls to computeKnownBitsImpl.
// Then we hit a target specific instruction.
// The target specific GISelKnownBits does this:
  GISelKnownBits TargetSpecificInfo(/*MaxDepth*/ = 6)
  TargetSpecificInfo.computeKnownBitsImpl() // <-- next max depth checks would
                                            // always return false.

This commit does not have any test case, none of the in-tree targets
use computeKnownBitsForTargetInstr.
2020-01-30 19:30:39 -08:00
Leonard Chan 2d3174c4df [SafeStack][DebugInfo] Insert DW_OP_deref in correct location
This patch addresses the issue found in https://bugs.llvm.org/show_bug.cgi?id=44585
where a DW_OP_deref was placed at the end of a dwarf expression, resulting in corrupt
symbols when debugging.

This is an attempt to reland with a few fixes for buildbot since I
haven't merged from master in a bit.

Differential Revision: https://reviews.llvm.org/D73526
2020-01-30 17:09:42 -08:00
Amara Emerson 84bd851108 [GlobalISel][IRTranslator] When translating vector geps, splat the base pointer if required.
We can have geps that have a scalar base pointer, and a vector index value, which
means that the base pointer must be splatted into a vector of pointers.

This fixes crashes on arm64 GlobalISel with optimizations enabled.
2020-01-30 16:27:27 -08:00
Leonard Chan 3b23453b6c Revert "[SafeStack][DebugInfo] Insert DW_OP_deref in correct location"
This reverts commit fff6a1b0f1.

This was breaking a bunch of buildbots.
2020-01-30 16:18:41 -08:00
Leonard Chan fff6a1b0f1 [SafeStack][DebugInfo] Insert DW_OP_deref in correct location
This patch addresses the issue found in https://bugs.llvm.org/show_bug.cgi?id=44585
where a DW_OP_deref was placed at the end of a dwarf expression, resulting in
corrupt symbols when debugging.

Differential Revision: https://reviews.llvm.org/D73526
2020-01-30 15:58:37 -08:00
Matt Arsenault eb7f74e300 CodeGen: Use Register 2020-01-30 15:01:56 -08:00
Sean Fertile 8b737688c2 [AIX] Minor cleanup in AsmPrinter. [NFC]
- Extends the comments related to function descriptors, noting how they
are only used on AIX.

- Changes the condition used to gate the creation of the current function
symbol in AsmPrinter::SetupMachineFunction to reflect being AIX
specific. The creation of the symbol is different because of AIXs
linkage conventions, not because AIX uses function descriptors.

Differential Revision: https://reviews.llvm.org/D73115
2020-01-30 14:15:02 -05:00
Fangrui Song 06b8e32d4f [AArch64] -fpatchable-function-entry=N,0: place patch label after BTI
Summary:
For -fpatchable-function-entry=N,0 -mbranch-protection=bti, after
9a24488cb6, we place the NOP sled after
the initial BTI.

```
.Lfunc_begin0:
bti c
nop
nop

.section __patchable_function_entries,"awo",@progbits,f,unique,0
.p2align 3
.xword .Lfunc_begin0
```

This patch adds a label after the initial BTI and changes the __patchable_function_entries entry to reference the label:

```
.Lfunc_begin0:
bti c
.Lpatch0:
nop
nop

.section __patchable_function_entries,"awo",@progbits,f,unique,0
.p2align 3
.xword .Lpatch0
```

This placement is compatible with the resolution in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 .

A local linkage function whose address is not taken does not need a BTI.
Placing the patch label after BTI has the advantage that code does not
need to differentiate whether the function has an initial BTI.

Reviewers: mrutland, nickdesaulniers, nsz, ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73680
2020-01-30 11:11:52 -08:00
Matt Arsenault ea956685a1 GlobalISel: Implement s32->s64 G_FPTOSI lowering
Port directly from DAG version.

The lowering for G_FPTOUI used to fail on AMDGPU because it uses
G_FPTOSI.
2020-01-30 08:47:07 -05:00
Dominik Montada dc141af755 [GlobalISel] (fix) Use pointer type size for offset constant when lowering stores
Commit 9965b12fd1 was supposed to change the offset constant when
lowering load/stores, but only introduced this change for loads. This
patch adds the same fix for stores.
2020-01-30 08:32:35 -05:00
Simon Pilgrim 57b0d33224 [DAGCombiner] ISD::AND/OR/XOR - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-30 12:02:53 +00:00
Simon Pilgrim a967aa2706 [DAGCombiner] ISD::SDIV/UDIV/SREM/UREM - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-30 12:02:52 +00:00
Matt Arsenault c5fffa4da3 GlobalISel: Add observer argument to legalizeIntrinsic
This is passed to legalizeCustom, but not intrinsic. Also remove the
MRI argument, since you can get that from the MachineIRBuilder.

I'm not sure why MachineIRBuilder has a private observer member, and
this is passed separately.
2020-01-29 18:33:45 -05:00
Amara Emerson c12f046eb9 [GlobalISel] Add new combine to convert scalar G_MUL to G_SHL.
For pow2 constants we should use G_SHL for pattern matching (and perf)
purposes later.

Vector support not yet implemented.

Differential Revision: https://reviews.llvm.org/D73659
2020-01-29 13:39:00 -08:00
Amara Emerson 0da937bb5c [GlobalISel][IRTranslator] Follow convention and put constant offset of getelementptr arithmetic on RHS.
We were needlessly putting known constant values on the LHS of a G_MUL, which
is suboptimal.

Differential Revision: https://reviews.llvm.org/D73650
2020-01-29 11:37:19 -08:00
Fangrui Song 8903e61b66 [AsmPrinter][ELF] Define local aliases (.Lfoo$local) for GlobalObjects
For `MC_GlobalAddress` operands referencing **certain** GlobalObjects,
we can lower them to STB_LOCAL aliases to avoid costs brought by
assembler/linker's conservative decisions about symbol interposition:

* An assembler conservatively assumes a global default visibility symbol interposable (ELF
  semantics). So relocations in object files are needed even if the code generator assumed
  the definition exact and non-interposable.
* The relocations can cause the creation of PLT entries on some targets for -shared links.
  A linker conservatively assumes a global default visibility symbol interposable (if not
  otherwise constrained by -Bsymbolic/--dynamic-list/VER_NDX_LOCAL/etc).

"certain" refers to GlobalObjects in the intersection of
`hasExactDefinition() and !isInterposable()`: `external`, `appending`, `internal`, `private`.
Local linkages (`internal` and `private`) cannot be interposed. `appending` is for very
few objects LLVM interpret specially.  So the set just includes `external`.

This patch emits STB_LOCAL aliases (.Lfoo$local) for such GlobalObjects, so that targets can lower
MC_GlobalAddress operands to STB_LOCAL aliases if applicable.
We may extend the scope and include GlobalAlias in the future.

LLVM's existing -fno-semantic-interposition behaviors give us license to do such optimizations:

* Various optimizations (ipconstprop, inliner, sccp, sroa, etc) treat normal ExternalLinkage
  GlobalObjects as non-interposable.
* Before D72197, MC resolved a PC-relative VK_None fixup to a non-local symbol at assembly time (no
  outstanding relocation), if the target is defined in the same section. Put it simply, even if IR
  optimizations failed to optimize and allowed interposition for the function call in
  `void foo() {} void bar() { foo(); }`, the assembler would disallow it.

This patch sets up AsmPrinter infrastructure to make -fno-semantic-interposition more so.
With and without the patch, the object file output should be identical:
`.Lfoo$local` does not take a symbol table entry.

Reviewed By: sfertile

Differential Revision: https://reviews.llvm.org/D73228
2020-01-29 10:58:43 -08:00
Simon Pilgrim f7245ef897 [DAGCombiner] ISD::SHL/SRA/SRL - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-29 18:49:42 +00:00
Adrian Prantl 18dbe1b279 Run clang-format on DwarfExpression (NFC) 2020-01-29 10:23:12 -08:00
Adrian Prantl 816ee8a423 DwarfExpression: Factor out getOrCreateBaseType() (NFC) 2020-01-29 10:23:12 -08:00
Simon Pilgrim 25b8e96388 [DAGCombiner] ISD::MUL - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-29 17:26:22 +00:00
Simon Pilgrim 4b04e11735 [DAGCombiner] Sub/SUBSAT - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-29 16:57:13 +00:00
Simon Pilgrim 48bd6a0986 [DAGCombiner] visitIMINMAX - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us instead of the ConstantSDNode variant where we have to do it ourselves.
2020-01-29 16:57:13 +00:00
Matt Arsenault b63629a58d GlobalISel: Fix mask computation in lowerInsert
This is supposed to be the high bit index, not the width. Use the
wrapping form of getBitsSet and avoid the bitflip.
2020-01-29 08:25:36 -08:00
Jay Foad 0d7bd34312 [MachineScheduler] Ignore artificial edges when forming store chains
Summary:
BaseMemOpClusterMutation::apply forms store chains by looking for
control (i.e. non-data) dependencies from one mem op to another.

In the test case, clusterNeighboringMemOps successfully clusters the
loads, and then adds artificial edges to the loads' successors as
described in the comment:
      // Copy successor edges from SUa to SUb. Interleaving computation
      // dependent on SUa can prevent load combining due to register reuse.
The effect of this is that *data* dependencies from one load to a store
are copied as *artificial* dependencies from a different load to the
same store.

Then when BaseMemOpClusterMutation::apply looks at the stores, it finds
that some of them have a control dependency on a previous load, which
breaks the chains and means that the stores are not all considered part
of the same chain and won't all be clustered.

The fix is to only consider non-artificial control dependencies when
forming chains.

Subscribers: MatzeB, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71717
2020-01-29 16:23:01 +00:00
Matt Arsenault f717483acd GlobalISel: Assert on invalid bitcast in MIRBuilder
The other casts validate, so this should too.
2020-01-29 07:49:39 -08:00
Matt Arsenault c5c1bb3374 GlobalISel: Lower G_WRITE_REGISTER 2020-01-29 06:48:24 -08:00
David Stenberg 6a2413c435 [ARM64] Debug info for structure argument missing DW_AT_location
Summary:
Prevent eliminating dbg_val due to COPY.

Fixes this
https://bugs.llvm.org/show_bug.cgi?id=40709

Patch by: Kamlesh Kumar (kamleshbhalui)

Reviewers: aprantl, dblaikie, vsk, dsanders

Reviewed By: dsanders

Subscribers: dstenb, kristof.beyls, hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D73159
2020-01-29 10:56:23 +01:00
Sam Parker ac30ea2f87 [RDA][ARM] Move functionality into RDA
Add several new helpers to RDA:
- hasLocalDefBefore
- isRegDefinedAfter
- isSafeToDefRegAt

And move two bits of logic from ARMLowOverheadLoops into RDA:
- isSafeToMove
- isSafeToRemove

Both of these have some wrappers too to make them more convienent to
use.

Differential Revision: https://reviews.llvm.org/D73460
2020-01-29 03:27:47 -05:00
Benjamin Kramer adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
Michael Spang a2fb2c0ddc [GlobalMerge] Preserve symbol visibility when merging globals
Symbols created for merged external global variables have default
visibility. This can break programs when compiling with -Oz
-fvisibility=hidden as symbols that should be hidden will be exported at
link time.

Differential Revision: https://reviews.llvm.org/D73235
2020-01-28 13:26:18 -08:00
Hiroshi Yamauchi 2c03c899d5 [MBFI] Move BranchFolding::MBFIWrapper to its own files. NFC.
Summary:
To avoid header file circular dependency issues in passing updated MBFI (in
MBFIWrapper) to the interface of profile guided size optimizations.

A prep step for (and split off of) D73381.

Reviewers: davidxl

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73494
2020-01-28 10:58:46 -08:00
Sam Parker 7ad879caa0 [NFC][RDA] typedef SmallPtrSetImpl<MachineInstr*> 2020-01-28 13:15:44 +00:00
Wang, Pengfei 3239b5034e [FPEnv] Add pragma FP_CONTRACT support under strict FP.
Summary: Support pragma FP_CONTRACT under strict FP.

Reviewers: craig.topper, andrew.w.kaylor, uweigand, RKSimon, LiuChen3

Subscribers: hiraditya, jdoerfert, cfe-commits, llvm-commits, LuoYuanke

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72820
2020-01-28 20:43:43 +08:00
Guillaume Chatelet 879c825cb8 [instrinsics] Add @llvm.memcpy.inline instrinsics
Summary:
This is a follow up on D61634. It adds an LLVM IR intrinsic to allow better implementation of memcpy from C++.
A follow up CL will add the intrinsics in Clang.

Reviewers: courbet, theraven, t.p.northover, jdoerfert, tejohnson

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71710
2020-01-28 09:42:01 +01:00
Fangrui Song c7c5da6df3 Reland "[StackColoring] Remap PseudoSourceValue frame indices via MachineFunction::getPSVManager()""
Reland 7a8b0b1595, with a fix that checks
`!E.value().empty()` to avoid inserting a zero to SlotRemap.

Debugged by rnk@ in https://bugs.chromium.org/p/chromium/issues/detail?id=1045650#c33

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D73510
2020-01-27 15:58:49 -08:00
Jay Foad cbbbd5b5f6 [GlobalISel] Make use of KnownBits::computeForAddSub
Summary:
This is mostly NFC. computeForAddSub may give more precise results in
some cases, but that doesn't seem to affect any existing GlobalISel
tests.

Subscribers: rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73431
2020-01-27 22:22:56 +00:00
Simon Pilgrim e7e043724e [DAG] Enable ISD::EXTRACT_SUBVECTOR SimplifyMultipleUseDemandedBits handling
This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.

Differential Revision: This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.
2020-01-27 21:17:47 +00:00
Adrian Prantl a095d149c2 Fix an assertion failure in DwarfExpression's subregister composition
This patch fixes an assertion failure in DwarfExpression that is
triggered when a complex fragment has exactly the size of a
subregister of the register the DBG_VALUE points to *and* there is no
DWARF encoding for the super-register.

I took the opportunity to replace/document some magic values with
static constructor functions to make this code less confusing to read.

rdar://problem/58489125

Differential Revision: https://reviews.llvm.org/D72938
2020-01-27 12:44:37 -08:00
Vedant Kumar e08f205f5c Reland (again): [DWARF] Allow cross-CU references of subprogram definitions
This is a revert-of-revert (i.e. this reverts commit 802bec89, which
itself reverted fa4701e1 and 79daafc9) with a fix folded in. The problem
was that call site tags weren't emitted properly when LTO was enabled
along with split-dwarf. This required a minor fix. I've added a reduced
test case in test/DebugInfo/X86/fission-call-site.ll.

Original commit message:

This allows a call site tag in CU A to reference a callee DIE in CU B
without resorting to creating an incomplete duplicate DIE for the callee
inside of CU A.

We already allow cross-CU references of subprogram declarations, so it
doesn't seem like definitions ought to be special.

This improves entry value evaluation and tail call frame synthesis in
the LTO setting. During LTO, it's common for cross-module inlining to
produce a call in some CU A where the callee resides in a different CU,
and there is no declaration subprogram for the callee anywhere. In this
case llvm would (unnecessarily, I think) emit an empty DW_TAG_subprogram
in order to fill in the call site tag. That empty 'definition' defeats
entry value evaluation etc., because the debugger can't figure out what
it means.

As a follow-up, maybe we could add a DWARF verifier check that a
DW_TAG_subprogram at least has a DW_AT_name attribute.

Update #1:

Reland with a fix to create a declaration DIE when the declaration is
missing from the CU's retainedTypes list. The declaration is left out
of the retainedTypes list in two cases:

1) Re-compiling pre-r266445 bitcode (in which declarations weren't added
   to the retainedTypes list), and
2) Doing LTO function importing (which doesn't update the retainedTypes
   list).

It's possible to handle (1) and (2) by modifying the retainedTypes list
(in AutoUpgrade, or in the LTO importing logic resp.), but I don't see
an advantage to doing it this way, as it would cause more DWARF to be
emitted compared to creating the declaration DIEs lazily.

Update #2:

Fold in a fix for call site tag emission in the split-dwarf + LTO case.

Tested with a stage2 ThinLTO+RelWithDebInfo build of clang, and with a
ReleaseLTO-g build of the test suite.

rdar://46577651, rdar://57855316, rdar://57840415, rdar://58888440

Differential Revision: https://reviews.llvm.org/D70350
2020-01-27 10:52:34 -08:00
Nico Weber 68051c1224 Revert "[StackColoring] Remap PseudoSourceValue frame indices via MachineFunction::getPSVManager()"
This reverts commit 7a8b0b1595.
It seems to break exception handling on 32-bit Windows, see
https://crbug.com/1045650
2020-01-27 11:22:33 -05:00
Dominik Montada 9965b12fd1 Use pointer type size for offset constant when lowering load/stores 2020-01-27 06:55:32 -08:00
Matt Arsenault 2a160ba5b0 GlobalISel: Reimplement widenScalar for G_UNMERGE_VALUES results
Only use shifts if the requested type exactly matches the source type,
and create sub-unmerges otherwise.
2020-01-27 06:18:26 -08:00
Matt Arsenault 06d9230fef GlobalISel: Translate vector GEPs 2020-01-27 05:35:05 -08:00
Igor Kudrin 8f3d47c54a [DWARF] Do not pass Version to DWARFExpression. NFCI.
The Version was used only to determine the size of an operand of
DW_OP_call_ref. The size was 4 for all versions apart from 2, but
the DW_OP_call_ref operation was introduced only in DWARF3. Thus,
the code may be simplified and using of Version may be eliminated.

Differential Revision: https://reviews.llvm.org/D73264
2020-01-27 19:08:46 +07:00
David Stenberg 13d4ef9ac0 Improvements to call site register worklist
Summary:
This fixes PR44118. For cases where we have a chain like this:

  R8 = R1 (entry value)
  R0 = R8
  call @foo R0

the code that emits call site entries using entry values would not
follow that chain, instead emitting a call site entry with R8 as
location rather than R0. Such a case was discovered when originally
adding dbgcall-site-orr-moves.mir. This patch fixes that issue. This is
done by changing the ForwardedRegWorklist set to a map in which the
worklist registers always map to the parameter registers that they
describe.

Another thing this patch fixes is that worklist registers now can
describe more than one parameter register at a time. Such a case
occurred in dbgcall-site-interpretation.mir, resulting in a call site
entry not being emitted for one of the parameters.

Reviewers: djtodoro, NikolaPrica, aprantl, vsk

Reviewed By: vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D73168
2020-01-27 12:41:42 +01:00
David Stenberg b46baa82fc Don't separate imp/expl def handling for call site params
Summary:
Since D70431 the describeLoadedValue() hook takes a parameter register,
meaning that it can now be asked to describe any register. This means
that we can drop the difference between explicit and implicit defines
that we previously had in collectCallSiteParameters().

I have not found any case for any upstream targets where a parameter
register is only implicitly defined, and does not overlap with any
explicit defines. I don't know if such a case would even make sense. So
as far as I have tested, this patch should be a non-functional change.
However, this reduces the complexity of the code a bit, and it will
simplify the implementation of an upcoming patch which solves PR44118.

Reviewers: djtodoro, NikolaPrica, aprantl, vsk

Reviewed By: djtodoro, vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D73167
2020-01-27 11:31:09 +01:00
Petar Avramovic cbf03aee6d [MIPS GlobalISel] Select population count (popcount)
G_CTPOP is generated from llvm.ctpop.<type> intrinsics, clang generates
these intrinsics from __builtin_popcount and __builtin_popcountll.
Add lower and narrow scalar for G_CTPOP.
Lower G_CTPOP for MIPS32.

Differential Revision: https://reviews.llvm.org/D73216
2020-01-27 09:59:50 +01:00
Petar Avramovic 8bc7ba5b9e [MIPS GlobalISel] Select count trailing zeros
llvm.cttz.<type> intrinsic has additional i1 argument is_zero_undef,
it tells whether zero as the first argument produces a defined result.
G_CTTZ is generated from llvm.cttz.<type> (<type> <src>, i1 false)
intrinsics, clang generates these intrinsics from __builtin_ctz and
__builtin_ctzll.
G_CTTZ_ZERO_UNDEF comes from llvm.cttz.<type> (<type> <src>, i1 true).
Clang generates such intrinsics as parts of expansion of builtin_ffs
and builtin_ffsll. It is also traditionally part of and many
algorithms that are now predicated on avoiding zero-value inputs.

Add narrow scalar (algorithm uses G_CTTZ_ZERO_UNDEF) for G_CTTZ.
Lower G_CTTZ and G_CTTZ_ZERO_UNDEF for MIPS32.

Differential Revision: https://reviews.llvm.org/D73215
2020-01-27 09:51:06 +01:00
Petar Avramovic 2b66d32f3f [MIPS GlobalISel] Select count leading zeros
llvm.ctlz.<type> intrinsic has additional i1 argument is_zero_undef,
it tells whether zero as the first argument produces a defined result.
MIPS clz instruction returns 32 for zero input.
G_CTLZ is generated from llvm.ctlz.<type> (<type> <src>, i1 false)
intrinsics, clang generates these intrinsics from __builtin_clz and
__builtin_clzll.
G_CTLZ_ZERO_UNDEF can also be generated from llvm.ctlz with true as
second argument. It is also traditionally part of and many algorithms
that are now predicated on avoiding zero-value inputs.

Add narrow scalar for G_CTLZ (algorithm uses G_CTLZ_ZERO_UNDEF).
Lower G_CTLZ_ZERO_UNDEF and select G_CTLZ for MIPS32.

Differential Revision: https://reviews.llvm.org/D73214
2020-01-27 09:43:38 +01:00
Fangrui Song 941f20c3bd [MachineVerifier] Simplify and delete LLVM_VERIFY_MACHINEINSTRS from a comment. NFC
The environment variable has been unused since r228079.
2020-01-27 00:31:23 -08:00
Wang, Pengfei 17b8f96d65 [FPEnv] Divide macro INSTRUCTION into INSTRUCTION and DAG_INSTRUCTION,
and macro FUNCTION likewise. NFCI.

Some functions like fmuladd don't really have a node, we should divide
the declaration form those have node to avoid introducing fake nodes.

Differential Revision: https://reviews.llvm.org/D72871
2020-01-27 10:38:05 +08:00
Simon Pilgrim 4a5f9d9faf [TargetLowering] Respect recursive depth in SimplifyDemandedBits call to ComputeNumSignBits 2020-01-26 10:01:56 +00:00
Simon Pilgrim 3daa71ee00 [SelectionDAG] ComputeNumSignBits - add DemandedElts support for MIN/MAX ops 2020-01-25 20:21:14 +00:00
Simon Pilgrim 3f8916b2e8 [SelectionDAG] ComputeNumSignBits - add support for rotate non-uniform vector amounts 2020-01-25 19:15:05 +00:00
Simon Pilgrim e3c26a9d1b [SelectionDAG] ComputeNumSignBits - add support for rotate uniform vector amounts 2020-01-25 18:55:47 +00:00
Simon Pilgrim c8de7c8f50 [TargetLowering] SimplifyDemandedBits - Remove ashr if all our demandedbits already match the sign bit
Differential Revision: https://reviews.llvm.org/D73412
2020-01-25 17:36:46 +00:00
Vedant Kumar 802bec8961 Revert "Reland: [DWARF] Allow cross-CU references of subprogram definitions"
... as well as:
Revert "[DWARF] Defer creating declaration DIEs until we prepare call site info"

This reverts commit fa4701e197.

This reverts commit 79daafc903.

There have been reports of this assert getting hit:

CalleeDIE && "Could not find DIE for call site entry origin
2020-01-24 18:07:54 -08:00
Quentin Colombet 5d87b5d202 [GISelKnownBits] Add support for PHIs
Teach the GISelKnowBits analysis how to deal with PHI operations.
PHIs are essentially COPYs happening on edges, so we can just reuse
the code for COPY.

This is NFC COPY-wise has we leave Depth untouched when calling
computeKnownBitsImpl for COPYs, like it was before this patch.
Increasing Depth is however required for PHIs as they may loop back to
themselves and we would end up in an infinite loop if we were not
increasing Depth.

Differential Revision: https://reviews.llvm.org/D73317
2020-01-24 16:43:52 -08:00
@justice_adams (Justice Adams) daee63f974 [SelectionDag] Updated FoldConstantArithmetic method signature in preparation for merge with FoldConstantVectorArithmetic
Updated FoldConstantArithmetic method signature to match that of
FoldConstantVectorArithmetic in preparation for merging the two
functions together

https://bugs.llvm.org/show_bug.cgi?id=36544

This is the first step in combining the various
FoldConstantVectorArithmetic and FoldConstantVectorArithmetic
functions into one FoldConstantArithmetic function.

Differential Revision: https://reviews.llvm.org/D72870
2020-01-24 18:00:58 -05:00
Craig Topper d3bf06bc81 [DAGCombiner] Add combine for (not (strict_fsetcc)) to create a strict_fsetcc with the opposite condition.
Unlike the existing code that I modified here, I only handle the
case where the strict_fsetcc has a single use. Not sure exactly
how to handle multiples uses.

Testing this on X86 is hard because we already have a other
combines that get rid of lowered version of the integer setcc that
this xor will eventually become. So this combine really just
saves a bunch of extra nodes being created. Not sure about other
targets.

Differential Revision: https://reviews.llvm.org/D71816
2020-01-24 14:15:36 -08:00
Stanislav Mekhanoshin be8e38cbd9 Correct NumLoads in clustering
Scheduler sends NumLoads argument into shouldClusterMemOps()
one less the actual cluster length. So for 2 instructions
it will pass just 1. Correct this number.

This is NFC for in tree targets.

Differential Revision: https://reviews.llvm.org/D73292
2020-01-24 12:45:28 -08:00
Stanislav Mekhanoshin 7a94d4f4ee Allow combining of extract_subvector to extract element
Differential Revision: https://reviews.llvm.org/D73132
2020-01-24 10:50:26 -08:00
Fangrui Song 50a3ff30e1 [PatchableFunction] Allow empty entry MachineBasicBlock
Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D73301
2020-01-24 09:42:48 -08:00
Tom Weaver f5147765ba [DebugInfo][LiveDebugValues] Teach Live Debug Values About Meta Instructions
Previously LiveDebugValues pass would consider meta instructions that 'fiddle' with liveness of registers as register definitions when transfering register defs. This would mean that, for example, a KILL instruction would cause LiveDebugValues to terminate the range of an earlier DBG_VALUE instruction resulting in the none propogation of said DBG_VALUE instructions into later blocks.

This patch adds the check and a helpful comment, fixes a test that previously tested for the broken behaviour by coincidence and adds a test specifically for this.

reviewers: vsk, dstenb, djtodoro

Differential Revision: https://reviews.llvm.org/D73210
2020-01-24 16:29:05 +00:00
Guillaume Chatelet 805c157e8a [Alignment][NFC] Deprecate Align::None()
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.

Reviewers: xbolva00, courbet, bollu

Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73099
2020-01-24 12:53:58 +01:00
Simon Pilgrim 0b45c2264a [SelectionDAG] rot(x, y) --> x iff ComputeNumSignBits(x) == BitWidth(x)
Rotating an 0/-1 value by any amount will always result in the same 0/-1 value
2020-01-24 10:35:57 +00:00
Fangrui Song 22467e2595 Add function attribute "patchable-function-prefix" to support -fpatchable-function-entry=N,M where M>0
Similar to the function attribute `prefix` (prefix data),
"patchable-function-prefix" inserts data (M NOPs) before the function
entry label.

-fpatchable-function-entry=2,1 (1 NOP before entry, 1 NOP after entry)
will look like:

```
  .type	foo,@function
.Ltmp0:               # @foo
  nop
foo:
.Lfunc_begin0:
  # optional `bti c` (AArch64 Branch Target Identification) or
  # `endbr64` (Intel Indirect Branch Tracking)
  nop

  .section  __patchable_function_entries,"awo",@progbits,get,unique,0
  .p2align  3
  .quad .Ltmp0
```

-fpatchable-function-entry=N,0 + -mbranch-protection=bti/-fcf-protection=branch has two reasonable
placements (https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01185.html):

```
(a)         (b)

func:       func:
.Ltmp0:     bti c
  bti c     .Ltmp0:
  nop       nop
```

(a) needs no additional code. If the consensus is to go for (b), we will
need more code in AArch64BranchTargets.cpp / X86IndirectBranchTracking.cpp .

Differential Revision: https://reviews.llvm.org/D73070
2020-01-23 17:02:27 -08:00
Simon Pilgrim e25eee4db7 [SelectionDAG] ComputeNumSignBits - add ISD::ADD demanded elts support 2020-01-23 17:48:07 +00:00
Sam Parker 05532575e8 [RDA] Skip debug values
Skip debug instructions when iterating through a block to find uses.

Differential Revision: https://reviews.llvm.org/D73273
2020-01-23 17:04:54 +00:00
Simon Pilgrim 0fec8acdd8 [SelectionDAG] ComputeNumSignBits - add ISD::ADD vector support
Add missing handling for (ADD (AND X, 1), -1) uniform vectors
2020-01-23 16:42:12 +00:00
Guillaume Chatelet 59f95222d4 [Alignment][NFC] Use Align with CreateAlignedStore
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, bollu

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73274
2020-01-23 17:34:32 +01:00
Simon Pilgrim fc5bbbf328 [SelectionDAG] ComputeNumSignBits - add ISD::SUB demanded elts support 2020-01-23 16:20:48 +00:00
Jay Foad b482e1bfe2 [CodeGen] Make use of MachineInstrBuilder::getReg
Reviewers: arsenm

Subscribers: wdng, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73262
2020-01-23 13:38:13 +00:00
Sam Parker 0d1468db58 [NFC][RDA] Make the interface const
Make all the public query methods const.
2020-01-23 13:32:11 +00:00
Simon Pilgrim 48d4ba8fb2 [SelectionDAG] Compute Known + Sign Bits - merge INSERT_VECTOR_ELT known/unknown index paths
Match the approach in SimplifyDemandedBits where we calculate the demanded elts and then have a common path for the ComputeKnownBits/ComputeNumSignBits call.
2020-01-23 13:31:37 +00:00
Guillaume Chatelet 279fa8e006 [Alignement][NFC] Deprecate untyped CreateAlignedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73260
2020-01-23 13:34:32 +01:00
Simon Pilgrim 03cae086f4 [SelectionDAG] ComputeKnownBits - merge EXTRACT_VECTOR_ELT known/unknown index paths
Match the approach in SimplifyDemandedBits/ComputeNumSignBits where we calculate the demanded elts and then have a common path for the ComputeKnownBits call.
2020-01-23 11:29:16 +00:00
Simon Pilgrim 98da49d979 [SelectionDAG] Compute Known + Sign Bits - merge INSERT_SUBVECTOR known/unknown index paths
Match the approach in SimplifyDemandedBits where we calculate the demanded elts and then have a common path for the ComputeKnownBits/ComputeNumSignBits call, additionally we only ever need original demanded elts of the base vector even if the index is unknown.
2020-01-23 11:29:15 +00:00
Djordje Todorovic 91b0956f38 [NFC][DwarfDebug] Use proper analog GNU attribute for the pc address
The low_pc is analog to the DW_AT_call_return_pc, since it describes
the return address after the call. The DW_AT_call_pc is the address
of the call instruction, and we don't use it at the moment.

Differential Revision: https://reviews.llvm.org/D73173
2020-01-23 12:15:35 +01:00
David Tenty 45a4aaea7f [NFC][XCOFF] Refactor Csect creation into TargetLoweringObjectFile
Summary:
We create a number of standard types of control sections in multiple places for
things like the function descriptors, external references and the TOC anchor
among others, so it is possible for  their properties to be defined
inconsistently in different places. This refactor moves their creation and
properties into functions in the TargetLoweringObjectFile class hierarchy, where
functions for retrieving various special types of sections typically seem
to reside.

Note: There is one case in PPCISelLowering which is specific to function entry
points which we don't address since we don't have access to the TLOF there.

Reviewers: DiggerLin, jasonliu, hubert.reinterpretcast

Reviewed By: jasonliu, hubert.reinterpretcast

Subscribers: wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72347
2020-01-22 12:09:11 -05:00
Stanislav Mekhanoshin 2d0fcf786c Precommit NFC part of DAGCombiner change. NFC.
This is NFC part of DAGCombiner::visitEXTRACT_SUBVECTOR()
change in the D73132.
2020-01-22 09:01:22 -08:00
Hiroshi Yamauchi ddbc728828 [PGO][PGSO] Update BFI in CodeGenPrepare::optimizeSelectInst.
Summary:
Without the BFI update, some hot blocks are incorrectly treated as cold code.

This fixes a FDO perf regression in the TSVC benchmark from D71288.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73146
2020-01-22 08:36:54 -08:00
Sander de Smalen 4cf16efe49 [AArch64][SVE] Add patterns for unpredicated load/store to frame-indices.
This patch also fixes up a number of cases in DAGCombine and
SelectionDAGBuilder where the size of a scalable vector is used in a
fixed-width context (thus triggering an assertion failure).

Reviewers: efriedma, c-rhodes, rovka, cameron.mcinally

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71215
2020-01-22 14:32:27 +00:00
Jay Foad e0f0d0e55c [MachineScheduler] Allow clustering mem ops with complex addresses
The generic BaseMemOpClusterMutation calls into TargetInstrInfo to
analyze the address of each load/store instruction, and again to decide
whether two instructions should be clustered. Previously this had to
represent each address as a single base operand plus a constant byte
offset. This patch extends it to support any number of base operands.

The old target hook getMemOperandWithOffset is now a convenience
function for callers that are only prepared to handle a single base
operand. It calls the new more general target hook
getMemOperandsWithOffset.

The only requirements for the base operands returned by
getMemOperandsWithOffset are:
- they can be sorted by MemOpInfo::Compare, such that clusterable ops
  get sorted next to each other, and
- shouldClusterMemOps knows what they mean.

One simple follow-on is to enable clustering of AMDGPU FLAT instructions
with both vaddr and saddr (base register + offset register). I've left
a FIXME in the code for this case.

Differential Revision: https://reviews.llvm.org/D71655
2020-01-22 14:28:24 +00:00
Simon Pilgrim 80656fd7ae [SelectionDAG] getShiftAmountConstant - assert the type is an integer. 2020-01-22 13:52:44 +00:00
Sander de Smalen 67d4c9924c Add support for (expressing) vscale.
In LLVM IR, vscale can be represented with an intrinsic. For some targets,
this is equivalent to the constexpr:

  getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1

This can be used to propagate the value in CodeGenPrepare.

In ISel we add a node that can be legalized to one or more
instructions to materialize the runtime vector length.

This patch also adds SVE CodeGen support for VSCALE, which maps this
node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD]
instructions (scaled multiples of 2, 4, or 8 bytes, respectively).

Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner

Reviewed by: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68203
2020-01-22 10:09:27 +00:00
Guillaume Chatelet 0957233320 [Alignment][NFC] Use Align with CreateMaskedStore
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73106
2020-01-22 11:04:39 +01:00
Amara Emerson 67a8775322 [AArch64] Don't generate gpr CSEL instructions in early-ifcvt if regclasses aren't compatible.
In GlobalISel we may in some unfortunate circumstances generate PHIs with
operands that are on separate banks. If-conversion doesn't currently check for
that case and ends up generating a CSEL on AArch64 with incorrect register
operands.

Differential Revision: https://reviews.llvm.org/D72961
2020-01-21 16:51:31 -08:00
Quentin Colombet ff1f3cc1a1 [GISelKnownBits] Make the max depth a parameter of the analysis
Allow users of that analysis to define the cut off depth of the
analysis instead of hardcoding 6.

NFC as the default parameter is 6.
2020-01-21 11:35:31 -08:00
Thomas Lively 3ef169e586 [WebAssembly][InstrEmitter] Foundation for multivalue call lowering
Summary:
WebAssembly is unique among upstream targets in that it does not at
any point use physical registers to store values. Instead, it uses
virtual registers to model positions in its value stack. This means
that some target-independent lowering activities that would use
physical registers need to use virtual registers instead for
WebAssembly and similar downstream targets. This CL generalizes the
existing `usesPhysRegsForPEI` lowering hook to
`usesPhysRegsForValues` in preparation for using it in more places.

One such place is in InstrEmitter for instructions that have variadic
defs. On register machines, it only makes sense for these defs to be
physical registers, but for WebAssembly they must be virtual registers
like any other values. This CL changes InstrEmitter to check the new
target lowering hook to determine whether variadic defs should be
physical or virtual registers.

These changes are necessary to support a generalized CALL instruction
for WebAssembly that is capable of returning an arbitrary number of
arguments. Fully implementing that instruction will require additional
changes that are described in comments here but left for a follow up
commit.

Reviewers: aheejin, dschuff, qcolombet

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71484
2020-01-21 11:13:46 -08:00
Fangrui Song 7a8b0b1595 [StackColoring] Remap PseudoSourceValue frame indices via MachineFunction::getPSVManager()
Reviewed By: dantrushin

Differential Revision: https://reviews.llvm.org/D73063
2020-01-21 09:46:27 -08:00
Krzysztof Parzyszek 020041d99b Update spelling of {analyze,insert,remove}Branch in strings and comments
These names have been changed from CamelCase to camelCase, but there were
many places (comments mostly) that still used the old names.

This change is NFC.
2020-01-21 10:15:38 -06:00
Simon Pilgrim f04284cf1d [TargetLowering] SimplifyDemandedBits ISD::SRA multi-use handling
Call SimplifyMultipleUseDemandedBits to peek through extended source args with multiple uses
2020-01-21 15:12:07 +00:00
Simon Pilgrim 47f99d2ca8 [SelectionDAG] GetDemandedBits - remove ANY_EXTEND handling
Rely on SimplifyMultipleUseDemandedBits fallback instead.
2020-01-21 14:39:00 +00:00
Simon Pilgrim 651fa669a2 [TargetLowering] SimplifyDemandedBits ANY_EXTEND/ANY_EXTEND_VECTOR_INREG multi-use handling
Call SimplifyMultipleUseDemandedBits to peek through extended source args with multiple uses
2020-01-21 14:07:19 +00:00
Simon Pilgrim 5f5f478564 [DAG] Fold extract_vector_elt (scalar_to_vector), K to undef (K != 0)
This was unconditionally folding this to the source operand, even if the access was out of bounds. Use undef instead of the extract is not the first element.

This helps with some cases where 3-vectors are legalized and avoids processing the 4th component.

Original Patch by: arsenm (Matt Arsenault)

Differential Revision: https://reviews.llvm.org/D51589
2020-01-21 10:58:30 +00:00
Simon Pilgrim 8d2e6bdbe1 [TargetLowering] SimplifyDemandedBits - Pull out InDemandedMask variable to ISD::SHL. NFCI.
Matches ISD::SRA + ISD::SRL variants.
2020-01-21 10:40:18 +00:00
Fangrui Song d232c21566 [AsmPrinter] Don't emit __patchable_function_entries entry if "patchable-function-entry"="0"
Add improve tests
2020-01-20 16:13:48 -08:00
Simon Pilgrim 9c06c10fba [SelectionDAG] GetDemandedBits - fallback to SimplifyMultipleUseDemandedBits by default.
First step towards removing SelectionDAG::GetDemandedBits entirely since it so similar to SimplifyMultipleUseDemandedBits anyhow.
2020-01-20 16:51:52 +00:00
Awanish Pandey 84c4c87e04 Recommit "[DWARF5][DebugInfo]: Added support for DebugInfo generation for auto return type for C++ member functions."
Summary:
This was reverted in 328e0f3dca due to
chromium bot failure. This revision addresses that case.

Original commit message:
Summary:
    This patch will provide support for auto return type for the C++ member
    functions. Before this return type of the member function is deduced and
    stored in the DIE.
    This patch includes llvm side implementation of this feature.

    Patch by: Awanish Pandey <Awanish.Pandey@amd.com>

    Reviewers: dblaikie, aprantl, shafik, alok, SouraVX, jini.susan.george

    Reviewed by: dblaikie

    Differential Revision: https://reviews.llvm.org/D70524
2020-01-20 15:13:13 +05:30
Fangrui Song eaab1bf21e [StackColoring] Remap FixedStackPseudoSourceValue frame index referenced by MachineMemOperand
StackColoring::remapInstructions() remaps MachineOperand frame index (e.g. %stack.1 -> %stack.0)
but does not remap FixedStackPseudoSourceValue frame index (e.g. store 4 into %stack.1.ap2.i.i)
referenced by MachineMemoryOperand.

This can cause an assertion failure when LiveDebugValues references a dead stack object.

It is difficult to craft a test case. -g, va_copy and stack-coloring are required.
I can only reproduce it on ppc32.
2020-01-19 22:53:45 -08:00
Fangrui Song 886d2c2ca7 [BranchRelaxation] Simplify offset computation and fix a bug in adjustBlockOffsets()
If Start!=0, adjustBlockOffsets() may unnecessarily adjust the offset of
Start. There is no correctness issue, but it can create more block
splits.
2020-01-19 16:02:16 -08:00
Fangrui Song 9a24488cb6 [CodeGen] Move fentry-insert, xray-instrumentation and patchable-function before addPreEmitPass()
This intention is to move patchable-function before aarch64-branch-targets
(configured in AArch64PassConfig::addPreEmitPass) so that we emit BTI before NOPs
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424).

This also allows addPreEmitPass() passes to know the precise instruction sizes if they want.

Tried x86-64 Debug/Release builds of ccls with -fxray-instrument -fxray-instruction-threshold=1.
No output difference with this commit and the previous commit.
2020-01-19 00:09:46 -08:00
Fangrui Song 9583a3f262 [AsmPrinter] Delete dead takeDeletedSymbsForFunction()
The code added in r98579 is dead now.
2020-01-18 17:08:00 -08:00
Michael Liao 6d0d86a64d [DAG] Add helper for creating constant vector index with correct type. NFC. 2020-01-18 01:23:36 -05:00
David Blaikie 58b10df54f DebugInfo: Move SectionLabel tracking into CU's addRange
This makes the SectionLabel handling more resilient - specifically for
future PROPELLER work which will have more CU ranges (rather than just
one per function).

Ultimately it might be nice to make this more general/resilient to
arbitrary labels (rather than relying on the labels being created for CU
ranges & then being reused by ranges, loclists, and possibly other
addresses). It's possible that other (non-rnglist/loclist) uses of
addresses will need the addresses to be in SectionLabels earlier (eg:
move the CU.addRange to be done on function begin, rather than function
end, so during function emission they are already populated for other
use).
2020-01-17 18:12:34 -08:00
Derek Schuff ff171acf84 [WebAssembly] Track frame registers through VReg and local allocation
This change has 2 components:

Target-independent: add a method getDwarfFrameBase to TargetFrameLowering. It
describes how the Dwarf frame base will be encoded.  That can be a register (the
default), the CFA (which replaces NVPTX-specific logic in DwarfCompileUnit), or
a DW_OP_WASM_location descriptr.

WebAssembly: Allow WebAssemblyFunctionInfo::getFrameRegister to return the
correct virtual register instead of FP32/SP32 after WebAssemblyReplacePhysRegs
has run.  Make WebAssemblyExplicitLocals store the local it allocates for the
frame register. Use this local information to implement getDwarfFrameBase

The result is that the DW_AT_frame_base attribute is correctly encoded for each
subprogram, and each param and local variable has a correct DW_AT_location that
uses DW_OP_fbreg to refer to the frame base.

This is a reland of rG3a05c3969c18 with fixes for the expensive-checks
and Windows builds

Differential Revision: https://reviews.llvm.org/D71681
2020-01-17 17:23:56 -08:00
Matt Arsenault a4451d88ee Consolidate internal denormal flushing controls
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.

- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute

AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).

Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.

Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.

-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.

Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.

This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.

AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
2020-01-17 20:09:53 -05:00
Reid Kleckner 423e3db6a8 Remove unneeded FoldingSet.h include from Attributes.h
Avoids 637 extra FoldingSet.h and Allocator.h includes. FoldingSet.h
needs Allocator.h, which is relatively expensive.
2020-01-17 16:36:09 -08:00
Evgenii Stepanov d081962dea Merge memtag instructions with adjacent stack slots.
Summary:
Detect a run of memory tagging instructions for adjacent stack frame slots,
and replace them with a shorter instruction sequence
* replace STG + STG with ST2G
* replace STGloop + STGloop with STGloop

This code needs to run when stack slot offsets are already known, but before
FrameIndex operands in STG instructions are eliminated; that's the
reason for the new hook in PrologueEpilogue.

This change modifies STGloop and STZGloop pseudos to take the size as an
immediate integer operand, and adds _untied variants of those pseudos
that are allowed to take the base address as a FI operand. This is needed to
simplify recognizing an STGloop instruction as operating on a stack slot
post-regalloc.

This improves memtag code size by ~0.25%, and it looks like an additional ~0.1%
is possible by rearranging the stack frame such that consecutive STG
instructions reference adjacent slots (patch pending).

Reviewers: pcc, ostannard

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70286
2020-01-17 15:19:29 -08:00
Ian Levesque 97ba483026 [xray] Allow instrumenting only function entry and/or only function exit
Extend -fxray-instrumentation-bundle to split function-entry and
function-exit into two separate options, so that it is possible to
instrument only function entry or only function exit.  For use cases
that only care about one or the other this will save significant overhead
and code size.

Differential Revision: https://reviews.llvm.org/D72890
2020-01-17 13:32:34 -08:00
Ian Levesque 7628e474a5 [xray] Add xray-ignore-loops option
XRay allows tuning by minimum function size, but also always instruments
functions with loops in them.  If the minimum function size is set to a
large value the loop instrumention ends up causing most functions to be
instrumented anyway.  This adds a new flag, xray-ignore-loops, to disable
the loop detection logic.

Differential Revision: https://reviews.llvm.org/D72659
2020-01-17 13:32:17 -08:00
Adrian Prantl 7b30370e5b Move the sysroot attribute from DIModule to DICompileUnit
[this re-applies c0176916a4
 with the correct commit message and phabricator link]

This addresses point 1 of PR44213.
https://bugs.llvm.org/show_bug.cgi?id=44213

The DW_AT_LLVM_sysroot attribute is used for Clang module debug info,
to allow LLDB to import a Clang module from source. Currently it is
part of each DW_TAG_module, however, it is the same for all modules in
a compile unit. It is more efficient and less ambiguous to store it
once in the DW_TAG_compile_unit.

This should have no effect on DWARF consumers other than LLDB.

Differential Revision: https://reviews.llvm.org/D71732
2020-01-17 12:55:40 -08:00
Adrian Prantl c17aee67f1 Revert "Rename DW_AT_LLVM_isysroot to DW_AT_LLVM_sysroot"
This reverts commit 12e479475a.

I accidentally landed this patch with the wrong commit message ...
2020-01-17 12:52:36 -08:00
Adrian Prantl 12e479475a Rename DW_AT_LLVM_isysroot to DW_AT_LLVM_sysroot
This is a purely cosmetic change that is NFC in terms of the binary
output. I bugs me that I called the attribute DW_AT_LLVM_isysroot
since the "i" is an artifact of GCC command line option syntax
(-isysroot is in the category of -i options) and doesn't carry any
useful information otherwise.

This attribute only appears in Clang module debug info.

Differential Revision: https://reviews.llvm.org/D71722
2020-01-17 09:36:48 -08:00
Simon Pilgrim 1dc2f25790 [SelectionDAG] ComputeKnownBits - assert we're computing the 0'th (difference) result for the SUB/SUBC cases
Matches what we already do for the ADD/ADDC/ADDE case.
2020-01-17 13:53:57 +00:00
Sam Parker 42350cd893 [ARM][MVE] Tail Predicate IsSafeToRemove
Introduce a method to walk through use-def chains to decide whether
it's possible to remove a given instruction and its users. These
instructions are then stored in a set until the end of the transform
when they're erased. This is now used to perform checks on the
iteration count (LoopDec chain), element count (VCTP chain) and the
possibly redundant iteration count.

As well as being able to remove chains of instructions, we know also
check that the sub feeding the vctp is producing the expected value.

Differential Revision: https://reviews.llvm.org/D71837
2020-01-17 13:19:14 +00:00
Simon Pilgrim f611158350 [SelectionDAG] Better ISD::ANY_EXTEND/ISD::ANY_EXTEND_VECTOR_INREG ComputeKnownBits support
Add DemandedElts handling to ISD::ANY_EXTEND and add missing ISD::ANY_EXTEND_VECTOR_INREG handling. Despite the lack of test changes this code IS being used - its just that the ANY_EXTEND ops are legalized later on (typically to ZERO_EXTEND equivalents) so we typically manage to combine later on.
2020-01-17 11:37:58 +00:00
Davide Italiano 30a8865142 [FastISel] Lower `llvm.dbg.value(undef, ...` correctly.
Summary:
Instead of just dropping them.

<rdar://problem/58657146>

Reviewers: aprantl, vsk, ab, paquette, echristo

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72877
2020-01-16 16:22:20 -08:00
Derek Schuff 80906d9d16 Revert "[WebAssembly] Track frame registers through VReg and local allocation"
This reverts commit 3a05c3969c.
It breaks under expensive-checks and on Windows
2020-01-16 14:38:00 -08:00
Derek Schuff 3a05c3969c [WebAssembly] Track frame registers through VReg and local allocation
This change has 2 components:

Target-independent: add a method getDwarfFrameBase to TargetFrameLowering. It
describes how the Dwarf frame base will be encoded.  That can be a register (the
default), the CFA (which replaces NVPTX-specific logic in DwarfCompileUnit), or
a DW_OP_WASM_location descriptr.

WebAssembly: Allow WebAssemblyFunctionInfo::getFrameRegister to return the
correct virtual register instead of FP32/SP32 after WebAssemblyReplacePhysRegs
has run.  Make WebAssemblyExplicitLocals store the local it allocates for the
frame register. Use this local information to implement getDwarfFrameBase

The result is that the DW_AT_frame_base attribute is correctly encoded for each
subprogram, and each param and local variable has a correct DW_AT_location that
uses DW_OP_fbreg to refer to the frame base.

Differential Revision: https://reviews.llvm.org/D71681
2020-01-16 13:51:17 -08:00
Matt Arsenault a66d2817ca GlobalISel: Don't ignore requested ext narrowing type
This was assuming the narrow target was the source type. Respect the
requested type when these don't match by using intermediate
merges. This avoids producing very wide, illegal shift expansions.
2020-01-16 14:29:37 -05:00
Matt Arsenault be31a7b7ee GlobalISel: Move extension scalar narrowing to separate function
Also rename a few things. Handling a different requested type will
require this to become much more complex.
2020-01-16 14:29:37 -05:00
Craig Topper 61a89e17df [LegalizeDAG][Mips] Add an assert to protect a uint_to_fp implementation from double rounding. Add a i32->f32 uint_to_fp implementation that avoids this code.
The algorithm here only works if the sint_to_fp doesn't do any
rounding. Otherwise it can round before the offset fixup is
applied. Add an assert to protect this.

To avoid breaking the one test in tree that tested this code
with a set of types that fail the assert, I've enabled i32->f32
to use the i64->f32 algorithm. This only occurs when f64 isn't
a legal type. If f64 is legal then we do i32->f64->f32 instead.

Differential Revision: https://reviews.llvm.org/D72794
2020-01-16 11:08:16 -08:00
Matt Arsenault d0943537e1 GlobalISel: Apply target MMO flags to atomics
Unify MMO flag handling with SelectionDAG like with loads and stores.
2020-01-16 13:49:43 -05:00
Matt Arsenault 0d0fce42b0 GlobalISel: Preserve load/store metadata in IRTranslator
This was dropping the invariant metadata on dead argument loads, so
they weren't deleted.

Atomics still need to be fixed the same way. Also, apparently store
was never preserving dereferencable which should also be fixed.
2020-01-16 13:49:43 -05:00
Jay Foad 885260d5d8 [GlobalISel] Don't arbitrarily limit a mask to 64 bits
Reviewers: arsenm

Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72853
2020-01-16 16:13:20 +00:00
Jay Foad 63f73545dd [GlobalISel] Pass MachineOperands into MachineIRBuilder helper methods
Reviewers: arsenm, aditya_nandakumar, aemerson

Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72849
2020-01-16 16:04:21 +00:00
Sam Parker 760b175109 [ARM][LowOverheadLoops] Update liveness info
Recommitting e93e0d413f after reverting due to test failures, which
will hopefully now be fixed. Original commit message:

After expanding the pseudo instructions, update the liveness info.
We do this in a post-order traversal of the loop, including its
exit blocks and preheader(s).

Differential Revision: https://reviews.llvm.org/D72131
2020-01-16 15:44:25 +00:00
Jay Foad 28bb43bdf8 [GlobalISel] Use more MachineIRBuilder helper methods
Reviewers: arsenm, nhaehnle

Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72833
2020-01-16 15:34:51 +00:00
Jeremy Morse c969335abd Revert "[PHIEliminate] Move dbg values after phi and label"
Testing compiler-rt, a new assertion failure occurs when building
the GwpAsanTestObjects object. I'm uploading a reproducer to D70597.

This reverts commit 75188b01e9.
2020-01-16 14:01:27 +00:00
Chris Ye 75188b01e9 [PHIEliminate] Move dbg values after phi and label
If there are DBG_VALUEs between phi and label (after phi and before label),
DBG_VALUE will block PHI lowering after the LABEL. Moving all DBG_VALUEs
after Labels in the function ScheduleDAGSDNodes::EmitSchedule to avoid
impacting PHI lowering.

  before:
     PHI
     DBG_VALUE
     LABEL
  after: (move DBG_VALUE after label)
     PHI
     LABEL
     DBG_VALUE
  then: (phi lowering after label)
     LABEL
     COPY
     DBG_VALUE

Fixes the issue: https://bugs.llvm.org/show_bug.cgi?id=43859

Differential Revision: https://reviews.llvm.org/D70597
2020-01-16 11:58:09 +00:00
Craig Topper 5cf1b01a01 [LegalizeDAG][TargetLowering] Move vXi64/i64->vXf32/f32 uint_to_fp legalizing code from TargetLowering::expandUINT_TO_FP back to LegalizeDAG.
This was moved in October 2018, but we don't appear to be using
this for vectors on any in tree target.

Moving it back simplifies D72794 so we can share the code for i32->f32.
2020-01-15 22:04:50 -08:00
Stanislav Mekhanoshin 8b417dd3d6 Process BUNDLE in tail duplication
When tail duplication estimates a size of tail it uses instruction
count. Account for a number of instrictions in a bundle too.

Differential Revision: https://reviews.llvm.org/D72783
2020-01-15 15:46:57 -08:00
Matt Arsenault 25e9938a45 GlobalISel: Handle more cases of G_SEXT narrowing
This now develops the same problem G_ZEXT/G_ANYEXT have where the
requested type is assumed to be the source type. This will be fixed
separately by creating intermediate merges.
2020-01-15 18:33:15 -05:00
Vedant Kumar 43464509fc DWARF: Simplify the way the return PC is attached to call site tags, NFC
This cleanup was suggested by Djordje in D72489.
2020-01-15 14:16:21 -08:00