Commit Graph

42427 Commits

Author SHA1 Message Date
Ronak Chauhan 487a805310 [AMDGPU] Support disassembly for AMDGPU kernel descriptors
Decode AMDGPU Kernel descriptors as assembler directives.

Reviewed By: scott.linder, jhenderson, kzhuravl

Differential Revision: https://reviews.llvm.org/D80713
2020-09-08 21:26:11 +05:30
Xing GUO 25c3fa3f13 [DWARFYAML] Make the debug_ranges section optional.
This patch makes the debug_ranges section optional. When we specify an
empty debug_ranges section, yaml2obj only emits the section header.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D87263
2020-09-08 19:55:47 +08:00
Roman Lebedev bb7d3af113
Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
This was reverted in 503deec218
because it caused gigantic increase (3x) in branch mispredictions
in certain benchmarks on certain CPU's,
see https://reviews.llvm.org/D84108#2227365.

It has since been investigated and here are the results:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html
> It's an amazingly severe regression, but it's also all due to branch
> mispredicts (about 3x without this). The code layout looks ok so there's
> probably something else to deal with. I'm not sure there's anything we can
> reasonably do so we'll just have to take the hit for now and wait for
> another code reorganization to make the branch predictor a bit more happy :)
>
> Thanks for giving us some time to investigate and feel free to recommit
> whenever you'd like.
>
> -eric

So let's just reland this.
Original commit message:


I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108

This reverts commit 503deec218.
2020-09-08 00:24:03 +03:00
Simon Pilgrim 783d7116dc AntiDepBreaker.h - remove unnecessary ScheduleDAG.h include. NFCI. 2020-09-07 16:39:42 +01:00
Simon Pilgrim 6670f5d1e6 MachineStableHash.h - remove MachineInstr.h include. NFC.
Use forward declarations and move the include to MachineStableHash.cpp
2020-09-07 13:33:48 +01:00
Sam Parker 928c4b4b49 [SCEV] Refactor isHighCostExpansionHelper
To enable the cost of constants, the helper function has been
reorganised:
- A struct has been introduced to hold SCEV operand information so
  that we know the user of the operand, as well as the operand index.
  The Worklist now uses instead instead of a bare SCEV.
- The costing of each SCEV, and collection of its operands, is now
  performed in a helper function.

Differential Revision: https://reviews.llvm.org/D86050
2020-09-07 11:57:46 +01:00
Sam Parker 65f78e73ad [SimplifyCFG] Consider cost of combining predicates.
Modify FoldBranchToCommonDest to consider the cost of inserting
instructions when attempting to combine predicates to fold blocks.
The threshold can be controlled via a new option:
-simplifycfg-branch-fold-threshold which defaults to '2' to allow
the insertion of a not and another logical operator.

Differential Revision: https://reviews.llvm.org/D86526
2020-09-07 10:04:50 +01:00
Jay Foad 713c2ad60c [GlobalISel] Extend not_cmp_fold to work on conditional expressions
Differential Revision: https://reviews.llvm.org/D86709
2020-09-07 09:31:08 +01:00
Xing GUO 40f4131fce [DWARFYAML] Make the debug_addr section optional.
This patch makes the debug_addr section optional. When an empty
debug_addr section is specified, yaml2obj only emits a section header
for it.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D87205
2020-09-07 16:17:18 +08:00
Raphael Isemann a98b126696 Add BinaryFormat/ELFRelocs/CSKY.def to LLVM modulemap 2020-09-07 10:14:22 +02:00
Jay Foad 5350e1b509 [KnownBits] Implement accurate unsigned and signed max and min
Use the new implementation in ValueTracking, SelectionDAG and
GlobalISel.

Differential Revision: https://reviews.llvm.org/D87034
2020-09-07 09:09:01 +01:00
Zi Xuan Wu 69f2c79f2a [ELF] Add a new e_machine value EM_CSKY and add some CSKY relocation types
This is the split part of D86269, which add a new ELF machine flag called EM_CSKY and related relocations.
Some target-specific flags and tests for csky can be added in follow-up patches later.

Differential Revision: https://reviews.llvm.org/D86610
2020-09-07 10:42:28 +08:00
Amy Kwan efa57f9a7a [PowerPC] Implement Vector Expand Mask builtins in LLVM/Clang
This patch implements the vec_expandm function prototypes in altivec.h in order
to utilize the vector expand with mask instructions introduced in Power10.

Differential Revision: https://reviews.llvm.org/D82727
2020-09-06 17:13:21 -05:00
Benjamin Kramer 8c386c9474 [SmallVector] Move error handling out of line
This reduces duplication and avoids emitting ice cold code into every
instance of grow().
2020-09-06 18:06:44 +02:00
Jonas Paulsson 714ceefad9 [SelectionDAG] Always intersect SDNode flags during getNode() node memoization.
Previously SDNodeFlags::instersectWith(Flags) would do nothing if Flags was
in an undefined state, which is very bad given that this is the default when
getNode() is called without passing an explicit SDNodeFlags argument.

This meant that if an already existing and reused node had a flag which the
second caller to getNode() did not set, that flag would remain uncleared.

This was exposed by https://bugs.llvm.org/show_bug.cgi?id=47092, where an NSW
flag was incorrectly set on an add instruction (which did in fact overflow in
one of the two original contexts), so when SystemZElimCompare removed the
compare with 0 trusting that flag, wrong-code resulted.

There is more that needs to be done in this area as discussed here:

Differential Revision: https://reviews.llvm.org/D86871

Review: Ulrich Weigand, Sanjay Patel
2020-09-05 10:30:38 +02:00
Lang Hames 3b64052a25 [ORC] Fix some bugs in TPCDynamicLibrarySearchGenerator, use in llvm-jitlink.
TPCDynamicLibrarySearchGenerator was generating errors on missing
symbols, but that doesn't fit the DefinitionGenerator contract: A symbol
that isn't generated by a particular generator should not cause an
error.

This commit fixes the error by using SymbolLookupFlags::WeaklyReferencedSymbol
for all elements of the lookup, and switches llvm-jitlink to use
TPCDynamicLibrarySearchGenerator.
2020-09-04 13:23:52 -07:00
Teresa Johnson 45c3560384 [HeapProf] Address post-review comments in instrumentation code
Addresses post-review comments from D85948, which can be found here:
https://reviews.llvm.org/rG7ed8124d46f9.
2020-09-04 08:59:00 -07:00
Simon Pilgrim 7582c5c023 CallingConvLower.h - remove unnecessary MachineFunction.h include. NFC.
Reduce to forward declaration, add the Register.h include that we still needed, move CCState::ensureMaxAlignment into CallingConvLower.cpp as it was the only function that needed the full definition of MachineFunction.

Fix a few implicit dependencies further down.
2020-09-04 12:16:48 +01:00
Simon Pilgrim 3a1308be05 MIRFormatter.h - remove MachineInstr.h include. NFC.
Use forward declarations and include the inner dependencies directly.
2020-09-04 11:17:24 +01:00
David Sherwood 73a3d350a4 [SVE][CodeGen] Fix up warnings in sve-split-insert/extract tests
I have fixed up some more ElementCount/TypeSize related warnings in
the following tests:

  CodeGen/AArch64/sve-split-extract-elt.ll
  CodeGen/AArch64/sve-split-insert-elt.ll

In SelectionDAG::CreateStackTemporary we were relying upon the implicit
cast from TypeSize -> uint64_t when calling MachineFrameInfo::CreateStackObject.
I've fixed this by passing in the known minimum size instead, which I
believe is fine because the associated stack id indicates whether this
is a scalable object or not.

I've also fixed up a case in TargetLowering::SimplifyDemandedBits when
extracting a vector element from a scalable vector. The result is a scalar,
hence it wasn't caught at the start of the function. If the vector is
scalable we just bail out for now.

Differential Revision: https://reviews.llvm.org/D86431
2020-09-04 09:51:31 +01:00
Florian Hahn e2fc6a31d3 [MemCpyOpt] Preserve MemorySSA.
This patch updates MemCpyOpt to preserve MemorySSA. It uses the
MemoryDef at the insertion point of the builder and inserts the new def
after that def.

In some cases, we just modify a memory instruction. In that case, get
the defining access, then remove the memory access and add a new one.
If the defining access is in a different block, insert a new def at the
beginning of the current block, otherwise after the defining access.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86651
2020-09-04 09:05:33 +01:00
Fangrui Song 2dd9a4d855 [SmallVector] Include stdexcept if LLVM_ENABLE_EXCEPTIONS
std::length_error needs stdexcept.
2020-09-03 18:06:08 -07:00
Michael Liao bf41c4d29e [codegen] Ensure target flags are cleared/set properly. NFC.
- When an operand is changed into an immediate value or like, ensure their
  target flags being cleared or set properly.

Differential Revision: https://reviews.llvm.org/D87109
2020-09-03 18:37:39 -04:00
Puyan Lotfi 7fff1fbd3c [MIRVRegNamer] Experimental MachineInstr stable hashing (Fowler-Noll-Vo)
This hashing scheme has been useful out of tree, and I want to start
experimenting with it. Specifically I want to experiment on the
MIRVRegNamer, MIRCanononicalizer, and eventually the MachineOutliner.

This diff is a first step, that optionally brings stable hashing to the
MIRVRegNamer (and as a result, the MIRCanonicalizer).  We've tested this
hashing scheme on a lot of MachineOperand types that llvm::hash_value
can not handle in a stable manner.

This stable hashing was also the basis for

"Global Machine Outliner for ThinLTO" in EuroLLVM 2020

http://llvm.org/devmtg/2020-04/talks.html#TechTalk_58

Credits: Kyungwoo Lee, Nikolai Tillmann

Differential Revision: https://reviews.llvm.org/D86952
2020-09-03 16:13:09 -04:00
Arthur Eubanks c9771391ce [NewPM][Lint] Port -lint to NewPM
This also changes -lint from an analysis to a pass. It's similar to
-verify, and that is a normal pass, and lives in llvm/IR.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87057
2020-09-03 13:03:44 -07:00
Wenlei He d1be928d23 SVML support for log2
Although LLVM supports vectorization of loops containing log2, it did not support using SVML implementation of it. Added support so that when clang is invoked with -fveclib=SVML now an appropriate SVML library log2 implementation will be invoked.

Follow up on: https://reviews.llvm.org/D77114

Tests:
Added unit tests to svml-calls.ll, svml-calls-finite.ll. Can be run with llvm-lint.
Created a simple c++ file that tests log2, and used clang+ to build it, and output final assembly.

Reviewed By: wenlei, craig.topper

Differential Revision: https://reviews.llvm.org/D86730
2020-09-03 11:52:29 -07:00
Jamie Schmeiser b2e65cf950 Revert "Add new hidden option -print-changed which only reports changes to IR"
This reverts commit 7bc9924cb2 due to
failure caused by missing a space between trailing >>, required by some
versions of C++:wq.
2020-09-03 18:41:20 +00:00
Simon Pilgrim 1673a08044 SelectionDAG.h - remove unnecessary FunctionLoweringInfo.h include. NFCI.
Use forward declarations and move the include down to dependent files that actually use it.

This also exposes a number of implicit dependencies on KnownBits.h
2020-09-03 18:33:25 +01:00
Dimitry Andric f26fc56840 Eliminate the sizing template parameter N from CoalescingBitVector
Since the parameter is not used anywhere, and the default size of 16
apparently causes PR47359, remove it. This ensures that IntervalMap will
automatically determine the optimal size, using its NodeSizer struct.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D87044
2020-09-03 18:15:41 +02:00
Jamie Schmeiser 7bc9924cb2 Add new hidden option -print-changed which only reports changes to IR
A new hidden option -print-changed is added along with code to support
printing the IR as it passes through the opt pipeline in the new pass
manager. Only those passes that change the IR are reported, with others
only having the banner reported, indicating that they did not change the
IR, were filtered out or ignored. Filtering of output via the
-filter-print-funcs is supported and a new supporting hidden option
-filter-passes is added. The latter takes a comma separated list of pass
names and filters the output to only show those passes in the list that
change the IR. The output can also be modified via the -print-module-scope
function.

The code introduces a template base class that generalizes the comparison
of IRs that takes an IR representation as template parameter. The
constructor takes a series of lambdas that provide an event based API
for generalized reporting of IRs as they are changed in the opt pipeline
through the new pass manager.

The first of several instantiations is provided that prints the IR
in a form similar to that produced by -print-after-all with the above
mentioned filtering capabilities. This version, and the others to
follow will be introduced at the upcoming developer's conference.
See https://hotcrp.llvm.org/usllvm2020/paper/29 for more information.

Reviewed By: yrouban (Yevgeny Rouban)

Differential Revision: https://reviews.llvm.org/D86360
2020-09-03 15:52:35 +00:00
Simon Pilgrim 898e42db93 GlobalISel/Utils.h - remove unused includes. NFCI.
Twine is unused, and TargetLowering can be reduced to a forward declaration and moved to Utils.cpp
2020-09-03 15:59:12 +01:00
Sanjay Patel bdd5bfd0e4 [IR][GVN] add/allow commutative intrinsics with >2 args
Follow-up to D86798 and rGe25449f.
2020-09-03 10:14:53 -04:00
Florian Hahn a344b382a0 [GVN] Preserve MemorySSA if it is available.
Preserve MemorySSA if it is available before running GVN.

DSE with MemorySSA will run closely after GVN. If GVN and 2 other
passes preserve MemorySSA, DSE can re-use MemorySSA used by LICM
when doing LTO.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86534
2020-09-03 12:28:13 +01:00
Martin Storsjö f5e2ea9a43 [AArch64] Add asm directives for the remaining SEH unwind codes
Add support in llvm-readobj for displaying them and support in the
asm parsser, AArch64TargetStreamer and MCWin64EH for emitting them.

The directives for the remaining basic opcodes have names that
match the opcode in the documentation.

The directives for custom stack cases, that are named
MSFT_OP_TRAP_FRAME, MSFT_OP_MACHINE_FRAME, MSFT_OP_CONTEXT
and MSFT_OP_CLEAR_UNWOUND_TO_CALL, are given matching assembler
directive names that fit into the rest of the opcode naming;
.seh_trap_frame, .seh_context, .seh_clear_unwound_to_call

The opcode MSFT_OP_MACHINE_FRAME is mapped to the existing
opecode enum UOP_PushMachFrame that is used on x86_64, and also
uses the corresponding existing x86_64 directive name
.seh_pushframe.

Differential Revision: https://reviews.llvm.org/D86889
2020-09-03 11:12:01 +03:00
Raphael Isemann 9124fa5920 Fix broken HUGE_VALF macro in llvm-c/DataTypes.h
Commit 3a29393b47 removes the cmath/math.h
includes from the DataTypes.h header to speed up parsing. However the
DataTypes.h header was using this header to get the macro `HUGE_VAL` for its own
`HUGE_VALF` macro definition. Now the macro instead just expands into a plain
`HUGE_VAL` token which leads to compiler errors unless `math.h` was previously
included by the including source file. It also leads to compiler warnings with
enabled module builds which point out this inconsistency.

The correct way to fix this seems to be to just remove HUGE_VALF from the
header. llvm-c is not referencing that macro from what I can see and users
probably should just include the math headers if they need it (or define it on
their own for really old C versions).

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D83761
2020-09-03 09:50:32 +02:00
Jonas Devlieghere 3746906193 [lldb] Add reproducer verifier
Add a reproducer verifier that catches:

 - Missing or invalid home directory
 - Missing or invalid working directory
 - Missing or invalid module/symbol paths
 - Missing files from the VFS

The verifier is enabled by default during replay, but can be skipped by
passing --reproducer-no-verify.

Differential revision: https://reviews.llvm.org/D86497
2020-09-02 22:00:00 -07:00
Arthur Eubanks e440b4933a Revert "[NewPM][Lint] Port -lint to NewPM"
This reverts commit 883399c840.
2020-09-02 21:34:29 -07:00
Arthur Eubanks 883399c840 [NewPM][Lint] Port -lint to NewPM
This also changes -lint from an analysis to a pass. It's similar to
-verify, and that is a normal pass, and lives in llvm/IR.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87057
2020-09-02 21:13:01 -07:00
Craig Topper b16e8687ab [CodeGenPrepare][X86] Teach optimizeGatherScatterInst to turn a splat pointer into GEP with scalar base and 0 index
This helps SelectionDAGBuilder recognize the splat can be used as a uniform base.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D86371
2020-09-02 20:44:12 -07:00
Geoffrey Martin-Noble 848b0e244c Improve error handling for SmallVector programming errors
This patch changes errors in `SmallVector::grow` that are independent of
memory capacity to be reported using report_fatal_error or
std::length_error instead of report_bad_alloc_error, which falsely signals
an OOM.

It also cleans up a few related things:
- makes report_bad_alloc_error to print the failure reason passed
  to it.
- fixes the documentation to indicate that report_bad_alloc_error
  calls `abort()` not "an assertion"
- uses a consistent name for the size/capacity argument to `grow`
  and `grow_pod`

Reviewed By: mehdi_amini, MaskRay

Differential Revision: https://reviews.llvm.org/D86892
2020-09-02 15:00:26 -07:00
Jay Foad 099c089d4b [APInt] New member function setBitVal
Differential Revision: https://reviews.llvm.org/D87033
2020-09-02 21:40:31 +01:00
Albion Fung 5d1fe3f903 [PowerPC] Implemented Vector Multiply Builtins
This patch implements the builtins for Vector Multiply Builtins (vmulxxd family of instructions), and adds the appropriate test cases for these builtins. The builtins utilize the vector multiply instructions itnroduced with ISA 3.1.

Differential Revision: 	https://reviews.llvm.org/D83955
2020-09-02 14:16:21 -05:00
Paul Walker f72121254d [SVE] Don't reorder subvector/binop sequences when the resulting binop is not legal.
When lowering fixed length vector operations for SVE the subvector
operations are used extensively to marshall data between scalable
and fixed-length vectors. This means that sequences like:

  extract_subvec(binop(insert_subvec(a), insert_subvec(b)))

are very common. DAGCombine only checks if the resulting binop is
legal or can be custom lowered when undoing such sequences. When
it's custom lowering that is introducing them the result is an
infinite legalise->combine->legalise loop.

This patch extends the isOperationLegalOr... functions to include
a "LegalOnly" parameter to restrict the check to legal operations
only. Although isOperationLegal could be used it's common for
the affected code paths to be visited pre and post legalisation,
so the extra parameter keeps the code tidy.

Differential Revision: https://reviews.llvm.org/D86450
2020-09-02 11:01:33 +01:00
Sander de Smalen f13beac51b [AArch64][SVE] Preserve full vector regs over EH edge.
Unwinders may only preserve the lower 64bits of Neon and SVE registers,
as only the registers in the base ABI are guaranteed to be preserved
over the exception edge. The caller will need to preserve additional
registers for when the call throws an exception and the unwinder has
tried to recover state.

For  e.g.

    svint32_t bar(svint32_t);
    svint32_t foo(svint32_t x, bool *err) {
      try { bar(x); } catch (...) { *err = true; }
      return x;
    }

`z0` needs to be spilled before the call to `bar(x)` and reloaded before
returning from foo, as the exception handler may have clobbered z0.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84737
2020-09-02 10:54:18 +01:00
Zi Xuan Wu b21ddded8f [RFC][Target] Add a new triple called Triple::csky
Before upstream a new target called CSKY, make a new triple of that called Triple::csky.
For now, it's a 32-bit little endian target and the detail can be referred at D86269.

This is the split part of D86269, which add a new target called CSKY.

Differential Revision: https://reviews.llvm.org/D86505
2020-09-02 12:46:09 +08:00
Alina Sbirlea 1ccfb52a61 [MemCpyOptimizer] Preserve analyses and replace use of lambdas to get them.
Summary:
Analyses are preserved in MemCpyOptimizer.
Get analyses before running the pass and store the pointers, instead of
using lambdas and getting them every time on demand.

Reviewers: lenary, deadalnix, mehdi_amini, nikic, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74494
2020-09-01 17:35:40 -07:00
Varun Gandhi 94948f3c92 [ADT] Make Optional a literal type.
This allows returning Optional values from constexpr contexts.

Reviewed By: fhahn, dblaikie, rjmccall

Differential Revision: https://reviews.llvm.org/D86354
2020-09-01 16:13:40 -07:00
Amy Kwan 0c2d872d5d [PowerPC] Implement builtins for xvcvspbf16 and xvcvbf16spn
This patch adds the builtin implementation for the xvcvspbf16 and xvcvbf16spn
instructions.

Differential Revision: https://reviews.llvm.org/D86795
2020-09-01 17:16:43 -05:00
Cameron McInally cfe2b81710 [SVE] Update INSERT_SUBVECTOR DAGCombine to use getVectorElementCount().
A small piece of the project to replace getVectorNumElements() with getVectorElementCount().

Differential Revision: https://reviews.llvm.org/D86894
2020-09-01 16:51:44 -05:00
Sergej Jaskiewicz fad75598d2 [llvm] [unittests] Remove temporary files after they're not needed
Some LLVM unit tests forget to clean up temporary files and
directories. Introduce RAII classes for cleaning them up.

Refactor the tests to use those classes.

Differential Revision: https://reviews.llvm.org/D83228
2020-09-02 00:34:44 +03:00
Amara Emerson 520ab710fb Revert "Revert "[GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _)" (and dependent patch "Optimize away a Not feeding a brcond by using tbz instead of tbnz.")"
This reverts commit 8693ddc743.

Re-committing with the test requiring asserts.
2020-09-01 14:29:04 -07:00
Jordan Rupprecht 8693ddc743 Revert "[GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _)" (and dependent patch "Optimize away a Not feeding a brcond by using tbz instead of tbnz.")
This reverts commit 8ad8f484b6. It causes crashes when running `ninja check-llvm-codegen-aarch64-globalisel`, e.g.
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24132/steps/test-stage1-compiler/logs/stdio.
Note that the crash does not seem to reproduce in debug builds.

5ded444252 depends on this, so revert that too.
2020-09-01 13:31:57 -07:00
Florian Hahn 0d966ae4b2 [Loads] Add canReplacePointersIfEqual helper.
This patch adds an initial, incomeplete and unsound implementation of
canReplacePointersIfEqual to check if a pointer value A can be replaced
by another pointer value B, that are deemed to be equivalent through
some means (e.g. information from conditions).

Note that is in general not sound to blindly replace pointers based on
equality, for example if they are based on different underlying objects.

LLVM's memory model is not completely settled as of now; see
https://bugs.llvm.org/show_bug.cgi?id=34548 for a more detailed
discussion.

The initial version of canReplacePointersIfEqual only rejects a very
specific case: replacing a pointer with a constant expression that is
not dereferenceable. Such a replacement is problematic and can be
restricted relatively easily without impacting most code. Using it to
limit replacements in GVN/SCCP/CVP only results in small differences in
7 programs out of MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto.

This patch is supposed to be an initial step to improve the current
situation and the helper should be made stricter in the future. But this
will require careful analysis of the impact on performance.

Reviewed By: aqjune

Differential Revision: https://reviews.llvm.org/D85524
2020-09-01 20:57:41 +01:00
Arthur Eubanks 96f0b57568 [Bindings] Add LLVMAddInstructionSimplifyPass
Reviewed By: sroland

Differential Revision: https://reviews.llvm.org/D86764
2020-09-01 12:38:49 -07:00
Amara Emerson 8ad8f484b6 [GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _)
This is needed for an upcoming change to how we translate conditional branches
which might generate these.

Differential Revision: https://reviews.llvm.org/D86383
2020-09-01 10:57:17 -07:00
Matt Arsenault 32a8a10b42 GlobalISel: Implement computeNumSignBits for G_SELECT 2020-09-01 12:50:19 -04:00
Matt Arsenault 759482ddaa GlobalISel: Implement computeKnownBits for G_BSWAP and G_BITREVERSE 2020-09-01 12:49:57 -04:00
Volkan Keles 061182b7ba GlobalISel: Add combines for extend operations
https://reviews.llvm.org/D86516
2020-09-01 08:50:06 -07:00
Matt Arsenault 92090e8bd8 GlobalISel: Implement computeKnownBits for G_UNMERGE_VALUES 2020-09-01 11:19:27 -04:00
Matt Arsenault 18bbd9f15e GlobalISel: Artifact combine unmerge of unmerge
Unmerges have the same fundamental problem as G_TRUNC, and G_TRUNC
could be implemented in terms of G_UNMERGE_VALUES. Reducing the number
of elements in unmerge results ends up producing the original unmerge
type profile, so the artifact combiner needs to eliminate the
intermediate illegal registers. This avoids infinite looping in the
legalizer in a future change.

Assuming an unmerge has each result unmerged the same way, this ends
up producing a new unmerge of the source for every definition. I'm not
sure if the artifact combiner should either insert temporary merges
here and erase the original merge, or if the combiner should look at
uses from defs rather than defs from uses for unmerges.

In a few cases this regresses from using 16-bit shifts for 8-bit
values to using 32-bit shifts, but I think these can be legalized
later (the other legalization rules don't try very hard to use 16-bit
shifts either).
2020-09-01 11:01:33 -04:00
Anh Tuyen Tran 68717acb24 [LoopIdiomRecognizePass] Options to disable part or the entire Loop Idiom Recognize Pass
Loop Idiom Recognize Pass (LIRP) attempts to transform loops with subscripted arrays
into memcpy/memset function calls. In some particular situation, this transformation
introduces negative impacts. For example: https://bugs.llvm.org/show_bug.cgi?id=47300

This patch will enable users to disable a particular part of the transformation, while
he/she can still enjoy the benefit brought about by the rest of LIRP. The default
behavior stays unchanged: no part of LIRP is disabled by default.

Reviewed By: etiotto (Ettore Tiotto)

Differential Revision: https://reviews.llvm.org/D86262
2020-09-01 13:59:24 +00:00
Raphael Isemann 5ffd940ac0 Reland [FileCheck] Move FileCheck implementation out of LLVMSupport into its own library
This relands e9a3d1a401 which was originally
missing linking LLVMSupport into LLMVFileCheck which broke the SHARED_LIBS build.

Original summary:

The actual FileCheck logic seems to be implemented in LLVMSupport. I don't see a
good reason for having FileCheck implemented there as it has a very specific use
while LLVMSupport is a dependency of pretty much every LLVM tool there is. In
fact, the only use of FileCheck I could find (outside the FileCheck tool and the
FileCheck unit test) is a single call in GISelMITest.h.

This moves the FileCheck logic to its own LLVMFileCheck library. This way only
FileCheck and the GlobalISelTests now have a dependency on this code.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D86344
2020-09-01 14:59:28 +02:00
Raphael Isemann 7c80f2da81 Revert "[lldb] Add reproducer verifier"
This reverts commit 297f69afac. It broke
the Fedora 33 x86-64 bot. See the review for more info.
2020-09-01 12:21:44 +02:00
David Sherwood 9fbb113247 [SVE][CodeGen] Fix TypeSize/ElementCount related warnings in sve-split-load.ll
I have fixed up a number of warnings resulting from TypeSize -> uint64_t
casts and calling getVectorNumElements() on scalable vector types. I
think most of the changes are fairly trivial except for those in
DAGTypeLegalizer::SplitVecRes_MLOAD I've tried to ensure we create
the MachineMemoryOperands in a sensible way for scalable vectors.

I have added a CHECK line to the following test:

  CodeGen/AArch64/sve-split-load.ll

that ensures no new warnings are added.

Differential Revision: https://reviews.llvm.org/D86697
2020-09-01 07:47:59 +01:00
Petr Hosek 3c7bfbd683 [CMake] Use find_library for ncurses
Currently it is hard to avoid having LLVM link to the system install of
ncurses, since it uses check_library_exists to find e.g. libtinfo and
not find_library or find_package.

With this change the ncurses lib is found with find_library, which also
considers CMAKE_PREFIX_PATH. This solves an issue for the spack package
manager, where we want to use the zlib installed by spack, and spack
provides the CMAKE_PREFIX_PATH for it.

This is a similar change as https://reviews.llvm.org/D79219, which just
landed in master.

Patch By: haampie

Differential Revision: https://reviews.llvm.org/D85820
2020-08-31 20:06:21 -07:00
Xing GUO 428b2ffad4 [DWARFYAML] Make the debug_str section optional.
This patch makes the debug_str section optional. When the debug_str
section exists but doesn't contain anything, yaml2obj will emit a
section header for it.

Reviewed By: grimar

Differential Revision: https://reviews.llvm.org/D86860
2020-09-01 10:02:09 +08:00
Jonas Devlieghere 297f69afac [lldb] Add reproducer verifier
Add a reproducer verifier that catches:

 - Missing or invalid home directory
 - Missing or invalid working directory
 - Missing or invalid module/symbol paths
 - Missing files from the VFS

The verifier is enabled by default during replay, but can be skipped by
passing --reproducer-no-verify.

Differential revision: https://reviews.llvm.org/D86497
2020-08-31 15:14:18 -07:00
Christopher Tetreault 867de151a5 [SVE] Mark VectorType::getNumElements() deprecated
getNumElements() is being removed from base VectorType in
order to eliminate the class of bugs in which a scalable vector
is accidentally treated like a fixed length vector. Clients of
this function should either call getElementCount(), and handle
the case where getElementCount().isScalable() is true, or they can
cast to FixedVectorType and call getNumElements() if they are
sure that the vector has fixed width.

Deprecated VectorType functions will be removed after the LLVM
12 branch.

See: http://lists.llvm.org/pipermail/llvm-dev/2020-March/139811.html

Reviewed By: fpetrogalli

Differential Revision: https://reviews.llvm.org/D78127
2020-08-31 15:13:04 -07:00
Sanjay Patel e25449ff57 [IR][GVN] allow intrinsics in Instruction's isCommutative query (2nd try)
The 1st try was reverted because I missed an assert that
needed softening.

As discussed in D86798 / rG09652721 , we were potentially
returning a different result for whether an Instruction
is commutable depending on if we call the base class or
derived class method.

This requires relaxing asserts in GVN, but that pass
seems to be working otherwise.

NewGVN requires more work because it uses different
code paths for numbering binops and calls.
2020-08-31 16:01:19 -04:00
Christopher Tetreault 640f20b0c7 [SVE] Remove calls to VectorType::getNumElements from InstCombine
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D82237
2020-08-31 12:59:10 -07:00
Raphael Isemann ed89eb3571 Revert "[FileCheck] Move FileCheck implementation out of LLVMSupport into its own library"
This reverts commit e9a3d1a401. Seems the new
FileCheck library doesn't link on some bots. Reverting for now.
2020-08-31 11:38:40 +02:00
Raphael Isemann e9a3d1a401 [FileCheck] Move FileCheck implementation out of LLVMSupport into its own library
The actual FileCheck logic seems to be implemented in LLVMSupport. I don't see a
good reason for having FileCheck implemented there as it has a very specific use
while LLVMSupport is a dependency of pretty much every LLVM tool there is. In
fact, the only use of FileCheck I could find (outside the FileCheck tool and the
FileCheck unit test) is a single call in GISelMITest.h.

This moves the FileCheck logic to its own LLVMFileCheck library. This way only
FileCheck and the GlobalISelTests now have a dependency on this code.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D86344
2020-08-31 11:24:41 +02:00
Sanjay Patel badd7264e1 Revert "[IR][GVN] allow intrinsics in Instruction's isCommutative query"
This reverts commit 25597f7783.
It is causing crashing on bots such as:
http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10523/steps/ninja-build/logs/stdio
2020-08-30 17:02:51 -04:00
Sanjay Patel 25597f7783 [IR][GVN] allow intrinsics in Instruction's isCommutative query
As discussed in D86798 / rG09652721 , we were potentially
returning a different result for whether an Instruction
is commutable depending on if we call the base class or
derived class method.

This requires relaxing an assert in GVN, but that pass
seems to be working otherwise.

NewGVN requires more work because it uses different
code paths for numbering binops and calls.
2020-08-30 16:49:22 -04:00
Sanjay Patel 2d3e12818e [FastISel] update to use intrinsic's isCommutative(); NFC
This requires adding a missing 'const' to the definition because
the callers are using const args, but there should be no change
in behavior.

The intrinsic method was added with D86798 / rG096527214033
2020-08-30 11:36:41 -04:00
David Green 543c5425f1 [LV] Add some const to RecurrenceDescriptor. NFC 2020-08-30 12:27:51 +01:00
sstefan1 8d8ce85b23 [Attributor] Introduce module slice.
Summary:
The module slice describes which functions we can analyze and transform
while working on an SCC as part of the Attributor-CGSCC pass. So far we
simply restricted it to the SCC.

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D86319
2020-08-30 10:30:44 +02:00
Lang Hames e1d5f7d003 [ORC] Add getDFSLinkOrder / getReverseDFSLinkOrder methods to JITDylib.
DFS and Reverse-DFS linkage orders are used to order execution of
deinitializers and initializers respectively.

This patch replaces uses of special purpose DFS order functions in
MachOPlatform and LLJIT with uses of the new methods.
2020-08-29 15:17:06 -07:00
Benjamin Kramer 8e5b1557e5 [IR] Inline AttrBuilder::addAttribute. It just sets 1 bit. NFC. 2020-08-29 19:13:49 +02:00
Sanjay Patel 0965272140 [EarlyCSE] fold commutable intrinsics
Handling the new min/max intrinsics is the motivation, but it
turns out that we have a bunch of other intrinsics with this
missing bit of analysis too.

The FP min/max tests show that we are intersecting FMF,
so that part should be safe too.

As noted in https://llvm.org/PR46897 , there is a commutative
property specifier for intrinsics, but no corresponding function
attribute, and so apparently no uses of that bit. We may want to
remove that next.

Follow-up patches should wire up the Instruction::isCommutative()
to this IntrinsicInst specialization. That requires updating
callers to be aware of the more general commutative property
(not just binops).

Differential Revision: https://reviews.llvm.org/D86798
2020-08-29 12:11:01 -04:00
Martin Storsjö 20f7773bb4 [MC] [Win64EH] Fill in FuncletOrFuncEnd if missing
This can happen e.g. for code that declare .seh_proc/.seh_endproc
in assembly, or for code that use .seh_handlerdata (which triggers
the unwind info to be emitted before the end of the function).

The TextSection field must be made non-const to be able to use it
with Streamer.SwitchSection().

Differential Revision: https://reviews.llvm.org/D86528
2020-08-29 15:15:22 +03:00
Roman Lebedev c1b3e32118
[NFC][InstructionSimplify] Add a warning about not simplifying to not def-reachable
See
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200824/824235.html
and
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200824/824967.html

InstSimply is not allowed to perform simplifications to instructions
that are not def-reachable from the original instruction.
2020-08-29 09:58:08 +03:00
Roman Lebedev 08669fbb43
[NFC][STLExtras] Add make_first_range(), similar to existing make_second_range()
Having just one of the two seens weird.
I wanted to use it a few times, but it wasn't there.
2020-08-29 09:58:07 +03:00
Xing GUO 12e832cbcb [DWARFYAML] Make the debug_abbrev_offset field optional.
This patch helps make the debug_abbrev_offset field optional. We don't
need to calculate the value of this field in the future.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D86614
2020-08-29 14:54:52 +08:00
Fangrui Song b5ef137c11 [gcov] Increment counters with atomicrmw if -fsanitize=thread
Without this patch, `clang --coverage -fsanitize=thread` may fail spuriously
because non-atomic counter increments can be detected as data races.
2020-08-28 16:32:35 -07:00
Matt Arsenault 1b201914b5 GlobalISel: Combine out redundant sext_inreg
The scalar tests don't work yet, since computeNumSignBits apparently
doesn't handle sextload yet, and sext folds into the load first.
2020-08-28 17:57:31 -04:00
Craig Topper aab90384a3 [Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None)
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.

So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.

This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.

Differential Revision: https://reviews.llvm.org/D86744
2020-08-28 13:23:45 -07:00
Arthur Eubanks cfde93e5d6 [ObjCARCOpt] Port objc-arc to NPM
Since doInitialization() in the legacy pass modifies the module, the NPM
pass is a Module pass.

Reviewed By: ahatanak, ychen

Differential Revision: https://reviews.llvm.org/D86178
2020-08-28 12:59:33 -07:00
Tyker 6d3657417e [SROA] Improve handleling of assumes bundles by SROA
This patch fixes this crash https://gcc.godbolt.org/z/Ps8d1e
And gives SROA the ability to remove assumes if it allows promoting an alloca to register
Without removing assumes when it can't promote to register.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86570
2020-08-28 21:55:45 +02:00
Snehasish Kumar 94faadaca4 [llvm][CodeGen] Machine Function Splitter
We introduce a codegen optimization pass which splits functions into hot and cold
parts. This pass leverages the basic block sections feature recently
introduced in LLVM from the Propeller project. The pass targets
functions with profile coverage, identifies cold blocks and moves them
to a separate section. The linker groups all cold blocks across
functions together, decreasing fragmentation and improving icache and
itlb utilization.

We evaluated the Machine Function Splitter pass on clang bootstrap and
SPECInt 2017.

For clang bootstrap we observe a mean 2.33% runtime improvement with a
~32% reduction in itlb and stlb misses. Additionally, L1 icache misses
reduced by 9.5% while L2 instruction misses reduced by 20%.

For SPECInt we report the change in IntRate the C/C++
benchmarks. All benchmarks apart from mcf and x264 improve, on average
by 0.6% with the max for deepsjeng at 1.6%.

Benchmark		% Change
500.perlbench_r		 0.78
502.gcc_r		 0.82
505.mcf_r		-0.30
520.omnetpp_r		 0.18
523.xalancbmk_r		 0.37
525.x264_r		-0.46
531.deepsjeng_r		 1.61
541.leela_r		 0.83
557.xz_r		 0.15

Differential Revision: https://reviews.llvm.org/D85368
2020-08-28 11:10:14 -07:00
David Sherwood f4257c5832 [SVE] Make ElementCount members private
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().

Differential Revision: https://reviews.llvm.org/D86065
2020-08-28 14:43:53 +01:00
Sam Parker b30adfb529 [ARM][LowOverheadLoops] Liveouts and reductions
Remove the code that tried to look for reduction patterns, since the
vectorizer and isel can now produce predicated arithmetic instructios
within the loop body. This has required some reorganisation and fixes
around live-out and predication checks, as well as looking for cases
where an input/output is initialised to zero.

Differential Revision: https://reviews.llvm.org/D86613
2020-08-28 13:56:16 +01:00
Martin Storsjö 37ef743cbf [MC] [Win64EH] Avoid producing malformed xdata records
If there's no unwinding opcodes, omit writing the xdata/pdata records.

Previously, this generated truncated xdata records, and llvm-readobj
would error out when trying to print them.

If writing of an xdata record is forced via the .seh_handlerdata
directive, skip it if there's no info to make a sensible unwind
info structure out of, and clearly error out if such info appeared
later in the process.

Differential Revision: https://reviews.llvm.org/D86527
2020-08-28 09:05:36 +03:00
serge-sans-paille b1f4e5979b (Expensive) Check for Loop, SCC and Region pass return status
This generalizes the logic introduced in https://reviews.llvm.org/D80916 to
other passes.

It's needed by https://reviews.llvm.org/D86442 to assert passes correctly report
their status.

Differential Revision: https://reviews.llvm.org/D86589
2020-08-28 07:56:35 +02:00
Valentin Clement 4df2a5f782 [flang][openacc] Add check for tile clause restriction
The tile clause in OpenACC 3.0 imposes some restriction. Element in the tile size list are either * or a
constant positive integer expression. If there are n tile sizes in the list, the loop construct must be immediately
followed by n tightly-nested loops.
This patch implement these restrictions and add some tests.

Reviewed By: klausler

Differential Revision: https://reviews.llvm.org/D86655
2020-08-27 22:13:46 -04:00
Harmen Stoppels cdcb9ab10e Revert "Use find_library for ncurses"
The introduction of find_library for ncurses caused more issues than it solved problems. The current open issue is it makes the static build of LLVM fail. It is better to revert for now, and get back to it later.

Revert "[CMake] Fix an issue where get_system_libname creates an empty regex capture on windows"
This reverts commit 1ed1e16ab8.

Revert "Fix msan build"
This reverts commit 34fe9613dd.

Revert "[CMake] Always mark terminfo as unavailable on Windows"
This reverts commit 76bf26236f.

Revert "[CMake] Fix OCaml build failure because of absolute path in system libs"
This reverts commit 8e4acb82f7.

Revert "[CMake] Don't look for terminfo libs when LLVM_ENABLE_TERMINFO=OFF"
This reverts commit 495f91fd33.

Revert "Use find_library for ncurses"
This reverts commit a52173a3e5.

Differential revision: https://reviews.llvm.org/D86521
2020-08-27 17:57:26 -07:00
Matt Arsenault abc99ab572 GlobalISel: Implement known bits for min/max 2020-08-27 16:56:17 -04:00
Matt Arsenault ee679638d7 MIR: Infer not-SSA for subregister defs
It's possible to have a single virtual register def with a subreg
index that would pass the previous check, but it's not possible to
have a subregister def in SSA.

This is in preparation for adding stricter checks for SSA MIR.
2020-08-27 16:56:16 -04:00
Vitaly Buka a6927c8621 [NFC][ValueTracking] Add OffsetZero into findAllocaForValue
For StackLifetime after finding alloca we need to check that
values ponting to the begining of alloca.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D86692
2020-08-27 13:46:22 -07:00
Matt Arsenault 201f770f16 GlobalISel: Add and_trivial_mask to all_combines
Also make up a new category of combines.
2020-08-27 16:42:09 -04:00
Eli Friedman 8d21985a75 [RegisterScavenging] Delete dead function unprocess(). 2020-08-27 13:19:32 -07:00
Shinji Okumura 58d257b290 [Attributor] Do not add AA to dependency graph after the update stage
If an AA is registered to the dependency graph in the manifest stage, Attributor aborts in `::manifestAttributes()`.
This patch prevents such termination.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86734
2020-08-28 05:16:18 +09:00
Shinji Okumura c5e6872ec6 [Attributor] Guarantee getAAFor not to update AA in the manifestation stage
If we query an AA with `Attributor::getAAFor` in `AbstractAttribute::manifest`, the AA may be updated.
This patch makes use of the phase flag in Attributor, and handle `getAAFor` behavior according to the flag.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86635
2020-08-28 04:07:42 +09:00
Christopher Tetreault 5a55e2781c [SVE] Remove calls to VectorType::getNumElements from IR
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D81500
2020-08-27 11:16:10 -07:00
Matt Arsenault 531f7063ba GlobalISel: Implement known bits for G_MERGE_VALUES 2020-08-27 14:07:18 -04:00
Mikhail Maltsev ae1396c7d4 [ARM][BFloat16] Change types of some Arm and AArch64 bf16 intrinsics
This patch adjusts the following ARM/AArch64 LLVM IR intrinsics:
- neon_bfmmla
- neon_bfmlalb
- neon_bfmlalt
so that they take and return bf16 and float types. Previously these
intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from
implementation lacking bf16 IR type).

The neon_vbfdot[q] intrinsics are adjusted similarly. This change
required some additional selection patterns for vbfdot itself and
also for vector shuffles (in a previous patch) because of SelectionDAG
transformations kicking in and mangling the original code.

This patch makes the generated IR cleaner (less useless bitcasts are
produced), but it does not affect the final assembly.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D86146
2020-08-27 18:43:16 +01:00
Aditya Nandakumar db464a3dbf [GISel] Add new GISel combiners for G_SELECT
https://reviews.llvm.org/D83833

Patch adds two new GICombinerRules for G_SELECT. The rules include:
combining selects with undef comparisons into their first selectee value,
and to combine away selects with constant comparisons. Patch additionally
adds a new combiner test for the AArch64 target to test these new G_SELECT
combiner rules and the existing select_same_val combiner rule.

Patch by  mkitzan
2020-08-27 09:40:15 -07:00
Shinji Okumura 7a68f0f1e0 [Attributor] Add a phase flag to Attributor
Add a new flag that indicates which stage in the process we are in.
This flag is introduced for handling behavior of `getAAFor` according to the stage. (discussed in D86635)

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86678
2020-08-28 01:16:38 +09:00
Aditya Nandakumar 5c2db1655b [GISel]: Fix one more CSE Non determinism
https://reviews.llvm.org/D86676

Sometimes we can have the following code

 x:gpr(s32) = G_OP

Say we build G_OP2 to the same x and then delete the previous instruction. Using something like

 Register X = ...;
 auto NewMIB = CSEBuilder.buildOp2(X, ... args);

Currently there's a mismatch in how NewMIB is profiled and inserted into the CSEMap (ie it doesn't consider register bank/register class along with type).Unify the profiling by refactoring and calling the common method.

This was found by turning on the CSEInfo::verify in at the end of each of our GISel passes which turns inconsistent state/non determinism in CSEing into crashes which likely usually indicates missing calls to Observer on mutations (the most common case). Here non determinism usually means not cseing sometimes, but almost never about producing incorrect code.
Also this patch adds this verification at the end of the combiners as well.
2020-08-27 09:06:21 -07:00
Teresa Johnson 7ed8124d46 [HeapProf] Clang and LLVM support for heap profiling instrumentation
See RFC for background:
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html

Note that the runtime changes will be sent separately (hopefully this
week, need to add some tests).

This patch includes the LLVM pass to instrument memory accesses with
either inline sequences to increment the access count in the shadow
location, or alternatively to call into the runtime. It also changes
calls to memset/memcpy/memmove to the equivalent runtime version.
The pass is modeled on the address sanitizer pass.

The clang changes add the driver option to invoke the new pass, and to
link with the upcoming heap profiling runtime libraries.

Currently there is no attempt to optimize the instrumentation, e.g. to
aggregate updates to the same memory allocation. That will be
implemented as follow on work.

Differential Revision: https://reviews.llvm.org/D85948
2020-08-27 08:50:35 -07:00
diggerlin 6923b0a76e Revert "[AIX][XCOFF] emit symbol visibility for xcoff object file."
This reverts commit a081868921.

Based on the Hubert Tong'comment  https://reviews.llvm.org/D84265#inline-799085
2020-08-27 11:07:58 -04:00
OCHyams 0b5a8050ea [DwarfDebug] Improve single location detection in validThroughout (2/4)
With this patch we're now accounting for two more cases which should be
considered 'valid throughout': First, where RangeEnd is ScopeEnd. Second, where
RangeEnd comes before ScopeEnd when including meta instructions, but are both
preceded by the same non-meta instruction.

CTMark shows a geomean binary size reduction of 1.5% for RelWithDebInfo builds.
`llvm-locstats` (using D85636) shows a very small variable location coverage
change in 2 of 10 binaries, but it is in the order of 10s of bytes which lines
up with my expectations.

I've added a test which checks both of these new cases. The first check in the
test isn't strictly necessary for this patch. But I'm not sure that it is
explicitly tested anywhere else, and is useful for the final patch in the
series.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D86151
2020-08-27 11:52:29 +01:00
Shinji Okumura 6c25eca614 [Attributor] Add flag for undef value to the state of AAPotentialValues
Currently, an undef value is reduced to 0 when it is added to a set of potential values.
This patch introduces a flag for under values. By this, for example, we can merge two states `{undef}`, `{1}` to `{1}` (because we can reduce the undef to 1).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85592
2020-08-27 16:30:29 +09:00
Jianzhou Zhao df182eb2d5 Fix an overflow issue at BackpatchWord
This happens when generating a huge file by LTO, for example, with -gmlt.
When BitNo is > 2^35, ByteNo is overflowed, and an incorrect output offset is overwritten.
This generates ill-formed bitcodes.

Reviewed-by: tejohnson, vitalybuka

Differential Revision: https://reviews.llvm.org/D86645
2020-08-27 04:46:19 +00:00
Amy Kwan 76b0f99ea8 [PowerPC] Implement Vector Multiply High/Divide Extended Builtins in LLVM/Clang
This patch implements the function prototypes vec_mulh and vec_dive in order to
utilize the vector multiply high (vmulh[s|u][w|d]) and vector divide extended
(vdive[s|u][w|d]) instructions introduced in Power10.

Differential Revision: https://reviews.llvm.org/D82609
2020-08-26 23:14:34 -05:00
Matt Arsenault 0b7f6cc71a GlobalISel: Add generic instructions for memory intrinsics
AArch64, X86 and Mips currently directly consumes these and custom
lowering to produce a libcall, but really these should follow the
normal legalization process through the libcall/lower action.
2020-08-26 20:08:45 -04:00
Lang Hames 605df8112c [ORC][JITLink] Switch to unique ownership for EHFrameRegistrars.
This will make stateful registrars (e.g. a future TargetProcessControl based
registrar) easier to deal with.
2020-08-26 16:59:45 -07:00
Arthur Eubanks 486ed88533 [ConstProp] Remove ConstantPropagation
As discussed in
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143801.html.

Currently no users outside of unit tests.

Replace all instances in tests of -constprop with -instsimplify.
Notable changes in tests:
* vscale.ll - @llvm.sadd.sat.nxv16i8 is evaluated by instsimplify, use a fake intrinsic instead
* InsertElement.ll - insertelement undef is removed by instsimplify in @insertelement_undef
llvm/test/Transforms/ConstProp moved to llvm/test/Transforms/InstSimplify/ConstProp

Reviewed By: lattner, nikic

Differential Revision: https://reviews.llvm.org/D85159
2020-08-26 15:51:30 -07:00
Juneyoung Lee 0c55889d80 [IR] Remove noundef from masked store/load/gather/scatter's pointer operands
As discussed in D86576, noundef attribute is removed from masked store/load/gather/scatter's
pointer operands.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D86656
2020-08-27 06:41:43 +09:00
Wei Mi c67ccf5faf [SampleFDO] Enhance profile remapping support for searching inline instance
and indirect call promotion candidate.

Profile remapping is a feature to match a function in the module with its
profile in sample profile if the function name and the name in profile look
different but are equivalent using given remapping rules. This is a useful
feature to keep the performance stable by specifying some remapping rules
when sampleFDO targets are going through some large scale function signature
change.

However, currently profile remapping support is only valid for outline
function profile in SampleFDO. It cannot match a callee with an inline
instance profile if they have different but equivalent names. We found
that without the support for inline instance profile, remapping is less
effective for some large scale change.

To add that support, before any remapping lookup happens, all the names
in the profile will be inserted into remapper and the Key to the name
mapping will be recorded in a map called NameMap in the remapper. During
name lookup, a Key will be returned for the given name and it will be used
to extract an equivalent name in the profile from NameMap. So with the help
of the NameMap, we can translate any given name to an equivalent name in
the profile if it exists. Whenever we try to match a name in the module to
a name in the profile, we will try the match with the original name first,
and if it doesn't match, we will use the equivalent name got from remapper
to try the match for another time. In this way, the patch can enhance the
profile remapping support for searching inline instance and searching
indirect call promotion candidate.

In a planned large scale change of int64 type (long long) to int64_t (long),
we found the performance of a google internal benchmark degraded by 2% if
nothing was done. If existing profile remapping was enabled, the performance
degradation dropped to 1.2%. If the profile remapping with the current patch
was enabled, the performance degradation further dropped to 0.14% (Note the
experiment was done before searching indirect call promotion candidate was
added. We hope with the remapping support of searching indirect call promotion
candidate, the degradation can drop to 0% in the end. It will be evaluated
post commit).

Differential Revision: https://reviews.llvm.org/D86332
2020-08-26 11:07:35 -07:00
Juneyoung Lee 684b43c0cf [IR] Add NoUndef attribute to Intrinsics.td
This patch adds NoUndef to Intrinsics.td.
The attribute is attached to llvm.assume's operand, because llvm.assume(undef)
is UB.
It is attached to pointer operands of several memory accessing intrinsics
as well.

This change makes ValueTracking::getGuaranteedNonPoisonOps' intrinsic check
unnecessary, so it is removed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86576
2020-08-27 02:54:48 +09:00
Roman Lebedev 95848ea101
[Value][InstCombine] Fix one-use checks in PHI-of-op -> Op-of-PHI[s] transforms to be one-user checks
As FIXME said, they really should be checking for a single user,
not use, so let's do that. It is not *that* unusual to have
the same value as incoming value in a PHI node, not unlike
how a PHI may have the same incoming basic block more than once.

There isn't a nice way to do that, Value::users() isn't uniqified,
and Value only tracks it's uses, not Users, so the check is
potentially costly since it does indeed potentially involes
traversing the entire use list of a value.
2020-08-26 20:20:41 +03:00
Roman Lebedev c07a430bd3
[NFC][Value] Fixup comments, "N users" is NOT the same as "N uses".
In those cases, it really means "N uses".
2020-08-26 20:20:41 +03:00
Kai Nacke ed07e1fe0f [SystemZ/ZOS] Add header file to encapsulate use of <sysexits.h>
The non-standard header file `<sysexits.h>` provides some return values.
`EX_IOERR` is used to as a special value to signal a broken pipe to the clang driver.
On z/OS Unix System Services, this header file does not exists. This patch

- adds a check for `<sysexits.h>`, removing the dependency on `LLVM_ON_UNIX`
- adds a new header file `llvm/Support/ExitCodes`, which either includes
  `<sysexits.h>` or defines `EX_IOERR`
- updates the users of `EX_IOERR` to include the new header file

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D83472
2020-08-26 12:44:30 -04:00
Sjoerd Meijer bda8fbe2d2 [LV] Fallback strategies if tail-folding fails
This implements 2 different vectorisation fallback strategies if tail-folding
fails: 1) don't vectorise at all, or 2) vectorise using a scalar epilogue. This
can be controlled with option -prefer-predicate-over-epilogue, that has been
changed to take a numeric value corresponding to the tail-folding preference
and preferred fallback.

Patch by: Pierre van Houtryve, Sjoerd Meijer.

Differential Revision: https://reviews.llvm.org/D79783
2020-08-26 16:55:25 +01:00
Dibya Ranjan Mishra a7da7e421c [Support] Allow printing the stack trace only for a given depth
Differential Revision: https://reviews.llvm.org/D85458
2020-08-26 09:27:42 -04:00
Matt Arsenault eb074088c9 GlobalISel: Combine G_ADD of G_PTRTOINT to G_PTR_ADD
This produces less work for addressing mode matching. I think this is
safe since I don't think machine IR is supposed to give the same
aliasing properties as getelementptr in the IR.
2020-08-26 08:57:15 -04:00
Xing GUO 8daa3264a3 [DWARFYAML] Make the unit_length and header_length fields optional.
This patch makes the unit_length and header_length fields of line tables
optional. yaml2obj is able to infer them for us.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D86590
2020-08-26 20:35:10 +08:00
QingShan Zhang ebf3b188c6 [Scheduling] Implement a new way to cluster loads/stores
Before calling target hook to determine if two loads/stores are clusterable,
we put them into different groups to avoid fake cluster due to dependency.
For now, we are putting the loads/stores into the same group if they have
the same predecessor. We assume that, if two loads/stores have the same
predecessor, it is likely that, they didn't have dependency for each other.

However, one SUnit might have several predecessors and for now, we just
pick up the first predecessor that has non-data/non-artificial dependency,
which is too arbitrary. And we are struggling to fix it.

So, I am proposing some better implementation.
1. Collect all the loads/stores that has memory info first to reduce the complexity.
2. Sort these loads/stores so that we can stop the seeking as early as possible.
3. For each load/store, seeking for the first non-dependency instruction with the
   sorted order, and check if they can cluster or not.

Reviewed By: Jay Foad

Differential Revision: https://reviews.llvm.org/D85517
2020-08-26 12:33:59 +00:00
Georgii Rymar 92c527e5a2 [llvm/Object] - Make dyn_cast<XCOFFObjectFile> work as it should.
Currently, `dyn_cast<XCOFFObjectFile>` always does cast and returns a pointer,
even when we pass `ELF`/`Wasm`/`Mach-O` or `COFF` instead of `XCOFF`.

It happens because `XCOFFObjectFile` class does not implement `classof`.
I've fixed it and added a unit test.

Differential revision: https://reviews.llvm.org/D86542
2020-08-26 15:09:55 +03:00
Gabriel Hjort Åkerlund 5e23dc5b47 [GlobalISel] Fix and tidy up documentation for ValueMapping class (NFC)
The documentation was missing a '*/' in '/*<2x32-bit> vadd {0, 64, VPR}',
and the example code are now aligned to improve readability.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D86201
2020-08-26 12:09:01 +02:00
sstefan1 99d18f7964 Reland [IR] Intrinsics default attributes and opt-out flag
Intrinsic properties can now be set to default and applied to all
intrinsics. If the attributes are not needed, the user can opt-out by
setting the DisableDefaultAttributes flag to true.

Differential Revision: https://reviews.llvm.org/D70365
2020-08-26 11:37:59 +02:00
Jan Kratochvil b20a4e293c [Support] Speedup llvm-dwarfdump 3.9x
Currently `strace llvm-dwarfdump x.debug >/tmp/file`:

  ioctl(1, TCGETS, 0x7ffd64d7f340)        = -1 ENOTTY (Inappropriate ioctl for device)
  write(1, "           DW_AT_decl_line\t(89)\n"..., 4096) = 4096
  ioctl(1, TCGETS, 0x7ffd64d7f400)        = -1 ENOTTY (Inappropriate ioctl for device)
  ioctl(1, TCGETS, 0x7ffd64d7f410)        = -1 ENOTTY (Inappropriate ioctl for device)
  ioctl(1, TCGETS, 0x7ffd64d7f400)        = -1 ENOTTY (Inappropriate ioctl for device)

After this patch:

  write(1, "0000000000001102 \"strlen\")\n     "..., 4096) = 4096
  write(1, "site\n                  DW_AT_low"..., 4096) = 4096
  write(1, "d53)\n\n0x000e4d4d:       DW_TAG_G"..., 4096) = 4096

The same speedup can be achieved by `--color=0` but that is not much convenient.

This implementation has been suggested by Joerg Sonnenberger.

Differential Revision: https://reviews.llvm.org/D86406
2020-08-26 10:29:46 +02:00
Shinji Okumura 3050713798 [Attributor] Provide an edge-based interface in AAIsDead
This patch produces an edge-based interface in AAIsDead.
By this, we can query a set of basic blocks that are directly reachable from a given basic block.
This is specifically useful for implementation of AAReachability.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85547
2020-08-26 16:57:52 +09:00
Adrien Guinet c6f7ac0071 [llvm-lipo] Add support for bitcode files
A Mach-O universal binary may contain bitcode as a slice.
This diff adds proper handling of such binaries to llvm-lipo.

Test plan: make check-all

Differential revision: https://reviews.llvm.org/D85740
2020-08-25 21:11:18 -07:00
Mircea Trofin 7cfcecece0 [MLInliner] Simplify TFUTILS_SUPPORTED_TYPES
We only need the C++ type and the corresponding TF Enum. The other
parameter was used for the output spec json file, but we can just
standardize on the C++ type name there.

Differential Revision: https://reviews.llvm.org/D86549
2020-08-25 14:19:39 -07:00
Krzysztof Parzyszek 514d6e9a8d [SDAG] Improve MemSDNode::getBasePtr
It returned getOperand(1), except for STORE for which it returned
getOperand(2). Handle MSTORE, MGATHER, and MSCATTER as well.
2020-08-25 15:19:52 -05:00
Juneyoung Lee f753f5b050 [ValueTracking] Let getGuaranteedNonPoisonOp find multiple non-poison operands
This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands.

Instead of special-casing llvm.assume, I think it is also a viable option to
add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86477
2020-08-26 04:40:21 +09:00
Lang Hames 594107d488 [ORC] Fix an endif comment. 2020-08-25 11:51:20 -07:00
Jeremy Morse 121a49d839 [LiveDebugValues] Add switches for using instr-ref variable locations
This patch adds the -Xclang option
"-fexperimental-debug-variable-locations" and same LLVM CodeGen option,
to pick which variable location tracking solution to use.

Right now all the switch does is pick which LiveDebugValues
implementation to use, the normal VarLoc one or the instruction
referencing one in rGae6f78824031. Over time, the aim is to add fragments
of support in aid of the value-tracking RFC:

  http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html

also controlled by this command line switch. That will slowly move
variable locations to be defined by an instruction calculating a value,
and a DBG_INSTR_REF instruction referring to that value. Thus, this is
going to grow into a "use the new kind of variable locations" switch,
rather than just "use the new LiveDebugValues implementation".

Differential Revision: https://reviews.llvm.org/D83048
2020-08-25 14:58:48 +01:00
David Sherwood 7b64765cd1 [SVE] Fix TypeSize related warnings with IR truncates of scalable vectors
In getCastInstrCost when the instruction is a truncate we were relying
upon the implicit TypeSize -> uint64_t cast when asking if a given type
has the same size as a legal integer. I've changed the code to only
ask the question if the type is fixed length.

I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail
out for now if the type is a scalable vector.

I've added the following new tests:

  Analysis/CostModel/AArch64/sve-trunc.ll
  Transforms/InstCombine/AArch64/sve-trunc.ll

for both of these fixes.

Differential revision: https://reviews.llvm.org/D86432
2020-08-25 09:17:56 +01:00
Freddy Ye e02d081f2b [X86] Support -march=sapphirerapids
Support -march=sapphirerapids for x86.
Compare with Icelake Server, it includes 14 more new features. They are
amxtile, amxint8, amxbf16, avx512bf16, avx512vp2intersect, cldemote,
enqcmd, movdir64b, movdiri, ptwrite, serialize, shstk, tsxldtrk, waitpkg.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D86503
2020-08-25 14:21:21 +08:00
Fangrui Song 44ee9d070a Revert D85812 "[coroutine] should disable inline before calling coro split"
This reverts commit 2e43acfed8.

LLVMCoroutines (the library which contains Coroutines.h) depends on LLVMipo (the
library which contains SampleProfile.cpp). It is inappropriate for
SampleProfile.cpp to depent on Coroutines.h (circular dependency).

The test inverted dependencies as well:
llvm/test/Transforms/Coroutines/coro-inline.ll uses -sample-profile.
2020-08-24 11:41:05 -07:00
Matt Arsenault 116affb18d TableGen/GlobalISel: Allow inst matcher to check multiple opcodes
This is to initially handleg immAllOnesV, which should match
G_BUILD_VECTOR or G_BUILD_VECTOR_TRUNC. In the future, it could be
used for other patterns cases that map to multiple G_* instructions,
such as G_ADD and G_PTR_ADD.
2020-08-24 13:48:51 -04:00
dongAxis 2e43acfed8 [coroutine] should disable inline before calling coro split
summary:
When callee coroutine function is inlined into caller coroutine
function before coro-split pass, llvm will emits "coroutine should
have exactly one defining @llvm.coro.begin". It seems that coro-early
pass can not handle this quiet well.
So we believe that unsplited coroutine function should not be inlined.
This patch fix such issue by not inlining function if it has attribute
"coroutine.presplit" (it means the function has not been splited) to
fix this issue

TestPlan: check-llvm

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D85812
2020-08-24 22:22:08 +08:00
Francesco Petrogalli 5a34b3ab95 [llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]
Changes:

* Change `ToVectorTy` to deal directly with `ElementCount` instances.
* `VF == 1` replaced with `VF.isScalar()`.
* `VF > 1` and `VF >=2` replaced with `VF.isVector()`.
* `VF <=1` is replaced with `VF.isZero() || VF.isScalar()`.
* Replaced the uses of `llvm::SmallSet<ElementCount, ...>` with
   `llvm::SmallSetVector<ElementCount, ...>`. This avoids the need of an
   ordering function for the `ElementCount` class.
* Bits and pieces around printing the `ElementCount` to string streams.

To guarantee that this change is a NFC, `VF.Min` and asserts are used
in the following places:

1. When it doesn't make sense to deal with the scalable property, for
example:
   a. When computing unrolling factors.
   b. When shuffle masks are built for fixed width vector types
In this cases, an
assert(!VF.Scalable && "<mgs>") has been added to make sure we don't
enter coepaths that don't make sense for scalable vectors.
2. When there is a conscious decision to use `FixedVectorType`. These
uses of `FixedVectorType` will likely be removed in favour of
`VectorType` once the vectorizer is generic enough to deal with both
fixed vector types and scalable vector types.
3. When dealing with building constants out of the value of VF, for
example when computing the vectorization `step`, or building vectors
of indices. These operation _make sense_ for scalable vectors too,
but changing the code in these places to be generic and make it work
for scalable vectors is to be submitted in a separate patch, as it is
a functional change.
4. When building the potential VFs in VPlan. Making the VPlan generic
enough to handle scalable vectorization factors is a functional change
that needs a separate patch. See for example `void
LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned
MaxVF)`.
5. The class `IntrinsicCostAttribute`: this class still uses `unsigned
VF` as updating the field to use `ElementCount` woudl require changes
that could result in changing the behavior of the compiler. Will be done
in a separate patch.
7. When dealing with user input for forcing the vectorization
factor. In this case, adding support for scalable vectorization is a
functional change that migh require changes at command line.

Note that in some places the idiom

```
unsigned VF = ...
auto VTy = FixedVectorType::get(ScalarTy, VF)
```

has been replaced with

```
ElementCount VF = ...
assert(!VF.Scalable && ...);
auto VTy = VectorType::get(ScalarTy, VF)
```

The assertion guarantees that the new code is (at least in debug mode)
functionally equivalent to the old version. Notice that this change had been
possible because none of the methods that are specific to `FixedVectorType`
were used after the instantiation of `VTy`.

Reviewed By: rengolin, ctetreau

Differential Revision: https://reviews.llvm.org/D85794
2020-08-24 13:54:03 +00:00
Francesco Petrogalli bad7d6b373 Revert "[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]"
Reverting because the commit message doesn't reflect the one agreed on
phabricator at https://reviews.llvm.org/D85794.

This reverts commit c8d2b065b9.
2020-08-24 13:50:55 +00:00
Matt Arsenault e1644a3779 GlobalISel: Reduce G_SHL width if source is extension
shl ([sza]ext x, y) => zext (shl x, y).

Turns expensive 64 bit shifts into 32 bit if it does not overflow the
source type:

This is a port of an AMDGPU DAG combine added in
5fa289f0d8. InstCombine does this
already, but we need to do it again here to apply it to shifts
introduced for lowered getelementptrs. This will help matching
addressing modes that use 32-bit offsets in a future patch.

TableGen annoyingly assumes only a single match data operand, so
introduce a reusable struct. However, this still requires defining a
separate GIMatchData for every combine which is still annoying.

Adds a morally equivalent function to the existing
getShiftAmountTy. Without this, we would have to do try to repeatedly
query the legalizer info and guess at what type to use for the shift.
2020-08-24 09:42:40 -04:00
Francesco Petrogalli c8d2b065b9 [llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]
Changes:

* Change `ToVectorTy` to deal directly with `ElementCount` instances.
* `VF == 1` replaced with `VF.isScalar()`.
* `VF > 1` and `VF >=2` replaced with `VF.isVector()`.
* `VF <=1` is replaced with `VF.isZero() || VF.isScalar()`.
* Add `<` operator to `ElementCount` to be able to use
`llvm::SmallSetVector<ElementCount, ...>`.
* Bits and pieces around printing the ElementCount to string streams.
* Added a static method to `ElementCount` to represent a scalar.

To guarantee that this change is a NFC, `VF.Min` and asserts are used
in the following places:

1. When it doesn't make sense to deal with the scalable property, for
example:
   a. When computing unrolling factors.
   b. When shuffle masks are built for fixed width vector types
In this cases, an
assert(!VF.Scalable && "<mgs>") has been added to make sure we don't
enter coepaths that don't make sense for scalable vectors.
2. When there is a conscious decision to use `FixedVectorType`. These
uses of `FixedVectorType` will likely be removed in favour of
`VectorType` once the vectorizer is generic enough to deal with both
fixed vector types and scalable vector types.
3. When dealing with building constants out of the value of VF, for
example when computing the vectorization `step`, or building vectors
of indices. These operation _make sense_ for scalable vectors too,
but changing the code in these places to be generic and make it work
for scalable vectors is to be submitted in a separate patch, as it is
a functional change.
4. When building the potential VFs in VPlan. Making the VPlan generic
enough to handle scalable vectorization factors is a functional change
that needs a separate patch. See for example `void
LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned
MaxVF)`.
5. The class `IntrinsicCostAttribute`: this class still uses `unsigned
VF` as updating the field to use `ElementCount` woudl require changes
that could result in changing the behavior of the compiler. Will be done
in a separate patch.
7. When dealing with user input for forcing the vectorization
factor. In this case, adding support for scalable vectorization is a
functional change that migh require changes at command line.

Differential Revision: https://reviews.llvm.org/D85794
2020-08-24 13:39:42 +00:00
Sam Parker 4ce176bed2 [SCEV] Still (again) trying to fix buildbots 2020-08-24 11:24:30 +01:00
Sam Parker 2e194fe73b [SCEV] Still trying to fix windows buildbots 2020-08-24 10:26:48 +01:00
Sam Parker e286c600e1 [SCEV] Attempt to fix windows buildbots 2020-08-24 08:29:22 +01:00
Sam Parker b999400a4f [SCEV] Add operand methods to Cast and UDiv
Add methods to access operands in a similar manner to NAryExpr.

Differential Revision: https://reviews.llvm.org/D86083
2020-08-24 06:57:07 +01:00
Craig Topper cc7bf9bcbf [X86] Allow 32-bit mode only CPUs with -mtune on 64-bit targets
gcc errors on this, but I'm nervous that since -mtune has been
ignored by clang for so long that there may be code bases out
there that pass 32-bit cpus to clang.
2020-08-22 16:38:05 -07:00
Matt Arsenault 901e3317fe GlobalISel: Merge FewerElements for G_BUILD_VECTOR/G_CONCAT_VECTORS
This switches from using G_EXTRACT in odd cases to widen with undef
and unmerge.
2020-08-22 10:25:53 -04:00
Florian Hahn 5e7e2162d4 [DSE,MemorySSA] Use BatchAA for AA queries.
We can use BatchAA to avoid some repeated AA queries. We only remove
stores, so I think we will get away with using a single BatchAA instance
for the complete run.

The changes in AliasAnalysis.h mirror the changes in D85583.

The change improves compile-time by roughly 1%.
http://llvm-compile-time-tracker.com/compare.php?from=67ad786353dfcc7633c65de11601d7823746378e&to=10529e5b43809808e8c198f88fffd8f756554e45&stat=instructions

This is part of the patches to bring down compile-time to the level
referenced in
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86275
2020-08-22 08:36:35 +01:00
Sourabh Singh Tomar f91d18eaa9 [DebugInfo][flang]Added support for representing Fortran assumed length strings
This patch adds support for representing Fortran `character(n)`.

Primarily patch is based out of D54114 with appropriate modifications.

Test case IR is generated using our downstream classic-flang. We're in process
of upstreaming flang PR's but classic-flang has dependencies on llvm, so
this has to get in first.

Patch includes functional test case for both IR and corresponding
dwarf, furthermore it has been manually tested as well using GDB.

Source snippet:
```
 program assumedLength
   call sub('Hello')
   call sub('Goodbye')
   contains
   subroutine sub(string)
           implicit none
           character(len=*), intent(in) :: string
           print *, string
   end subroutine sub
 end program assumedLength
```

GDB:
```
(gdb) ptype string
type = character (5)
(gdb) p string
$1 = 'Hello'
```

Reviewed By: aprantl, schweitz

Differential Revision: https://reviews.llvm.org/D86305
2020-08-22 10:13:40 +05:30
Alina Sbirlea f55ad3973d [DomTree] Extend update API to allow a post CFG view.
Extend the `applyUpdates` in DominatorTree to allow a post CFG view,
different from the current CFG.
This patch implements the functionality of updating an already up to
date DT, to the desired PostCFGView.
Combining a set of updates towards an up to date DT and a PostCFGView is
not yet supported.

Differential Revision: https://reviews.llvm.org/D85472
2020-08-21 17:23:08 -07:00
Roman Lebedev 503deec218
Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline"
As disscussed in post-commit review starting with
	https://reviews.llvm.org/D84108#2227365
while this appears to be mostly a win overall, especially code-size-wise,
this appears to shake //certain// code pattens in a way that is extremely
unfavorable for performance (+30% runtime regression)
on certain CPU's (i personally can't reproduce).

So until the behaviour is better understood, and a path forward is mapped,
let's back this out for now.

This reverts commit 1d51dc38d8.
2020-08-22 00:33:22 +03:00
Nicolai Hähnle b37db11d95 MachineSSAUpdater: Allow initialization with just a register class
The register class is required for inserting PHIs, but the "current
virtual register" isn't actually used for anything, so let's remove it
while we're at it.

Differential Revision: https://reviews.llvm.org/D85602

Change-Id: I1e647f31570ef21a7ea8e20db3454178e98a6a8b
2020-08-21 23:04:35 +02:00
Alina Sbirlea 7ea0ee3058 [DomTree] Avoid creating an empty GD to reduce compile time. 2020-08-21 13:45:00 -07:00
Serguei Katkov 9e362bb0eb [InstCombine] Remove unused entries in gc-live bundle of statepoint
If some of gc live value are not used in gc.relocate we can remove them
from gc-live bundle of statepoint instruction.

Also the CL removes duplicated Values in gc-live bundle.

Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D85959
2020-08-22 01:36:22 +07:00
Kamau Bridgeman 365f861c45 [PowerPC][PCRelative] Thread Local Storage Support for Initial Exec
This patch is the initial support for the Intial Exec Thread Local
Local Storage model to produce code sequence and relocations correct
to the ABI for the model when using PC relative memory operations.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D81947
2020-08-21 10:13:11 -05:00
diggerlin a081868921 [AIX][XCOFF] emit symbol visibility for xcoff object file.
SUMMARY:

Reviewers:  Jason liu

Differential Revision: https://reviews.llvm.org/D84265
2020-08-21 11:00:56 -04:00
Florian Hahn 8eded24bf4 Recommit "[SCEVExpander] Add helper to clean up instrs inserted while expanding."
Recommit the patch after fixing an issue reported caused by the fact
that re-used values are also added to InsertedValues.

Additional tests have been added in 88818491b9

This reverts the revert commit 38884641f2.
2020-08-21 15:04:17 +01:00
Xing GUO f5643dc3dc Recommit: [DWARFYAML] Add support for referencing different abbrev tables.
The original commit (7ff0ace96db9164dcde232c36cab6519ea4fce8) was causing
build failure and was reverted in 6d242a7326

==================== Original Commit Message ====================
This patch adds support for referencing different abbrev tables. We use
'ID' to distinguish abbrev tables and use 'AbbrevTableID' to explicitly
assign an abbrev table to compilation units.

The syntax is:
```
debug_abbrev:
  - ID: 0
    Table:
      ...
  - ID: 1
    Table:
      ...
debug_info:
  - ...
    AbbrevTableID: 1 ## Reference the second abbrev table.
  - ...
    AbbrevTableID: 0 ## Reference the first abbrev table.
```

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D83116
2020-08-21 19:02:10 +08:00
Roman Lebedev 5d7c5a5e99
[NFC] Port InstCount pass to new pass manager 2020-08-21 12:39:42 +03:00
Yevgeny Rouban 18bc400f97 [NewPM][PassInstrumentation] Add PreservedAnalyses parameter to AfterPass* callbacks
Both AfterPass and AfterPassInvalidated pass instrumentation
callbacks get additional parameter of type PreservedAnalyses.
This patch was created by @fedor.sergeev. I have just slightly
changed it.

Reviewers: fedor.sergeev

Differential Revision: https://reviews.llvm.org/D81555
2020-08-21 16:10:42 +07:00
David Green 2b69efded0 [ARM][LV] Add a preferPredicatedReductionSelect target hook
As part of D84741, this adds a target hook for the
preferPredicatedReductionSelect option and makes use
of it under MVE, allowing us to tail predicate most
reduction loops.

Differential Revision: https://reviews.llvm.org/D85980
2020-08-21 08:48:12 +01:00
Qiu Chaofan 91039784b3 [PowerPC] Add readflm/setflm intrinsics to Clang
Commit dbcfbffc adds ppc.readflm and ppc.setflm intrinsics to read or
write FPSCR register. This patch adds them to Clang.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D85874
2020-08-21 15:12:19 +08:00
Xing GUO 6d242a7326 Revert "[DWARFYAML] Add support for referencing different abbrev tables."
This reverts commit f7ff0ace96.

This change is causing build failure.

http://lab.llvm.org:8011/builders/clang-cmake-armv7-global-isel/builds/10400
2020-08-21 12:15:54 +08:00
Yevgeny Rouban 7d9a16241f [ADT] Allow IsSizeLessThanThresholdT for incomplete types. NFC
If the type T is incomplete then sizeof(T) results in C++ compilation error at line:
  static constexpr bool value = sizeof(T) <= (2 * sizeof(void *));

This patch allows incomplete types in parameters of function. Example:
  using SomeFunc = void(SomeIncompleteType &);
  llvm::unique_function<SomeFuncType> SomeFunc;

Reviewers: DaniilSuchkov, vvereschaka

Differential Revision: https://reviews.llvm.org/D81554
2020-08-21 11:01:57 +07:00
Xing GUO f7ff0ace96 [DWARFYAML] Add support for referencing different abbrev tables.
This patch adds support for referencing different abbrev tables. We use
'ID' to distinguish abbrev tables and use 'AbbrevTableID' to explicitly
assign an abbrev table to compilation units.

The syntax is:
```
debug_abbrev:
  - ID: 0
    Table:
      ...
  - ID: 1
    Table:
      ...
debug_info:
  - ...
    AbbrevTableID: 1 ## Reference the second abbrev table.
  - ...
    AbbrevTableID: 0 ## Reference the first abbrev table.
```

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D83116
2020-08-21 11:44:25 +08:00
Xing GUO 290e399f96 [DWARFYAML] Add support for emitting multiple abbrev tables.
This patch adds support for emitting multiple abbrev tables. Currently,
compilation units will always reference the first abbrev table.

Reviewed By: jhenderson, labath

Differential Revision: https://reviews.llvm.org/D86194
2020-08-21 10:12:08 +08:00
Michael Liao 5257a60ee0 [amdgpu] Add codegen support for HIP dynamic shared memory.
Summary:
- HIP uses an unsized extern array `extern __shared__ T s[]` to declare
  the dynamic shared memory, which size is not known at the
  compile time.

Reviewers: arsenm, yaxunl, kpyzhov, b-sumner

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82496
2020-08-20 21:29:18 -04:00
Jon Roelofs 74ca5275e9 Fix a couple of typos. NFC 2020-08-20 14:56:57 -06:00
Kamau Bridgeman b74b80bb2d [PowerPC][PCRelative] Thread Local Storage Support for General Dynamic
This patch is the initial support for the General Dynamic Thread Local
Local Storage model to produce code sequence and relocations correct
to the ABI for the model when using PC relative memory operations.

Patch by: NeHuang

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D82315
2020-08-20 15:08:13 -05:00
Simon Pilgrim 0ee23b286a Fix Wdocumentation unknown parameter warning. NFC. 2020-08-20 12:41:34 +01:00
Vitaly Buka ebdc886b5f [APInt] Allow self-assignment with libstdc++
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/8256/steps/test-check-all/logs/FAIL%3A%20LLVM%3A%3Athinlto-function-summary-paramaccess.ll

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D86053
2020-08-20 04:14:40 -07:00
Sebastian Neubauer b8d1994778 [AMDGPU] Add A16/G16 to InstCombine
When sampling from images with coordinates that only have 16 bit
accuracy, convert the image intrinsic call to use a16 or g16.
This does only happen if the target hardware supports it.

An alternative would be to always apply this combination, independent of
the target hardware and extend 16 bit arguments to 32 bit arguments
during legalization. To me, this sounds like an unnecessary roundtrip
that could prevent some further InstCombine optimizations.

Differential Revision: https://reviews.llvm.org/D85887
2020-08-20 10:51:49 +02:00
Georgii Rymar a6436b0b3a [yaml2obj] - Make the 'Machine' key optional.
Currently we have to set 'Machine' to something in our
YAML descriptions. Usually we use 'EM_X86_64' for 64-bit targets
and 'EM_386' for 32-bit targets. At the same time, in fact, in most
cases our tests do not need a machine type and we can use
'EM_NONE'.

This is cleaner, because avoids the need of using a particular machine.

In this patch I've made the 'Machine' key optional (the default value,
when it is not specified is `EM_NONE`) and removed it (where possible)
from yaml2obj, obj2yaml and llvm-readobj tests.

There are few tests left where I decided not to remove it, because
I didn't want to touch CHECK lines or doing anything more complex
than a removing a "Machine: *" line and formatting lines around.

Differential revision: https://reviews.llvm.org/D86202
2020-08-20 11:40:51 +03:00
Bevin Hansson f03b10f57e [IR] Add FixedPointBuilder.
This patch adds a convenience class for using FixedPointSemantics
to build fixed-point operations in IR.

RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-August/144025.html

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D85314
2020-08-20 10:29:57 +02:00
Bevin Hansson 1a995a0af3 [ADT] Move FixedPoint.h from Clang to LLVM.
This patch moves FixedPointSemantics and APFixedPoint
from Clang to LLVM ADT.

This will make it easier to use the fixed-point
classes in LLVM for constructing an IR builder for
fixed-point and for reusing the APFixedPoint class
for constant evaluation purposes.

RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-August/144025.html

Reviewed By: leonardchan, rjmccall

Differential Revision: https://reviews.llvm.org/D85312
2020-08-20 10:29:45 +02:00
Johannes Doerfert 2f38c755ba Revert "[IR] Intrinsics default attributes and opt-out flag"
This commit introduced a non-trivial compile time regression that needs
to be addressed: https://reviews.llvm.org/D70365#2227627
Given that it is unclear how long that will take, I'll revert it for
now.

This reverts commit eedf18fc1f.
2020-08-20 00:25:32 -05:00
Matt Arsenault 31adc28d24 GlobalISel: Implement fewerElementsVector for G_CONCAT_VECTORS sources
This fixes <6 x s16> = G_CONCAT_VECTORS from <3 x s16> handling.
2020-08-19 18:53:24 -04:00
Francesco Petrogalli dac0b1d330 [llvm] Add default constructor of `llvm::ElementCount`.
This patch prevents failures like those reported in
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast/builds/34173.

We have enabled the default constructor for
`llvm::ElementCount` to make sure the code compiles on Windows.

Reviewed By: ormris

Differential Revision: https://reviews.llvm.org/D86240
2020-08-19 21:39:24 +00:00
Sanjay Patel 6f3511a01a [ValueTracking] define/use max recursion depth in header
There's a potential motivating case to increase this limit in PR47191:
http://bugs.llvm.org/PR47191

But first we should make it less hacky. The limit in InstCombine is directly tied
to this value because an increase there can cause asserts in the underlying value
tracking calls if not changed together. The usage in VectorUtils is independent,
but the comment suggests that we should use the same value unless there's a known
reason to diverge. There are similar limits in codegen analysis, but I think we
should leave those independent in case we intentionally want the optimization
power/cost to be different there.

Differential Revision: https://reviews.llvm.org/D86113
2020-08-19 16:56:59 -04:00
Matt Arsenault adbcc8e733 GlobalISel: Add TargetLowering member to LegalizerHelper 2020-08-19 14:50:35 -04:00
Matt Arsenault e95c08432a GlobalISel: Use Register 2020-08-19 13:45:31 -04:00
Mehdi Amini a407ec9b6d Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private.""
Was reverted because MLIR/Flang builds were broken, these APIs have been
fixed in the meantime.
2020-08-19 17:26:36 +00:00
Mehdi Amini 4fc56d70aa Revert "[NFC][llvm] Make the contructors of `ElementCount` private."
This reverts commit 264afb9e6a.
(and dependent 6b742cc48 and fc53bd610f)

MLIR/Flang are broken.
2020-08-19 17:21:37 +00:00
Jessica Paquette d25b12bdc3 [GlobalISel] Add combine for (x & mask) -> x when (x & mask) == x
If we have a mask, and a value x, where (x & mask) == x, we can drop the AND
and just use x.

This is about a 0.4% geomean code size improvement on CTMark at -O3 for AArch64.

In AArch64, this is most useful post-legalization. Patterns like this often
show up when legalizing s1s, which must be extended to larger types.

e.g.

```
%cmp:_(s32) = G_ICMP ...
%and:_(s32) = G_AND %cmp, 1
```

Since G_ICMP only produces a single bit, there's no reason to mask it with the
G_AND.

Differential Revision: https://reviews.llvm.org/D85463
2020-08-19 10:20:57 -07:00
Francesco Petrogalli 264afb9e6a [NFC][llvm] Make the contructors of `ElementCount` private.
Differential Revision: https://reviews.llvm.org/D86120
2020-08-19 16:26:44 +00:00
Bjorn Pettersson 54105d635d [GlobalISel] Untabify InstructionSelectorImpl.h. NFC 2020-08-19 12:00:00 +02:00
sstefan1 eedf18fc1f [IR] Intrinsics default attributes and opt-out flag
Intrinsic properties can now be set to default and applied to all
intrinsics. If the attributes are not needed, the user can opt-out by
setting the DisableDefaultAttributes flag to true.

Differential Revision: https://reviews.llvm.org/D70365
2020-08-19 10:50:46 +02:00
Ronak Chauhan fdf71d486c Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors"
This reverts commit cacfb02d28.

Reverting due to buildbot failures.
2020-08-19 13:12:29 +05:30
David Sherwood 3f36561f69 [SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer::GenWidenVectorLoads
In DAGTypeLegalizer::GenWidenVectorLoads the algorithm assumes it only
ever deals with fixed width types, hence the offsets for each individual
store never take 'vscale' into account. I've changed the code in that
function to use TypeSize instead of unsigned for tracking the remaining
load amount. In addition, I've changed the load loop to use the new
IncrementPointer helper function for updating the addresses in each
iteration, since this handles scalable vector types.

Also, I've added report_fatal_errors in GenWidenVectorExtLoads,
TargetLowering::scalarizeVectorLoad and TargetLowering::scalarizeVectorStores,
since these functions currently use a sequence of element-by-element
scalar loads/stores. In a similar vein, I've also added a fatal error
report in FindMemType for the case when we decide to return the element
type for a scalable vector type.

I've added new tests in

  CodeGen/AArch64/sve-split-load.ll
  CodeGen/AArch64/sve-ld-addressing-mode-reg-imm.ll

for the changes in GenWidenVectorLoads.

Differential Revision: https://reviews.llvm.org/D85909
2020-08-19 07:54:32 +01:00
Yaxun (Sam) Liu 7546b29e76 [HIP] Support target id by --offload-arch
This patch introduces support of target id by
-offload-arch.

Differential Revision: https://reviews.llvm.org/D60620
2020-08-18 23:43:53 -04:00
Ronak Chauhan cacfb02d28 [AMDGPU] Support disassembly for AMDGPU kernel descriptors
Decode AMDGPU Kernel descriptors as assembler directives.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D80713
2020-08-19 08:49:07 +05:30
Elliott Hughes a7d0b7a786 ld128 demangle: allow space for 'L' suffix.
Summary:
Caught by HWASAN on arm64 Android (which uses ld128 for long double). This
was running the existing fuzzer.

The specific minimized fuzz input to reproduce this is:

  __cxa_demangle("1\006ILeeeEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE", 0, 0, 0);

Reviewers: eugenis, srhines, #libc_abi!

Subscribers: kristof.beyls, danielkiss, libcxx-commits

Tags: #libc_abi

Differential Revision: https://reviews.llvm.org/D77924
2020-08-18 16:14:05 -07:00
Jessica Paquette bf36e90295 [GlobalISel][CallLowering] NFC: Unify flag-setting from CallBase + AttributeList
It's annoying to have to maintain multiple, nearly identical chains of if
statements which all set the same attributes.

Add a helper function, `addFlagsUsingAttrFn` which performs the attribute
setting.

Then, use wrappers for that function in `lowerCall` and `setArgFlags`.

(Note that the flag-setting code in `setArgFlags` was missing the returned
attribute. There's no selection for this yet, so no test. It's an example of
the kind of thing this lets us avoid, though.)

Differential Revision: https://reviews.llvm.org/D86159
2020-08-18 11:07:33 -07:00
Matt Arsenault 5a15f6628e GlobalISel: Implement fewerElementsVector for G_INSERT_VECTOR_ELT
Add unit tests since AMDGPU will only trigger this for gigantic
vectors, and won't use the annoying odd sized breakdown case.
2020-08-18 13:51:19 -04:00
David Blaikie f7a49d2aa6 [WIP][DebugInfo] Lazily parse debug_loclist offsets
Parsing DWARFv5 debug_loclist offsets when a CU is parsed is weighing
down memory usage of symbolizers that don't need to parse this data at
all. There's not much benefit to caching these anyway - since they are
O(1) lookup and reading once you know where the offset list starts (and
can do bounds checking with the offset list size too).

In general, I think it might be time to start paying down some of the
technical debt of loc/loclist/range/rnglist parsing to try to unify it a
bit more.

eg:

* Currently DWARFUnit has: RangeSection, RangeSectionBase, LocSection,
  LocSectionBase, LocTable, RngListTable, LoclistTableHeader (be nice if
  these were all wrapped up in two variables - one for loclists, one for
  rnglists)

* rnglists and loclists are handled differently (see:
  LoclistTableHeader, but no RnglistTableHeader)

* maybe all these types could be less stateful - lazily parse what they
  need to, even reparsing rather than caching because it doesn't seem
  too expensive, for instance. (though admittedly so long as it's
  constantcost/overead per compilatiton that's probably adequate)

* Maybe implementing and using a DWARFDataExtractor that can be
  sub-ranged (so we could slice it up to just the single contribution) -
  though maybe that's not so useful because loc/ranges need to refer to
  it by absolute, not contribution-relative mechanisms

Differential Revision: https://reviews.llvm.org/D86110
2020-08-18 10:49:39 -07:00
Amara Emerson 04a6ea5d77 [GlobalISel] Add a combine for sext_inreg(load x), c --> sextload x
This is restricted to single use loads, which if we fold to sextloads we can
find more optimal addressing modes on AArch64.

This also fixes an overload the MachineFunction::getMachineMemOperand() method
which was incorrectly using the MF alignment instead of the MMO alignment.

Differential Revision: https://reviews.llvm.org/D85966
2020-08-18 10:42:15 -07:00
Amara Emerson 40e269ea6d [GlobalISel] Add a combine for ashr(shl x, c), c --> sext_inreg x, c'
By detecting this sign extend pattern early, we can uncover opportunities for
more optimizations.

Differential Revision: https://reviews.llvm.org/D85965
2020-08-18 10:42:15 -07:00
Jessica Paquette 224a8c639e [GlobalISel][CallLowering] Look through call parameters for flags
We weren't looking through the parameters on calls at all.

E.g., say you had

```
declare i32 @zext(i32 zeroext %x)

...
%y = call i32 @zext(i32 %something)
...

```

At the point of the call, we wouldn't know that the %something should have the
zeroext attribute.

This sets flags in about the same way as
TargetLoweringBase::ArgListEntry::setAttributes.

Differential Revision: https://reviews.llvm.org/D86125
2020-08-18 08:48:56 -07:00
Ronak Chauhan 7b777ee730 [ELF] Hide target specific methods as private
Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D86136
2020-08-18 18:26:08 +05:30
Ronak Chauhan e760e85680 [llvm-objdump][AMDGPU] Detect CPU string
AMDGPU ISA isn't backwards compatible and hence -mcpu must always be specified during disassembly.
However, the AMDGPU target CPU is stored in e_flags in the ELF object.

This patch allows targets to implement CPU string detection, and also implements it for AMDGPU by looking at e_flags.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D84519
2020-08-18 17:43:16 +05:30
Shinji Okumura 5e361e2aa4 [Attributor] Deduce noundef attribute
This patch introduces a new abstract attribute `AANoUndef` which corresponds to `noundef` IR attribute and deduce them.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85184
2020-08-18 18:05:54 +09:00
Johannes Doerfert 858c75f7d1 [Attributor][NFC] Directly return proper type to avoid casts 2020-08-17 23:36:36 -05:00
Harmen Stoppels a52173a3e5 Use find_library for ncurses
Currently it is hard to avoid having LLVM link to the system install of
ncurses, since it uses check_library_exists to find e.g. libtinfo and
not find_library or find_package.

With this change the ncurses lib is found with find_library, which also
considers CMAKE_PREFIX_PATH. This solves an issue for the spack package
manager, where we want to use the zlib installed by spack, and spack
provides the CMAKE_PREFIX_PATH for it.

This is a similar change as https://reviews.llvm.org/D79219, which just
landed in master.

Differential revision: https://reviews.llvm.org/D85820
2020-08-17 19:52:52 -07:00
Amy Kwan c7ec3a7e33 [PowerPC] Implement Vector Extract Mask builtins in LLVM/Clang
This patch implements the vec_extractm function prototypes in altivec.h in
order to utilize the vector extract with mask instructions introduced in Power10.

Differential Revision: https://reviews.llvm.org/D82675
2020-08-17 21:14:17 -05:00
Hamilton Tobon Mosquera 496f8e5b36 [OpenMPOpt][HideMemTransfersLatency] Split __tgt_target_data_begin_mapper into its "issue" and "wait" counterparts.
WIP that tries to hide the latency of runtime calls that involve host to
device memory transfers by splitting them into their "issue" and "wait"
versions. The "issue" is moved upwards as much as possible. The "wait" is
moved downards as much as possible. The "issue" issues the memory transfer
asynchronously, returning a handle. The "wait" waits in the returned
handle for the memory transfer to finish. We still lack of the movement.
2020-08-17 20:56:10 -05:00
Hongtao Yu 819b2d9c79 [llvm-objdump] Symbolize binary addresses for low-noisy asm diff.
When diffing disassembly dump of two binaries, I see lots of noises from mismatched jump target addresses and global data references, which unnecessarily causes diffs on every function, making it impractical. I'm trying to symbolize the raw binary addresses to minimize the diff noise.
In this change, a local branch target is modeled as a label and the branch target operand will simply be printed as a label. Local labels are collected by a separate pre-decoding pass beforehand. A global data memory operand will be printed as a global symbol instead of the raw data address. Unfortunately, due to the way the disassembler is set up and to be less intrusive, a global symbol is always printed as the last operand of a memory access instruction. This is less than ideal but is probably acceptable from checking code quality point of view since on most targets an instruction can have at most one memory operand.

So far only the X86 disassemblers are supported.

Test Plan:

llvm-objdump -d  --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr :
```
Disassembly of section .text:

<_start>:
               	push	rax
               	mov	dword ptr [rsp + 4], 0
               	mov	dword ptr [rsp], 0
               	mov	eax, dword ptr [rsp]
               	cmp	eax, dword ptr [rip + 4112]  # 202182 <g>
               	jge	0x20117e <_start+0x25>
               	call	0x201158 <foo>
               	inc	dword ptr [rsp]
               	jmp	0x201169 <_start+0x10>
               	xor	eax, eax
               	pop	rcx
               	ret
```

llvm-objdump -d  **--symbolize-operands** --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr :
```
Disassembly of section .text:

<_start>:
               	push	rax
               	mov	dword ptr [rsp + 4], 0
               	mov	dword ptr [rsp], 0
<L1>:
               	mov	eax, dword ptr [rsp]
               	cmp	eax, dword ptr  <g>
               	jge	 <L0>
               	call	 <foo>
               	inc	dword ptr [rsp]
               	jmp	 <L1>
<L0>:
               	xor	eax, eax
               	pop	rcx
               	ret
```

Note that the jump instructions like `jge 0x20117e <_start+0x25>` without this work is printed as a real target address and an offset from the leading symbol. With a change in the optimizer that adds/deletes an instruction, the address and offset may shift for targets placed after the instruction. This will be a problem when diffing the disassembly from two optimizers where there are unnecessary false positives due to such branch target address changes. With `--symbolize-operand`, a label is printed for a branch target instead to reduce the false positives. Similarly, the disassemble of PC-relative global variable references is also prone to instruction insertion/deletion.

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D84191
2020-08-17 16:55:12 -07:00
Matt Arsenault a128292b90 GlobalISel: Make type for lower action more consistently optional
Some of the lower implementations were relying on this, however the
type was not set depending on which form .lower* helper form you were
using. For instance, if you used an unconditonal lower(), the type was
never set. Most of the lower actions do not benefit from a type
parameter, and just expand in terms of the original operation's types.

However, some lowerings could benefit from an additional type hint to
combine a promotion and an expansion. An example of this is for
add/sub sat. The DAG integer legalization tries to use smarter
expansions directly when promoting the integer type, and doesn't
always produce the same instruction with a wider type.

Treat this as an optional hint argument, that only means something for
specific lower actions. It may be useful to generalize this mechanism
to pass a full list of type indexes and desired types, but I haven't
run into a case like that yet.
2020-08-17 16:24:55 -04:00
diggerlin 2f0d755d81 [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d
SUMMARY:

1. This patch provided API for decoding the traceback table info and unit test for the these API.

2. Another patchs will do the following things:
2.1 added a new option --traceback-table to decode the trace back table information for xcoff object file when
using llvm-objdump to disassemble the xcoff objfile.

2.2 print out the  traceback table information for llvm-objdump.

Reviewers:  Jason liu, Hubert Tong, James Henderson

Differential Revision: https://reviews.llvm.org/D81585
2020-08-17 16:23:47 -04:00
Dávid Bolvanský 0f14b2e6cb Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 50c743fa71. Patch will be split to smaller ones.
2020-08-17 20:44:33 +02:00
Valentin Clement 3060894bbb [flang][directives] Use TableGen to generate clause unparsing
Use the TableGen directive back-end to generate code for the clauses unparsing.

Reviewed By: sscalpone, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D85851
2020-08-17 14:22:25 -04:00
Matt Arsenault 5ca7c6386f GlobalISel: Fix parameter name in doxygen comment 2020-08-17 13:57:10 -04:00
Matt Arsenault fe171908e9 GlobalISel: Revisit users of other merge opcodes in artifact combiner
The artifact combiner searches for the uses of G_MERGE_VALUES for
unmerge/trunc that need further combining. This also needs to handle
the vector merge opcodes the same way. This fixes leaving behind some
pairs I expected to be removed, that were if the legalizer is run a
second time.
2020-08-17 13:56:53 -04:00
Matt Arsenault 924f31bc3c GlobalISel: Remove unnecessary check for copy type
COPY isn't allowed to change the type, but can mix no type with type.
2020-08-17 09:19:25 -04:00
Alex Zinenko 874aef875d [llvm] support graceful failure of DataLayout parsing
Existing implementation always aborts on syntax errors in a DataLayout
description. While this is meaningful for consuming textual IR modules, it is
inconvenient for users that may need fine-grained control over the layout from,
e.g., command-line options. Propagate errors through the parsing functions and
only abort in the top-level parsing function instead.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D85650
2020-08-17 15:10:37 +02:00
Simon Pilgrim c1f6ce0c73 [DemandedBits] Improve accuracy of Add propagator
The current demand propagator for addition will mark all input bits at and right of the alive output bit as alive. But carry won't propagate beyond a bit for which both operands are zero (or one/zero in the case of subtraction) so a more accurate answer is possible given known bits.

I derived a propagator by working through truth tables and using a bit-reversed addition to make demand ripple to the right, but I'm not sure how to make a convincing argument for its correctness in the comments yet. Nevertheless, here's a minimal implementation and test to get feedback.

This would help in a situation where, for example, four bytes (<128) packed into an int are added with four others SIMD-style but only one of the four results is actually read.

Known A:     0_______0_______0_______0_______
Known B:     0_______0_______0_______0_______
AOut:        00000000001000000000000000000000
AB, current: 00000000001111111111111111111111
AB, patch:   00000000001111111000000000000000

Committed on behalf of: @rrika (Erika)

Differential Revision: https://reviews.llvm.org/D72423
2020-08-17 12:54:09 +01:00
Vitaly Buka e10e7829bf [StackSafety] Skip ambiguous lifetime analysis
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D84630
2020-08-16 18:05:52 -07:00
Fady Ghanim aaa93a681b [OpenMP][OMPBuilder] Adding support for `omp single`
This adds support for generating `omp single`, and necessary calls for
`copyprivate` clause.

Differential Revision: https://reviews.llvm.org/D85617
2020-08-16 01:15:16 -04:00
Wenlei He 577e58bcc7 [InlineAdvisor] New inliner advisor to replay inlining from optimization remarks
This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.

A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.

This is a resubmit of https://reviews.llvm.org/D83743
2020-08-15 20:17:21 -07:00
Aditya Kumar 49a944af7f [NFC] Fix typo and variable names 2020-08-15 09:06:22 -07:00
Philip Reames 6b2105456a [Statepoint] Remove code related to inline operand bundles
This code becomes dead for valid IR after 48f4312 and a96fc46.  The reason for the test change is that the verifier reports the first verification error encountered, in some non-specified visit order.  By removing the verification code in gc.relocates for a statepoint with inline gc operands, I change the error the verifier reports.  And in one case, the checked for error is no longer possible with the bundle representation, so I simply delete the file.
2020-08-14 20:29:41 -07:00
Arthur Eubanks e6ea8779c2 [NewPM][optnone] Mark various passes as required
This was done by turning on -enable-npm-optnone and fixing failures.
That will be enabled in a follow-up change for ease of reverting.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D85457
2020-08-14 15:51:59 -07:00
Craig Topper c7a0b2684f [X86][MC][Target] Initial backend support a tune CPU to support -mtune
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.

This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.

One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.

I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.

Differential Revision: https://reviews.llvm.org/D85165
2020-08-14 15:31:50 -07:00
Jordan Rupprecht 38884641f2 Temporarily revert "[SCEVExpander] Add helper to clean up instrs inserted while expanding."
This reverts commit 7829c33084. The assertion is triggering on some internal code. A reduced test case is in progress.
2020-08-14 14:52:37 -07:00
Vitaly Buka fc4fd89852 [StackSafety] Use ValueInfo in ParamAccess::Call
This avoid GUID lookup in Index.findSummaryInModule.
Follow up for D81242.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D85269
2020-08-14 12:42:44 -07:00
Greg McGary eef41efe00 [MachO] Add skeletal support for DriverKit platform
Define the platform ID = 10, and simple mappings between platform ID & name.

Reviewed By: MaskRay, cishida

Differential Revision: https://reviews.llvm.org/D85594
2020-08-14 12:36:43 -07:00
Matt Arsenault 5c5e6d951e TableGen/GlobalISel: Partially handle immAllOnesV/immAllZerosV
These should really match either G_BUILD_VECTOR or
G_BUILD_VECTOR_TRUNC, but there doesn't seem to be an existing
mechanism for matching alternative opcodes. There is GIM_SwitchOpcode,
but it seems to assume it's oly only used for matcher optimization.

I could also omit any opcode check and rely on the matcher directly
checking the opcode, but the table optimizer currently assumes there
has to be an opcode check.

Also doesn't try to handle undef elements like the DAG version.
2020-08-14 13:55:30 -04:00
Bjorn Pettersson b395d67a88 [Orc] Fix werror for unused variable in noasserts build 2020-08-14 15:58:04 +02:00
Stefan Gränitz 28e1015e32 [ORC] Fix missing include in OrcRemoteTargetClient.h 2020-08-14 12:00:18 +02:00
Stefan Gränitz 6bf74a924f [ORC] In LLLazyJIT provide public access to the CompileOnDemandLayer
This is analog to how LLJIT provides public access to all its layers.

Differential Revision: https://reviews.llvm.org/D85921
2020-08-14 11:34:44 +02:00
Stefan Gränitz 30c4561e36 [ORC] Add JITLink-compatible remote memory-manager and LLJITWithChildProcess example
This adds RemoteJITLinkMemoryManager is a new subclass of OrcRemoteTargetClient. It implements jitlink::JITLinkMemoryManager and targets the OrcRemoteTargetRPCAPI.

Behavior should be very similar to RemoteRTDyldMemoryManager. The essential differnce with JITLink is that allocations work in isolation from its memory manager. Thus, the RemoteJITLinkMemoryManager might be seen as "JITLink allocation factory".

RPCMMAlloc is another subclass of OrcRemoteTargetClient and implements the actual functionality. It allocates working memory on the host and target memory on the remote target. Upon finalization working memory is copied over to the tagrte address space. Finalization can be asynchronous for JITLink allocations, but I don't see that it makes a difference here.

Differential Revision: https://reviews.llvm.org/D85919
2020-08-14 11:34:44 +02:00
Yuanfang Chen a5ed20b549 [NewPM][CodeGen] Add machine code verification callback
D83608 need this.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D85916
2020-08-13 16:13:01 -07:00
Stefan Gränitz f12db8cf75 [ORC] cloneToNewContext() can work with a const-ref to ThreadSafeModule 2020-08-13 21:01:21 +02:00
Stefan Gränitz 34a5669ccd [ORC] Fix SymbolLookupSet::containsDuplicates() 2020-08-13 21:01:21 +02:00
Haowei Wu d650cbc349 [elfabi] Move llvm-elfabi related code to InterfaceStub library
This change moves elfabi related code to llvm/InterfaceStub library
so it can be shared by multiple llvm tools without causing cyclic
dependencies.

Differential Revision: https://reviews.llvm.org/D85678
2020-08-13 11:51:44 -07:00
Sameer Arora 8d58eb11f9 [llvm-libtool-darwin] Refactor ArchiveWriter
Refactoring function `writeArchive` in ArchiveWriter. Added a new
function `writeArchiveBuffer` that returns the archive in a memory
buffer instead of writing it out to the disk. This refactor is necessary
so as to allow `llvm-libtool-darwin` to write universal files containing
archives.

Reviewed by jhenderson, MaskRay, smeenai

Differential Revision: https://reviews.llvm.org/D84858
2020-08-13 10:56:30 -07:00
Dávid Bolvanský 50c743fa71 [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 19:54:27 +02:00
Fangrui Song 7f8c49b016 [llvm-objdump] Change symbol name/PLT decoding errors to warnings
If the referenced symbol of a J[U]MP_SLOT is invalid (e.g. symbol index 0), llvm-objdump -d will bail out:

```
error: 'a': st_name (0x326600) is past the end of the string table of size 0x7
```

where 0x326600 is the st_name field of the first entry past the end of .symtab

Change it to a warning to continue dumping.
`X86/plt.test` uses a prebuilt executable, so I pick `ELF/AArch64/plt.test`
which has a YAML input and can be easily modified.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D85623
2020-08-13 08:13:42 -07:00
David Stenberg e8ebebb0bd [InstCombine] Fix incorrect Modified status
When removing instructions from unreachable blocks, and only debug info
intrinsics were removed, InstCombine could incorrectly return a false
Modified status.

This is fixed by making removeAllNonTerminatorAndEHPadInstructions()
also return how many debug info intrinsics that were removed, and take
that into account.

This was caught using the check introduced by D80916.

Reviewed By: majnemer

Differential Revision: https://reviews.llvm.org/D85839
2020-08-13 15:10:41 +02:00
Dávid Bolvanský f9264995a6 Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 44587e2f7e. Sanitizer tests need to be updated.
2020-08-13 14:37:40 +02:00
Dávid Bolvanský 44587e2f7e [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 14:23:58 +02:00
Dávid Bolvanský a0485421d2 Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 385c9d673f.
2020-08-13 12:59:15 +02:00
Dávid Bolvanský 385c9d673f [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 12:45:40 +02:00