Commit Graph

126735 Commits

Author SHA1 Message Date
Jonas Devlieghere de0ce98abe [DebugLine] Don't try to guess the path style
In r368879 I made an attempt to guess the path style from the files in
the line table. After some consideration I now think this is a poor
idea. This patch undoes that behavior and instead adds an optional
argument to specify the path style. This allows us to make that decision
elsewhere where we have more information. In case of LLDB based on the
Unit.

llvm-svn: 369072
2019-08-15 23:53:15 +00:00
Volkan Keles 0ae6006bee [GlobalISel] CSEMIRBuilder: Add support for G_GEP
Summary:
This patch adds G_GEP to `shouldCSEOpc` so that it can be CSEd. It also refactors
`translateGetElementPtr` by replacing `createGenericVirtualRegister` calls with types.

Reviewers: aditya_nandakumar, arsenm, dsanders, paquette, aemerson

Reviewed By: aditya_nandakumar

Subscribers: wdng, rovka, javed.absar, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66316

llvm-svn: 369070
2019-08-15 23:45:45 +00:00
Eli Friedman 9b9a308452 [ARM][LowOverheadLoops] Fix generated code for "revert".
Two issues:

1. t2CMPri shouldn't use CPSR if it isn't predicated. This doesn't
really have any visible effect at the moment, but it might matter in the
future.
2. The t2CMPri generated for t2WhileLoopStart might need to use a
register that isn't LR.

My team found this because we have a patch to track register liveness
late in the pass pipeline. I'll look into upstreaming it to help catch
issues like this earlier.

Differential Revision: https://reviews.llvm.org/D66243

llvm-svn: 369069
2019-08-15 23:35:53 +00:00
Jonas Devlieghere 6d6babf745 [Support] Re-introduce the RWMutexImpl for macOS < 10.12
In r369018, Benjamin replaced the custom RWMutex implementation with
their C++14 counterpart. Unfortunately, std::shared_timed_mutex is only
available on macOS 10.12 and later. This prevents LLVM from compiling
even on newer versions of the OS when you have an older deployment
target. This patch reintroduced the old RWMutexImpl but guards it by the
macOS availability macro.

Differential revision: https://reviews.llvm.org/D66313

llvm-svn: 369064
2019-08-15 23:07:20 +00:00
Evgeniy Stepanov 75344955fc Move isPointerOffset function to ValueTracking (NFC).
Summary: To be reused in MemTag sanitizer.

Reviewers: pcc, vitalybuka, ostannard

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66165

llvm-svn: 369062
2019-08-15 22:58:28 +00:00
Philip Reames 5c38ca3534 [SDAG] Minor code cleanup/standardization of atomic accessors [NFC]
llvm-svn: 369057
2019-08-15 22:21:14 +00:00
Evgeniy Stepanov 10ce5f88d1 Add missing MIR serialization text for AArch64II::MO_TAGGED.
Reviewers: pcc

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66312

llvm-svn: 369053
2019-08-15 22:03:55 +00:00
Alina Sbirlea 79ff20428e [MemorySSA] Remove restrictive asserts.
The verification I added has overly restrictive asserts.
Unreachable blocks can have any incoming value in practice, after an
update due to a "replaceAllUses" call when the repalced entry is
LiveOnEntry.

llvm-svn: 369050
2019-08-15 21:20:08 +00:00
Daniel Sanders 0c47611131 Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor

Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&

Depends on D65919

Reviewers: arsenm, bogner, craig.topper, RKSimon

Reviewed By: arsenm

Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65962

llvm-svn: 369041
2019-08-15 19:22:08 +00:00
Krzysztof Parzyszek 8e987702b1 [Hexagon] Fix instruction selection for vselect v4i8
llvm-svn: 369040
2019-08-15 19:20:09 +00:00
Matt Arsenault 1f2b727298 MVT: Add v3i16/v3f16 vectors
AMDGPU has some buffer intrinsics which theoretically could use
this. Some of the generated tables include the 3 and 4 element vector
versions of these rounded to 64-bits, which is ambiguous. Add these to
help the table disambiguate these.

Assertion change is for the path odd sized vectors now take for R600.
v3i16 is widened to v4i16, which then needs to be promoted to v4i32.

llvm-svn: 369038
2019-08-15 18:58:25 +00:00
Philip Reames d202899431 [NFC] Add a couple of dump routines for RegisterPressure helper classes
llvm-svn: 369037
2019-08-15 18:49:39 +00:00
Florian Hahn 3f2850bc60 [ValueTracking] Look through ptrmask intrinsics during getUnderlyingObject.
Reviewers: nlopes, efriedma, hfinkel, sanjoy, aqjune, jdoerfert

Reviewed By: jdoerfert

Subscribers: jdoerfert, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61669

llvm-svn: 369036
2019-08-15 18:39:56 +00:00
Craig Topper 2a372ba534 [X86] Add custom type legalization for bitcasting mmx to v2i32/v4i16/v8i8 to use movq2dq instead of going through memory.
llvm-svn: 369031
2019-08-15 18:23:37 +00:00
Benjamin Kramer 2e62396c2f Link libpthread into LLVMCore.so
After r369018 the compiler can inline pthread calls into users of
RWMutex.

llvm-svn: 369029
2019-08-15 18:06:30 +00:00
Pavel Labath 11d9e46f8e Revert "MemoryBuffer: Add a missing error-check to getOpenFileImpl"
This reverts commit r368977 because it broke a couple of tests in lldb.

llvm-svn: 369027
2019-08-15 17:52:40 +00:00
Jeremy Morse c476124bc8 [DebugInfo] Avoid crash from dropped fragments in LiveDebugValues
This patch avoids a crash caused by DW_OP_LLVM_fragments being dropped
from DIExpressions by LiveDebugValues spill-restore code. The appearance
of a previously unseen fragment configuration confuses LDV, as documented
in PR42773, and reproduced by the test function this patch adds (Crashes
on a x86_64 debug build).

To avoid this, on spill restore, we now use fragment information from the
spilt-location-expression.

In addition, when spilling, we now don't spill any DBG_VALUE with a complex
expression, as it can't be safely restored and will definitely lead to an
incorrect variable location. The discussion of this is in D65368.

Differential Revision: https://reviews.llvm.org/D66284

llvm-svn: 369026
2019-08-15 17:49:46 +00:00
Mark Lacey 626ed22fbe [CallGraph] Refine call graph for indirect calls with !callees metadata
For indirect call sites having a small set of possible callees,
!callees metadata can be used to indicate what those callees are.
This patch updates the call graph and lazy call graph analyses so
that they consider this metadata when encountering call sites. For
the call graph, it adds a new external call graph node to the graph
for each unique !callees metadata node. A call graph edge connects
an indirect call site with the external node associated with the
!callees metadata that is attached to it. And there is an edge from
this external node to each of the callees indicated by the metadata.
Similarly, for the lazy call graph, the patch adds Ref edges from a
caller to the possible callees indicated by the metadata.

The primary purpose of the patch is to facilitate iterating over the
functions in a module such that all of the callees indicated by a
given !callees metadata node will be visited prior to the functions
containing call sites annotated by that node. This property is
required by optimizations performing a bottom-up traversal of the
SCC DAG. For example, the inliner can be made to inline through an
indirect call. If the call site is annotated with !callees metadata,
this patch ensures that the inliner will have visited all of the
callees prior to the caller, allowing it to reliably compute the
cost of inlining one or more of the potential callees.

Original patch by @mssimpso. I've made some small changes to get it
to apply, build, and pass tests on the top of tree, as well as
some minor tweaks to formatting and functionality.

Subscribers: mehdi_amini, hiraditya, llvm-commits, mssimpso

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D39339

llvm-svn: 369025
2019-08-15 17:47:53 +00:00
Taewook Oh 213d8a9f13 [NewPM][PassInstrumentation] IR printing support for (Thin)LTO
Summary: IR printing has not been correctly supported with (Thin)LTO if the new pass manager is enabled. Previously we only get outputs from backend(codegen) passes, as they are still under legacy pass manager even when the new pass manager is enabled. This patch addresses the issue and enables IR printing for optimization passes with new pass manager + (Thin)LTO setting.

Reviewers: fedor.sergeev, philip.pfaffe

Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, dang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66253

llvm-svn: 369024
2019-08-15 17:47:44 +00:00
Craig Topper 6eebd2bcd7 [X86] Improve cost model for subvector extraction of less than 128-bit vectors
Now that we're using widening legalization. We need to improve our extract_subvector cost model for these types. This patch begins by modeling these as a subvector extract followed by a permute. I've left FIXMEs in the code for future improvements.

Differential Revision: https://reviews.llvm.org/D65892

llvm-svn: 369022
2019-08-15 17:29:42 +00:00
Benjamin Kramer 8d3a1523dd [Support] Base RWMutex on std::shared_timed_mutex (C++14)
This should have the same semantics. We use std::shared_mutex instead on
MSVC and C++17, std::shared_timed_mutex is less efficient than our
custom implementation on Windows, std::shared_mutex should be faster.

llvm-svn: 369018
2019-08-15 16:55:23 +00:00
Krzysztof Parzyszek 8460301d58 [Hexagon] Generate vector min/max for HVX
llvm-svn: 369014
2019-08-15 16:13:17 +00:00
Jonas Devlieghere 0eaee545ee [llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

llvm-svn: 369013
2019-08-15 15:54:37 +00:00
Andrea Di Biagio 3de2f0330f [MCA] Slightly refactor class RetireControlUnit, and add the ability to override the mask of used buffered resources in class mca::Instruction. NFCI
This patch teaches the RCU how to peek 'next' RCUTokens. A new method has been
added to the RetireControlUnit class with the goal of minimizing the complexity
of follow-up patches that will enable macro-fusion support in mca.

This patch also adds method Instruction::getNumMicroOpcodes() to simplify common
interactions with the instruction descriptor (a pattern quite common in some
pipeline stages).

Added the ability to override the default set of consumed scheduler resources
(this -again- is to simplify future patches that add support for macro-op fusion).

No functional change intended.

llvm-svn: 369010
2019-08-15 15:27:40 +00:00
Simon Pilgrim d4df81f463 Remove SmallBitVector.h include. NFCI.
SmallBitVector/BitVector types aren't used at all in the cpp file.

llvm-svn: 369008
2019-08-15 14:40:37 +00:00
Simon Pilgrim 983e9118a2 Remove BitVector.h include. NFCI.
BitVector type isn't used at all in the cpp file.

llvm-svn: 369007
2019-08-15 14:39:28 +00:00
Jinsong Ji 9fd81dc139 [PowerPC] Use xxleqv to set all one vector IMM(-1).
Summary:
xxspltib/vspltisb are 3 cycle PM instructions,
xxleqv is 2 cycle ALU instruction.

We should use xxleqv to set all one vectors.

Reviewers: hfinkel, nemanjai, steven.zhang

Subscribers: hiraditya, kbarton, MaskRay, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65529

llvm-svn: 369006
2019-08-15 14:32:51 +00:00
Simon Pilgrim ed804dad1e [DAGCombine] MergeConsecutiveStores - fix cppcheck/MSVC extension warning. NFCI.
Set the StartIdx type to size_t so that it matches the StoreNodes SmallVector size() and index types.

Silences the MSVC analyzer warning that unsigned increment might overflow before exceeding size_t on 64-bit targets - this isn't likely to happen but it means we use consistent types and reduces the warning "noise" a little.

llvm-svn: 368998
2019-08-15 13:07:14 +00:00
Kang Zhang 2a903c0b67 [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks
Summary:

This patch has trigger a bug of r368339, and the r368339 has been reverted, So upstream this patch again.

In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun.
But the `early-ret` pass is before `block-placement`, we don't want to run it again.
This patch is to do the simple early return to optimize the blocks at the last of `block-placement`.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D63972

llvm-svn: 368997
2019-08-15 13:05:16 +00:00
David Green 3a99101812 [ARM] Fix alignment checks for BE VLDRH
We need to allow any alignment at least 2, not just exactly 2, so that the big
endian loads and stores can be selected successfully. I've also added extra BE
testing for the load and store tests.

Thanks to Oliver for the report.

Differential Revision: https://reviews.llvm.org/D66222

llvm-svn: 368996
2019-08-15 12:54:47 +00:00
Sanjay Patel 57d459309d [SDAG][x86] check for relaxed math when matching an FP reduction
If the last step in an FP add reduction allows reassociation and doesn't care
about -0.0, then we are free to recognize that computation as a reduction
that may reorder the intermediate steps.

This is requested directly by PR42705:
https://bugs.llvm.org/show_bug.cgi?id=42705
and solves PR42947 (if horizontal math instructions are actually faster than
the alternative):
https://bugs.llvm.org/show_bug.cgi?id=42947

Differential Revision: https://reviews.llvm.org/D66236

llvm-svn: 368995
2019-08-15 12:43:15 +00:00
Andrea Di Biagio 7aa0dbb664 [MCA] Slightly refactor the logic in ResourceManager. NFCI
This patch slightly changes the API in the attempt to simplify resource buffer
queries. It is done in preparation for a patch that will enable support for
macro fusion.

llvm-svn: 368994
2019-08-15 12:39:55 +00:00
Florian Hahn fd72bf21c9 [ValueTracking] Add MustPreserveNullness arg to functions analyzing calls. (NFC)
Some uses of getArgumentAliasingToReturnedPointer and
isIntrinsicReturningPointerAliasingArgumentWithoutCapturing require the
calls/intrinsics to preserve the nullness of the argument.

For alias analysis, the nullness property does not really come into
play.

This patch explicitly sets it to true. In D61669, the alias analysis
uses will be switched to not require preserving nullness.

Reviewers: nlopes, efriedma, hfinkel, sanjoy, aqjune, jdoerfert

Reviewed By: jdoerfert

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64150

llvm-svn: 368993
2019-08-15 12:13:02 +00:00
David Green 0ff2296a49 [ARM] MVE predicate store patterns
Stack loads and stores were already working, but direct stores were not. This
adds the patterns for them, same as predicate loads.

Differential Revision: https://reviews.llvm.org/D66213

llvm-svn: 368988
2019-08-15 10:41:42 +00:00
Sander de Smalen 643adb5576 [AArch64] Change location of frame-record within callee-save area.
This patch changes the location of the frame-record (FP, LR) to the 
bottom of the callee-saved area. According to the AAPCS the location of
the frame-record within the stackframe is unspecified (section 5.2.3 The 
Frame Pointer), so the compiler should be free to choose a different
location.

The reason for changing the location of the frame-record is to prepare
the frame for allocating an SVE area below the callee-saves. This way the 
compiler can use the VL-scaled addressing modes to directly access SVE 
objects from the frame-pointer.

            :                :   
        | stack |        | stack |
        |  args |        |  args |
        +-------+        +-------+
        |  x30  |        |  x19  |
        |  x29  |        |  x20  |
  FP -> |- - - -|        |  x21  |
        |  x19  |   ==>  |  x22  |
        |  x20  |        |- - - -|
        |  x21  |        |  x30  |
        |  x22  |        |  x29  |
        +-------+        +-------+ <- FP
        |///////|        |///////|         // realignment gap 
        |- - - -|        |- - - -|
        |spills/|        |spills/|
        | locals|        | locals|
  SP -> +-------+        +-------+ <- SP

Things to point out:
- The algorithm to find a paired register should be prevented from
  accidentally pairing some callee-saved register with LR that is not 
  FP, since they should always be paired together when the frame
  has a frame-record.
- For Darwin platforms the location of the frame-record is unchanged,
  since the unwind encoding does not allow for encoding this position
  dynamically and other tools currently depend on the former layout. 

Reviewers: efriedma, rovka, rengolin, thegameg, greened, t.p.northover

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D65653

llvm-svn: 368987
2019-08-15 10:34:16 +00:00
Florian Hahn de1d6c8220 Add ptrmask intrinsic
This patch adds a ptrmask intrinsic which allows masking out bits of a
pointer that must be zero when accessing it, because of ABI alignment
requirements or a restriction of the meaningful bits of a pointer
through the data layout.

This avoids doing a ptrtoint/inttoptr round trip in some cases (e.g. tagged
pointers) and allows us to not lose information about the underlying
object.

Reviewers: nlopes, efriedma, hfinkel, sanjoy, jdoerfert, aqjune

Reviewed by: sanjoy, jdoerfert

Differential Revision: https://reviews.llvm.org/D59065

llvm-svn: 368986
2019-08-15 10:12:26 +00:00
Sven van Haastregt 0096d1938e [Support] Fix Wundef warning
llvm-svn: 368984
2019-08-15 10:05:22 +00:00
David Green 04f2f32869 [ARM] MVE trunc to i1 vectors
This adds patterns for selecting trunc instructions from full vectors to i1's
vectors.

Differential Revision: https://reviews.llvm.org/D66201

llvm-svn: 368981
2019-08-15 09:26:51 +00:00
Pavel Labath 46bfdb956c MemoryBuffer: Add a missing error-check to getOpenFileImpl
Summary:
In case the function was called with a desired read size *and* the file
was not an "mmap()" candidate, the function was falling back to a
"pread()", but it was failing to check the result of that system call.
This meant that the function would return "success" even though the read
operation failed, and it returned a buffer full of uninitialized memory.

Reviewers: rnk, dblaikie

Subscribers: kristina, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66224

llvm-svn: 368977
2019-08-15 08:20:15 +00:00
Dorit Nuzman d57d73daed [LV] fold-tail predication should be respected even with assume_safety
assume_safety implies that loads under "if's" can be safely executed
speculatively (unguarded, unmasked). However this assumption holds only for the
original user "if's", not those introduced by the compiler, such as the
fold-tail "if" that guards us from loading beyond the original loop trip-count.
Currently the combination of fold-tail and assume-safety pragmas results in
ignoring the fold-tail predicate that guards the loads, generating unmasked
loads. This patch fixes this behavior.

Differential Revision: https://reviews.llvm.org/D66106

Reviewers: Ayal, hsaito, fhahn
llvm-svn: 368973
2019-08-15 07:12:14 +00:00
Craig Topper 1e246b20c0 [X86] Add isel pattern to match VZEXT_MOVL and a v2i64 scalar_to_vector bitcasted from x86mmx to MOVQ2DQ.
We already had the pattern for just the scalar to vector and bitcast,
but not the case where we wanted zeroes in the high half of the xmm.

llvm-svn: 368972
2019-08-15 06:46:30 +00:00
Craig Topper e6409602a1 [X86] Make sure load is non-volatile in the MMX_X86movdq2q (loadv2i64) isel pattern.
This pattern will narrow the load so we should make sure its not
volatile.

llvm-svn: 368971
2019-08-15 06:46:26 +00:00
Craig Topper dbcbbf5658 [X86] Remove unneeded isel pattern for v4f32->v4i32 fp_to_sint and conversion to MMX.
fp_to_sint is turned into X86cvttp2si during isel preprocessing.
The other redundant isel patterns were removed previously, but I
missed this one because its in the MMX td file.

llvm-svn: 368968
2019-08-15 05:52:02 +00:00
Craig Topper a57734ba4e [X86] Disable custom type legalization for v2i32/v4i16/v8i8->i64.
The default legalization can take care of this.

llvm-svn: 368967
2019-08-15 05:51:58 +00:00
Craig Topper 57286afe4e [X86] Disable custom type legalization for v2i32/v4i16/v8i8->f64 bitcast.
The generic legalization handles this in the same way so just use
that.

llvm-svn: 368966
2019-08-15 05:51:54 +00:00
Craig Topper ba39fcd8c6 [X86] Remove some unreachable code from LowerBITCAST.
llvm-svn: 368965
2019-08-15 05:51:50 +00:00
Michael Pozulp 9abf668c08 [llvm-objdump] Add warning messages if disassembly + source for problematic inputs
Summary: Addresses https://bugs.llvm.org/show_bug.cgi?id=41905

Reviewers: jhenderson, rupprecht, grimar

Reviewed By: jhenderson, grimar

Subscribers: RKSimon, MaskRay, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62462

llvm-svn: 368963
2019-08-15 05:15:22 +00:00
Craig Topper 14f7560020 [X86] Remove some dead code and combine some repeated code that's left.
If the width is 256 bits, then we must have AVX so the else here
was unnecessary. Once that's removed then the >= 256 bit code is
identical to the 128 bit code with a different VT so combine them.

llvm-svn: 368956
2019-08-15 04:07:43 +00:00
Jonas Devlieghere ed3b6d1bb2 Revert "Expose TailCallKind via the LLVM C API"
This is failing on several build bots. Reverting as discussed in
https://reviews.llvm.org/D66061.

llvm-svn: 368953
2019-08-15 03:49:51 +00:00
Gor Nishanov efe0093404 [coroutine] Fixes "cannot move instruction since its users are not dominated by CoroBegin" problem.
Summary:
Fixes https://bugs.llvm.org/show_bug.cgi?id=36578 and https://bugs.llvm.org/show_bug.cgi?id=36296.
Supersedes: https://reviews.llvm.org/D55966

One of the fundamental transformation that CoroSplit pass performs before splitting the coroutine is to find which values need to survive between suspend and resume and provide a slot for them in the coroutine frame to spill and restore the value as needed.

Coroutine frame becomes available once the storage for it was allocated and that point is marked in the pre-split coroutine with a llvm.coro.begin intrinsic.

FE normally puts all of the user-authored code that would be accessing those values after llvm.coro.begin, however, sometimes instructions accessing those values would end up prior to coro.begin. For example, writing out a value of the parameter into the alloca done by the FE or instructions that are added by the optimization passes such as SROA when it rewrites allocas.

Prior to this change, CoroSplit pass would try to move instructions that may end up accessing the values in the coroutine frame after CoroBegin. However it would run into problems (report_fatal_error) if some of the values would be used both in the allocation function (for example allocator is passed as a parameter to a coroutine) and in the use-authored body of the coroutine.

To handle this case and to simplify the instruction moving logic, this change removes all of the instruction moving. Instead, we only change the uses of the spilled values that are dominated by coro.begin and leave other instructions intact.

Before:

```
%var = alloca i32
%1 = getelementptr .. %var; ; will move this one after coro.begin
%f = call i8* @llvm.coro.begin(
```

After:

```
%var = alloca i32
%1 = getelementptr .. %var; stays put
%f = call i8* @llvm.coro.begin(
```
If we discover that there is a potential write into an alloca, prior to coro.begin we would copy its value from the alloca into the spill slot in the coroutine frame.

Before:

```
%var = alloca i32
store .. %var ; will move this one after coro.begin
%f = call i8* @llvm.coro.begin(
```

After:

```
%var = alloca i32
store .. %var ;stays put
%f = call i8* @llvm.coro.begin(
%tmp = load %var
store %tmp, %spill.slot.for.var
```

Note: This change does not handle array allocas as that is something that C++ FE does not produce, but, it can be added in the future if need arises

Reviewers: llvm-commits, modocache, ben-clayton, tks2103, rjmccall

Reviewed By: modocache

Subscribers: bartdesmet

Differential Revision: https://reviews.llvm.org/D66230

llvm-svn: 368949
2019-08-15 00:48:51 +00:00
Robert Widmann 708c4605a1 Expose TailCallKind via the LLVM C API
Summary: This exposes `CallInst`'s tail call kind via new `LLVMGetTailCallKind` and `LLVMSetTailCallKind` functions. The motivation for this is to be able to see `musttail` for languages that require mandatory tail calls for correctness. Today only the weaker `LLVMSetTail` is exposed and there is no way to set `GuaranteedTailCallOpt` via the C API.

Reviewers: CodaFi, jyknight, deadalnix, rnk

Reviewed By: CodaFi

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66061

llvm-svn: 368945
2019-08-14 23:54:35 +00:00
Johannes Doerfert 54f6be7b83 [Attributor] Try to fix "missing field 'RetInsts' initializer" warning
http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/35674/steps/build_Lld/logs/stdio

llvm-svn: 368938
2019-08-14 22:32:29 +00:00
Johannes Doerfert 5304b72a81 [Attributor][NFC] Make debug output consistent
llvm-svn: 368931
2019-08-14 22:04:28 +00:00
Philip Reames 7b0515176b [SCEV] Rename getMaxBackedgeTakenCount to getConstantMaxBackedgeTakenCount [NFC]
llvm-svn: 368930
2019-08-14 21:58:13 +00:00
Johannes Doerfert 4395b31d99 [Attributor][NFC] Try to eliminate warnings (debug build + fall through)
llvm-svn: 368928
2019-08-14 21:46:28 +00:00
Johannes Doerfert 17b578bc75 [Attributor][NFC] Introduce statistics macros for new positions
llvm-svn: 368927
2019-08-14 21:46:25 +00:00
Craig Topper e7ea06b7d2 [SelectionDAGBuilder] Teach gather/scatter getUniformBase to look through vector zeroinitializer indices in addition to scalar zeroes.
llvm-svn: 368926
2019-08-14 21:38:56 +00:00
Johannes Doerfert e1e844d6b0 [Attributor][NFC] Add merge/join/clamp operators to the IntegerState
Differential Revision: https://reviews.llvm.org/D66146

llvm-svn: 368925
2019-08-14 21:35:20 +00:00
Johannes Doerfert 6a1274a52e [Attributor] Use the AANoNull attribute directly in AADereferenceable
Summary:
Instead of constantly keeping track of the nonnull status with the
dereferenceable information we can simply query the nonnull attribute
whenever we need the information (debug + manifest).

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66113

llvm-svn: 368924
2019-08-14 21:31:32 +00:00
Amara Emerson 1222cfd5fe [AArch64][GlobalISel] Custom selection for s8 load acquire.
Implement this single atomic load instruction so that we can compile stack
protector code.

Differential Revision: https://reviews.llvm.org/D66245

llvm-svn: 368923
2019-08-14 21:30:30 +00:00
Johannes Doerfert def9928204 [Attributor] Use liveness during the creation of AAReturnedValues
Summary:
As one of the first attributes, and one of the complex ones,
AAReturnedValues was not using liveness but we filtered the result after
the fact. This change adds liveness usage during the creation. The
algorithm is also improved and shorter.

The new algorithm will collect returned values over time using the
generic facilities that work with liveness already, e.g.,
genericValueTraversal which does not look at dead PHI node predecessors.
A test to show how this leads to better results is included.

Note: Unresolved calls and resolved calls are now tracked explicitly.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66120

llvm-svn: 368922
2019-08-14 21:29:37 +00:00
Johannes Doerfert 9a1a1f96d9 [Attributor] Do not update or manifest dead attributes
Summary:
If the associated context instruction is assumed dead we do not need to
update or manifest the state.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66116

llvm-svn: 368921
2019-08-14 21:25:08 +00:00
Johannes Doerfert 710ebb03ed [Attributor] Use IRPosition consistently
Summary:
The next attempt to clean up the Attributor interface before we grow it
further.

Before, we used a combination of two values (associated + anchor) and an
argument number (or -1) to determine a location. This was very fragile.
The new system uses exclusively IR positions and we restrict the
generation of IR positions to special constructor methods that verify
internal constraints we have. This will catch misuse early.

The auto-conversion, e.g., in getAAFor, is now performed through the
SubsumingPositionIterator. This iterator takes an IR position and allows
to visit all IR positions that "subsume" the given one, e.g., function
attributes "subsume" argument attributes of that function. For a
detailed breakdown see the class comment of SubsumingPositionIterator.

This patch also introduces the IRPosition::getAttrs() to extract IR
attributes at a certain position. The method knows how to look up in
different positions that are equivalent, e.g., the argument position for
call site arguments. We also introduce three new positions kinds such
that we have all IR positions where attributes can be placed and one for
"floating" values.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65977

llvm-svn: 368919
2019-08-14 21:18:01 +00:00
Dinar Temirbulatov da0435a690 [SLP][NFC] Use pointers to address to ScalarToTreeEntry elements, instead of indexes.
llvm-svn: 368906
2019-08-14 19:46:50 +00:00
Sanjay Patel ecccf29e6c [SDAG] move variable closer to use; NFC
llvm-svn: 368905
2019-08-14 19:46:15 +00:00
Jan Korous 14230f9926 [Support][NFC] Fix error message for posix_spawn_file_actions_addopen failed call
Seems like a copy-paste from couple lines above.

llvm-svn: 368899
2019-08-14 18:30:18 +00:00
Philip Reames 6cca3ad43e [RLEV] Rewrite loop exit values for multiple exit loops w/o overall loop exit count
We already supported rewriting loop exit values for multiple exit loops, but if any of the loop exits were not computable, we gave up on all loop exit values. This patch generalizes the existing code to handle individual computable loop exits where possible.

As discussed in the review, this is a starting point for figuring out a better API.  The code is a bit ugly, but getting it in lets us test as we go.  

Differential Revision: https://reviews.llvm.org/D65544

llvm-svn: 368898
2019-08-14 18:27:57 +00:00
Matt Arsenault dbc1f207fa InferAddressSpaces: Move target intrinsic handling to TTI
I'm planning on handling intrinsics that will benefit from checking
the address space enums. Don't bother moving the address collection
for now, since those won't need th enums.

llvm-svn: 368895
2019-08-14 18:13:00 +00:00
Matt Arsenault 0eac2a2963 InferAddressSpaces: Remove unnecessary check for ConstantInt
The IR is invalid if this isn't a constant since immarg was added.

llvm-svn: 368893
2019-08-14 18:01:42 +00:00
Taewook Oh df7022825c [DebugInfo] Consider debug label scope has an extra lexical block file
Summary: There are places where a case that debug label scope has an extra lexical block file is not considered properly. The modified test won't pass without this patch.

Reviewers: aprantl, HsiangKai

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66187

llvm-svn: 368891
2019-08-14 17:58:45 +00:00
David Bolvansky f94460d4b6 [SLC] Dereferenceable annonation - handle valid null pointers
Reviewers: jdoerfert, reames

Reviewed By: jdoerfert

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66161

llvm-svn: 368884
2019-08-14 17:15:20 +00:00
Jonas Devlieghere c0a9b1edca [DebugLine] Improve path handling.
After switching over LLDB's line table parser to libDebugInfo, we
noticed two regressions on the Windows bot. The problem is that when
obtaining a file from the line table prologue, we append paths without
specifying a path style. This leads to incorrect results on Windows for
debug info containing Posix paths:

  0x0000000000201000: /tmp\b.c, is_start_of_statement = TRUE

This patch is an attempt to fix that by guessing the path style whenever
possible.

Differential revision: https://reviews.llvm.org/D66227

llvm-svn: 368879
2019-08-14 17:00:10 +00:00
David Bolvansky 0e0fbae1a4 [BuildLibCalls] Noalias annotation
Summary: I think this is better solution than annotating callsites in IC/SLC.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66217

llvm-svn: 368875
2019-08-14 16:50:06 +00:00
Bill Wendling cc2bebe039 Ignore indirect branches from callbr.
Summary:
We can't speculate around indirect branches: indirectbr and invoke. The
callbr instruction needs to be included here.

Reviewers: nickdesaulniers, manojgupta, chandlerc

Reviewed By: chandlerc

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66200

llvm-svn: 368873
2019-08-14 16:44:07 +00:00
Thomas Lively de0133eaa2 [WebAssembly] Stop unrolling SIMD shifts since they are fixed in V8
Summary:
Fixes PR42973. Tests don't change because simd-arith.ll tests behavior
on unimplemented-simd128, which does not include any temporary
workarounds such as the one removed in this revision.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66166

llvm-svn: 368868
2019-08-14 16:24:37 +00:00
Craig Topper 3e44d96170 [X86] Use PSADBW for v8i8 addition reductions.
Improves the 8 byte case from PR42674.

Differential Revision: https://reviews.llvm.org/D66069

llvm-svn: 368864
2019-08-14 15:57:29 +00:00
Xiangling Liao 49661f94c8 [NFC][AIX] Change assertion
Address one left comment on https://reviews.llvm.org/D63547. A minor
change for assertion.

Differential Revision: https://reviews.llvm.org/D63547

llvm-svn: 368860
2019-08-14 14:57:25 +00:00
Craig Topper 30d3e9c395 [X86][CostModel] Adjust the costs of ZERO_EXTEND/SIGN_EXTEND with less than 128-bit inputs
Now that we legalize by widening, the element types here won't change. Previously these were modeled as the elements being widened and then the instruction might become an AND or SHL/ASHR pair. But now they'll become something like a ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG.

For AVX2, when the destination type is legal its clear the cost should be 1 since we have extend instructions that can produce 256 bit vectors from less than 128 bit vectors. I'm a little less sure about AVX1 costs, but I think the ones I changed were definitely too high, but they might still be too high.

Differential Revision: https://reviews.llvm.org/D66169

llvm-svn: 368858
2019-08-14 14:52:39 +00:00
Craig Topper 8c545168ee [X86] Add llvm_unreachable to a switch that covers all expected values.
llvm-svn: 368857
2019-08-14 14:51:19 +00:00
Jinsong Ji e71db6584d [PowerPC][NFC] Consolidate duplicate XX3Form_SetZero and XX3Form_Zero.
Rename one to XX3Form_SameOp, remove the other one.

llvm-svn: 368856
2019-08-14 14:16:26 +00:00
Jason Liu 8fc095d453 [AIX] Add call lowering for parameters that could pass onto FPRs
Summary:
This patch adds call lowering functionality to enable passing
parameters onto floating point registers when needed.

Differential Revision: https://reviews.llvm.org/D63654

llvm-svn: 368855
2019-08-14 14:13:11 +00:00
Pavel Labath 0d802a4923 Revert "raw_ostream: add operator<< overload for std::error_code"
This reverts commit r368849, because it breaks some bots (e.g.
llvm-clang-x86_64-win-fast).

It turns out this is not as NFC as we had hoped, because operator== will
consider two std::error_codes to be distinct even though they both hold
"success" values if they have different categories.

llvm-svn: 368854
2019-08-14 13:59:04 +00:00
Pavel Labath 40837e97b1 raw_ostream: add operator<< overload for std::error_code
Summary:
The main motivation for this is unit tests, which contain a large macro
for pretty-printing std::error_code, and this macro is duplicated in
every file that needs to do this. However, the functionality may be
useful elsewhere too.

In this patch I have reimplemented the existing ASSERT_NO_ERROR macros
to reuse the new functionality, but I have kept the macro (as a
one-liner) as it is slightly more readable than ASSERT_EQ(...,
std::error_code()).

Reviewers: sammccall, ilya-biryukov

Subscribers: zturner, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65643

llvm-svn: 368849
2019-08-14 13:33:28 +00:00
Jeremy Morse 90c2794bfc [DebugInfo] MCP: collect and update DBG_VALUEs encountered in local block
MCP currently uses changeDebugValuesDefReg / collectDebugValues to find
debug users of a register, however those functions assume that all
DBG_VALUEs immediately follow the specified instruction, which isn't
necessarily true. This is going to become very often untrue when we turn
off CodeGenPrepare::placeDbgValues.

Instead of calling changeDebugValuesDefReg on an instruction to change its
debug users, in this patch we instead collect DBG_VALUEs of copies as we
iterate over insns, and update the debug users of copies that are made
dead. This isn't a non-functional change, because MCP will now update
DBG_VALUEs that aren't immediately after a copy, but refer to the same
register. I've hijacked the regression test for PR38773 to test for this
new behaviour, an entirely new test seemed overkill.

Differential Revision: https://reviews.llvm.org/D56265

llvm-svn: 368835
2019-08-14 12:20:02 +00:00
Fangrui Song 4c8deb6172 [IR] Simplify removeDeadConstantUsers. NFC
llvm-svn: 368833
2019-08-14 11:38:45 +00:00
Simon Pilgrim 828a89e244 Fix "not all control paths return a value" MSVC warnings. NFCI.
llvm-svn: 368831
2019-08-14 11:31:05 +00:00
Simon Pilgrim 3f40bdb558 Fix "not all control paths return a value" MSVC warning. NFCI.
llvm-svn: 368830
2019-08-14 11:29:56 +00:00
Simon Pilgrim 8bba4798c2 Fix "not all control paths return a value" MSVC warnings. NFCI.
llvm-svn: 368829
2019-08-14 11:29:16 +00:00
George Rimar bcc00e1afb Recommit r368812 "[llvm/Object] - Convert SectionRef::getName() to return Expected<>"
Changes: no changes. A fix for the clang code will be landed right on top.

Original commit message:

SectionRef::getName() returns std::error_code now.
Returning Expected<> instead has multiple benefits.

For example, it forces user to check the error returned.
Also Expected<> may keep a valuable string error message,
what is more useful than having a error code.
(Object\invalid.test was updated to show the new messages printed.)

This patch makes a change for all users to switch to Expected<> version.

Note: in a few places the error returned was ignored before my changes.
In such places I left them ignored. My intention was to convert the interface
used, and not to improve and/or the existent users in this patch.
(Though I think this is good idea for a follow-ups to revisit such places
and either remove consumeError calls or comment each of them to clarify why
it is OK to have them).

Differential revision: https://reviews.llvm.org/D66089

llvm-svn: 368826
2019-08-14 11:10:11 +00:00
Fangrui Song 8caa0aaa4d [AsmPrinter] Delete redundant .type foo, @function when emitting an ifunc
In MCAsmStreamer:

.type foo,@function   # <--- this is redundant
.type foo,@gnu_indirect_function

In MCELFStreamer, the latter STT_GNU_IFUNC overrides STT_FUNC.

llvm-svn: 368823
2019-08-14 10:30:27 +00:00
Roman Lebedev 32f1e1a01d [InstCombine] Refactor getFlippedStrictnessPredicateAndConstant() out of canonicalizeCmpWithConstant(), NFCI
I'd like to use it elsewhere, hopefully without reinventing the wheel.
No functional change intended so far.

llvm-svn: 368820
2019-08-14 09:57:20 +00:00
George Rimar 468919e182 Revert r368812 "[llvm/Object] - Convert SectionRef::getName() to return Expected<>"
It broke clang BB: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/16455

llvm-svn: 368813
2019-08-14 08:56:55 +00:00
George Rimar a0c6a35714 [llvm/Object] - Convert SectionRef::getName() to return Expected<>
SectionRef::getName() returns std::error_code now.
Returning Expected<> instead has multiple benefits.

For example, it forces user to check the error returned.
Also Expected<> may keep a valuable string error message,
what is more useful than having a error code.
(Object\invalid.test was updated to show the new messages printed.)

This patch makes a change for all users to switch to Expected<> version.

Note: in a few places the error returned was ignored before my changes.
In such places I left them ignored. My intention was to convert the interface
used, and not to improve and/or the existent users in this patch.
(Though I think this is good idea for a follow-ups to revisit such places
and either remove consumeError calls or comment each of them to clarify why
it is OK to have them).

Differential revision: https://reviews.llvm.org/D66089

llvm-svn: 368812
2019-08-14 08:46:54 +00:00
Dorit Nuzman 491ca2425d [LV] Fold-tail flag
This is the compiler-flag equivalent of the Predicate pragma
(https://reviews.llvm.org/D65197), to direct the vectorizer to fold the
remainder-loop into the main-loop using predication.

Differential Revision: https://reviews.llvm.org/D66108

Reviewers: Ayal, hsaito, fhahn, SjoerdMeije
llvm-svn: 368801
2019-08-14 05:22:20 +00:00
David L. Jones d4edd9d97e Revert '[LICM] Make Loop ICM profile aware' and 'Fix pass dependency for LICM'
This reverts r368526 (git commit 7e71aa24bc)
This reverts r368542 (git commit cb5a90fd31)

llvm-svn: 368800
2019-08-14 04:50:33 +00:00
John McCall a318c55073 Coroutines: adjust for SVN r358739
CallSite has been removed in favour of CallBase.  Adjust the coroutine split to
account for that.

llvm-svn: 368798
2019-08-14 03:54:25 +00:00
John McCall 3bbf207fbc Don't run a full verifier pass in coro-splitting's private pipeline.
Potentially addresses rdar://49022293.

llvm-svn: 368797
2019-08-14 03:54:18 +00:00
John McCall 5f60b68c68 Remove unreachable blocks before splitting a coroutine.
The suspend-crossing algorithm is not correct in the presence of uses
that cannot be reached on some successor path from their defs.

llvm-svn: 368796
2019-08-14 03:54:13 +00:00
John McCall 2133feec93 Support swifterror in coroutine lowering.
The support for swifterror allocas should work in all lowerings.
The support for swifterror arguments only really works in a lowering
with prototypes where you can ensure that the prototype also has a
swifterror argument; I'm not really sure how it could possibly be
made to work in the switch lowering.

llvm-svn: 368795
2019-08-14 03:54:05 +00:00
John McCall d47801e718 In coro.retcon lowering, don't explode if the optimizer messes around with the linkage of the prototype or the exact types of the yielded values.
llvm-svn: 368793
2019-08-14 03:53:52 +00:00
John McCall ac40483276 Fix a use-after-free in the coro.alloca treatment.
llvm-svn: 368792
2019-08-14 03:53:46 +00:00
John McCall 62a5dde0c2 Add intrinsics for doing frame-bound dynamic allocations within a coroutine.
These rely on having an allocator provided to the coroutine and thus,
for now, only work in retcon lowerings.

llvm-svn: 368791
2019-08-14 03:53:40 +00:00
John McCall 137b50f0c3 Guard dumps in the coro intrinsic validation logic behind NDEBUG checks. dump() is not guaranteed to be defined in all builds.
llvm-svn: 368790
2019-08-14 03:53:31 +00:00
John McCall 3829214185 Generalize llvm.coro.suspend.retcon to allow an arbitrary number of arguments to be passed back to the continuation function.
llvm-svn: 368789
2019-08-14 03:53:26 +00:00
John McCall 94010b2b7f Extend coroutines to support a "returned continuation" lowering.
A quick contrast of this ABI with the currently-implemented ABI:

- Allocation is implicitly managed by the lowering passes, which is fine
  for frontends that are fine with assuming that allocation cannot fail.
  This assumption is necessary to implement dynamic allocas anyway.

- The lowering attempts to fit the coroutine frame into an opaque,
  statically-sized buffer before falling back on allocation; the same
  buffer must be provided to every resume point.  A buffer must be at
  least pointer-sized.

- The resume and destroy functions have been combined; the continuation
  function takes a parameter indicating whether it has succeeded.

- Conversely, every suspend point begins its own continuation function.

- The continuation function pointer is directly returned to the caller
  instead of being stored in the frame.  The continuation can therefore
  directly destroy the frame when exiting the coroutine instead of having
  to leave it in a defunct state.

- Other values can be returned directly to the caller instead of going
  through a promise allocation.  The frontend provides a "prototype"
  function declaration from which the type, calling convention, and
  attributes of the continuation functions are taken.

- On the caller side, the frontend can generate natural IR that directly
  uses the continuation functions as long as it prevents IPO with the
  coroutine until lowering has happened.  In combination with the point
  above, the frontend is almost totally in charge of the ABI of the
  coroutine.

- Unique-yield coroutines are given some special treatment.

llvm-svn: 368788
2019-08-14 03:53:17 +00:00
Aditya Nandakumar c65ac865c3 [GlobalISel]: Fix lowering of G_Shuffle_vector where we pick up the wrong source index
https://reviews.llvm.org/D66182

llvm-svn: 368781
2019-08-14 01:23:33 +00:00
Amara Emerson 2a312fc989 [AArch64][GlobalISel] RBS: Treat s128s like vectors when unmerging.
The destinations should be FPRs (for now).

Differential Revision: https://reviews.llvm.org/D66184

llvm-svn: 368775
2019-08-13 23:51:20 +00:00
Eli Friedman b5eb3e1e82 [AArch64] Remove incorrect usage of MONonTemporal.
This has no effect at the moment, but might matter if we try to
implement non-temporal loads in the future.

llvm-svn: 368770
2019-08-13 23:12:14 +00:00
Aditya Nandakumar 615eee6402 [GlobalISel]: Fix lowering of G_SHUFFLE_VECTOR with scalar sources
https://reviews.llvm.org/D66171

llvm-svn: 368753
2019-08-13 21:49:11 +00:00
Xiangling Liao a8c624a1c4 [AIX]Lowering global address for 32/64bit small/large code models
This patch implements global address lowering for 32/64 bit with small/large code models.
    1.For 32bit large code model on AIX, there are newly added pseudo opcode LWZtocL & ADDIStocHA32, the support of which on MC layer will be
       provided by future patches.
    2.The default code model on AIX should be small code model.
    3.Since AIX does not have medium code model, "report_fatal_error" when users specify it.

    Differential Revision: https://reviews.llvm.org/D63547

llvm-svn: 368744
2019-08-13 20:29:01 +00:00
Tim Renouf 10db641aab [AMDGPU] Fix to 'Fold readlane from copy of SGPR or imm'
That change (r363670) could leave a copy from vgpr to sgpr. Fixed.

Differential Revision: https://reviews.llvm.org/D66133

Change-Id: I00c3fe6fda2e8e1e36f53195b881b1449c777ea4
llvm-svn: 368736
2019-08-13 18:57:55 +00:00
David Green a655393f17 [ARM] Add MVE beats vector cost model
The MVE architecture has the idea of "beats", where a vector instruction can be
executed over several ticks of the architecture. This adds a similar system
into the Arm backend cost model, multiplying the cost of all vector
instructions by a factor.

This factor essentially becomes the expected difference between scalar code
and vector code, on average. MVE Vector instructions can also overlap so the a
true cost of them is often lower. But equally scalar instructions can in some
situations be dual issued, or have other optimisations such as unrolling or
make use of dsp instructions. The default is chosen as 2. This should not
prevent vectorisation is a most cases (as the vector instructions will still be
doing at least 4 times the work), but it will help prevent over vectorising in
cases where the benefits are less likely.

This adds things so far to the obvious places in ARMTargetTransformInfo, and
updates a few related costs like not treating float instructions as cost 2 just
because they are floats.

Differential Revision: https://reviews.llvm.org/D66005

llvm-svn: 368733
2019-08-13 18:12:08 +00:00
Wenlei He d328954467 [llvm-profdata] Profile dump for compact binary format
Summary: Fix "llvm-profdata show" so it can work with compact binary format profile. The change is to mark all functions "used" so SampleProfileReaderCompactBinary::read will read in all profiles available for dumping. The function names will be MD5 hash for compact binary format.

Reviewers: wmi, davidxl, danielcdh

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65162

llvm-svn: 368731
2019-08-13 17:56:08 +00:00
Steven Wu 9e51fb6c57 [AutoUpgrader] Make ArcRuntime Autoupgrader more conservative
Summary:
This is a tweak to r368311 and r368646 which auto upgrades the calls to
objc runtime functions to objc runtime intrinsics, in order to make sure
that the auto upgrader does not trigger with up-to-date bitcode.

It is possible for bitcode that is up-to-date to contain direct calls to
objc runtime function and those are not inserted by compiler as part of
ARC and they should not be upgraded. Now auto upgrader only triggers as
when the old style of ARC marker is used so it is guaranteed that it
won't trigger on update-to-date bitcode.

This also means it won't do this upgrade for bitcode from llvm-8 and
llvm-9, which preserves the behavior of those releases. Ideally they
should be upgraded as well but it is more important to make sure
AutoUpgrader will not trigger on up-to-date bitcode.

Reviewers: ahatanak, rjmccall, dexonsmith, pete

Reviewed By: dexonsmith

Subscribers: hiraditya, jkorous, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66153

llvm-svn: 368730
2019-08-13 17:52:21 +00:00
Heejin Ahn 64517a6419 Use Register over unsigned in LateEHPrepare (NFC)
Summary:
While D65962 is pending for review, I landed D65475 that added one more
use of `unsigned`. Changed it to `Register`.

Reviewers: dsanders

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66064

llvm-svn: 368727
2019-08-13 17:35:44 +00:00
David Bolvansky 038d604f4f [SimplifyLibCalls] Add noalias from known callsites
Summary:
Should be fine for memcpy, strcpy, strncpy.


Reviewers: jdoerfert, efriedma

Reviewed By: jdoerfert

Subscribers: uenoku, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66135

llvm-svn: 368724
2019-08-13 17:18:46 +00:00
Nikita Popov 2a4f26b4c2 [ValueTracking] Improve reverse assumption inference
Use isGuaranteedToTransferExecutionToSuccessor() instead of
isSafeToSpeculativelyExecute() when seeing whether we can propagate
the information in an assume backwards in isValidAssumeForContext().
The latter is more general - it also allows arbitrary loads/stores -
and is also the condition we want: if our assume is guaranteed to
execute, its condition not holding would be UB.

Original patch by arielb1.

Differential Revision: https://reviews.llvm.org/D37215

llvm-svn: 368723
2019-08-13 17:15:42 +00:00
Hubert Tong 0996705009 Reland r368691: "[AIX] Implement LR prolog/epilog save/restore"
Trying again with the code changes (and not just the new test).

Summary:
This patch fixes the offsets of fields in the stack frame linkage save
area for AIX.

Reviewers: sfertile, hubert.reinterpretcast, jasonliu, Xiangling_L, xingxue, ZarkoCA, daltenty

Reviewed By: hubert.reinterpretcast

Subscribers: wuzish, nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64424

Patch by Chris Bowler!

llvm-svn: 368721
2019-08-13 17:05:53 +00:00
David Tenty 9bf01e53a3 [NFC][AIX] Use assert instead of llvm_unreachable
Addresses post-commit comments on https://reviews.llvm.org/D64825. Use
assert instead of llvm_unreachable to check if invalid csect types are being
generated. Use report_fatal_error on unimplemented XCOFF features.

Differential Revision: https://reviews.llvm.org/D64825

llvm-svn: 368720
2019-08-13 17:04:51 +00:00
Jonas Devlieghere 57ae300562 [Dwarf] Complete the list of type tags.
An incorrect verification error revealed that the list of type tags was
incomplete. This patch adds the missing types by adding a tag kind to
the Dwarf.def file, which is used by the `isType` function.

A test was added for the original verification error.

Differential revision: https://reviews.llvm.org/D65914

llvm-svn: 368718
2019-08-13 17:00:54 +00:00
David Bolvansky 90a30fdcc3 [SLC] Improve dereferenceable bytes annotation
llvm-svn: 368715
2019-08-13 16:44:16 +00:00
Matt Arsenault 28215caa60 GlobalISel: Partially implement fewerElementsVector G_UNMERGE_VALUES
Odd sized vectors aren't handled yet.

llvm-svn: 368713
2019-08-13 16:26:28 +00:00
Momchil Velikov 114c37e72a [ARM] Fix detection of duplicates when parsing reg list operands
Differential Revision: https://reviews.llvm.org/D65957

llvm-svn: 368712
2019-08-13 16:13:00 +00:00
Momchil Velikov f990e4a4c7 [ARM] Fix encoding of APSR in CLRM instruction
The APSR is encoded by setting bit 15 in the register list of the CLRM
instruction (cf. https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf).

Differential Revision: https://reviews.llvm.org/D65873

llvm-svn: 368711
2019-08-13 16:12:46 +00:00
Matt Arsenault 690645bda0 GlobalISel: Implement lower for G_SHUFFLE_VECTOR
llvm-svn: 368709
2019-08-13 16:09:07 +00:00
Lang Hames 52a34a78d9 [ORC] Refactor definition-generation, add a generator for static libraries.
This patch replaces the JITDylib::DefinitionGenerator typedef with a class of
the same name, and adds support for attaching a sequence of DefinitionGeneration
objects to a JITDylib.

This patch also adds a new definition generator,
StaticLibraryDefinitionGenerator, that can be used to add symbols fom a static
library to a JITDylib. An object from the static library will be added (via
a supplied ObjectLayer reference) whenever a symbol from that object is
referenced.

To enable testing, lli is updated to add support for the --extra-archive option
when running in -jit-kind=orc-lazy mode.

llvm-svn: 368707
2019-08-13 16:05:18 +00:00
Matt Arsenault 0a04a06250 GlobalISel: Add more verifier checks for G_SHUFFLE_VECTOR
llvm-svn: 368705
2019-08-13 15:52:21 +00:00
Matt Arsenault 5af9cf042f GlobalISel: Change representation of shuffle masks
Currently shufflemasks get emitted as any other constant, and you end
up with a bunch of virtual registers of G_CONSTANT with a
G_BUILD_VECTOR. The AArch64 selector then asserts on anything that
doesn't fit this pattern. This isn't an ideal representation, and
should avoid legalization and have fewer opportunities for a
representational error.

Rather than invent a new shuffle mask operand type, similar to what
ShuffleVectorSDNode does, just track the original IR Constant mask
operand. I don't completely like the idea of adding another link to
the IR, but MIR is already quite dependent on IR constants already,
and this will allow sharing the shuffle mask utility functions with
the IR.

llvm-svn: 368704
2019-08-13 15:34:38 +00:00
Roman Lebedev 676594305a [CodeGen][SelectionDAG] More efficient code for X % C == 0 (SREM case)
Summary:
This implements an optimization described in Hacker's Delight 10-17:
when `C` is constant, the result of `X % C == 0` can be computed
more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

One huge caveat: this signed case is only valid for positive divisors.

While we can freely negate negative divisors, we can't negate `INT_MIN`,
so for now if `INT_MIN` is encountered, we bailout.
As a follow-up, it should be possible to handle that more gracefully
via extra `and`+`setcc`+`select`.

This passes llvm's test-suite, and from cursory(!) cross-examination
the folds (the assembly) match those of GCC, and manual checking via alive
did not reveal any issues (other than the `INT_MIN` case)

Reviewers: RKSimon, spatel, hermord, craig.topper, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: xbolva00, thakis, javed.absar, hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65366

llvm-svn: 368702
2019-08-13 14:57:37 +00:00
Roman Lebedev f4de7eda4a [TargetLowering][NFC] prepareUREMEqFold(): fixup comment
The comment initially matched the code, but the code was incorrect
and was fixed after the initial revert back back when it was introduced,
but the comment was never updated.

llvm-svn: 368701
2019-08-13 14:57:08 +00:00
Roman Lebedev 73f702ff19 [InstCombine] Non-canonical clamp-like pattern handling
Summary:
Given a pattern like:
```
%old_cmp1 = icmp slt i32 %x, C2
%old_replacement = select i1 %old_cmp1, i32 %target_low, i32 %target_high
%old_x_offseted = add i32 %x, C1
%old_cmp0 = icmp ult i32 %old_x_offseted, C0
%r = select i1 %old_cmp0, i32 %x, i32 %old_replacement
```
it can be rewritten as more canonical pattern:
```
%new_cmp1 = icmp slt i32 %x, -C1
%new_cmp2 = icmp sge i32 %x, C0-C1
%new_clamped_low = select i1 %new_cmp1, i32 %target_low, i32 %x
%r = select i1 %new_cmp2, i32 %target_high, i32 %new_clamped_low
```
Iff `-C1 s<= C2 s<= C0-C1`
Also, `ULT` predicate can also be `UGE`; or `UGT` iff `C0 != -1` (+invert result)
Also, `SLT` predicate can also be `SGE`; or `SGT` iff `C2 != INT_MAX` (+invert result)

If `C1 == 0`, then all 3 instructions must be one-use; else at most either `%old_cmp1` or `%old_x_offseted` can have extra uses.
NOTE: if we could reuse `%old_cmp1` as one of the comparisons we'll have to build, this could be less limiting.

So there are two icmp's, each one with 3 predicate variants, so there are 9 fold variants:

|     | ULT                            | UGE                             | UGT                             |
| SLT | https://rise4fun.com/Alive/yIJ | https://rise4fun.com/Alive/5BfN | https://rise4fun.com/Alive/INH  |
| SGE | https://rise4fun.com/Alive/hd8 | https://rise4fun.com/Alive/Abk  | https://rise4fun.com/Alive/PlzS |
| SGT | https://rise4fun.com/Alive/VYG | https://rise4fun.com/Alive/oMY  | https://rise4fun.com/Alive/KrzC |
{F9730206}

This fold was brought up in https://reviews.llvm.org/D65148#1603922 by @dmgreen, and is needed to unblock that patch.
This patch requires D65530.

Reviewers: spatel, nikic, xbolva00, dmgreen

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits, dmgreen

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65765

llvm-svn: 368687
2019-08-13 12:49:28 +00:00
Roman Lebedev 0410489a34 [InstCombine][NFC] Rename IsFreeToInvert() -> isFreeToInvert() for consistency
As per https://reviews.llvm.org/D65530#inline-592325

llvm-svn: 368686
2019-08-13 12:49:16 +00:00
Roman Lebedev 2635c324da [InstCombine] foldXorOfICmps(): don't give up on non-single-use ICmp's if all users are freely invertible
Summary:
This is rather unconventional..

As the comment there says, we don't have much folds for xor-of-icmps,
we try to turn them into an and-of-icmps, for which we have plenty of folds.
But if the ICmp we need to invert is not single-use - we give up.

As discussed in https://reviews.llvm.org/D65148#1603922,
we may have a non-canonical CLAMP pattern, with bit match and
select-of-threshold that we'll potentially clamp.
As it can be seen in `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`,
out of all 8 variations of the pattern, only two are **not** canonicalized into
the variant with and+icmp instead of bit math.
The reason is because the ICmp we need to invert is not single-use - we give up.

We indeed can't perform this fold at will, the general rule is that
we should not increase instruction count in InstCombine,

But we wouldn't end up increasing instruction count if we can adapt every other
user to the inverted value. This way the `not` we create **will** get folded,
and in the end the instruction count did not increase.

For that, of course, we need to look at the users of a Value,
which is again rather unconventional for InstCombine :S

Thus i'm proposing to be a little bit more insistive in `foldXorOfICmps()`.
The alternatives would be to not create that `not`, but add duplicate code to
manually invert all users; or to add some even less general combine to handle
some more specific pattern[s].

Reviewers: spatel, nikic, RKSimon, craig.topper

Reviewed By: spatel

Subscribers: hiraditya, jdoerfert, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65530

llvm-svn: 368685
2019-08-13 12:49:06 +00:00
Simon Pilgrim e7b350a5d1 [X86] XFormVExtractWithShuffleIntoLoad - handle shuffle mask scaling
If the target shuffle mask is from a wider type, attempt to scale the mask so that the extraction can attempt to peek through.

Fixes the regression mentioned in rL368662

Reapplying this as rL368308 had to be reverted as part of rL368660 to revert rL368276

llvm-svn: 368663
2019-08-13 11:11:42 +00:00
Simon Pilgrim 1a8d790cf5 [X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using DemandedElts mask (reapplied)
If we don't demand all elements, then attempt to combine to a simpler shuffle.

At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts. 

The insertps-combine.ll regression is because XFormVExtractWithShuffleIntoLoad can't see through shuffles of different widths - this will be fixed in a follow-up commit.

Reapplying this as rL368307 had to be reverted as part of rL368660 to revert rL368276

llvm-svn: 368662
2019-08-13 10:51:39 +00:00
Hans Wennborg 5390d25f2b Revert r368276 "[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT"
This introduced a false positive MemorySanitizer warning about use of
uninitialized memory in a vectorized crc function in Chromium. That suggests
maybe something is not right with this transformation. See
https://crbug.com/992853#c7 for a reproducer.

This also reverts the follow-up commits r368307 and r368308 which
depended on this.

> This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.
>
> In particular this helps remove some unnecessary scalar->vector->scalar patterns.
>
> The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.
>
> Differential Revision: https://reviews.llvm.org/D65887

llvm-svn: 368660
2019-08-13 09:33:25 +00:00
David Bolvansky 39130314fe [SimplifyLibCalls] Add dereferenceable bytes from known callsites
Summary:
int mm(char *a, char *b) {
    return memcmp(a,b,16);
}

Currently:
define dso_local i32 @mm(i8* nocapture readonly %a, i8* nocapture readonly %b) local_unnamed_addr #1 {
entry:
  %call = tail call i32 @memcmp(i8* %a, i8* %b, i64 16)
  ret i32 %call
}

After patch:
define dso_local i32 @mm(i8* nocapture readonly %a, i8* nocapture readonly %b) local_unnamed_addr #1 {
entry:
  %call = tail call i32 @memcmp(i8* dereferenceable(16)  %a, i8* dereferenceable(16)  %b, i64 16)
  ret i32 %call
}




Reviewers: jdoerfert, efriedma

Reviewed By: jdoerfert

Subscribers: javed.absar, spatel, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66079

llvm-svn: 368657
2019-08-13 09:11:49 +00:00
Qiu Chaofan 4fb99a3330 [PowerPC] Fix ICE when truncating some vectors
The legalizer would hit an assertion on PowerPC platform when truncating
a vector whose size is not power of 2.  This patch is to add a check to
prevent vectors with such odd-size elements from being custom lowered.

Reviewed By: Hal Finkel

Differential Revision: https://reviews.llvm.org/D65261

llvm-svn: 368654
2019-08-13 07:53:29 +00:00
Amara Emerson 72c81b94cb [AArch64][GlobalISel] Replace explicit vreg creation with implicit using SrcOp. NFC.
llvm-svn: 368653
2019-08-13 06:55:32 +00:00
Amara Emerson e14c91b71a [GlobalISel] Make the InstructionSelector instance non-const, allowing state to be maintained.
Currently we can't keep any state in the selector object that we get from
subtarget. As a result we have to plumb through all our variables through
multiple functions. This change makes it non-const and adds a virtual init()
method to allow further state to be captured for each target.

AArch64 makes use of this in this patch to cache a call to hasFnAttribute()
which is expensive to call, and is used on each selection of G_BRCOND.

Differential Revision: https://reviews.llvm.org/D65984

llvm-svn: 368652
2019-08-13 06:26:59 +00:00
Serge Pavlov 2a09b9acfb Added unit tests to check supported rounding modes
Also added fixed misspelled metadata name.

Differential Revision: https://reviews.llvm.org/D66073

llvm-svn: 368650
2019-08-13 05:21:18 +00:00
Aditya Nandakumar 70fdfed45f [GlobalISel]: Add KnownBits for G_XOR
https://reviews.llvm.org/D66119

llvm-svn: 368648
2019-08-13 04:32:33 +00:00
Yevgeny Rouban 8b996dc16e Verifier: check prof branch_weights
This patch is to check some of constraints on
!pro branch_weights metadata:
https://llvm.org/docs/BranchWeightMetadata.html

Reviewers: asbirlea, reames, chandlerc
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D61179

llvm-svn: 368647
2019-08-13 04:03:38 +00:00
Akira Hatanaka 3c7c053145 Do not call replaceAllUsesWith to upgrade calls to ARC runtime functions
to intrinsic calls

This fixes a bug in r368311.

It turns out that the ARC runtime functions in the IR can have pointer
parameter types that are not i8* or i8**. Instead of RAUWing normal
functions with intrinsics, manually bitcast the arguments before passing
them to the intrinsic functions and bitcast the return value back to the
type of the original call instruction.

This recommits r368634, which was reverted in r368637. The loop in the
patch was iterating over uses of a function and deleting function calls
inside it, which caused bots to crash.

rdar://problem/54125406

Differential Revision: https://reviews.llvm.org/D66047

llvm-svn: 368646
2019-08-13 01:23:06 +00:00
Stanislav Mekhanoshin 438315bf69 [AMDGPU] Fix msan failure in printf lowering
llvm-svn: 368645
2019-08-13 01:07:27 +00:00
Daniel Sanders a58a27513b Eliminate implicit Register->unsigned conversions in VirtRegMap. NFC
Summary:
This was mostly an experiment to assess the feasibility of completely
eliminating a problematic implicit conversion case in D61321 in advance of
landing that* but it also happens to align with the goal of propagating the
use of Register/MCRegister instead of unsigned so I believe it makes sense
to commit it.

The overall process for eliminating the implicit conversions from
Register/MCRegister -> unsigned was to:
1. Add an explicit conversion to support genuinely required conversions to
   unsigned. For example, using them as an index for IndexedMap. Sadly it's
   not possible to have an explicit and implicit conversion to the same
   type and only deprecate the implicit one so I called the explicit
   conversion get().
2. Temporarily annotate the implicit conversion to unsigned with
   LLVM_ATTRIBUTE_DEPRECATED to make them visible
3. Eliminate implicit conversions by propagating Register/MCRegister/
   explicit-conversions appropriately
4. Remove the deprecation added in 2.

* My conclusion is that it isn't feasible as there's too much code to
  update in one go.

Depends on D65678

Reviewers: arsenm

Subscribers: MatzeB, wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65685

llvm-svn: 368643
2019-08-13 00:55:24 +00:00
Akira Hatanaka c109808982 Revert "Do not call replaceAllUsesWith to upgrade calls to ARC runtime functions"
This reverts commit r368634 because it broke a bot.

llvm-svn: 368637
2019-08-13 00:20:36 +00:00
Eric Christopher 4acb4ee767 Move findBBwithCalls to the file it's used in to avoid unused function
warnings.

llvm-svn: 368636
2019-08-13 00:05:01 +00:00
Akira Hatanaka 6817ce24c1 Do not call replaceAllUsesWith to upgrade calls to ARC runtime functions
to intrinsic calls

This fixes a bug in r368311.

It turns out that the ARC runtime functions in the IR can have pointer
parameter types that are not i8* or i8**. Instead of RAUWing normal
functions with intrinsics, manually bitcast the arguments before passing
them to the intrinsic functions and bitcast the return value back to the
type of the original call instruction.

rdar://problem/54125406

llvm-svn: 368634
2019-08-12 23:53:23 +00:00
Stanislav Mekhanoshin 5b32752d10 [AMDGPU] removed unused functions from printf lowering
Differential Revision: https://reviews.llvm.org/D66117

llvm-svn: 368633
2019-08-12 23:32:35 +00:00
Reid Kleckner e9865b9b31 [WinEH] Fix catch block parent frame pointer offset
r367088 made it so that funclets store XMM registers into their local
frame instead of storing them to the parent frame. However, that change
forgot to update the parent frame pointer offset for catch blocks. This
change does that.

Fixes crashes when an exception is rethrown in a catch block that saves
XMMs, as described in https://crbug.com/992860.

llvm-svn: 368631
2019-08-12 23:02:00 +00:00
Juergen Ributzka b978c51ce4 [TextAPI] Fix & Add tests for tbd files version 3.
- There was a simple typo in TextStub code that prevented version 3 files to be read.
- Included a version 3 unit test to handle the differences in the format.
- Also a typo in Error.h inside the comments.

https://reviews.llvm.org/D66041

This patch is from Cyndy Ishida <cyndy_ishida@apple.com>.

llvm-svn: 368630
2019-08-12 23:01:07 +00:00
Daniel Sanders 3836874dbb [risc-v] Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Depends on D65919

Reviewers: lenary

Subscribers: jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision for full review was: https://reviews.llvm.org/D65962

llvm-svn: 368629
2019-08-12 22:41:02 +00:00
Daniel Sanders 5ae66e56cf [aarch64] Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Manual fixups in:
AArch64InstrInfo.cpp - genFusedMultiply() now takes a Register* instead of unsigned*
AArch64LoadStoreOptimizer.cpp - Ternary operator was ambiguous between Register/MCRegister. Settled on Register

Depends on D65919

Reviewers: aemerson

Subscribers: jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision for full review was: https://reviews.llvm.org/D65962

llvm-svn: 368628
2019-08-12 22:40:53 +00:00
Daniel Sanders 05c145d694 [webassembly] Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Reviewers: aheejin

Subscribers: jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision for whole review: https://reviews.llvm.org/D65962

llvm-svn: 368627
2019-08-12 22:40:45 +00:00
Stanislav Mekhanoshin ef8f1c473a [AMDGPU] Use PredicateControl in MIMGBaseOpcode. NFC.
This is infrastructural, will be needed for future work.
For some reason it was only used in MIMG_NoSampler, while
needed everywere we use MIMGBaseOpcode if we want to use
predicates.

Differential Revision: https://reviews.llvm.org/D66115

llvm-svn: 368626
2019-08-12 22:32:21 +00:00
Johannes Doerfert 26e58466de [Attributor] Use the cached data layout directly
This removes the warning by using the new DL member.
It also simplifies the code.

llvm-svn: 368625
2019-08-12 22:21:09 +00:00
Craig Topper e07e593782 [X86] Allow combineTruncateWithSat to use pack instructions for i16->i8 without AVX512BW.
We need AVX512BW to be able to truncate an i16 vector. If we don't
have that we have to extend i16->i32, then trunc, i32->i8. But we
won't be able to remove the min/max if we do that. At least not
without more special handling.

llvm-svn: 368623
2019-08-12 22:18:23 +00:00
Johannes Doerfert acc8079f8e [Attributor][NFC] Add IntegerState raw_ostream << operator
llvm-svn: 368622
2019-08-12 22:07:34 +00:00
Johannes Doerfert ece8190497 [Attributor] Make the InformationCache an Attributor member
The functionality is not changed but the interfaces are simplified and
repetition is removed.

llvm-svn: 368621
2019-08-12 22:05:53 +00:00
Aditya Nandakumar 55371e697c [GISel]: Fix a bug in KnownBits where we should have been using SizeInBits
https://reviews.llvm.org/D66039

We were using getIndexSize instead of getIndexSizeInBits().
Added test case for G_PTRTOINT and G_INTTOPTR.

llvm-svn: 368618
2019-08-12 21:28:12 +00:00
Craig Topper 0761a38e8a [X86] Remove unreachable code from LowerTRUNCATE. NFC
All three 256->128 bit cases were already handled above.

Noticed while looking at the coverage report.

llvm-svn: 368609
2019-08-12 19:26:45 +00:00
Craig Topper a3605baaff [X86] Add a paranoia type check to the code that detects AVG patterns from truncating stores.
If we're after type legalize, we should make sure we won't create
a store with an illegal type when we separate the AVG pattern
from the truncating store.

I don't know of a way to fail for this today. Just noticed while
I was in the vicinity.

llvm-svn: 368608
2019-08-12 19:26:37 +00:00
Craig Topper 1b02909847 [X86] Simplify creation of saturating truncating stores.
We just need to check if the truncating store is legal
instead of going through isSATValidOnAVX512Subtarget.

llvm-svn: 368607
2019-08-12 19:26:30 +00:00
Craig Topper 3f4e9b156d [X86] Replace call to isTruncStoreLegalOrCustom with isTruncStoreLegal. NFC
We have no custom trunc stores on X86.

llvm-svn: 368606
2019-08-12 19:26:22 +00:00
Wenlei He 4b99b58a84 [ThinLTO][AutoFDO] Fix memory corruption due to race condition from thin backends
Summary:
This commit fixed a race condition from multi-threaded thinLTO backends that causes non-deterministic memory corruption for a data structure used only by AutoFDO with compact binary profile.
GUIDToFuncNameMap, a static data member of type DenseMap in FunctionSamples is used as a per-module mapping from function name MD5 to name string when input AutoFDO profile is in compact binary format. However with ThinLTO, we can have parallel backends modifying and accessing the class static map concurrently. The fix is to make GUIDToFuncNameMap a member of SampleProfileLoader instead of a file static data.

Reviewers: wmi, davidxl, danielcdh

Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65848

llvm-svn: 368596
2019-08-12 17:45:14 +00:00
Craig Topper 09d5d15339 [X86] Disable use of zmm registers for varargs musttail calls under prefer-vector-width=256 and min-legal-vector-width=256.
Under this config, the v16f32 type we try to use isn't to a register
class so the getRegClassFor call will fail.

llvm-svn: 368594
2019-08-12 17:43:26 +00:00
David Green 86876422ef [ARM] sext of a load is free
This teaches the cost model that the sext or zext of a load is going to be
free.

Differential Revision: https://reviews.llvm.org/D66006

llvm-svn: 368593
2019-08-12 17:39:56 +00:00
Stanislav Mekhanoshin 4c9c98f36b [AMDGPU] Printf runtime binding pass
This pass is a port of the according pass from the HSAIL compiler.
It parses printf calls and setup runtime printf buffer.
After that it copies printf arguments to the buffer and fills in
module metadata for runtime.

Differential Revision: https://reviews.llvm.org/D24035

llvm-svn: 368592
2019-08-12 17:12:29 +00:00
David Green 3e39f39ad9 [ARM] MVE shuffle broadcast costs
A VDUP will perform a vector broadcast in a single instruction. Update the cost
model for MVE accordingly.

Code originally by David Sherwood.

Differential Revision: https://reviews.llvm.org/D63448

llvm-svn: 368589
2019-08-12 16:54:07 +00:00
David Green 83bbfaa5e4 [ARM] Put some of the TTI costmodel behind hasNeon calls.
This puts some of the calls in ARMTargetTransformInfo.cpp behind hasNeon()
checks, now that we have MVE, and updates all the tests accordingly.

Differential Revision: https://reviews.llvm.org/D63447

llvm-svn: 368587
2019-08-12 15:59:52 +00:00
Sean Fertile 29141da75e [XCOFF] Use a single symbolic constant for the size of an embeded name. [NFC]
Convert SymbolNameSize and SectionNameSize into just `NameSize`. The length of
a name embeded in a symbol table entry or section header table entry is length 8
for Sections, Symbols and Files. No need to have a distinct constant for each
one. Also removes the Size argument to 'generateStringRef' as the size is
always 'XCOFF::NameSize'.

llvm-svn: 368584
2019-08-12 15:27:40 +00:00
Hans Wennborg a45f301f7a Revert r368339 "[MBP] Disable aggressive loop rotate in plain mode"
It caused assertions to fire when building Chromium:

  lib/CodeGen/LiveDebugValues.cpp:331: bool
  {anonymous}::LiveDebugValues::OpenRangesSet::empty() const: Assertion
  `Vars.empty() == VarLocs.empty() && "open ranges are inconsistent"' failed.

See https://crbug.com/992871#c3 for how to reproduce.

> Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
>
> To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
>
> Differential Revision: https://reviews.llvm.org/D65673

llvm-svn: 368579
2019-08-12 14:23:13 +00:00
Kang Zhang 489efc68a5 Revert r368565: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks
llvm-svn: 368574
2019-08-12 14:00:31 +00:00
Sam Elliott fee242aed4 [RISCV] Fix ICE in isDesirableToCommuteWithShift
Summary:
Ana Pazos reported a bug where we were not checking that an APInt would
fit into 64-bits before calling `getSExtValue()`. This caused asserts when
compiling large constants, such as i128s, as happens when compiling compiler-rt.

This patch adds a testcase and makes the callback less error-prone.

Reviewers: apazos, asb, luismarques

Reviewed By: luismarques

Subscribers: hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66081

llvm-svn: 368572
2019-08-12 13:51:00 +00:00
David Bolvansky 20d37fab82 [InstCombine] x /c fabs(x) -> copysign(1.0, x)
Summary:
x / fabs(x) -> copysign(1.0, x)
fabs(x) / x -> copysign(1.0, x)

Reviewers: spatel, foad, RKSimon, efriedma

Reviewed By: spatel

Subscribers: lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65898

llvm-svn: 368570
2019-08-12 13:43:35 +00:00
David Stenberg 9b29ec58b7 [DebugInfo] Remove call sites when eliminating unreachable blocks
Summary:
When eliminating an unreachable block we must remove any call site
information for calls residing in the block.

This was originally found on a downstream target, and the attached x86
test case was produced by hand-modifying some MIR.

Reviewers: aprantl, asowda, NikolaPrica, djtodoro, ivanbaev, vsk

Reviewed By: NikolaPrica, vsk

Subscribers: vsk, hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D64500

llvm-svn: 368566
2019-08-12 13:22:29 +00:00
Kang Zhang 342fb0db6d [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks
Summary:

In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun.
But the `early-ret` pass is before `block-placement`, we don't want to run it again.
This patch is to do the simple early return to optimize the blocks at the last of `block-placement`.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D63972

llvm-svn: 368565
2019-08-12 13:15:31 +00:00
Hans Wennborg 5b96d4655c Revert r368509 "[CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks"
> In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun.
> But the `early-ret` pass is before `block-placement`, we don't want to run it again.
> This patch is to do the simple early return to optimize the blocks at the last of `block-placement`.
>
> Reviewed By: efriedma
>
> Differential Revision: https://reviews.llvm.org/D63972

This also revertes follow-ups r368514 and r368532.

llvm-svn: 368560
2019-08-12 12:43:51 +00:00
Simon Pilgrim 182249daee [X86][SSE] ComputeKnownBits - add basic PSADBW handling
llvm-svn: 368558
2019-08-12 12:19:19 +00:00
Roman Lebedev ccdad6ef48 [InstCombine] foldShiftIntoShiftInAnotherHandOfAndInICmp(): avoid constantexpr pitfail (PR42962)
Instead of matching value and then blindly casting to BinaryOperator
just to get the opcode, just match instruction and do no cast.

Fixes https://bugs.llvm.org/show_bug.cgi?id=42962

llvm-svn: 368554
2019-08-12 11:28:02 +00:00
Simon Pilgrim 05e8209e33 [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::TRUNCATE
llvm-svn: 368553
2019-08-12 10:56:05 +00:00
Pengfei Wang e28cbbd5d4 [X86] Support -march=tigerlake
Support -march=tigerlake for x86.
Compare with Icelake Client, It include 4 more new features ,they are
avx512vp2intersect, movdiri, movdir64b, shstk.

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D65840

llvm-svn: 368543
2019-08-12 01:29:46 +00:00
Wenlei He cb5a90fd31 Fix pass dependency for LICM
Expected to address buildbot failure http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/16285 caused by D65060.

llvm-svn: 368542
2019-08-11 22:54:05 +00:00
Bjorn Pettersson 27038a3780 [SelectionDAG] Widen vector results of SMULFIX/UMULFIX/SMULFIXSAT
Summary:
After the commits that changed x86 backend to widen vectors
instead of using promotion some of our downstream tests
started to fail. It was noticed that WidenVectorResult has
been missing support for SMULFIX/UMULFIX/SMULFIXSAT. This
patch adds the missing functionality.

Reviewers: craig.topper, RKSimon

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66051

llvm-svn: 368540
2019-08-11 19:27:06 +00:00
Craig Topper ce6a2cf966 [X86] Simplify some of the type checks in combineSubToSubus.
If we have SSE2 we can handle any i8/i16 type and let
type legalization deal with it.

llvm-svn: 368538
2019-08-11 17:36:49 +00:00
Craig Topper 637964bfd8 [X86] Don't use SplitOpsAndApply for ISD::USUBSAT.
Target independent type legalization and custom lowering
should be able to handle it.

llvm-svn: 368537
2019-08-11 17:36:45 +00:00
Kang Zhang b1a62d168f [NFC][CodeGen] Use while loop instead for loop in MachineBlockPlacement::optimizeBranches()
This will pass EXPENSIVE check.

llvm-svn: 368532
2019-08-11 12:58:50 +00:00
David Green 11c4602fce [MVE] Don't try to unroll vectorised MVE loops
Due to the nature of the beat system in the MVE architecture, along with tail
predication and low-overhead loops, unrolling has less benefit compared to
normal loops. You can not, for example, hide the latency of a load with other
instructions as you can for scalar code. Preventing unrolling also makes the
code easier to read and reason about.

So if a loop contains vector code, don't enable the runtime unrolling. At least
for the time being.

Differential Revision: https://reviews.llvm.org/D65803

llvm-svn: 368530
2019-08-11 08:53:18 +00:00
David Green 44f8d635e2 [ARM] Permit auto-vectorization using MVE
With enough codegen complete, we can now correctly report the number and size
of vector registers for MVE, allowing auto vectorisation. This also allows FP
auto-vectorization for MVE without -Ofast/-ffast-math, due to support for IEEE
FP arithmetic and parity between scalar and vector FP behaviour.

Patch by David Sherwood.

Differential Revision: https://reviews.llvm.org/D63728

llvm-svn: 368529
2019-08-11 08:42:57 +00:00
Heejin Ahn 831efe0e0f Fix __clang_call_termiante's argument for foreign exceptions
Summary:
When exceptions are repeatedly thrown in the middle of handling another
exception, we call `__clang_call_terminate` with the exception pointer
(i32) as an argument. But in case of foreign exceptions, we don't have
the pointer, so we call the function with 0. (This requires
`__clang_call_terminate` can deal with 0 argument, which will be done
later)

But previously the 0 argument was not added as a `i32.const 0` but an
immediate by mistake, causing the `call` instruction to take not an i32
but rather an exnref, because an `exnref` is left on top of the value
stack if `br_on_exn` is not taken.

```
block i32
  br_on_exn 0, __cpp_exception
                               ;; exnref is on top of stack now
  i32.const 0                  ;; This was missing!
  call __clang_call_terminate
  unreachable
end
call __clang_call_terminate    ;; This takes i32 extracted by br_on_exn
```

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65475

llvm-svn: 368527
2019-08-11 06:24:07 +00:00
Wenlei He 7e71aa24bc [LICM] Make Loop ICM profile aware
Summary:
Hoisting/sinking instruction out of a loop isn't always beneficial. Hoisting an instruction from a cold block inside a loop body out of the loop could hurt performance. This change makes Loop ICM profile aware - it now checks block frequency to make sure hoisting/sinking anly moves instruction to colder block.

Test Plan:

ninja check

Reviewers: asbirlea, sanjoy, reames, nikic, hfinkel, vsk

Reviewed By: asbirlea

Subscribers: fhahn, vsk, davidxl, xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65060

llvm-svn: 368526
2019-08-11 06:05:35 +00:00
Wenlei He d664072dd5 Revert "test commit"
This reverts commit ad92a4a2769425ad0d39ac1dbb6282f6f51a1af7.

llvm-svn: 368525
2019-08-11 05:59:20 +00:00
Wenlei He 7bd327da00 test commit
llvm-svn: 368524
2019-08-11 05:50:28 +00:00
Craig Topper 9758e0e1bf [X86] Remove some more code from combineShuffle that is no longer needed with widening legalization.
llvm-svn: 368523
2019-08-11 02:17:18 +00:00
Craig Topper 0f74b82ef1 [X86] Remove some code from combineShuffle that seems largely unnecessary with widening legalization.
The test case that changed is probably better served through
allowing combineTruncatedArithmetic to create narrow vectors. It
also appears InstCombine would have simplified this test case
to remove the zext and trunc anyway.

llvm-svn: 368522
2019-08-11 02:08:38 +00:00
Roman Lebedev 96474d17c6 [InstCombine][NFC] Use SimplifyAddInst() instead of SimplifyBinOp(Instruction::BinaryOps::Add, )
llvm-svn: 368521
2019-08-10 19:29:10 +00:00
Roman Lebedev a8d20b4467 [InstCombine] Shift amount reassociation in bittest: relax one-use check when shifting constant
If one of the values being shifted is a constant, since the new shift
amount is known-constant, the new shift will end up being constant-folded
so, we don't need that one-use restriction then.

llvm-svn: 368519
2019-08-10 19:28:54 +00:00
Roman Lebedev 64fe806c4e [InstCombine] Shift amount reassociation in bittest: drop pointless one-use restriction
That one-use restriction is not needed for correctness - we have already
ensured that one of the shifts will go away, so we know we won't increase
the instruction count. So there is no need for that restriction.

llvm-svn: 368518
2019-08-10 19:28:44 +00:00
Simon Pilgrim ec128709f0 [X86][SSE] Lower shuffle as ANY_EXTEND_VECTOR_INREG
On SSE41+ targets we always lower vector shuffles to ZERO_EXTEND_VECTOR_INREG, even if we don't need the extended bits.

This patch relaxes this so that we lower to ANY_EXTEND_VECTOR_INREG if we can, meaning that shuffle combines have a better idea of what elements need to be kept zero. This helps the multiple reduction code as we can now combine away a lot more of the pack+extend codes.

Differential Revision: https://reviews.llvm.org/D65741

llvm-svn: 368515
2019-08-10 16:46:07 +00:00
Kang Zhang 555f7495df [NFC][CodeGen] Modify the PI++ to ++PI in MachineBlockPlacement::optimizeBranches()
llvm-svn: 368514
2019-08-10 16:23:17 +00:00
Sanjay Patel 21c15ef384 [Reassociate] try harder to convert negative FP constants to positive
This is an extension of a transform that tries to produce positive floating-point
constants to improve canonicalization (and hopefully lead to more reassociation
and CSE).

The original patches were:
D4904
D5363 (rL221721)

But as the test diffs show, these were limited to basic patterns by walking from
an instruction to its single user rather than recursively moving up the def-use
sequence. No fast-math is required here because we're only rearranging implicit
FP negations in intermediate ops.

A motivating bug is:
https://bugs.llvm.org/show_bug.cgi?id=32939

Differential Revision: https://reviews.llvm.org/D65954

llvm-svn: 368512
2019-08-10 13:17:54 +00:00
Kang Zhang 36cd84bdd9 [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks
Summary:

In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun.
But the `early-ret` pass is before `block-placement`, we don't want to run it again.
This patch is to do the simple early return to optimize the blocks at the last of `block-placement`.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D63972

llvm-svn: 368509
2019-08-10 09:58:52 +00:00
Craig Topper 74c43a2277 [X86] Match the IR pattern form movmsk on SSE1 only targets where v4i32 isn't legal
Summary:
This patch adds a special DAG combine for SSE1 to recognize the IR pattern InstCombine gives us for movmsk. This only does the recognition for a few cases where its obvious the input won't be scalarized resulting in building a vector just do to the movmsk. I've made it separate from our existing matching for movmsk since that's called in multiple places and I didn't spend time to see if the other callers would make sense here. Plus the restrictions and additional checks would complicate that.

This fixes the case from PR42870. Buts its probably still broken the presence of logic ops feeding the movmsk pattern which would further hide the v4f32 type.

Reviewers: spatel, RKSimon, xbolva00

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65689

llvm-svn: 368506
2019-08-10 07:51:13 +00:00
Craig Topper a8e5e73711 [X86] Improve the diagnostic for larger than 4-bit immediate for vpermil2pd/ps. Only allow MCConstantExprs.
llvm-svn: 368505
2019-08-10 04:28:52 +00:00
Luo, Yuanke c6c86f4f81 [X86] Fix stack probe issue on windows32.
Summary:
On windows if the frame size exceed 4096 bytes, compiler need to
generate a call to _alloca_probe. X86CallFrameOptimization pass
changes the reserved stack size and cause of stack probe function
not be inserted. This patch fix the issue by detecting the call
frame size, if the size exceed 4096 bytes, drop X86CallFrameOptimization.

Reviewers: craig.topper, wxiao3, annita.zhang, rnk, RKSimon

Reviewed By: rnk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65923

llvm-svn: 368503
2019-08-10 02:49:02 +00:00
Fedor Sergeev 92e160abab [MemDep] allow to select block-scan-limit when constructing MemoryDependenceAnalysis
Introducing non-global control for default block-scan-limit in MemDep analysis.
Useful when there are many compilations per initialized LLVM instance (e.g. JIT).

Reviewed By: asbirlea
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65806

llvm-svn: 368502
2019-08-10 01:23:38 +00:00
Peter Collingbourne 0e497d1554 cfi-icall: Allow the jump table to be optionally made non-canonical.
The default behavior of Clang's indirect function call checker will replace
the address of each CFI-checked function in the output file's symbol table
with the address of a jump table entry which will pass CFI checks. We refer
to this as making the jump table `canonical`. This property allows code that
was not compiled with ``-fsanitize=cfi-icall`` to take a CFI-valid address
of a function, but it comes with a couple of caveats that are especially
relevant for users of cross-DSO CFI:

- There is a performance and code size overhead associated with each
  exported function, because each such function must have an associated
  jump table entry, which must be emitted even in the common case where the
  function is never address-taken anywhere in the program, and must be used
  even for direct calls between DSOs, in addition to the PLT overhead.

- There is no good way to take a CFI-valid address of a function written in
  assembly or a language not supported by Clang. The reason is that the code
  generator would need to insert a jump table in order to form a CFI-valid
  address for assembly functions, but there is no way in general for the
  code generator to determine the language of the function. This may be
  possible with LTO in the intra-DSO case, but in the cross-DSO case the only
  information available is the function declaration. One possible solution
  is to add a C wrapper for each assembly function, but these wrappers can
  present a significant maintenance burden for heavy users of assembly in
  addition to adding runtime overhead.

For these reasons, we provide the option of making the jump table non-canonical
with the flag ``-fno-sanitize-cfi-canonical-jump-tables``. When the jump
table is made non-canonical, symbol table entries point directly to the
function body. Any instances of a function's address being taken in C will
be replaced with a jump table address.

This scheme does have its own caveats, however. It does end up breaking
function address equality more aggressively than the default behavior,
especially in cross-DSO mode which normally preserves function address
equality entirely.

Furthermore, it is occasionally necessary for code not compiled with
``-fsanitize=cfi-icall`` to take a function address that is valid
for CFI. For example, this is necessary when a function's address
is taken by assembly code and then called by CFI-checking C code. The
``__attribute__((cfi_jump_table_canonical))`` attribute may be used to make
the jump table entry of a specific function canonical so that the external
code will end up taking a address for the function that will pass CFI checks.

Fixes PR41972.

Differential Revision: https://reviews.llvm.org/D65629

llvm-svn: 368495
2019-08-09 22:31:59 +00:00
Sanjay Patel 26b2c11451 [DAGCombiner] exclude x*2.0 from normal negation profitability rules
This is the codegen part of fixing:
https://bugs.llvm.org/show_bug.cgi?id=32939

Even with the optimal/canonical IR that is ideally created by D65954,
we would reverse that transform in DAGCombiner and end up with the same
asm on AArch64 or x86.

I see 2 options for trying to correct this:

  1. Limit isNegatibleForFree() by special-casing the fmul pattern (this patch).
  2. Avoid creating (fmul X, 2.0) in the 1st place by adding a special-case
     transform to SelectionDAG::getNode() and/or SelectionDAGBuilder::visitFMul()
     that matches the transform done by DAGCombiner.

This seems like the less intrusive patch, but if there's some other reason to
prefer 1 option over the other, we can change to the other option.

Differential Revision: https://reviews.llvm.org/D66016

llvm-svn: 368490
2019-08-09 21:37:32 +00:00
Daniel Sanders e9a57c2b23 [globalisel] Add G_SEXT_INREG
Summary:
Targets often have instructions that can sign-extend certain cases faster
than the equivalent shift-left/arithmetic-shift-right. Such cases can be
identified by matching a shift-left/shift-right pair but there are some
issues with this in the context of combines. For example, suppose you can
sign-extend 8-bit up to 32-bit with a target extend instruction.
  %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity)
  %2:_(s32) = G_ASHR %1:_(s32), i32 24
  %3:_(s32) = G_ASHR %2:_(s32), i32 1
would reasonably combine to:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 25
which no longer matches the special case. If your shifts and extend are
equal cost, this would break even as a pair of shifts but if your shift is
more expensive than the extend then it's cheaper as:
  %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8
  %3:_(s32) = G_ASHR %2:_(s32), i32 1
It's possible to match the shift-pair in ISel and emit an extend and ashr.
However, this is far from the only way to break this shift pair and make
it hard to match the extends. Another example is that with the right
known-zeros, this:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 24
  %3:_(s32) = G_MUL %2:_(s32), i32 2
can become:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 23

All upstream targets have been configured to lower it to the current
G_SHL,G_ASHR pair but will likely want to make it legal in some cases to
handle their faster cases.

To follow-up: Provide a way to legalize based on the constant. At the
moment, I'm thinking that the best way to achieve this is to provide the
MI in LegalityQuery but that opens the door to breaking core principles
of the legalizer (legality is not context sensitive). That said, it's
worth noting that looking at other instructions and acting on that
information doesn't violate this principle in itself. It's only a
violation if, at the end of legalization, a pass that checks legality
without being able to see the context would say an instruction might not be
legal. That's a fairly subtle distinction so to give a concrete example,
saying %2 in:
  %1 = G_CONSTANT 16
  %2 = G_SEXT_INREG %0, %1
is legal is in violation of that principle if the legality of %2 depends
on %1 being constant and/or being 16. However, legalizing to either:
  %2 = G_SEXT_INREG %0, 16
or:
  %1 = G_CONSTANT 16
  %2:_(s32) = G_SHL %0, %1
  %3:_(s32) = G_ASHR %2, %1
depending on whether %1 is constant and 16 does not violate that principle
since both outputs are genuinely legal.

Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm

Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61289

llvm-svn: 368487
2019-08-09 21:11:20 +00:00
Eric Christopher db2f17d362 Remove variable only used in an assert.
llvm-svn: 368486
2019-08-09 21:02:47 +00:00
Craig Topper 6cb05ca044 [X86] Remove custom handling for extloads from LowerLoad.
We don't appear to need this with widening legalization.

llvm-svn: 368479
2019-08-09 20:27:22 +00:00
Bill Wendling 79176a2542 [CodeGen] Require a name for a block addr target
Summary:
A block address may be used in inline assembly. In which case it
requires a name so that the asm parser has something to parse. Creating
a name for every block address is a large hammer, but is necessary
because at the point when a temp symbol is created we don't necessarily
know if it's used in inline asm. This ensures that it exists regardless.

Reviewers: nickdesaulniers, craig.topper

Subscribers: nathanchance, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65352

llvm-svn: 368478
2019-08-09 20:18:30 +00:00
Bill Wendling 1b10438875 [MC] Don't recreate a label if it's already used
Summary:
This patch keeps track of MCSymbols created for blocks that were
referenced in inline asm. It prevents creating a new symbol which
doesn't refer to the block.

Inline asm may have a reference to a label. The asm parser however
doesn't recognize it as a label and tries to create a new symbol. The
result being that instead of the original symbol (e.g. ".Ltmp0") the
parser replaces it in the inline asm with the new one (e.g. ".Ltmp00")
without updating it in the symbol table. So the machine basic block
retains the "old" symbol (".Ltmp0"), but the inline asm uses the new one
(".Ltmp00").

Reviewers: nickdesaulniers, craig.topper

Subscribers: nathanchance, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65304

llvm-svn: 368477
2019-08-09 20:16:31 +00:00
Evandro Menezes 59fbe516bd [InstCombine] Refactor optimizeExp2() (NFC)
Refactor `LibCallSimplifier::optimizeExp2()` to use the new
`emitBinaryFloatFnCall()` version that fetches the function name from TLI.

llvm-svn: 368457
2019-08-09 17:22:56 +00:00
Evandro Menezes 8a21214174 [Transforms] Add a emitBinaryFloatFnCall() version that fetches the function name from TLI
Add the counterpart to a similar function for single operands.

Differential revision: https://reviews.llvm.org/D65976

llvm-svn: 368453
2019-08-09 17:06:46 +00:00
Sunil Srivastava 27f6f2f88b Print reasonable representations of type names in llvm-nm, readelf and readobj
For type values that do not have proper names, print reasonable representation
in llvm-nm, llvm-readobj and llvm-readelf, matching GNU tools.s

Fixes PR41713.

Differential Revision: https://reviews.llvm.org/D65537

llvm-svn: 368451
2019-08-09 16:54:51 +00:00
Evandro Menezes c6c00cdf2e [Transforms] Rename hasUnaryFloatFn() and getUnaryFloatFn() (NFC)
Rename `hasUnaryFloatFn()` to `hasFloatFn()` and `getUnaryFloatFn()` to `getFloatFnName()`.

llvm-svn: 368449
2019-08-09 16:04:18 +00:00
Sanjay Patel 0b4ae34c2f [DAGCombiner] remove redundant fold for X*1.0; NFC
This is handled at node creation time (similar to X/1.0)
after:
rL357029
(no fast-math-flags needed)

llvm-svn: 368443
2019-08-09 14:30:59 +00:00
Jinsong Ji 6349ce5ca5 [MachinePipeliner] Avoid indeterminate order in FuncUnitSorter
Summary:
This is exposed by adding a new testcase in PowerPC in
https://reviews.llvm.org/rL367732

The testcase got different output on different platform, hence breaking
buildbots.

The problem is that we get differnt FuncUnitOrder when calculateResMII.

The root cause is:
1. Two MachineInstr might get SAME priority(MFUsx) from minFuncUnits.
2. Current comparison operator() will return `MFUs1 > MFUs2`.
3. We use iterators for MachineInstr, so the input to FuncUnitSorter
   might be different on differnt platform due to the iterator nature.

So for two MI with same MFU, their order is actually depends on the
iterator order, which is platform (implemtation) dependent.

This is risky, and may cause cross-compiling problems.

The fix is to check make sure we assign a determine order when they are
equal.

Reviewers: bcahoon, hfinkel, jmolloy

Subscribers: nemanjai, hiraditya, MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65992

llvm-svn: 368441
2019-08-09 14:10:57 +00:00
Whitney Tsang dd3b6498b0 Title: Loop Cache Analysis
Summary: Implement a new analysis to estimate the number of cache lines
required by a loop nest.
The analysis is largely based on the following paper:

Compiler Optimizations for Improving Data Locality
By: Steve Carr, Katherine S. McKinley, Chau-Wen Tseng
http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf
The analysis considers temporal reuse (accesses to the same memory
location) and spatial reuse (accesses to memory locations within a cache
line). For simplicity the analysis considers memory accesses in the
innermost loop in a loop nest, and thus determines the number of cache
lines used when the loop L in loop nest LN is placed in the innermost
position.

The result of the analysis can be used to drive several transformations.
As an example, loop interchange could use it determine which loops in a
perfect loop nest should be interchanged to maximize cache reuse.
Similarly, loop distribution could be enhanced to take into
consideration cache reuse between arrays when distributing a loop to
eliminate vectorization inhibiting dependencies.

The general approach taken to estimate the number of cache lines used by
the memory references in the inner loop of a loop nest is:

Partition memory references that exhibit temporal or spatial reuse into
reference groups.
For each loop L in the a loop nest LN: a. Compute the cost of the
reference group b. Compute the 'cache cost' of the loop nest by summing
up the reference groups costs
For further details of the algorithm please refer to the paper.
Authored By: etiotto
Reviewers: hfinkel, Meinersbur, jdoerfert, kbarton, bmahjour, anemet,
fhahn
Reviewed By: Meinersbur
Subscribers: reames, nemanjai, MaskRay, wuzish, Hahnfeld, xusx595,
venkataramanan.kumar.llvm, greened, dmgreen, steleman, fhahn, xblvaOO,
Whitney, mgorny, hiraditya, mgrang, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63459

llvm-svn: 368439
2019-08-09 13:56:29 +00:00
Simon Pilgrim 60394f47b0 [X86][SSE] Swap X86ISD::BLENDV inputs with an inverted selection mask (PR42825)
As discussed on PR42825, if we are inverting the selection mask we can just swap the inputs and avoid the inversion.

Differential Revision: https://reviews.llvm.org/D65522

llvm-svn: 368438
2019-08-09 12:44:20 +00:00
Sanjay Patel 991834a516 [GlobalOpt] prevent crashing on large integer types (PR42932)
This is a minimal fix (copy the predicate for the assert) to
prevent the crashing seen in:
https://bugs.llvm.org/show_bug.cgi?id=42932
...when converting a constant integer of arbitrary width to uint64_t.

Differential Revision: https://reviews.llvm.org/D65970

llvm-svn: 368437
2019-08-09 12:43:25 +00:00
Simon Atanasyan 242c5a70d4 [Mips][Codegen] Fix fast-isel mixing of FGR64 and AFGR64 registers
Fast-isel was picking AFGR64 register class for processing call
arguments when +fp64 options was used. We simply check is option +fp64
is used and pick appropriate register.

Patch by Mirko Brkusanin.

Differential Revision: https://reviews.llvm.org/D65886

llvm-svn: 368433
2019-08-09 12:02:32 +00:00
Andrea Di Biagio cbec9af6bf [MCA] Add flag -show-encoding to llvm-mca.
Flag -show-encoding enables the printing of instruction encodings as part of the
the instruction info view.

Example (with flags -mtriple=x86_64--  -mcpu=btver2):

Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)
[7]: Encoding Size

[1]    [2]    [3]    [4]    [5]    [6]    [7]    Encodings:     Instructions:
 1      2     1.00                         4     c5 f0 59 d0    vmulps   %xmm0, %xmm1, %xmm2
 1      4     1.00                         4     c5 eb 7c da    vhaddps  %xmm2, %xmm2, %xmm3
 1      4     1.00                         4     c5 e3 7c e3    vhaddps  %xmm3, %xmm3, %xmm4

In this example, column Encoding Size is the size in bytes of the instruction
encoding. Column Encodings reports the actual instruction encodings as byte
sequences in hex (objdump style).

The computation of encodings is done by a utility class named mca::CodeEmitter.

In future, I plan to expose the CodeEmitter to the instruction builder, so that
information about instruction encoding sizes can be used by the simulator. That
would be a first step towards simulating the throughput from the decoders in the
hardware frontend.

Differential Revision: https://reviews.llvm.org/D65948

llvm-svn: 368432
2019-08-09 11:26:27 +00:00
Pablo Barrio 3cdd586be2 [AArch64] Set pref. func. align to 8 bytes on Neoverse E1 & Cortex-A65
Summary:
The Arm Neoverse E1 and Cortex-A65 Software Optimization Guide [1][2],
Section "4.7 Branch instruction alignment" state:

"It is preferable for branch targets, including subroutine entry points,
to be placed on aligned 64-bit boundaries to maximize instruction fetch
efficiency."

This patch sets the preferred function alignment on Neoverse E1 and
Cortex-A65 to 2^3=8B. This was already the case in some Cortex-A CPUs
such as Cortex-A53.

[1] https://developer.arm.com/docs/swog466751/latest/arm-neoversetm-e1-core-software-optimization-guide
[2] https://developer.arm.com/docs/swog010045/latest/arm-cortex-a65-core-software-optimization-guide

Reviewers: dmgreen, fhahn, samparker

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65937

llvm-svn: 368431
2019-08-09 11:05:15 +00:00
Tim Northover 01eb869114 AArch64: support TLS on Darwin platforms in GlobalISel.
All TLS access on Darwin is in the "general dynamic" form where we call
a function to resolve the address, so implementation is pretty simple.

llvm-svn: 368418
2019-08-09 09:32:38 +00:00
Tim Northover e1a5f668b3 GlobalISel: pack various parameters for lowerCall into a struct.
I've now needed to add an extra parameter to this call twice recently. Not only
is the signature getting extremely unwieldy, but just updating all of the
callsites and implementations is a pain. Putting the parameters in a struct
sidesteps both issues.

llvm-svn: 368408
2019-08-09 08:26:38 +00:00
Sam Parker 0dba791a25 [ARM][ParallelDSP] Replace SExt uses
As loads are combined and widened, we replaced their sext users
operands whereas we should have been replacing the uses of the sext.
I've added a load of tests, with only a few of them originally
causing assertion failures, the rest improve pattern coverage.

Differential Revision: https://reviews.llvm.org/D65740

llvm-svn: 368404
2019-08-09 07:48:50 +00:00
Bjorn Pettersson d218a3326e [InstSimplify] Report "Changed" also when only deleting dead instructions
Summary:
Make sure that we report that changes has been made
by InstSimplify also in situations when only trivially
dead instructions has been removed. If for example a call
is removed the call graph must be updated.

Bug seem to have been introduced by llvm-svn r367173
(commit 02b9e45a7e), since the code in question
was rewritten in that commit.

Reviewers: spatel, chandlerc, foad

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65973

llvm-svn: 368401
2019-08-09 07:08:25 +00:00
Craig Topper 6179175551 [X86] Remove code that expands truncating stores from combineStore.
We shouldn't form trunc stores that need to be expanded now that
we are using widening legalization.

llvm-svn: 368400
2019-08-09 06:59:53 +00:00
Craig Topper 7e33f11ba7 [X86] Remove stale FIXME from combineMaskedStore. NFC
I believe PR34584 was tracking that FIXME, but its since been
closed and a test case was added.

llvm-svn: 368397
2019-08-09 05:55:41 +00:00
Craig Topper 8c5c09780d [X86] Remove DAG combine expansion of extending masked load and truncating masked store.
The only way to generate these was through promoting legalization
of narrow vectors, but we widen those types now. So we shouldn't
produce these nodes.

llvm-svn: 368396
2019-08-09 05:53:37 +00:00
Craig Topper 509c8774fa [X86] Remove handler for (U/S)(ADD/SUB)SAT from ReplaceNodeResults. Remove TypeWidenVector check from code that handles X86ISD::VPMADDWD and X86ISD::AVG.
More unneeded code since we now legalize narrow vectors by widening.

llvm-svn: 368395
2019-08-09 05:17:52 +00:00
Craig Topper 824961824f [X86] Remove ISD::SETCC handling from ReplaceNodeResults.
This is no longer needed since we widen v2i32 instead of promoting.

llvm-svn: 368394
2019-08-09 05:17:48 +00:00
Craig Topper ef5b435b00 [X86] Simplify ISD::LOAD handling in ReplaceNodeResults and ISD::STORE handling in LowerStore now that v2i32 is widened to v4i32.
llvm-svn: 368390
2019-08-09 03:09:43 +00:00
Craig Topper 0da681a2be [X86] Merge v2f32 and v2i32 gather/scatter handling in ReplaceNodeResults/LowerMSCATTER now that v2i32 is also widened like v2f32.
llvm-svn: 368389
2019-08-09 03:09:28 +00:00
Craig Topper 6f81db0f68 [X86] Now unreachable handling for f64->v2i32/v4i16/v8i8 bitcasts from ReplaceNodeResults.
We rely on the generic type legalizer for this now.

llvm-svn: 368388
2019-08-09 03:09:19 +00:00
Craig Topper d871f638d7 [X86] Simplify ReplaceNodeResults handling for FP_TO_SINT/UINT for vectors to only handle widening.
llvm-svn: 368387
2019-08-09 03:09:10 +00:00
Craig Topper 0bd44d59db [X86] Simplify ReplaceNodeResults handling for SIGN_EXTEND/ZERO_EXTEND/TRUNCATE for vectors to only handle widening.
llvm-svn: 368386
2019-08-09 03:08:54 +00:00
Craig Topper cdb9a8ebd8 [X86] Simplify ReplaceNodeResults handling for UDIV/UREM/SDIV/SREM for vectors to only handle widening.
llvm-svn: 368385
2019-08-09 03:08:45 +00:00
Craig Topper 35848345f0 [X86] Remove vector promotion handling from the ReplaceNodeResults ISD::MUL handling code.
We now widen illegal vector types so we don't need this anymore.

llvm-svn: 368384
2019-08-09 03:08:28 +00:00
David Blaikie 0fcc1f7bac DebugInfo/DWARF: Provide some (pretty half-hearted) error handling access when parsing units
This isn't the most robust error handling API, but does allow clients to
opt-in to getting Errors they can handle. I suspect the long-term
solution would be to move away from the lazy unit parsing and have an
explicit step that parses the unit and then allows access to the other
APIs that require a parsed unit.

llvm-dwarfdump could be expanded to use this (or newer/better API) to
demonstrate the benefit of it - but for now lld will use this in a
follow-up cl which ensures lld can exit non-zero on errors like this (&
provide more descriptive diagnostics including which object file the
error came from).

(error access to later errors when parsing nested DIEs would be good too
- but, again, exposing that without it being a hassle for every consumer
may be tricky)

llvm-svn: 368377
2019-08-09 01:14:33 +00:00
Akira Hatanaka 3e61ed0299 Change the return type of UpgradeARCRuntimeCalls to void
Nothing is using the function return.

llvm-svn: 368367
2019-08-08 23:33:17 +00:00
David Blaikie 5b9508396c Remove else-after-return
llvm-svn: 368364
2019-08-08 23:17:23 +00:00
Peter Collingbourne bb17e46644 Linker: Add support for GlobalIFunc.
GlobalAlias and GlobalIFunc ought to be treated the same by the IR
linker, so we can generalize the code to be in terms of their common
base class GlobalIndirectSymbol.

Differential Revision: https://reviews.llvm.org/D55046

llvm-svn: 368357
2019-08-08 22:09:18 +00:00
Cameron McInally 8416f20f2f [LICM] Support unary FNeg in LICM
Differential Revision: https://reviews.llvm.org/D65908

llvm-svn: 368350
2019-08-08 21:38:31 +00:00
Craig Topper c49d3e6c4d [X86] Improve codegen of v8i64->v8i16 and v16i32->v16i8 truncate with avx512vl, avx512bw, min-legal-vector-width<=256 and prefer-vector-width=256
Under this configuration we'll want to split the v8i64 or v16i32 into two vectors. The default legalization will try to truncate each of those 256-bit pieces one step to 128-bit, concatenate those, then truncate one more time from the new 256 to 128 bits.

With this patch we now truncate the two splits to 64-bits then concatenate those. We have to do this two different ways depending on whether have widening legalization enabled. Without widening legalization we have to manually construct X86ISD::VTRUNC to prevent the ISD::TRUNCATE with a narrow result being promoted to 128 bits with a larger element type than what we want followed by something like a pshufb to grab the lower half of each element to finish the job. With widening legalization we just get the right thing. When we switch to widening by default we can just delete the other code path.

Differential Revision: https://reviews.llvm.org/D65626

llvm-svn: 368349
2019-08-08 21:36:47 +00:00
Craig Topper 9158e54270 [SelectionDAG][X86] Move setcc mask splitting for mload/mstore/mgather/mscatter from DAGCombiner to the type legalizer.
We may be able to look to how VSELECT is handled to further
improve this, but this appears to be neutral or an improvement
on the test cases we have.

llvm-svn: 368344
2019-08-08 21:14:08 +00:00
Craig Topper bce4d79f37 [LegalizeTypes] Remove SplitVSETCC helper and just call SplitVecRes_SETCC.
llvm-svn: 368343
2019-08-08 21:13:58 +00:00
Guozhi Wei 80347c3acc [MBP] Disable aggressive loop rotate in plain mode
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.

To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.

Differential Revision: https://reviews.llvm.org/D65673

llvm-svn: 368339
2019-08-08 20:25:23 +00:00
Brian Cain 6dbbd0f343 [llvm-mc] Add reportWarning() to MCContext
Adding reportWarning() to MCContext, so that it can be used from
the Hexagon assembler backend.

llvm-svn: 368327
2019-08-08 19:13:23 +00:00
Craig Topper 9d55e2c85e [X86] Make CMPXCHG16B feature imply CMPXCHG8B feature.
This fixes znver1 so that it properly enables CMPXHG8B. We can
probably remove explicit CMPXCHG8B from CPUs that also have
CMPXCHG16B, but keeping this simple to allow cherry pick to 9.0.

Fixes PR42935.

llvm-svn: 368324
2019-08-08 18:11:17 +00:00
Pirama Arumuga Nainar 0cb2a33dfd [AArch64] Do not emit '#' before immediates in inline asm
Summary:
The A64 assembly language does not require the '#' character to
introduce constant immediate operands.  Avoid the '#' since the AArch64
asm parser does not accept '#' before the lane specifier and rejects the
following:
  __asm__ ("fmla v2.4s, v0.4s, v1.s[%0]" :: "I"(0x1))

Fix a test to not expect the '#' and add a new test case with the above
asm.

Fixes: https://github.com/android-ndk/ndk/issues/1036

Reviewers: peter.smith, kristof.beyls

Subscribers: javed.absar, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65550

llvm-svn: 368320
2019-08-08 17:50:39 +00:00
Akira Hatanaka ecde8c7ad4 [ObjC][ARC] Upgrade calls to ARC runtime functions to intrinsic calls if
the bitcode has the arm64 retainAutoreleasedReturnValue marker

The ARC middle-end passes stopped optimizing or transforming bitcode
that has been compiled with old compilers after we started emitting
calls to ARC runtime functions as intrinsic calls instead of normal
function calls in the front-end and made changes to teach the ARC
middle-end passes about those intrinsics (see r349534). This patch
converts calls to ARC runtime functions that are not intrinsic functions
to intrinsic function calls if the bitcode has the arm64
retainAutoreleasedReturnValue marker. Checking for the presence of the
marker is necessary to make sure we aren't changing ARC function calls
that were originally MRR message sends (see r349952).

rdar://problem/53280660

Differential Revision: https://reviews.llvm.org/D65902

llvm-svn: 368311
2019-08-08 16:59:31 +00:00
Simon Pilgrim eb7a553db8 [X86] XFormVExtractWithShuffleIntoLoad - handle shuffle mask scaling
If the target shuffle mask is from a wider type, attempt to scale the mask so that the extraction can attempt to peek through.

Fixes the regression mentioned in rL368307

llvm-svn: 368308
2019-08-08 16:05:23 +00:00
Simon Pilgrim 67c246bbe6 [X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using DemandedElts mask
If we don't demand all elements, then attempt to combine to a simpler shuffle.

At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts. 

The insertps-combine.ll regression is because XFormVExtractWithShuffleIntoLoad can't see through shuffles of different widths - this will be fixed in a follow-up commit.

llvm-svn: 368307
2019-08-08 15:54:20 +00:00
David Tenty 8558aac82c Enable assembly output of local commons for AIX
Summary:
This patch enable assembly output of local commons for AIX using .lcomm
directives. Adds a EmitXCOFFLocalCommonSymbol to MCStreamer so we can emit the
AIX version of .lcomm assembly directives which include a csect name. Handle the
case of BSS locals in PPCAIXAsmPrinter by using EmitXCOFFLocalCommonSymbol. Adds
a test for generating .lcomm on AIX Targets.

Reviewers: cebowleratibm, hubert.reinterpretcast, Xiangling_L, jasonliu, sfertile

Reviewed By: sfertile

Subscribers: wuzish, nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64825

llvm-svn: 368306
2019-08-08 15:40:35 +00:00
David Green 27ca82f32a [ARM] Add support for MVE pre and post inc loads and stores
This adds pre- and post- increment and decrements for MVE loads and stores. It
uses the builtin pre and post load/store detection, unlike Neon. Loads are
selected with the code in tryT2IndexedLoad, stores are selected with tablegen
patterns. The immediates have a +/-7bit range, multiplied by the size of the
element.

Differential Revision: https://reviews.llvm.org/D63840

llvm-svn: 368305
2019-08-08 15:27:58 +00:00
David Green 824ffd8b12 [ARM] MVE big endian loads/stores
This adds some missing patterns for big endian loads/stores, allowing unaligned
loads/stores to also be selected with an extra VREV, which produces better code
than aligning through a stack. Also moves VLDR_P0 to not be LE only, and
adjusts some of the tests to show all that working.

Differential Revision: https://reviews.llvm.org/D65583

llvm-svn: 368304
2019-08-08 15:15:19 +00:00
Sam Elliott 856d5c5817 [RISCV] Allow ABI Names in Inline Assembly Constraints
Summary:
Clang will replace references to registers using ABI names in inline
assembly constraints with references to architecture names, but other
frontends do not. LLVM uses the regular assembly parser to parse inline asm,
so inline assembly strings can contain references to registers using their
ABI names.

This patch adds support for parsing constraints using either the ABI name or
the architectural register name. This means we do not need to implement the
ABI name replacement code in every single frontend, especially those like
Rust which are a very thin shim on top of LLVM IR's inline asm, and that
constraints can more closely match the assembly strings they refer to.

Reviewers: asb, simoncook

Reviewed By: simoncook

Subscribers: hiraditya, rbar, johnrusso, JDevlieghere, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65947

llvm-svn: 368303
2019-08-08 14:59:16 +00:00
Sam Elliott cd44aee3da [RISCV] Minimal stack realignment support
Summary:
Currently the RISC-V backend does not realign the stack. This can be an issue even for the RV32I/RV64I ABIs (where the stack is 16-byte aligned), though is rare. It will be much more comment with RV32E (though the alignment requirements for common data types remain under-documented...).

This patch adds minimal support for stack realignment. It should cope with large realignments. It will error out if the stack needs realignment and variable sized objects are present.

It feels like a lot of the code like getFrameIndexReference and determineFrameLayout could be refactored somehow, as right now it feels fiddly and brittle. We also seem to allocate a lot more memory than GCC does for equivalent C code.

Reviewers: asb

Reviewed By: asb

Subscribers: wwei, jrtc27, s.egerton, MaskRay, Jim, lenary, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62007

llvm-svn: 368300
2019-08-08 14:40:54 +00:00
Thomas Preud'homme b1add2b774 [FileCheck] Add missing includes in header
Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65778

llvm-svn: 368297
2019-08-08 13:56:59 +00:00
Tim Corringham 4f64f1ba3c Add llvm.licm.disable metadata
For some targets the LICM pass can result in sub-optimal code in some
cases where it would be better not to run the pass, but it isn't
always possible to suppress the transformations heuristically.

Where the front-end has insight into such cases it is beneficial
to attach loop metadata to disable the pass - this change adds the
llvm.licm.disable metadata to enable that.

Differential Revision: https://reviews.llvm.org/D64557

llvm-svn: 368296
2019-08-08 13:46:17 +00:00
Simon Pilgrim 59fabf9c60 [X86][SSE] matchBinaryPermuteShuffle - split INSERTPS combines
We need to prefer INSERTPS with zeros over SHUFPS, but fallback to INSERTPS if that fails.

llvm-svn: 368292
2019-08-08 13:23:53 +00:00
Simon Pilgrim e2e366797e [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT
This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.

In particular this helps remove some unnecessary scalar->vector->scalar patterns.

The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.

Differential Revision: https://reviews.llvm.org/D65887

llvm-svn: 368276
2019-08-08 10:37:03 +00:00
Andrea Di Biagio 987331671f [MCA] Remove dependency from InstrBuilder in mca::Context. NFC
InstrBuilder is not required to construct the default pipeline.

llvm-svn: 368275
2019-08-08 10:30:58 +00:00
Petar Avramovic caef930699 [MIPS GlobalISel] Select jump_table and brjt
G_JUMP_TABLE and G_BRJT appear from translation of switch statement.
Select these two instructions for MIPS32, both pic and non-pic.

Differential Revision: https://reviews.llvm.org/D65861

llvm-svn: 368274
2019-08-08 10:21:12 +00:00
George Rimar d3963051c4 [yaml2obj/obj2yaml] - Add a basic support for extended section indexes.
In some cases a symbol might have section index == SHN_XINDEX.
This is an escape value indicating that the actual section header index
is too large to fit in the containing field.
Then the SHT_SYMTAB_SHNDX section is used. It contains the 32bit values
that stores section indexes.

ELF gABI says that there can be multiple SHT_SYMTAB_SHNDX sections,
i.e. for example one for .symtab and one for .dynsym
(1) https://groups.google.com/forum/#!topic/generic-abi/-XJAV5d8PRg
(2) DT_SYMTAB_SHNDX: http://www.sco.com/developers/gabi/latest/ch5.dynamic.html

In this patch I am only supporting a single SHT_SYMTAB_SHNDX associated
with a .symtab. This is a more or less common case which is used a few tests I saw in LLVM.

I decided not to create the SHT_SYMTAB_SHNDX section as "implicit",
but implement is like a kind of regular section for now.
i.e. tools do not recreate this section or its content, like they do for
symbol table sections, for example. That should allow to write all kind of
possible broken test cases for our needs and keep the output closer to requested.

Differential revision: https://reviews.llvm.org/D65446

llvm-svn: 368272
2019-08-08 09:49:05 +00:00
Sam Tebbs 7ca980edcd [ARM] Select VFMA
llvm-svn: 368264
2019-08-08 08:21:01 +00:00
Craig Topper 724c6053ac [X86] Remove -x86-experimental-vector-widening-legalization command line option and all its uses.
This option is now defaulted to true and we don't want to support
turning it off so remove the option.

llvm-svn: 368258
2019-08-08 06:48:22 +00:00
David Green 1becefd3f7 [ARM] Tighten up VLDRH.32 with low alignments
VLDRH needs to have an alignment of at least 2, including the
widening/narrowing versions. This tightens up the ISel patterns for it and
alters allowsMisalignedMemoryAccesses so that unaligned accesses are expanded
through the stack. It also fixed some incorrect shift amounts, which seemed to
be passing a multiple not a shift.

Differential Revision: https://reviews.llvm.org/D65580

llvm-svn: 368256
2019-08-08 06:22:03 +00:00
Craig Topper 0aacc7da8b [X86] Add CMOV_FR32X and CMOV_FR64X to the isCMOVPseudo function.
llvm-svn: 368250
2019-08-08 04:40:59 +00:00
Amy Huang 0b870b969f Recommit "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG"
with a fix to clear the SDNode map when SelectionDAG is cleared.

llvm-svn: 368230
2019-08-07 22:49:40 +00:00
Johannes Doerfert d1b79e0774 [Attributor][Stats] Locate statistics tracking with the attributes
Summary:
The ever growing switch required Attribute::AttrKind values but they
might not be available for all abstract attributes we deduce. With the
new method we track statistics at the abstract attribute level. The
provided macros simplify the usage and make the messages uniform.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65732

llvm-svn: 368227
2019-08-07 22:46:11 +00:00
Johannes Doerfert beb5150f47 [Attributor][NFC] Code simplification and style normalization
llvm-svn: 368225
2019-08-07 22:36:15 +00:00
Johannes Doerfert 344d038960 [Attributor] Introduce a state wrapper class
Summary:
The wrapper reduces boilerplate code and also provide a nice way to
determine the state type used by an abstract attributes statically via
AAType::StateType.

This was already discussed as part of the review of D65711.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65786

llvm-svn: 368224
2019-08-07 22:34:26 +00:00
Johannes Doerfert d620781872 [Attributor][NFC] Avoid unnecessary liveness queries
If we know everything is live there is no need to query for liveness.
Indicating a pessimistic fixpoint will cause the state to be "invalid"
which will cause the Attributor to not return the AAIsDead on request,
which will prevent us from querying isAssumedDead().

llvm-svn: 368223
2019-08-07 22:32:38 +00:00
Johannes Doerfert 14a0493a88 [Attributor] Provide easier checkForallReturnedValues functionality
Summary:
So far, whenever one wants to look at returned values, one had to deal
with the AAReturnedValues and potentially with the AAIsDead attribute.
In the same spirit as other checkForAllXXX methods, we add this
functionality now to the Attributor. By adopting the use sites we got
better results when return instructions were dead.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65733

llvm-svn: 368222
2019-08-07 22:27:24 +00:00
Craig Topper 005b22855e [LoopVectorize][X86] Clamp interleave factor if we have a known constant trip count that is less than VF*interleave
If we know the trip count, we should make sure the interleave factor won't cause the vectorized loop to exceed it.

Improves one of the cases from PR42674

Differential Revision: https://reviews.llvm.org/D65896

llvm-svn: 368215
2019-08-07 21:44:14 +00:00
David Blaikie 1b1f1d6677 DebugInfo/DWARF: Remove unused return type from DWARFUnit::extractDIEsIfNeeded
llvm-svn: 368212
2019-08-07 21:31:33 +00:00
Craig Topper 7f7ef0208b [X86] Allow pack instructions to be used for 512->256 truncates when -mprefer-vector-width=256 is causing 512-bit vectors to be split
If we're splitting the 512-bit vector anyway and we have zero/sign bits, then we might as well use pack instructions to concat and truncate at once.

Differential Revision: https://reviews.llvm.org/D65904

llvm-svn: 368210
2019-08-07 21:16:10 +00:00
Bob Haarman 885fa02da9 Revert r367501 "Create unique, but identically-named ELF sections..."
This reverts commit fbc563e2cb "Create
unique, but identically-named ELF sections for explicitly-sectioned
functions and globals when using -function-sections and
-data-sections."

Reason for revert: sections are created with potentially wrong
attributes.

llvm-svn: 368204
2019-08-07 20:45:23 +00:00
David Blaikie 353938ec68 Fix indentation
llvm-svn: 368198
2019-08-07 19:09:31 +00:00
Craig Topper 66c08430f6 [ValueTracking] When calculating known bits for integer abs, make sure we're looking at a negate and not just any instruction with the nsw flag set.
The matchSelectPattern code can match patterns like (x >= 0) ? x : -x
for absolute value. But it can also match ((x-y) >= 0) ? (x-y) : (y-x).
If the latter form was matched we can only use the nsw flag if its
set on both subtracts.

This match makes sure we're looking at the former case only.

Differential Revision: https://reviews.llvm.org/D65692

llvm-svn: 368195
2019-08-07 18:28:16 +00:00
Stefan Stipanovic aaa5270c53 [Attributor] Introduce checkForAllReadWriteInstructions(...).
Summary: Similarly to D65731 `Attributor::checkForAllReadWriteInstructions` is introduced.

Reviewers: jdoerfert, uenoku

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65825

llvm-svn: 368194
2019-08-07 18:26:02 +00:00
Nikolai Bozhenov 03edcd68dd [SCEV] Return zero from computeConstantDifference(X, X)
Without this patch computeConstantDifference returns None for cases like
these:

  computeConstantDifference(%x, %x)
  computeConstantDifference({%x,+,16}, {%x,+,16})

Differential Revision: https://reviews.llvm.org/D65474

llvm-svn: 368193
2019-08-07 17:38:38 +00:00
Florian Hahn d8c3c17394 [DataLayout] Check StackNatural and FunctionPtr alignments.
MaybeAlignment asserts that the passed in value is == 0 or a power of 2.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=16272

Reviewers: michaelplatings, gchatelet, jakehehrlich, jfb

Reviewed By: gchatelet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65858

llvm-svn: 368191
2019-08-07 17:20:55 +00:00
David Blaikie 90146cd8b9 DebugInfo/DWARF: Normalize DWARFObject members on the DWARF spec section names
Some of these names were abbreviated, some were not, some pluralised,
some not. Made the API difficult to use - since it's an exact 1:1
mapping to the DWARF sections - use those names (changing underscore
separation for camel casing).

llvm-svn: 368189
2019-08-07 17:18:11 +00:00
Nico Weber 1919317929 Support: Remove needless allocation when getMainExecutable() calls readlink()
We built a StringRef from a string literal which we then converted to a
std::string to call c_str().  Just use a pointer to the string literal
instead of a StringRef.

No behavior change.

Differential Revision: https://reviews.llvm.org/D65890

llvm-svn: 368187
2019-08-07 17:00:19 +00:00
Craig Topper 8b5f2ab2a4 Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."
The assert that caused this to be reverted should be fixed now.

Original commit message:

This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.

This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.

Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.

llvm-svn: 368183
2019-08-07 16:24:26 +00:00
Oliver Cruickshank 4d4eefda6c [ARM] Expand CTPOP intrinsic for MVE
llvm-svn: 368180
2019-08-07 15:47:45 +00:00
Jay Foad 8e8b295835 [InstCombine] Propagate fast math flags through selects
Summary:
In SimplifySelectsFeedingBinaryOp, propagate fast math flags from the
outer op into both arms of the new select, to take advantage of
simplifications that require fast math flags.

Reviewers: mcberg2017, majnemer, spatel, arsenm, xbolva00

Subscribers: wdng, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65658

llvm-svn: 368175
2019-08-07 15:16:28 +00:00
Cameron McInally 303b6dbfb4 [EarlyCSE] Add support for unary FNeg to EarlyCSE
Differential Revision: https://reviews.llvm.org/D65815

llvm-svn: 368171
2019-08-07 14:34:41 +00:00
Tim Northover 3c10f346dc GlobalISel: factor common code from translateCall and translateInvoke. NFC.
llvm-svn: 368166
2019-08-07 12:43:53 +00:00
Simon Pilgrim d52bc482a5 [X86] EltsFromConsecutiveLoads - early out for non-byte sized memory (PR42909)
Don't attempt to merge loads for types that aren't modulo 8-bits.

llvm-svn: 368165
2019-08-07 12:41:59 +00:00
Sander de Smalen 1d2bfa4a86 [AArch64][WinCFI] Do not pair callee-save instructions in LoadStoreOptimizer
Prevent the LoadStoreOptimizer from pairing any load/store instructions with
instructions from the prologue/epilogue if the CFI information has encoded the
operations as separate instructions.  This would otherwise lead to a mismatch
of the actual prologue size from the size as recorded in the Windows CFI.

Reviewers: efriedma, mstorsjo, ssijaric

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D65817

llvm-svn: 368164
2019-08-07 12:41:38 +00:00
Simon Atanasyan e5fa049efa [mips] Make a couple of class methods plain static functions. NFC
llvm-svn: 368162
2019-08-07 12:21:41 +00:00
Simon Atanasyan 8a7c0e7c0a [mips] Use isMicroMips() function to check enabled feature flag. NFC
llvm-svn: 368161
2019-08-07 12:21:32 +00:00
Simon Atanasyan 9f2e076f27 [Mips] Instruction `sc` now accepts symbol as an argument
Function MipsAsmParser::expandMemInst() did not properly handle
instruction `sc` with a symbol as an argument because first argument
would be counted twice. We add additional checks and handle this case
separately.

Patch by Mirko Brkusanin.

Differential Revision: https://reviews.llvm.org/D64252

llvm-svn: 368160
2019-08-07 12:21:26 +00:00
Benjamin Kramer ea134f221f [Support] Base SmartMutex on std::recursive_mutex
- Remove support for non-recursive mutexes. This was unused.
- The std::recursive_mutex is now created/destroyed unconditionally.
  Locking is still only done if threading is enabled.
- Alias SmartScopedLock to std::lock_guard.

This should make no semantic difference on the existing APIs.

llvm-svn: 368158
2019-08-07 11:59:57 +00:00
Igor Kudrin 45ee93323b Remove support for 32-bit offsets in utility classes (5/5)
Differential Revision: https://reviews.llvm.org/D65641

llvm-svn: 368156
2019-08-07 11:44:47 +00:00
Simon Pilgrim 0eafe011ca [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::VECTOR_SHUFFLE
In particular this helps the SSE vector shift cvttps2dq+add+shl pattern by avoiding the need for zeros in shuffle style extensions to vXi32 types as we'll be shifting out those bits anyway

llvm-svn: 368155
2019-08-07 11:43:13 +00:00
Benjamin Kramer 3d5360a439 Replace llvm::MutexGuard/UniqueLock with their standard equivalents
All supported platforms have <mutex> now, so we don't need our own
copies any longer. No functionality change intended.

llvm-svn: 368149
2019-08-07 10:57:25 +00:00
Oliver Cruickshank 30dcae0956 [ARM] Generate MVE VHADDs/VHSUBs
llvm-svn: 368146
2019-08-07 10:26:57 +00:00
Roman Lebedev 9bece444dd [InstCombine] Recommit: Shift amount reassociation: shl-trunc-shl pattern
This was initially committed in r368059 but got reverted in r368084
because there was a faulty logic in how the shift amounts type mismatch
was being handled (it simply wasn't).

I've added an explicit bailout before we SimplifyAddInst() - i don't think
it's designed in general to handle differently-typed values, even though
the actual problem only comes from ConstantExpr's.

I have also changed the common type deduction, to not just blindly
look past zext, but try to do that so that in the end types match.

Differential Revision: https://reviews.llvm.org/D65380

llvm-svn: 368141
2019-08-07 09:41:50 +00:00
Rui Ueyama cac8df1ab9 Re-submit r367649: Improve raw_ostream so that you can "write" colors using operator<<
The original patch broke buildbots, perhaps because it changed the
default setting whether colors are enabled or not.

llvm-svn: 368131
2019-08-07 08:08:17 +00:00
Sam Parker 173de03740 [ARM][LowOverheadLoops] Revert after read/write
Currently we check whether LR is stored/loaded to/from inbetween the
loop decrement and loop end pseudo instructions. There's two problems
here:
- It relies on all load/store instructions being labelled as such in
  tablegen.
- Actually any use of loop decrement is troublesome because the value
  doesn't exist!
    
So we need to check for any read/write of LR that occurs between the
two instructions and revert if we find anything.

Differential Revision: https://reviews.llvm.org/D65792

llvm-svn: 368130
2019-08-07 07:39:19 +00:00
Yevgeny Rouban cb87f3734b Force check prof branch_weights consistency in SwitchInstProfUpdateWrapper
This patch turns on the prof branch_weights metadata consistency
check in SwitchInstProfUpdateWrapper.

If this patch causes a failure then please before reverting do report
the IR that hits the assertion and try identifying the pass that
introduces the inconsistency. We have to fix all such passes.

See also the upcoming change https://reviews.llvm.org/D61179
in the Verifier.

Reviewers: davidx, nikic, eraman, reames, chandlerc
Reviewed By: davidx
Differential Revision: https://reviews.llvm.org/D64061

llvm-svn: 368129
2019-08-07 07:17:45 +00:00
Craig Topper f192cc587c [X86] Allow any 8-bit immediate to be used with bt/btc/btr/bts memory aliases.
We have aliases that disambiguate memory forms of bt/btc/btr/bts
without suffixes to the 32-bit form. These aliases should have
been updated when the instructions were updated in r356413.

llvm-svn: 368127
2019-08-07 06:17:58 +00:00
Craig Topper 624980037d [X86] Use isInt<8> to simplify some code. NFC
llvm-svn: 368126
2019-08-07 06:17:55 +00:00
Kai Luo 02b8056cc1 [MachineCSE][NFC] Use 'profitable' rather than 'beneficial' to name method.
llvm-svn: 368124
2019-08-07 05:40:21 +00:00
Craig Topper 29688f4da0 [X86] Limit vpermil2pd/vpermil2ps immediates to 4 bits in the assembly parser.
The upper 4 bits of the immediate byte are used to encode a
register. We need to limit the explicit immediate to fit in the
remaining 4 bits.

Fixes PR42899.

llvm-svn: 368123
2019-08-07 05:34:27 +00:00
Alex Brachet c22d9666fc [yaml2obj] Move core yaml2obj code into lib and include for use in unit tests
Reviewers: jhenderson, rupprecht, MaskRay, grimar, labath

Reviewed By: rupprecht

Subscribers: gribozavr, mgrang, seiya, mgorny, sbc100, hiraditya, aheejin, jakehehrlich, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65255

llvm-svn: 368119
2019-08-07 02:44:49 +00:00
Alex Lorenz 8d5c280316 TLI: darwin does not support _bcmp
Not all Darwin targets support _bcmp in all circumstances.

Differential Revision: https://reviews.llvm.org/D65834

llvm-svn: 368113
2019-08-07 00:03:37 +00:00
Mitch Phillips 924359dc0f Revert "[X86] Add more extract subvector cost model tests for smaller element sizes and smaller than 128-bit vectors."
This reverts commit fc33e33776.

This commit depends on the rolled back commit rL367901, and thus needs
to be rolled back.

llvm-svn: 368109
2019-08-06 23:38:14 +00:00
Mitch Phillips bd0d97e1c4 Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."
This reverts commit 3de33245d2.

This commit broke the MSan buildbots. See
https://reviews.llvm.org/rL367901 for more information.

llvm-svn: 368107
2019-08-06 23:00:43 +00:00
Bill Wendling 73be7cf5aa Use parenthses to silence warning.
llvm-svn: 368105
2019-08-06 22:47:47 +00:00
Peter Collingbourne 0930643ff6 hwasan: Instrument globals.
Globals are instrumented by adding a pointer tag to their symbol values
and emitting metadata into a special section that allows the runtime to tag
their memory when the library is loaded.

Due to order of initialization issues explained in more detail in the comments,
shadow initialization cannot happen during regular global initialization.
Instead, the location of the global section is marked using an ELF note,
and we require libc support for calling a function provided by the HWASAN
runtime when libraries are loaded and unloaded.

Based on ideas discussed with @evgeny777 in D56672.

Differential Revision: https://reviews.llvm.org/D65770

llvm-svn: 368102
2019-08-06 22:07:29 +00:00
Guanzhong Chen b3292a8469 [WebAssembly] Lower ASan constructor priority on Emscripten
Summary:
This change gives Emscripten the ability to use more than one constructor
priorities that runs before ASan. By convention, constructor priorites 0-100
are reserved for use by the system. ASan on Emscripten now uses priority 50,
leaving plenty of room for use by Emscripten before and after ASan.

This change is done in response to:
https://github.com/emscripten-core/emscripten/pull/9076#discussion_r310323723

Reviewers: kripken, tlively, aheejin

Reviewed By: tlively

Subscribers: cfe-commits, dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D65684

llvm-svn: 368101
2019-08-06 21:52:58 +00:00
Peter Collingbourne 411d96f99a IR: Disable verifier check for GlobalValues with private linkage named after a comdat for non-COFF.
This check is only meaningful for COFF and it is perfectly valid to create
such a GlobalValue in ELF.

Differential Revision: https://reviews.llvm.org/D65686

llvm-svn: 368094
2019-08-06 21:47:18 +00:00
Craig Topper ecc1e5d476 [X86] Don't allow combineSIntToFP to create v2i32 vectors after type legalization.
If we're after type legalization we should only be trying to turn
v2i64 into v2i32. So bitcast to v4i32, shuffle the even elements
together. Then use X86ISD::CVTSI2P. The alternative is to leave
the v2i64 type alone and let it scalarized. Hopefully keeping
it packed is better.

Fixes PR42905.

llvm-svn: 368091
2019-08-06 21:43:15 +00:00
Reid Kleckner e4bd38478b Revert [InstCombine] Shift amount reassociation: shl-trunc-shl pattern
This reverts r368059 (git commit 0f95710976)

This caused Clang to assert while self-hosting and compiling
SystemZInstrInfo.cpp. Reduction is running.

llvm-svn: 368084
2019-08-06 20:32:07 +00:00
Craig Topper fc33e33776 [X86] Add more extract subvector cost model tests for smaller element sizes and smaller than 128-bit vectors.
With the switch to widening legalization, we need to a better
job of costing extractions of less than 128-bits.

llvm-svn: 368081
2019-08-06 20:12:41 +00:00
Kristina Brooks 26e60f0653 [Attributor][modulemap] Revert r368064 but fix the build
Commit r368064 was necessary after r367953 (D65712) broke the module
build. That happened, apparently, because the template class IRAttribute
defined in the header had a virtual method defined in the corresponding
source file (IRAttribute::manifest). To unbreak the situation this patch
introduces a helper function IRAttributeManifest::manifestAttrs which
is used to implement IRAttribute::manifest in the header. The deifnition
of the helper function is still in the source file.

Patch by jdoerfert (Johannes Doerfert)

Differential Revision: https://reviews.llvm.org/D65821

llvm-svn: 368076
2019-08-06 19:53:19 +00:00
Aditya Nandakumar 6bbfde5c48 [GISel]: Fix trivial build breakage
llvm-svn: 368067
2019-08-06 17:53:04 +00:00
Aditya Nandakumar c8ac029d0a [GISel]: Add GISelKnownBits analysis
https://reviews.llvm.org/D65698

This adds a KnownBits analysis pass for GISel. This was done as a
pass (compared to static functions) so that we can add other features
such as caching queries(within a pass and across passes) in the future.
This patch only adds the basic pass boiler plate, and implements a lazy
non caching knownbits implementation (ported from SelectionDAG). I've
also hooked up the AArch64PreLegalizerCombiner pass to use this - there
should be no compile time regression as the analysis is lazy.

llvm-svn: 368065
2019-08-06 17:18:29 +00:00
Daniel Sanders d9934d4939 [globalisel] Allow SrcOp to convert an APInt and render it as an immediate operand (MO.isImm() == true)
Summary:
This is tested by D61289 but has been pulled into a separate patch at
a reviewers request.

Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm, rovka

Reviewed By: arsenm

Subscribers: javed.absar, hiraditya, wdng, kristof.beyls, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61321

llvm-svn: 368063
2019-08-06 17:16:27 +00:00
Roman Lebedev 213817327f [X86] Move CPU features for Barcelona/K10 out of line
Summary:
Cleans X86.td's Barcelona entry to be more like the others,
by moving the features out of the `Proc<>`, thus potentially
making it possible to inherit from them.
Split off from D63628

Reviewers: craig.topper, RKSimon

Reviewed By: craig.topper

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65791

llvm-svn: 368061
2019-08-06 17:04:02 +00:00
Roman Lebedev 0f95710976 [InstCombine] Shift amount reassociation: shl-trunc-shl pattern
Summary:
Currently `reassociateShiftAmtsOfTwoSameDirectionShifts()` only handles
two shifts one after another. If the shifts are `shl`, we still can
easily perform the fold, with no extra legality checks:
https://rise4fun.com/Alive/OQbM

If we have right-shift however, we won't be able to make it
any simpler than it already is.

After this the only thing missing here is constant-folding: (`NewShAmt >= bitwidth(X)`)
* If it's a logical shift, then constant-fold to `0` (not `undef`)
* If it's a `ashr`, then a splat of original signbit
https://rise4fun.com/Alive/E1K
https://rise4fun.com/Alive/i0V

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65380

llvm-svn: 368059
2019-08-06 17:03:40 +00:00
Diego Caballero 8bac17709e Re-land D65760/r367944
Fixed most vexing parse ambiguation.

llvm-svn: 368055
2019-08-06 16:24:17 +00:00
Jonas Devlieghere cb6f2646fd [Path] Fix bug in make_absolute logic
This fixes a bug for making path with a //net style root absolute. I
discovered the bug while writing a test case for the VFS, which uses
these paths because they're both legal absolute paths on Windows and
Unix.

Differential revision: https://reviews.llvm.org/D65675

llvm-svn: 368053
2019-08-06 15:46:45 +00:00
Sander de Smalen ad7e95df5a [AArch64] NFC: Generalize emitFrameOffset to support more than byte offsets.
Refactor emitFrameOffset to accept a StackOffset struct as its offset argument.
This method currently only supports byte offsets (MVT::i8) but will be extended
in a later patch to support scalable offsets (MVT::nxv1i8) as well.

Reviewers: thegameg, rovka, t.p.northover, efriedma, greened

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D61436

llvm-svn: 368049
2019-08-06 15:06:31 +00:00
Hubert Tong fc34a536d0 [XCOFF][MC] report_fatal_error before dereferencing NULL
This patch replaces a TODO comment with a call to `report_fatal_error`.
The path that reaches the added call to `report_fatal_error` manifestly
dereferences a null pointer.

llvm-svn: 368048
2019-08-06 15:05:20 +00:00
Simon Pilgrim dae5ddad9d [TargetLowering] SimplifyMultipleUseDemandedBits - return UNDEF for undemanded ops
If we demand no bits/elts from an Op, just return UNDEF

llvm-svn: 368043
2019-08-06 14:30:42 +00:00
Tim Renouf 5a0794327a [StructurizeCFG] Enable -structurizecfg-relaxed-uniform-regions by default
D62198 introduced an option to relax the checks for
hasOnlyUniformBranches. This commit turns the option on by default, for
better code generation in some cases in AMDGPU.

Differential Revision: https://reviews.llvm.org/D63198

Change-Id: I9cbff002a1e74d3b7eb96b4192dc8129936d537d
llvm-svn: 368042
2019-08-06 14:30:19 +00:00
Dmitri Gribenko fc21bb661f Revert "[yaml2obj] Move core yaml2obj code into lib and include for use in unit tests"
This reverts commit r368021, it broke tests.

llvm-svn: 368035
2019-08-06 13:39:50 +00:00
Tim Northover b5abc425d2 AArch64: bail instead of asserting on unexpected type in G_CONSTANT 0.
llvm-svn: 368031
2019-08-06 13:34:08 +00:00
Simon Pilgrim cf62047d29 [X86][SSE] Call SimplifyMultipleUseDemandedBits on PACKSS/PACKUS arguments.
This mainly helps to replace unused arguments with UNDEF in the case where they have multiple users.

llvm-svn: 368026
2019-08-06 13:10:42 +00:00
Sander de Smalen 612b038966 [AArch64] NFC: Add generic StackOffset to describe scalable offsets.
To support spilling/filling of scalable vectors we need a more generic
representation of a stack offset than simply 'int'.

For this we introduce the StackOffset struct, which comprises multiple
offsets sized by their respective MVTs. Byte-offsets will thus be a simple
tuple such as { offset, MVT::i8 }. Adding two byte-offsets will result in a
byte offset { offsetA + offsetB, MVT::i8 }. When two offsets have different
types, we can canonicalise them to use the same MVT, as long as their
runtime sizes are guaranteed to have the same size-ratio as they would have
at compile-time.

When we have both scalable- and fixed-size objects on the stack, we can 
create an offset that is: 

  ({ offset_fixed, MVT::i8 } + { offset_scalable, MVT::nxv1i8 })

The struct also contains a getForFrameOffset() method that is specific to
AArch64 and decomposes the frame-offset to be used directly in instructions
that operate on the stack or index into the stack.

Note: This patch adds StackOffset as an AArch64-only concept, but we would
like to make this a generic concept/struct that is supported by all 
interfaces that take or return stack offsets (currently as 'int'). Since
that would be a bigger change that is currently pending on D32530 landing,
we thought it makes sense to first show/prove the concept in the AArch64
target before proposing to roll this out further.

Reviewers: thegameg, rovka, t.p.northover, efriedma, greened

Reviewed By: rovka, greened

Differential Revision: https://reviews.llvm.org/D61435

llvm-svn: 368024
2019-08-06 13:06:40 +00:00
Simon Pilgrim 01d267dc4f [X86] SimplifyMultipleUseDemandedBits - target shuffles might not be identity
If we don't demand any non-undef shuffle elements then the assert will fail as all shuffle inputs would still be flagged as 'identity' safe.

Exposed by an incoming patch.

llvm-svn: 368022
2019-08-06 12:41:29 +00:00
Alex Brachet 3cfeaa4d2c [yaml2obj] Move core yaml2obj code into lib and include for use in unit tests
Reviewers: jhenderson, rupprecht, MaskRay, grimar, labath

Reviewed By: rupprecht

Subscribers: seiya, mgorny, sbc100, hiraditya, aheejin, jakehehrlich, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65255

llvm-svn: 368021
2019-08-06 12:15:18 +00:00
Igor Kudrin 2836cf0b72 Try to unbreak buildbots after r368014
llvm-svn: 368018
2019-08-06 11:12:13 +00:00
Simon Pilgrim c6735aecfa [X86][SSE] Enable min/max partial reduction
As mentioned on D65047 / rL366933 the plan is to enable partial reduction handling wherever possible.

llvm-svn: 368016
2019-08-06 11:00:34 +00:00
Igor Kudrin f26a70a5e7 Switch LLVM to use 64-bit offsets (2/5)
This updates all libraries and tools in LLVM Core to use 64-bit offsets
which directly or indirectly come to DataExtractor.

Differential Revision: https://reviews.llvm.org/D65638

llvm-svn: 368014
2019-08-06 10:49:40 +00:00
Igor Kudrin f5f35c5cd1 Support 64-bit offsets in utility classes (1/5)
Using 64-bit offsets is required to fully implement 64-bit DWARF.
As these classes are used in many different libraries they should
temporarily support both 32- and 64-bit offsets.

Differential Revision: https://reviews.llvm.org/D64006

llvm-svn: 368013
2019-08-06 10:47:20 +00:00
Ulrich Weigand 7b24dd741c [Strict FP] Allow custom operation actions
This patch changes the DAG legalizer to respect the operation actions
set by the target for strict floating-point operations. (Currently, the
legalizer will usually fall back to mutate to the non-strict action
(which is assumed to be legal), and only skip mutation if the strict
operation is marked legal.)

With this patch, if whenever a strict operation is marked as Legal or
Custom, it is passed to the target as usual. Only if it is marked as
Expand will the legalizer attempt to mutate to the non-strict operation.
Note that this will now fail if the non-strict operation is itself
marked as Custom -- the target will have to provide a Custom definition
for the strict operation then as well.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D65226

llvm-svn: 368012
2019-08-06 10:43:13 +00:00
Fangrui Song cb4327d7db Change two unnecessary uses of llvm::size(C) to C.size()
llvm-svn: 368011
2019-08-06 10:24:36 +00:00
Cullen Rhodes ced419f4d7 [SelectionDAG] Extend base addressing modes supported by MGATHER/MSCATTER
Summary:
Before this patch MGATHER/MSCATTER is capable of representing all
common addressing modes, but only when illegal types are used.
This patch adds an IndexType property so more representations
are available when using legal types only.

Original modes:
 vector of bases
 base + vector of signed scaled offsets

New modes:
 base + vector of signed unscaled offsets
 base + vector of unsigned scaled offsets
 base + vector of unsigned unscaled offsets

The current behaviour of addressing modes for gather/scatter remains
unchanged.

Patch by Paul Walker.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D65636

llvm-svn: 368008
2019-08-06 09:46:13 +00:00
Tim Northover de98e92bc2 AArch64: use xzr/wzr for constant 0 in GlobalISel.
COPYs from xzr and wzr can often be folded away entirely during register
allocation, unlike a movz.

llvm-svn: 368003
2019-08-06 09:18:41 +00:00
Guillaume Chatelet a7b6a7c851 [LLVM][Alignment] Introduce Alignment In Attributes
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: jfb

Subscribers: hiraditya, dexonsmith, llvm-commits, courbet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65742

llvm-svn: 368002
2019-08-06 09:16:33 +00:00
Guillaume Chatelet 396521378f [LLVM][Alignment] Introduce Alignment In GlobalObject
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: jfb

Subscribers: hiraditya, dexonsmith, llvm-commits, courbet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65748

Address comments

llvm-svn: 368000
2019-08-06 09:03:21 +00:00
Bill Wendling ebc2cf9c27 Use "isa" since the variable isn't used.
llvm-svn: 367985
2019-08-06 07:27:26 +00:00
Hideki Saito ec818d7fb3 [LV][NFC] Share the LV illegality reporting with LoopVectorize.
Reviewers: hsaito, fhahn, rengolin
 
Reviewed By: rengolin
 
Patch by psamolysov, thanks!
 
Differential Revision: https://reviews.llvm.org/D62997

llvm-svn: 367980
2019-08-06 06:08:48 +00:00
Matt Arsenault f4d3113a5f CodeGen: Migration to using Register
llvm-svn: 367974
2019-08-06 03:59:31 +00:00
Austin Kerbow a05c384132 Re-commit: [AMDGPU] Use S_DENORM_MODE for gfx10
Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10.

Reviewers: arsenm, rampitec

Reviewed By: arsenm, rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65620

llvm-svn: 367969
2019-08-06 02:16:11 +00:00
Johannes Doerfert 21fe0a314e [Attributor][NFC] Outline common pattern into helper method
This helper will also allow to also place logic to determine if an
abstract attribute is necessary in the first place.

llvm-svn: 367966
2019-08-06 00:55:11 +00:00
Johannes Doerfert d0f6400978 [Attributor] Provide a generic interface to check live instructions
Summary:
Similar to `Attributor::checkForAllCallSites`, we now provide such
functionality for instructions of a certain opcode through
`Attributor::checkForAllInstructions` and the convenient wrapper
`Attributor::checkForAllCallLikeInstructions`. This cleans up code,
avoids duplication, and simplifies the usage of liveness information.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65731

llvm-svn: 367961
2019-08-06 00:32:43 +00:00
Shiva Chen b12056bd33 [RISCV] Custom legalize i32 operations for RV64 to reduce signed extensions
Differential Revision: https://reviews.llvm.org/D65434

llvm-svn: 367960
2019-08-06 00:24:00 +00:00
Peter Collingbourne f0380bac5f Silence ubsan after r367926.
Fixes e.g.
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-ubsan/builds/14273

We can't left shift here because left shifting of a negative number is UB.
The same doesn't apply to unsigned arithmetic, but switching to unsigned
doesn't appear to stop ubsan from complaining, so we need to mask out the
high bits.

llvm-svn: 367959
2019-08-06 00:21:30 +00:00
Puyan Lotfi 37fe40c330 Reverting D65760/r367944 due to buildbot failure.
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/15952/steps/build/logs/stdio

JITTargetMachineBuilder.cpp fails to build.

llvm-svn: 367954
2019-08-05 23:47:07 +00:00
Johannes Doerfert eccdf08577 [Attributor] Introduce the IRAttribute helper struct
Summary:
Certain properties, e.g., an AttrKind, are not shared among all abstract
attributes. This patch extracts the functionality into a helper struct.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65712

llvm-svn: 367953
2019-08-05 23:35:12 +00:00
Johannes Doerfert fb69f7688a [Attributor] Make abstract attributes stateless
To remove boilerplate, mostly passing through values to the
AbstractAttriubute base class, we extract the state into an IRPosition
helper. There is no function change intended but the IRPosition struct
will provide more functionality down the line.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65711

llvm-svn: 367952
2019-08-05 23:32:31 +00:00
Johannes Doerfert 2402062557 [Attributor] Use proper ID for attribute lookup
Summary:
The new scheme is similar to the pass manager and dyn_cast scheme where
we identify classes by the address of a static member. This is better
than the old scheme in which we had to "invent" new Attributor enums if
there was no corresponding one.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65710

llvm-svn: 367951
2019-08-05 23:30:01 +00:00
Johannes Doerfert 007153e9d4 [Attributor][NFCI] Avoid duplication of the InformationCache reference
Summary:
Instead of storing the reference to the InformationCache we now pass it
whenever it might be needed.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65709

llvm-svn: 367950
2019-08-05 23:26:06 +00:00
Johannes Doerfert e83f303938 [Attributor] Deduce the "no-return" attribute for functions
A function is "no-return" if we never reach a return instruction, either
because there are none or the ones that exist are dead.

Test have been adjusted:
  - either noreturn was added, or
  - noreturn was avoided by modifying the code.

The new noreturn_{sync,async} test make sure we do handle invoke
instructions with a noreturn (and potentially nowunwind) callee
correctly, even in the presence of potential asynchronous exceptions.

llvm-svn: 367948
2019-08-05 23:22:05 +00:00
Amara Emerson bc1172df14 [GlobalISel][CallLowering] Rename isArgumentHandler() -> isIncomingArgumentHandler()
Previous name and comment incorrectly implied it was just for formal arg handlers,
which is not true.

llvm-svn: 367945
2019-08-05 23:05:28 +00:00
Diego Caballero 1647758882 [ORC] Add CPU name and sub-target features to detectHost
This commit adds host CPU name and sub-target features to the
`JITTargetMachineBuilder` created by `JITTargetMachineBuilder::detectHost()`.

Differential Revision: https://reviews.llvm.org/D65760

llvm-svn: 367944
2019-08-05 23:02:12 +00:00
Keno Fischer 5c3cdef84b [WebAssembly] Fix conflict between ret legalization and sjlj
Summary:
When the WebAssembly backend encounters a return type that doesn't
fit within i32, SelectionDAG performs sret demotion, adding an
additional argument to the start of the function that contains
a pointer to an sret buffer to use instead. However, this conflicts
with the emscripten sjlj lowering pass. There we translate calls like:

```
	call {i32, i32} @foo()
```

into (in pseudo-llvm)
```
	%addr = @foo
	call {i32, i32} @__invoke_{i32,i32}(%addr)
```

i.e. we perform an indirect call through an extra function.
However, the sret transform now transforms this into
the equivalent of
```
        %addr = @foo
        %sret = alloca {i32, i32}
        call {i32, i32} @__invoke_{i32,i32}(%sret, %addr)
```
(while simultaneously translation the implementation of @foo as well).
Unfortunately, this doesn't work out. The __invoke_ ABI expected
the function address to be the first argument, causing crashes.

There is several possible ways to fix this:
1. Implementing the sret rewrite at the IR level as well and performing
   it as part of lowering to __invoke
2. Fixing the wasm backend to recognize that __invoke has a special ABI
3. A change to the binaryen/emscripten ABI to recognize this situation

This revision implements the middle option, teaching the backend to
treat __invoke_ functions specially in sret lowering. This is achieved
by
1) Introducing a new CallingConv ID for invoke functions
2) When this CallingConv ID is seen in the backend and the first argument
   is marked as sret (a function pointer would never be marked as sret),
   swapping the first two arguments.

Reviewed By: tlively, aheejin
Differential Revision: https://reviews.llvm.org/D65463

llvm-svn: 367935
2019-08-05 21:36:09 +00:00
Johannes Doerfert 3d7bbc6f9c [Attributor][Fix] Do not remove instructions during manifestation
When we remove instructions cached references could still be live. This
patch avoids removing invoke instructions that are replaced by calls and
instead keeps them around but in a dead block.

llvm-svn: 367933
2019-08-05 21:35:02 +00:00
Daniel Sanders eac86ec25f Revert Register/MCRegister: Add conversion operators to avoid use of implicit convert to unsigned. NFC
MSVC finds ambiguity where clang doesn't and it looks like it's not going to be an easy fix
Reverting while I figure out how to fix it

This reverts r367916 (git commit aa15ec3c23)
This reverts r367920 (git commit 5d14efe279)

llvm-svn: 367932
2019-08-05 21:34:45 +00:00
Johannes Doerfert 924d2138fc [Attributor][Fix] Keep invokes if handlers catch asynchronous exceptions
Similar to other places where we transform invokes to calls we need to
be careful if the handler (=personality) can catch asynchronous
exceptions as they are not modeled as part of nounwind.

This is tested with D59978.

llvm-svn: 367931
2019-08-05 21:34:45 +00:00
Eric Christopher 1d73e228db BMI2 support is indicated in bit eight of EBX, not nine.
See Intel SDM, Vol 2A, Table 3-8:
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-2a-manual.pdf#page=296

Differential Revision: https://reviews.llvm.org/D65766

llvm-svn: 367929
2019-08-05 21:25:59 +00:00
Peter Collingbourne a56d81f4fb llvm-symbolizer: Untag addresses in object files by default.
Any addresses that we pass to llvm-symbolizer are going to be untagged,
while any HWASAN instrumented globals are going to be tagged in the
symbol table. Therefore we need to untag the addresses before using them.

Differential Revision: https://reviews.llvm.org/D65769

llvm-svn: 367926
2019-08-05 20:59:25 +00:00
Lang Hames 1707735fa4 [ORC] Work around broken GCC/libstdc++ by adding an explicit conversion.
This should fix the bots that have been failing due to r367712.

llvm-svn: 367921
2019-08-05 20:30:35 +00:00
Daniel Sanders 5d14efe279 Fix MSVC error after r367916
It seems that MSVC sees ambiguity between the operator==()'s where clang
doesn't

llvm-svn: 367920
2019-08-05 20:03:43 +00:00
Amara Emerson 85e5e28ab4 [AArch64][GlobalISel] Inline tiny memcpy et al at -O0.
FastISel already does this since the initial arm64 port was upstreamed, so
it seems there are no issues with doing this at -O0 for very small memcpys.

Gives a 0.2% geomean code size improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D65758

llvm-svn: 367919
2019-08-05 20:02:52 +00:00
Dmitri Gribenko 37aa8ad663 Revert "[AMDGPU] Use S_DENORM_MODE for gfx10"
This reverts commit r367882. It broke the test
MC/Disassembler/AMDGPU/gfx10_dasm_all.txt.

llvm-svn: 367904
2019-08-05 18:36:43 +00:00
Craig Topper 3de33245d2 [X86] Enable -x86-experimental-vector-widening-legalization by default.
This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.

This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.

Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.

llvm-svn: 367901
2019-08-05 18:25:36 +00:00
Evandro Menezes a005c1ac4f [AArch64] Expand bcmp() for small block lengths
Patch D56593 by @courbet results in calls to `bcmp()` in some cases, should
the target support the it.  Unless `TTI::MemCmpExpansionOptions()`
is overridden by the target.

In a proprietary benchmark we see a performance drop of about 12% on PNG
compression before this patch, though it passes all tests.

This patch mirrors X86 for AArch64 and initializes
`TTI::MemCmpExpansionOptions()` to then expand calls to `bcmp()` when
appropriate.  No tuning of the parameters was performed, but, at this point,
it's enough to recover the performance drop above.

This problem also exists on ARM.  Once a consensus is reached for AArch64, we
can work to fix ARM as well.

Authors:
- Evandro Menezes (@evandro) <e.menezes@samsung.com>
- Brian Rzycki (@brzycki) <b.rzycki@samsung.com>

Differential revision: https://reviews.llvm.org/D64805

llvm-svn: 367898
2019-08-05 18:09:14 +00:00
Pablo Barrio a8426b43f8 [AArch64] Set preferred function alignment to 16 bytes on Neoverse N1
Summary:
The Arm Neoverse N1 Software Optimization Guide [1], Section "4.8 Branch
instruction alignment" states:

"Consider aligning subroutine entry points and branch targets to 32B
boundaries, within the bounds of the code-density requirements of the
program."

This patch sets the preferred function alignment on Neoverse N1 to 2^4=16B.
This was already the case in some of the latest Cortex-A CPUs. Benchmarking
in previous Cortex-A CPUs suggested that 16B alignment is already better
than the default. See commit d04ee305.

The reason we don't set it to 32B right now (as the optimisation guide
suggests) is that this will impact code size and perhaps the instruction
cache performance. Therefore we need benchmark numbers first.

I have also added testing for A75 and A76 that we were missing.

[1] https://developer.arm.com/docs/swog309707/latest

Reviewers: fhahn, greened, samparker, dmgreen

Reviewed By: dmgreen

Subscribers: dmgreen, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65654

llvm-svn: 367894
2019-08-05 17:38:58 +00:00
Sanjay Patel 5dbb90bfe1 [InstCombine] combine mul+shl separated by zext
This appears to slightly help patterns similar to what's
shown in PR42874:
https://bugs.llvm.org/show_bug.cgi?id=42874
...but not in the way requested.

That fix will require some later IR and/or backend pass to
decompose multiply/shifts into something more optimal per
target. Those transforms already exist in some basic forms,
but probably need enhancing to catch more cases.

https://rise4fun.com/Alive/Qzv2

llvm-svn: 367891
2019-08-05 16:59:58 +00:00
Austin Kerbow 8d229dbb47 [AMDGPU] Use S_DENORM_MODE for gfx10
Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10.

Reviewers: arsenm, rampitec

Reviewed By: arsenm, rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65620

llvm-svn: 367882
2019-08-05 16:09:49 +00:00
Tom Stellard e15d95a987 AMDGPU/LoadStoreOptimizer: Set the correct offset whem merging MMOs
Summary:
This is a follow up to r367237.  MachineFunction::getMachineMemOperand()
adds the offset parameter to the existing offset instead of resetting it.
So we need to reset the offset to the correct value after calling this
function.

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65557

llvm-svn: 367881
2019-08-05 16:08:44 +00:00
Sanjay Patel 1a29823b9c [InstCombine] add extra use constraint for shl-zext fold
As the test shows, we can end up with more instructions than
we started with if we don't include the extra-use check.

llvm-svn: 367880
2019-08-05 16:04:07 +00:00
Matt Arsenault 3922392969 AMDGPU: Correct behavior of f16 buffer loads
Don't assume format loads for f16. Also fixes support for targets
without i16.

llvm-svn: 367879
2019-08-05 15:59:07 +00:00
Matt Arsenault 0e0a1c80fb AMDGPU: Correct behavior of f16/i16 non-format store intrinsics
This was switching to use a format store for a non-format store for
f16 types. Also fixes i16/f16 stores on targets without legal f16.

The corresponding loads also need to be fixed.

llvm-svn: 367872
2019-08-05 14:57:59 +00:00
Matt Arsenault ff6b007772 AMDGPU/GlobalISel: Alternative mappings for constants
Without context we assume SGPR. Allowing VGPR constants theoretically
helps avoid a copy. This seems to not actually work now, and the
choice isn't based on the use bank.

llvm-svn: 367871
2019-08-05 14:40:26 +00:00
Matt Arsenault 4e21730300 AMDGPU/GlobalISel: Don't reject shader types
I'm not sure what complications these present, but the current
argument lowering is pretty much directly copied from the DAG
lowering, so I assume these work as they should.

No tests because I'm lazy and things are getting pretty close to the
point where the existing calling-conventions.ll can be shared with
SelectionDAG.

llvm-svn: 367870
2019-08-05 14:40:23 +00:00
Nilanjana Basu da60fc813c Changing representation of .cv_def_range directives in Codeview debug info assembly format for better readability
llvm-svn: 367867
2019-08-05 14:16:58 +00:00
Nilanjana Basu b5e4d7de17 Revert "Changing representation of .cv_def_range directives in Codeview debug info assembly format for better readability"
This reverts commit a885afa9fa.

llvm-svn: 367861
2019-08-05 13:55:21 +00:00
Cullen Rhodes 2a48176373 [AArch64] Implement initial SVE calling convention support
Summary:

This patch adds initial support for the SVE calling convention such that
SVE types can be passed as arguments and return values to/from a
subroutine.

The SVE AAPCS states [1]:

    z0-z7 are used to pass scalable vector arguments to a subroutine,
    and to return scalable vector results from a function. If a
    subroutine takes arguments in scalable vector or predicate
    registers, or if it is a function that returns results in such
    registers, it must ensure that the entire contents of z8-z23 are
    preserved across the call. In other cases it need only preserve the
    low 64 bits of z8-z15, as described in §5.1.2.

    p0-p3 are used to pass scalable predicate arguments to a subroutine
    and to return scalable predicate results from a function. If a
    subroutine takes arguments in scalable vector or predicate
    registers, or if it is a function that returns results in these
    registers, it must ensure that p4-p15 are preserved across the call.
    In other cases it need not preserve any scalable predicate register
    contents.

SVE predicate and data registers are passed indirectly (i.e. spilled to the
stack and pass the address) if they exceed the registers used for argument
passing defined by the PCS referenced above.  Until SVE stack support is merged
we can't spill SVE registers to the stack, so currently an llvm_unreachable is
used where we will eventually handle this.

[1] https://static.docs.arm.com/100986/0000/100986_0000.pdf

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D65448

llvm-svn: 367859
2019-08-05 13:44:10 +00:00
Nilanjana Basu a885afa9fa Changing representation of .cv_def_range directives in Codeview debug info assembly format for better readability
llvm-svn: 367850
2019-08-05 13:11:51 +00:00
Sanjay Patel eaf13044bd [DAGCombiner][x86] prevent infinite loop from truncate/extend transforms
The test case is based on the example from the post-commit thread for:
https://reviews.llvm.org/rGc9171bd0a955

This replaces the x86-specific simple-type check from:
rL367766
with a check in the DAGCombiner. Adding the check isn't
strictly necessary after the fix from:
rL367768
...but it seems likely that we're heading for trouble if
we are creating weird types in this transform.

I combined the earlier legality check into the initial
clause to simplify the code.

So we should only try the trunc/sext transform at the
earliest combine stage, but we limit the transform to
simple types anyway because the TLI hook is probably
too lax about what it considers a free truncate.

llvm-svn: 367834
2019-08-05 11:27:07 +00:00
Graham Hunter 208d63ea90 [MVT][SVE] Map between scalable vector IR Type and VTs
Adds a two way mapping between the scalable vector IR type and
corresponding SelectionDAG ValueTypes.

Reviewers: craig.topper, jeroen.dobbelaere, fhahn, rengolin, greened, rovka

Reviewed By: greened

Differential Revision: https://reviews.llvm.org/D47770

llvm-svn: 367832
2019-08-05 11:18:19 +00:00
Florian Hahn e3ea97b049 [AArch64] Skip isZIPMask check for masks with an odd number of elements.
We process 2 elements at a time and expect the number of elements to be
even. Similar to D60690.

Reviewers: dmgreen, samparker, t.p.northover

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D65400

llvm-svn: 367831
2019-08-05 11:12:23 +00:00
Guillaume Chatelet c97a3d15d2 [LLVM][Alignment] Introduce Alignment Type
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jfb, jakehehrlich

Reviewed By: jfb

Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65514

llvm-svn: 367828
2019-08-05 11:02:05 +00:00
David Bolvansky ef72cded32 [TLI][NFC] Fixed typo
llvm-svn: 367827
2019-08-05 10:14:09 +00:00
Guillaume Chatelet 6c5fb61f8b [LLVM][Alignment] Introduce Alignment In CallingConv
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Subscribers: hiraditya, llvm-commits, courbet, jfb

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65659

llvm-svn: 367822
2019-08-05 09:49:09 +00:00
Nicolai Haehnle e204786b6c AMDGPU: add missing llvm.amdgcn.{raw,struct}.buffer.atomic.{inc,dec}
Summary:
Wrapping increment/decrement. These aren't exposed by many APIs...

Change-Id: I1df25c7889de5a5ba76468ad8e8a2597efa9af6c

Reviewers: arsenm, tpr, dstuttard

Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65283

llvm-svn: 367821
2019-08-05 09:36:06 +00:00
Oliver Stannard 8ed8353fc4 Reland: Fix and test inter-procedural register allocation for ARM
Add an explicit construction of the ArrayRef, gcc 5 and earlier don't
seem to select the ArrayRef constructor which takes a C array when the
construction is implicit.

Original commit message:

- Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves
  with a null RegScavenger. Simply not updating the register scavenger
  is fine because IPRA only cares about the SavedRegs vector, the acutal
  code of the function has already been generated at this point.
- Add a new hook to TargetRegisterInfo to get the set of registers which
  can be clobbered inside a call, even if the compiler can see both
  sides, by linker-generated code.

Differential revision: https://reviews.llvm.org/D64908

llvm-svn: 367819
2019-08-05 09:04:10 +00:00
Guillaume Chatelet 65e4b47aad [LLVM][Alignment] Introduce Alignment Type in DataLayout
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jfb, jakehehrlich

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65521

Make getFunctionPtrAlign() return MaybeAlign

llvm-svn: 367817
2019-08-05 09:00:43 +00:00
Michael Pozulp 3046ef5c11 Revert "[llvm-objdump] Re-commit r367284."
This reverts r367776 (git commit d34099926e).
My changes to llvm-objdump tests caused them to fail on windows:
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast/builds/27368

llvm-svn: 367816
2019-08-05 08:52:28 +00:00
Fangrui Song db26488bf9 [DWARF] Change DWARFDebugLoc::Entry::Loc from SmallVector<char, 4> to SmallString<4>
SmallString has a conversion to StringRef, which can be leveraged to
simplify two use sites.

llvm-svn: 367801
2019-08-05 06:33:52 +00:00
Fangrui Song d9b948b6eb Rename F_{None,Text,Append} to OF_{None,Text,Append}. NFC
F_{None,Text,Append} are kept for compatibility since r334221.

llvm-svn: 367800
2019-08-05 05:43:48 +00:00
Craig Topper 635f5ff580 [X86] Fix a bad early out in combineExtInVec that prevented recursive shuffle combining from running with -x86-experimental-vector-widening-legalization.
llvm-svn: 367798
2019-08-05 03:48:31 +00:00
Johannes Doerfert 305b961f64 [Attributor][NFC] Create some attributes earlier
llvm-svn: 367793
2019-08-04 18:40:01 +00:00
Johannes Doerfert 6471bb6f18 [Attributor][NFC] Improve debug output
llvm-svn: 367792
2019-08-04 18:39:28 +00:00
Johannes Doerfert 4361da24ac [Attributor][Fix] Resolve various liveness issues
Summary:
This contains various fixes:
  - Explicitly determine and return the next noreturn instruction.
  - If an invoke calls a noreturn function which is not nounwind we
    keep the unwind destination live. This also means we require an
    invoke. Though we can still add the unreachable to the normal
    destination block.
  - Check if the return instructions are dead after we look for calls
    to avoid triggering an optimistic fixpoint in the presence of
    assumed liveness information.
  - Make the interface work with "const" pointers.
  - Some simplifications

While additional tests are included, full coverage is achieved only with
D59978.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65701

llvm-svn: 367791
2019-08-04 18:38:53 +00:00
Johannes Doerfert d1c3793563 [Attributor][NFC] Simplify common pattern wrt. fixpoints
When a fixpoint is indicated the change status is known due to the
fixpoint kind. This simplifies a common code pattern by making the
connection explicit.

llvm-svn: 367790
2019-08-04 18:37:38 +00:00
Johannes Doerfert b6acee5c7b [Attributor][NFC] Invalid DerefState is at fixpoint
Summary:
If the DerefBytesState (and thereby the DerefState) is invalid, we
reached a fixpoint for the whole DerefState as we will not
manifest/provide information then.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65586

llvm-svn: 367789
2019-08-04 17:55:15 +00:00
Craig Topper 5a4989e2ac [TargetLowering][X86] Teach SimplifyDemandedVectorElts to replace the base vector of INSERT_SUBVECTOR with undef if none of the elements are demanded even if the node has other users.
Summary:
The SimplifyDemandedVectorElts function can replace with undef
when no elements are demanded, but due to how it interacts with
TargetLoweringOpts, it can only do this when the node has
no other users.

Remove a now unneeded DAG combine from the X86 backend.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65713

llvm-svn: 367788
2019-08-04 17:30:41 +00:00
Simon Pilgrim 436fd52a71 [X86] lowerShuffleAsSpecificZeroOrAnyExtend - use undef PSHUFB mask indices for ANY_EXTEND shuffles
llvm-svn: 367784
2019-08-04 13:15:23 +00:00
Simon Pilgrim c5891eaa34 Fix signed/unsigned comparison warning. NFC.
llvm-svn: 367783
2019-08-04 12:48:19 +00:00
Simon Pilgrim e16901844d [X86] SimplifyMultipleUseDemandedBits - Add target shuffle support
llvm-svn: 367782
2019-08-04 12:24:40 +00:00
David Green 91296295d0 [ARM] MVE big endian bitcasts
This adds big endian MVE patterns for bitcasts. They are defined in llvm as
being the same as a store of the existing type and the load into the new. This
means that they have to become a VREV between the two types, working in the
same way that NEON works in big-endian. This also adds some example tests for
bigendian, showing where code is and isn't different.

The main difference, especially from a testing perspective is that vectors are
passed as v2f64, and so are VREV into and out of call arguments, and the
parameters are passed in a v2f64 format. Same happens for inline assembly where
the register class is used, so it is VREV to a v16i8.

So some of this is probably not correct yet, but it is (mostly) self-consistent
and seems to be consistent with how llvm treats vectors. The rest we can
hopefully fix later. More details about big endian neon can be found in
https://llvm.org/docs/BigEndianNEON.html.

Differential Revision: https://reviews.llvm.org/D65581

llvm-svn: 367780
2019-08-04 10:18:15 +00:00
Michael Pozulp d34099926e [llvm-objdump] Re-commit r367284.
Add warning messages if disassembly + source for problematic inputs

Summary: Addresses https://bugs.llvm.org/show_bug.cgi?id=41905

Reviewers: jhenderson, rupprecht, grimar

Reviewed By: jhenderson, grimar

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62462

llvm-svn: 367776
2019-08-04 06:04:00 +00:00
Craig Topper 0fff1e4f3d [X86] Consistently use MVT::i8 for the constant operand of BLENDI and INSERTPS nodes.
This is the type listed in the type constraint for isel. But since
we list a type there, it doesn't get checked during isel matching.

llvm-svn: 367775
2019-08-04 06:01:31 +00:00
Craig Topper 76f0f2e0f0 [SelectionDAG] Add node creation debug message to getMemIntrinsicNode.
llvm-svn: 367771
2019-08-04 02:32:06 +00:00
Yonghong Song 44b16bd4a5 [Transforms] Do not drop !preserve.access.index metadata
Currently, when a GVN or CSE optimization happens,
the llvm.preserve.access.index metadata is dropped.
This caused a problem for BPF AbstructMemberOffset phase
as it relies on the metadata (debuginfo types).

This patch added proper hooks in lib/Transforms to
preserve !preserve.access.index metadata. A test
case is added to ensure metadata is preserved under CSE.

Differential Revision: https://reviews.llvm.org/D65700

llvm-svn: 367769
2019-08-03 23:41:26 +00:00
Craig Topper 2edeb8a11a [DAGCombiner] Prevent the combine added in r367710 from creating illegal types after type legalization.
This is further fix for PR42880.

Sanjay already disabled the X86 TLI hook for non-simple types,
but we should really call isTypeLegal here if we're after type
legalization.

llvm-svn: 367768
2019-08-03 23:09:13 +00:00
Sanjay Patel c9171bd0a9 [x86] change free truncate hook to handle only simple types (PR42880)
This avoids the crash from:
https://bugs.llvm.org/show_bug.cgi?id=42880
...and I think it's a proper constraint for the TLI hook.

But that example raises questions about what happens to get us
into this situation (created i29 types) and what happens later
(why does legalization die on those types), so I'm not sure if
we will resolve the bug based on this change.

llvm-svn: 367766
2019-08-03 21:46:27 +00:00
Keno Fischer 3c805d125a [WebAssembly] Fix allocsize attribute in sjlj lowering
Summary:
The allocsize attribute refers to call parameters by index.
Thus, when we add the extra parameter in sjlj lowering, we
need to increment the referenced paramater in the allocsize
attribute to avoid angering the Verifier.

Reviewed By: aheejin
Differential Revision: https://reviews.llvm.org/D65470

llvm-svn: 367765
2019-08-03 21:38:19 +00:00
Lang Hames 3daccaac8a [JITLink] Add support for MachO/x86-64 UNSIGNED relocs with length=2.
MachO/x86-64 UNSIGNED relocs are almost always 64-bit (length=3), but UNSIGNED
relocs of length=2 are allowed if the target resides in the low 32-bits. This
patch adds support for such relocations in JITLink (previously they would have
triggered an unsupported relocation error).

llvm-svn: 367764
2019-08-03 20:17:10 +00:00
Lang Hames b31229af4f [JITLink] Fix error message formatting.
llvm-svn: 367763
2019-08-03 20:17:08 +00:00
Stefan Stipanovic 7849e41635 [Attributor][NFC] run clang-format on Attributor.cpp
llvm-svn: 367757
2019-08-03 15:27:41 +00:00
Praveen Velliengiri f5c40cb900 Speculative Compilation
[ORC] Remove Speculator Variants for Different Program Representations

[ORC] Block Freq Analysis

Speculative Compilation with Naive Block Frequency

Add Applications to OrcSpeculation

ORC v2 with Block Freq Query & Example

Deleted BenchMark Programs

Signed-off-by: preejackie <praveenvelliengiri@gmail.com>

ORCv2 comments resolved

[ORCV2] NFC

ORCv2 NFC

[ORCv2] Speculative compilation - CFGWalkQuery

ORCv2 Adapting IRSpeculationLayer to new locking scheme

llvm-svn: 367756
2019-08-03 14:42:13 +00:00
Tim Northover a009a60a91 IR: print value numbers for unnamed function arguments
For consistency with normal instructions and clarity when reading IR,
it's best to print the %0, %1, ... names of function arguments in
definitions.

Also modifies the parser to accept IR in that form for obvious reasons.

llvm-svn: 367755
2019-08-03 14:28:34 +00:00
Sylvestre Ledru 6bf861298a Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC
llvm-svn: 367754
2019-08-03 13:51:58 +00:00
Nikita Popov 4f8259bdbc [Thumb] Fix invalid symbol redefinition due to duplicated jumptable (PR42760)
Fix for https://bugs.llvm.org/show_bug.cgi?id=42760. A tBR_JTr
instruction is duplicated by tail duplication, which results in
the same jumptable with the same label being emitted twice.

Fix this by marking tBR_JTr as not duplicable. The corresponding
ARM/Thumb instructions are already marked as not duplicable.
Additionally also mark tTBB_JT and tTBH_JT to be consistent with
Thumb2, even though this shouldn't be strictly necessary.

Differential Revision: https://reviews.llvm.org/D65606

llvm-svn: 367753
2019-08-03 06:47:23 +00:00
Bill Wendling 41a2847a9a Emit diagnostic if an inline asm constraint requires an immediate
Summary:
An inline asm call can result in an immediate after inlining. Therefore emit a
diagnostic here if constraint requires an immediate but one isn't supplied.

Reviewers: joerg, mgorny, efriedma, rsmith

Reviewed By: joerg

Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60942

llvm-svn: 367750
2019-08-03 05:52:47 +00:00
Hideto Ueno 96bb347205 [Attributor] Fix dereferenceable callsite argument initialization
llvm-svn: 367748
2019-08-03 04:10:50 +00:00
Amara Emerson c835164a47 Re-commit "[GlobalISel] Add legalization support for non-power-2 loads and stores""
This is an old commit that exposed a bug in the GISel importer, which caused
non-truncating stores to be selected for truncating store patterns. Now that's
been fixed in r367737 this can go back in.

llvm-svn: 367739
2019-08-02 23:44:24 +00:00
Craig Topper b1cfcd1a56 [ScalarizeMaskedMemIntrin] Bitcast the mask to the scalar domain and use scalar bit tests for the branches for expandload/compressstore.
Same as what was done for gather/scatter/load/store in r367489.
Expandload/compressstore were delayed due to lack of constant
masking handling that has since been fixed.

llvm-svn: 367738
2019-08-02 23:43:53 +00:00
Craig Topper 45ea25289d [X86] Use the pointer VT for the Scale node when lowering x86 gather/scatter intrinsics.
This is consistent with the target independent intrinsic handling.

Not sure this really matters since we just pull the constant out
using getZExtValue later.

llvm-svn: 367736
2019-08-02 23:18:16 +00:00
Yonghong Song 37d24a696b [BPF] Handling type conversions correctly for CO-RE
With newly added debuginfo type
metadata for preserve_array_access_index() intrinsic,
this patch did the following two things:
 (1). checking validity before adding a new access index
      to the access chain.
 (2). calculating access byte offset in IR phase
      BPFAbstractMemberAccess instead of when BTF is emitted.

For (1), the metadata provided by all preserve_*_access_index()
intrinsics are used to check whether the to-be-added type
is a proper struct/union member or array element.

For (2), with all available metadata, calculating access byte
offset becomes easier in BPFAbstractMemberAccess IR phase.
This enables us to remove the unnecessary complexity in
BTFDebug.cpp.

New tests are added for
  . user explicit casting to array/structure/union
  . global variable (or its dereference) as the source of base
  . multi demensional arrays
  . array access given a base pointer
  . cases where we won't generate relocation if we cannot find
    type name.

Differential Revision: https://reviews.llvm.org/D65618

llvm-svn: 367735
2019-08-02 23:16:44 +00:00
JF Bastien 748dac7389 Remove support for unsupported MSVC versions
Re-land r367727 with the #if fixed.

Reviewers: rnk, lebedev.ri

Subscribers: hiraditya, jkorous, dexonsmith, lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65662

llvm-svn: 367734
2019-08-02 23:09:01 +00:00
Douglas Yung 42618b270d Revert Fix and test inter-procedural register allocation for ARM
This reverts r367669 (git commit f6b00c279a)

This was breaking a build bot http://lab.llvm.org:8011/builders/netbsd-amd64/builds/21233

llvm-svn: 367731
2019-08-02 22:11:49 +00:00
JF Bastien 21d01ea9b6 Revert "Remove support for unsupported MSVC versions"
Mismatched preprocessor, I'll fix in a follow-up.

llvm-svn: 367728
2019-08-02 22:02:25 +00:00
JF Bastien dc8af80c19 Remove support for unsupported MSVC versions
Reviewers: rnk, lebedev.ri

Subscribers: hiraditya, jkorous, dexonsmith, lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65662

llvm-svn: 367727
2019-08-02 21:52:35 +00:00
Stefan Stipanovic d021617bf7 [Attributor] Using liveness in other attributes.
Modifying other AbstractAttributes to use Liveness AA and skip dead instructions.

Reviewers: jdoerfert, uenoku

Subscribers: hiraditya, llvm-commits

Differential revision: https://reviews.llvm.org/D65243

llvm-svn: 367725
2019-08-02 21:31:22 +00:00
Amara Emerson 73752abeab [AArch64][GlobalISel] Eliminate redundant G_ZEXT when the source is implicitly zext-loaded.
These cases can come up when the extending loads combiner doesn't combine a
zext(load) to a zextload op, due to some other operation being in between, which
then gets simplified at a later stage.

Differential Revision: https://reviews.llvm.org/D65360

llvm-svn: 367723
2019-08-02 21:15:36 +00:00
Simon Pilgrim 794f7591ec [TargetLowering] SimplifyMultipleUseDemandedBits - don't assume INSERT_VECTOR_ELT value type is simple.
Noticed by inspection - this was copied from the X86 target equivalent where we can assume its legal/simple.

llvm-svn: 367721
2019-08-02 21:07:07 +00:00
Daniel Sanders e7694f34ab Use MCRegister in MCRegisterInfo's interfaces
Summary:
As part of this, define DenseMapInfo for MCRegister (and Register while I'm at it)

Depends on D65599

Reviewers: arsenm

Subscribers: MatzeB, qcolombet, jvesely, wdng, nhaehnle, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65605

llvm-svn: 367719
2019-08-02 20:23:00 +00:00
Philip Reames 511be2a158 [Statepoints] Fix overalignment of loads in no-realign-stack functions
This really should have been part of 366765.  For some reason, I forgot to handle the corresponding load side, and the readable test cases (using deopt vs statepoints) turned out to be overly reduced.  Oops.

As seen in the test change, the problem was that we were using a load with alignment expectations rather than the unaligned variant when the stack alignment was less than that prefered type alignment.

llvm-svn: 367718
2019-08-02 20:17:37 +00:00
Peter Collingbourne 196931a7dd hwasan: Remove unused field CurModuleUniqueId. NFCI.
llvm-svn: 367717
2019-08-02 20:14:58 +00:00
Lang Hames 10430f4174 [ORC] Remove a dead method.
llvm-svn: 367716
2019-08-02 20:09:30 +00:00
Craig Topper de9b1d7912 [ScalarizeMaskedMemIntrin] Add constant mask support to expandload and compressstore scalarization
This adds support for generating all the loads or stores for a constant mask into a single basic block with no conditionals.

Differential Revision: https://reviews.llvm.org/D65613

llvm-svn: 367715
2019-08-02 20:04:34 +00:00
Lang Hames cb391279b4 [ORC] Turn on symbol-flags overrides for LLJIT on Windows by default.
libObject does not apply the Exported flag to symbols in COFF object files,
which can lead to assertions when the symbol flags initially derived from
IR added to the JIT clash with the flags seen by the JIT linker. Both
RTDyldObjectLinkingLayer and ObjectLinkingLayer have a workaround for this:
they can be told to override the flags seen by the linker with the flags
attached to the materialization responsibility object that was passed down
to the linker. This patch modifies LLJIT's setup code to enable this override
by default on platforms where COFF is the default object format.

llvm-svn: 367712
2019-08-02 19:43:20 +00:00
Sanjay Patel 68264558f9 [DAGCombiner] try to convert opposing shifts to casts
This reverses a questionable IR canonicalization when a truncate
is free:

sra (add (shl X, N1C), AddC), N1C -->
sext (add (trunc X to (width - N1C)), AddC')

https://rise4fun.com/Alive/slRC

More details in PR42644:
https://bugs.llvm.org/show_bug.cgi?id=42644

I limited this to pre-legalization for code simplicity because that
should be enough to reverse the IR patterns. I don't have any
evidence (no regression test diffs) that we need to try this later.

Differential Revision: https://reviews.llvm.org/D65607

llvm-svn: 367710
2019-08-02 19:33:46 +00:00
Eric Christopher 5fb56b1966 Temporarily Revert "Changing representation of cv_def_range directives in Codeview debug info assembly format for better readability"
This is breaking bots and the author asked me to revert.

This reverts commit 367704.

llvm-svn: 367707
2019-08-02 19:10:37 +00:00
Nilanjana Basu 1c67521591 Changing representation of cv_def_range directives in Codeview debug info assembly format for better readability
llvm-svn: 367704
2019-08-02 18:44:39 +00:00
Jessica Paquette e4c46c34ce [AArch64][GlobalISel] Support the neg_addsub_shifted_imm32 pattern
Add an equivalent ComplexRendererFns function for SelectNegArithImmed. This
allows us to select immediate adds of -1 by turning them into subtracts.

Update select-binop.mir to show that the pattern works.

Differential Revision: https://reviews.llvm.org/D65460

llvm-svn: 367700
2019-08-02 18:12:53 +00:00
Alina Sbirlea 5545e6963f [SimplifyCFG] Cleanup redundant conditions [NFC].
Summary:
Since the for loop iterates over BB's predecessors, the branch conditions found must have BB as one of the successors.
For an unconditional branch the successor must be BB, added `assert`.
For a conditional branch, one of the two successors must be BB, simplify `else if` to `else` and `assert`.
Sink common instructions outside the if/else block.

Reviewers: sanjoy.google

Subscribers: jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65596

llvm-svn: 367699
2019-08-02 18:06:54 +00:00
Daniel Sanders c94c91f55c Fix ARC after r367633
llvm-svn: 367697
2019-08-02 17:52:17 +00:00
Peter Collingbourne 4dcf8800e2 CodeGen: Don't follow aliases when extracting type info.
This fixes a crash in the case where the type info object is an alias
pointing to a non-zero offset within a global or is otherwise unanalyzable
by the stripPointerCasts() function. Looking through the alias is not the
right thing to do anyway for similar reasons as D65118.

Differential Revision: https://reviews.llvm.org/D65314

llvm-svn: 367696
2019-08-02 17:43:45 +00:00
Sanjay Patel 9ce5f41851 [InstCombine] fold cmp+select using select operand equivalence
As discussed in PR42696:
https://bugs.llvm.org/show_bug.cgi?id=42696
...but won't help that case yet.

We have an odd situation where a select operand equivalence fold was
implemented in InstSimplify when it could have been done more generally
in InstCombine if we allow dropping of {nsw,nuw,exact} from a binop operand.

Here's an example:
https://rise4fun.com/Alive/Xplr

  %cmp = icmp eq i32 %x, 2147483647
  %add = add nsw i32 %x, 1
  %sel = select i1 %cmp, i32 -2147483648, i32 %add
  =>
  %sel = add i32 %x, 1

I've left the InstSimplify code in place for now, but my guess is that we'd
prefer to remove that as a follow-up to save on code duplication and
compile-time.

Differential Revision: https://reviews.llvm.org/D65576

llvm-svn: 367695
2019-08-02 17:39:32 +00:00
Lang Hames 809e9d1efa [ORC] Change the locking scheme for ThreadSafeModule.
ThreadSafeModule/ThreadSafeContext are used to manage lifetimes and locking
for LLVMContexts in ORCv2. Prior to this patch contexts were locked as soon
as an associated Module was emitted (to be compiled and linked), and were not
unlocked until the emit call returned. This could lead to deadlocks if
interdependent modules that shared contexts were compiled on different threads:
when, during emission of the first module, the dependence was discovered the
second module (which would provide the required symbol) could not be emitted as
the thread emitting the first module still held the lock.

This patch eliminates this possibility by moving to a finer-grained locking
scheme. Each client holds the module lock only while they are actively operating
on it. To make this finer grained locking simpler/safer to implement this patch
removes the explicit lock method, 'getContextLock', from ThreadSafeModule and
replaces it with a new method, 'withModuleDo', that implicitly locks the context,
calls a user-supplied function object to operate on the Module, then implicitly
unlocks the context before returning the result.

ThreadSafeModule TSM = getModule(...);
size_t NumFunctions = TSM.withModuleDo(
    [](Module &M) { // <- context locked before entry to lambda.
      return M.size();
    });

Existing ORCv2 layers that operate on ThreadSafeModules are updated to use the
new method.

This method is used to introduce Module locking into each of the existing
layers.

llvm-svn: 367686
2019-08-02 15:21:37 +00:00
David Candler 7eacefedab [NFC] Test commit, corrected some spelling in comment
Test commit, corrected some spelling in comment.

Differential Revision: https://reviews.llvm.org/D65516

llvm-svn: 367685
2019-08-02 14:44:17 +00:00
Tim Northover 522fb7eedc GlobalISel: support swiftself attribute
llvm-svn: 367683
2019-08-02 14:09:49 +00:00
Teresa Johnson d2df54e6a5 [ThinLTO] Implement index-based WPD
This patch adds support to the WholeProgramDevirt pass to perform
index-based WPD, which is invoked from ThinLTO during the thin link.

The ThinLTO backend (WPD import phase) behaves the same regardless of
whether the WPD decisions were made with the index-based or (the
existing) IR-based analysis.

Depends on D54815.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D55153

llvm-svn: 367679
2019-08-02 13:10:52 +00:00
Martin Storsjo ed7e1cd877 [llvm-dlltool] Clarify an error message. NFC.
The parameter to the -D (--dllname) option is the name of the dll
that llvm-dlltool produces an import library for. Even though this
is named "OutputFile" in the COFFModuleDefinition class, it's not
an output file name in the context of llvm-dlltool, but the name
of the DLL to create an import library for.

llvm-svn: 367676
2019-08-02 11:20:03 +00:00
Oliver Stannard 4b7239ebac [IPRA][ARM] Disable no-CSR optimisation for ARM
This optimisation isn't generally profitable for ARM, because we can
save/restore many registers in the prologue and epilogue using the PUSH
and POP instructions, but mostly use individual LDR/STR instructions for
other spills.

Differential revision: https://reviews.llvm.org/D64910

llvm-svn: 367670
2019-08-02 10:23:17 +00:00
Oliver Stannard f6b00c279a Fix and test inter-procedural register allocation for ARM
- Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves
  with a null RegScavenger. Simply not updating the register scavenger
  is fine because IPRA only cares about the SavedRegs vector, the acutal
  code of the function has already been generated at this point.
- Add a new hook to TargetRegisterInfo to get the set of registers which
  can be clobbered inside a call, even if the compiler can see both
  sides, by linker-generated code.

Differential revision: https://reviews.llvm.org/D64908

llvm-svn: 367669
2019-08-02 10:23:05 +00:00
Serguei Katkov de67affd00 [Loop Peeling] Introduce an option for profile based peeling disabling.
This patch adds an ability to disable profile based peeling 
causing the peeling of all iterations and as a result prohibits
further unroll/peeling attempts on that loop.

The motivation to get an ability to separate peeling usage in
pipeline where in the first part we peel only separate iterations if needed
and later in pipeline we apply the full peeling which will prohibit further peeling.

Reviewers: reames, fhahn
Reviewed By: reames
Subscribers: hiraditya, zzheng, dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D64983

llvm-svn: 367668
2019-08-02 09:32:52 +00:00
Sam Parker cd38599275 [NFC][ARM[ParallelDSP] Rename/remove/change types
Remove forward declaration, fold a couple of typedefs and change one
to be more useful.

llvm-svn: 367665
2019-08-02 08:21:17 +00:00
Sam Parker 14c6dfdfe2 [NFC][ARM][ParallelDSP] Remove ValueList
We only care about the first element in the list.

llvm-svn: 367660
2019-08-02 07:32:28 +00:00
Rui Ueyama 4d41c332ef Revert r367649: Improve raw_ostream so that you can "write" colors using operator<<
This reverts commit r367649 in an attempt to unbreak Windows bots.

llvm-svn: 367658
2019-08-02 07:22:34 +00:00
Hideki Saito 09fac2450b [LV] Avoid building interleaved group in presence of WAW dependency
Reviewers: hsaito, Ayal, fhahn, anna, mkazantsev

Reviewed By: hsaito

Patch by evrevnov, thanks!

Differential Revision: https://reviews.llvm.org/D63981

llvm-svn: 367654
2019-08-02 06:31:50 +00:00
Rui Ueyama a52f982f1c Improve raw_ostream so that you can "write" colors using operator<<
1. raw_ostream supports ANSI colors so that you can write messages to
the termina with colors. Previously, in order to change and reset
color, you had to call `changeColor` and `resetColor` functions,
respectively.

So, if you print out "error: " in red, for example, you had to do
something like this:

  OS.changeColor(raw_ostream::RED);
  OS << "error: ";
  OS.resetColor();

With this patch, you can write the same code as follows:

  OS << raw_ostream::RED << "error: " << raw_ostream::RESET;

2. Add a boolean flag to raw_ostream so that you can disable colored
output. If you disable colors, changeColor, operator<<(Color),
resetColor and other color-related functions have no effect.

Most LLVM tools automatically prints out messages using colors, and
you can disable it by passing a flag such as `--disable-colors`.
This new flag makes it easy to write code that works that way.

Differential Revision: https://reviews.llvm.org/D65564

llvm-svn: 367649
2019-08-02 04:48:30 +00:00
Serguei Katkov bbdcc82111 [Loop Peeling] Do not close further unroll/peel if profile based peeling was not used.
Current peeling cost model can decide to peel off not all iterations
but only some of them to eliminate conditions on phi. At the same time 
if any peeling happens the door for further unroll/peel optimizations on that
loop closes because the part of the code thinks that if peeling happened
it is profile based peeling and all iterations are peeled off.

To resolve this inconsistency the patch provides the flag which states whether
the full peeling basing on profile is enabled or not and peeling cost model
is able to modify this field like it does not PeelCount.

In a separate patch I will introduce an option to allow/disallow peeling basing
on profile.

To avoid infinite loop peeling the patch tracks the total number of peeled iteration
through llvm.loop.peeled.count loop metadata.

Reviewers: reames, fhahn
Reviewed By: reames
Subscribers: hiraditya, zzheng, dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D64972

llvm-svn: 367647
2019-08-02 04:29:23 +00:00
Stanislav Mekhanoshin 6fe00a21f2 Handle casts changing pointer size in the vectorizer
Added code to truncate or shrink offsets so that we can continue
base pointer search if size has changed along the way.

Differential Revision: https://reviews.llvm.org/D65612

llvm-svn: 367646
2019-08-02 04:03:37 +00:00
Kai Luo fec7da8285 [PowerPC][Peephole] Check if `extsw`'s second operand is a virtual register
Summary:
When combining `extsw` and `sldi` in `PPCMIPeephole`, we have to check
if `extsw`'s second operand is a virtual register, otherwise we might
get miscompile.

Differential Revision: https://reviews.llvm.org/D65315

llvm-svn: 367645
2019-08-02 03:14:17 +00:00
Kang Zhang 038dd43782 [NFC][CodeGen] Modify the type element of TailCalls to simplify the dupRetToEnableTailCallOpts()
Summary:
The old code can be simplified to define the element type of TailCalls as `BasicBlock` not `CallInst`. Also I use the for-range loop instead the for loop.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D64905

llvm-svn: 367644
2019-08-02 03:09:07 +00:00
Eric Christopher 5a00b0772a Temporarily revert "Changes to improve CodeView debug info type record inline comments"
due to a sanitizer failure.

This reverts commit 367623.

llvm-svn: 367640
2019-08-02 01:05:47 +00:00
Daniel Sanders 12961ff0fa Fix up an unused variable warning caused by TRI->isVirtualRegister() -> Register::isVirtualRegister()
llvm-svn: 367637
2019-08-02 00:17:48 +00:00
Daniel Sanders 2bea69bf65 Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC
llvm-svn: 367633
2019-08-01 23:27:28 +00:00
Rong Xu ca161fa008 [PGO] Add PGO support at -O0 in the experimental new pass manager
Add PGO support at -O0 in the experimental new pass manager to sync the
behavior of the legacy pass manager.

Also change the test of gcc-flag-compatibility.c for more complete test:
(1) change the match string to "profc" and "profd" to ensure the
    instrumentation is happening.
(2) add IR format proftext so that PGO use compilation is tested.

Differential Revision: https://reviews.llvm.org/D64029

llvm-svn: 367628
2019-08-01 22:36:34 +00:00
Stanislav Mekhanoshin eee9312a85 Relax load store vectorizer pointer strip checks
The previous change to fix crash in the vectorizer introduced
performance regressions. The condition to preserve pointer
address space during the search is too tight, we only need to
match the size.

Differential Revision: https://reviews.llvm.org/D65600

llvm-svn: 367624
2019-08-01 22:18:56 +00:00
Nilanjana Basu ac7e5788ca Changes to improve CodeView debug info type record inline comments
Signed-off-by: Nilanjana Basu <nilanjana.basu87@gmail.com>
llvm-svn: 367623
2019-08-01 22:05:14 +00:00
Wouter van Oortmerssen 7fee93ed59 [WebAssembly] Fixed relocation errors having no location.
Summary:
Fixes: https://bugs.llvm.org/show_bug.cgi?id=42441

Used to print:

<unknown>:0: error: Cannot represent a difference across sections

(the location was null).

Now prints:

err.s:20:3: error: Cannot represent a difference across sections
  i32.const foo-bar
  ^

Note: I looked at adding a test for this, but I don't think it is
worth it. We're not testing error formatting in the Wasm backend :)

Reviewers: sbc100, jgravelle-google

Subscribers: dschuff, aheejin, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65602

llvm-svn: 367619
2019-08-01 21:34:54 +00:00
Matt Arsenault d9d30a408e GlobalISel: Lower scalarizing unmerge of a vector to shifts
AMDGPU sometimes has legal s16 and <2 x s16> operations, but all
registers are really 32-bit. An unmerge destination really should ben
widened to a 32-bit register. If widening a scalarizing vector with a
target size that matches the vector size, bitcast to integer and
extract the relevant bits with shifts.

I'm not sure if this is the right place for this. This could arguably
be part of widenScalar for the result. I also have a growing feeling
that we're missing a bitcast legalize action.

llvm-svn: 367604
2019-08-01 19:10:05 +00:00
Sjoerd Meijer e0dfce0723 Follow up of rL367592, fix the build
Some buildbots complained about:
error: default label in switch which covers all enumeration values

llvm-svn: 367603
2019-08-01 18:54:29 +00:00
Craig Topper a9ed5436bd [X86] In decomposeMulByConstant, legalize the VT before querying whether the multiply is legal
If a type is larger than a legal type and needs to be split, we would previously allow the multiply to be decomposed even if the split multiply is legal. Since the shift + add/sub code would also need to be split, its not any better to decompose it.

This patch figures out what type the mul will eventually be legalized to and then uses that type for the query. I tried just returning false illegal types and letting them get handled after type legalization, but then we can't recognize and i64 constant splat on 32-bit targets since will be destroyed by type legalization. We could special case vectors of i64 to avoid that...

Differential Revision: https://reviews.llvm.org/D65533

llvm-svn: 367601
2019-08-01 18:49:07 +00:00
Matt Arsenault bb582ebdba AMDGPU: Remove v0 workaround for DS_GWS_* instructions
Any register should work for the src field since r366067, since the
used value is not pulled from the expected encoding field.

llvm-svn: 367598
2019-08-01 18:41:32 +00:00
Matt Arsenault e56a2ad85e CodeGen: Allow virtual registers in bundles
The note in the documentation suggests this restriction is a compile
time optimization for architectures that make heavy use of
bundling. Allowing virtual registers in a bundle is useful for some
(non-R600) AMDGPU use cases and are infrequent enough to matter.

A more common AMDGPU use case has already been using virtual registers
in bundles since r333691, although never calling finalizeBundle on
them and manually creating the use/def list on the BUNDLE
instruction. This is also relatively infrequent, and only happens for
consecutive sequences of some load/store types.

llvm-svn: 367597
2019-08-01 18:41:28 +00:00
Alina Sbirlea 3af2a69575 [SimplifyCFG] Mark missed Changed to true.
Summary:
DominatorTree is invalid after SimplifyCFG because of a missed `Changed = true` when simplifying a branch condition and removing an edge.
Resolves PR42272.

Reviewers: zhizhouy, manojgupta

Subscribers: jlebar, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65490

llvm-svn: 367596
2019-08-01 18:37:34 +00:00
Alina Sbirlea 172838df6b [MemorySSA] Set LoopSimplify to preserve MemorySSA in the NPM, if analysis exists.
Summary:
LoopSimplify is preserved in the legacy pass manager, but not in the new pass manager.
Update LoopSimplify to preserve MemorySSA conditionally when the analysis is available (same behavior as the legacy pass manager).

Reviewers: chandlerc

Subscribers: mehdi_amini, jlebar, Prazek, george.burgess.iv, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65418

llvm-svn: 367594
2019-08-01 18:28:28 +00:00
Matt Arsenault aff2995f46 AMDGPU: Use tablegen pattern for sendmsg intrinsics
Since this now emits a direct copy to m0, SIFixSGPRCopies has to
handle a physical register.

llvm-svn: 367593
2019-08-01 18:27:11 +00:00
Sjoerd Meijer 20b198ec5e [LV] Tail-Loop Folding
This allows folding of the scalar epilogue loop (the tail) into the main
vectorised loop body when the loop is annotated with a "vector predicate"
metadata hint. To fold the tail, instructions need to be predicated (masked),
enabling/disabling lanes for the remainder iterations.

Differential Revision: https://reviews.llvm.org/D65197

llvm-svn: 367592
2019-08-01 18:21:44 +00:00
Matt Arsenault 5faa533e47 GlobalISel: Fix widenScalar for G_MERGE_VALUES to pointer
AMDGPU testcase isn't broken now, but will be in a future patch
without this.

llvm-svn: 367591
2019-08-01 18:13:16 +00:00
Wouter van Oortmerssen 87af0b1911 [WebAssembly] Assembler/InstPrinter: support call_indirect type index.
A TYPE_INDEX operand (as used by call_indirect) used to be represented
by the InstPrinter as a symbol (e.g. .Ltype_index0@TYPE_INDEX) which
was a bit of a mismatch with the WasmObjectWriter which expects an
unnamed symbol, to receive the signature from and then turn into a
reloc.

There was really no good way to round-trip this information. An earlier
version of this patch tried to attach the signature information using
a .functype, but that ran into trouble when the symbol was re-emitted
without a name. Removing the name was a giant hack also.

The current version changes the assembly syntax to have an inline
signature spec for TYPEINDEX operands that is always unnamed, which
is much more elegant both in syntax and in implementation (as now the
assembler is able to follow the same path as the regular backend)

Reviewers: sbc100, dschuff, aheejin, jgravelle-google, sunfish, tlively

Subscribers: arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64758

llvm-svn: 367590
2019-08-01 18:08:26 +00:00
Simon Pilgrim 1d183b407a [TargetLowering] SimplifyMultipleUseDemandedBits - Add ISD::INSERT_VECTOR_ELT handling
Allow us to peek through vector insertions to avoid dependencies on entire insertion chains.

llvm-svn: 367588
2019-08-01 17:46:44 +00:00
Simon Pilgrim 63d4114f72 [X86][SSE] Add PEXTR*(PINSR*(v, s, c), c) -> s combine.
We should probably extend this to cover bitcasts as well to help other cases in promote-vec3.ll.

llvm-svn: 367582
2019-08-01 16:38:39 +00:00
Johannes Doerfert da4d811707 [Attributor][FIX] Indicate a missing update change
User of AAReturnedValues need to know if HasOverdefinedReturnedCalls
changed from false to true as it will impact the result of the return
value traversal (calls are not ignored anymore).

This will be tested with the tests in D59978.

llvm-svn: 367581
2019-08-01 16:21:54 +00:00
Simon Atanasyan 0620cf11ec [mips] Fix lowering load/store instruction in PIC case
If an operand of the `lw/sw` instructions is a symbol, these instructions
incorrectly lowered using not-position-independent chain of commands.
For PIC code we should use `lw/addiu` instructions with the `R_MIPS_GOT16`
and `R_MIPS_LO16` relocations respectively. Instead of that LLVM generates
position dependent code with the `R_MIPS_HI16` and `R_MIPS_LO16`
relocations.

This patch provides a fix for the bug by handling PIC case separately in
the `MipsAsmParser::expandMemInst`. The main idea is to generate a chain
of PIC instructions to load a symbol address into a register and then
load the address content.

The fix is not optimal and does not fix all PIC-related problems. This
is a task for subsequent patches.

Differential Revision: https://reviews.llvm.org/D65524

llvm-svn: 367580
2019-08-01 16:04:29 +00:00
Simon Pilgrim 33f5f863b5 [X86][SSE] SimplifyMultipleUseDemandedBits - Add PEXTR/PINSR B+W handling
This adds SimplifyMultipleUseDemandedBitsForTargetNode X86 support and uses it to allow us to peek through vector insertions to avoid dependencies on entire insertion chains.

llvm-svn: 367570
2019-08-01 14:46:03 +00:00
Simon Pilgrim f99f9881e3 [X86] EltsFromConsecutiveLoads - don't attempt to merge volatile loads (PR42846)
llvm-svn: 367556
2019-08-01 13:13:18 +00:00
Sam Elliott f596f45070 [RISCV] Add Custom Parser for Atomic Memory Operands
Summary:
GCC Accepts both (reg) and 0(reg) for atomic instruction memory
operands. These instructions do not allow for an offset in their
encoding, so in the latter case, the 0 is silently dropped.

Due to how we have structured the RISCVAsmParser, the easiest way to add
support for parsing this offset is to add a custom AsmOperand and
parser. This parser drops all the parens, and just keeps the register.

This commit also adds a custom printer for these operands, which matches
the GCC canonical printer, printing both `(a0)` and `0(a0)` as `(a0)`.

Reviewers: asb, lewis-revill

Reviewed By: asb

Subscribers: s.egerton, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, psnobl, benna, Jim, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65205

llvm-svn: 367553
2019-08-01 12:42:31 +00:00
Roman Lebedev 081e990d08 [IR] Value: add replaceUsesWithIf() utility
Summary:
While there is always a `Value::replaceAllUsesWith()`,
sometimes the replacement needs to be conditional.

I have only cleaned a few cases where `replaceUsesWithIf()`
could be used, to both add test coverage,
and show that it is actually useful.

Reviewers: jdoerfert, spatel, RKSimon, craig.topper

Reviewed By: jdoerfert

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, aheejin, george.burgess.iv, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65528

llvm-svn: 367548
2019-08-01 12:32:08 +00:00
Roman Lebedev 0efeaa8162 [IR] SelectInst: add swapValues() utility
Summary:
Sometimes we need to swap true-val and false-val of a `SelectInst`.
Having a function for that is nicer than hand-writing it each time.

Reviewers: spatel, RKSimon, craig.topper, jdoerfert

Reviewed By: jdoerfert

Subscribers: jdoerfert, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65520

llvm-svn: 367547
2019-08-01 12:31:35 +00:00
David Green 1343814fb4 [ARM] Fix for MVE VREV64
The VREV64 instruction is apparently unpredictable if Qd == Qm, due to the
cross-beat nature of the instruction. This adds an earlyclobber to Qd, which
seems to be the same way we deal with this on other instructions like the
write-back on loads and stores.

Differential Revision: https://reviews.llvm.org/D65502

llvm-svn: 367544
2019-08-01 11:22:03 +00:00
Sander de Smalen 7ebccfefb8 [AArch64] Do not allocate unnecessary emergency slot.
Fix an issue where the compiler still allocates an emergency spill slot even
though it already decided to spill an extra callee-save register to use
as a scratch register.

Reviewers: gberry, thegameg, mstorsjo, t.p.northover

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D65504

llvm-svn: 367540
2019-08-01 10:53:45 +00:00
Petar Avramovic 8a40cedfe6 [MIPS GlobalISel] Fold load/store + G_GEP + G_CONSTANT
Fold load/store + G_GEP + G_CONSTANT when
immediate in G_CONSTANT fits into 16 bit signed integer.

Differential Revision: https://reviews.llvm.org/D65507

llvm-svn: 367535
2019-08-01 09:40:13 +00:00
Sam Parker 7ca8c6f6db [NFC][ARM][ParallelDSP] Getters and renaming
Add a couple of getters for Reduction and do some renaming of
variables around CreateSMLAD for clarity.

llvm-svn: 367522
2019-08-01 08:17:51 +00:00
Craig Topper 388df2ea19 [SelectionDAG] Use APInt::isSubsetOf/intersects to simplify some code.
Also use KnownBits::isNegative/isNonNegative to further simplify.

llvm-svn: 367518
2019-08-01 06:06:21 +00:00
Tom Stellard 7a2958bc20 AMDGPU/SILoadStoreOptimizer: Make some functions const
Reviewers: arsenm, pendingchaos, rampitec

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65316

llvm-svn: 367517
2019-08-01 05:39:17 +00:00
Zi Xuan Wu 66c320908b recommit:[PowerPC] Eliminate loads/swap feeding swap/store for vector type by using big-endian load/store
In PowerPC, there is instruction to load vector in big endian element order when it's in little endian target. 
So we can combine vector load + reverse into big endian load to eliminate the swap instruction.
Also combine vector reverse + store into big endian store.

Differential Revision: https://reviews.llvm.org/D65063

llvm-svn: 367516
2019-08-01 05:26:02 +00:00
Matt Arsenault 9952f46407 AMDGPU/GlobalISel: Fix flat load/store of pointer types
llvm-svn: 367513
2019-08-01 03:57:42 +00:00
Matt Arsenault 57495268ac AMDGPU/GlobalISel: Remove manual store select code
This regresses the weird types that are newly treated as legal load
types, but fixes incorrectly using flat instrucions on SI.

llvm-svn: 367512
2019-08-01 03:52:40 +00:00
Matt Arsenault ae87b9f2c2 AMDGPU/GlobalISel: Select local atomic cmpxchg
llvm-svn: 367511
2019-08-01 03:41:41 +00:00
Matt Arsenault 26cb53b260 AMDGPU/GlobalISel: Handle G_ATOMICRMW_FADD
llvm-svn: 367509
2019-08-01 03:33:15 +00:00
Matt Arsenault da5b9bfa95 AMDGPU/GlobalISel: Allow selection of DS atomicrmw
llvm-svn: 367507
2019-08-01 03:29:01 +00:00
Matt Arsenault e6ce48422c AMDGPU: Start redefining atomic PatFrags
Start migrating to a form that will be compatible with the global isel
emitter. Also should fix some overly lax checks on the memory type,
which allowed mis-selecting some illegal atomics.

llvm-svn: 367506
2019-08-01 03:25:52 +00:00
Matt Arsenault 70e20c0f08 AMDGPU: Correct FP atomic patterns
These need to use an fadd, not an add. Also make the noret part clear
in the name.

llvm-svn: 367505
2019-08-01 03:22:40 +00:00
Matt Arsenault 3baf4d3418 AMDGPU/GlobalISel: Select simple local stores
llvm-svn: 367504
2019-08-01 03:09:15 +00:00
Matt Arsenault 7bedceb5b2 GlobalISel: moreElementsVector for G_LOAD/G_STORE
AMDGPU change and test is a placeholder until a future patch with
complete handling.

llvm-svn: 367503
2019-08-01 01:44:22 +00:00
Peter Collingbourne fbc563e2cb Create unique, but identically-named ELF sections for explicitly-sectioned functions and globals when using -function-sections and -data-sections.
This allows functions and globals to to be reordered later in the linking phase
(using the -symbol-ordering-file) even though reordering will be limited to
the scope of the explicit section.

Patch by Rahman Lavaee!

Differential Revision: https://reviews.llvm.org/D65478

llvm-svn: 367501
2019-08-01 01:38:53 +00:00
Matt Arsenault d48324ff6f Reapply "AMDGPU: Split block for si_end_cf"
This reverts commit r359363, reapplying r357634

llvm-svn: 367500
2019-08-01 01:25:27 +00:00
Philip Reames 79c27c9464 Fix a release-only build warning triggered by rL367485
llvm-svn: 367499
2019-08-01 01:16:08 +00:00
Matt Arsenault 3594011de0 AMDGPU/GlobalISel: Select local loads
llvm-svn: 367498
2019-08-01 00:53:38 +00:00
Amy Huang 153f20057c Revert "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG" and
and partial fix.
Causes windows buildbot errors.

This reverts commit 6e65c34523963094acd0d6c94a5f5c64b32fe6aa and
53da7ca943.

llvm-svn: 367496
2019-07-31 23:59:31 +00:00
Eli Friedman 89b80f1239 [ARM] Lower "(x<<c) > 0x80000000U" to "lsls" on Thumb1.
This is extremely specific, but saves three instructions when it's
legal.  I don't think the code can be usefully generalized.

Differential Revision: https://reviews.llvm.org/D65351

llvm-svn: 367492
2019-07-31 23:19:21 +00:00
Eli Friedman 2f45ec1c39 [ARM] Transform compare of masked value to shift on Thumb1.
Thumb1 has very limited immediate modes, so turning an "and" into a
shift can save multiple instructions.

It's possible to simplify the generated code for test2 and test3 in
cmp-and-fold.ll a little more, but I'll implement that as a followup.

Differential Revision: https://reviews.llvm.org/D65175

llvm-svn: 367491
2019-07-31 23:17:34 +00:00
Craig Topper b70026c43c [ScalarizeMaskedMemIntrin] Bitcast the mask to the scalar domain and use scalar bit tests for the branches.
X86 at least is able to use movmsk or kmov to move the mask to the scalar
domain. Then we can just use test instructions to test individual bits.

This is more efficient than extracting each mask element
individually.

I special cased v1i1 to use the previous behavior. This avoids
poor type legalization of bitcast of v1i1 to i1.

I've skipped expandload/compressstore as I think we need to
handle constant masks for those better first.

Many tests end up with duplicate test instructions due to tail
duplication in the branch folding pass. But the same thing
happens when constructing similar code in C. So its not unique
to the scalarization.

Not sure if this lowering code will also be good for other targets,
but we're only testing X86 today.

Differential Revision: https://reviews.llvm.org/D65319

llvm-svn: 367489
2019-07-31 22:58:15 +00:00
Craig Topper b51dc64063 [X86] Add DAG combine to fold any_extend_vector_inreg+truncstore to an extractelement+store
We have custom code that ignores the normal promoting type legalization on less than 128-bit vector types like v4i8 to emit pavgb, paddusb, psubusb since we don't have the equivalent instruction on a larger element type like v4i32. If this operation appears before a store, we can be left with an any_extend_vector_inreg followed by a truncstore after type legalization. When truncstore isn't legal, this will normally be decomposed into shuffles and a non-truncating store. This will then combine away the any_extend_vector_inreg and shuffle leaving just the store. On avx512, truncstore is legal so we don't decompose it and we had no combines to fix it.

This patch adds a new DAG combine to detect this case and emit either an extract_store for 64-bit stoers or a extractelement+store for 32 and 16 bit stores. This makes the avx512 codegen match the avx2 codegen for these situations. I'm restricting to only when -x86-experimental-vector-widening-legalization is false. When we're widening we're not likely to create this any_extend_inreg+truncstore combination. This means we should be able to remove this code when we flip the default. I would like to flip the default soon, but I need to investigate some performance regressions its causing in our branch that I wasn't seeing on trunk.

Differential Revision: https://reviews.llvm.org/D65538

llvm-svn: 367488
2019-07-31 22:43:08 +00:00
Michael Berg 005d705d43 Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options control
Summary: Honoring no signed zeroes is also available as a user control through clang separately regardless of fastmath or UnsafeFPMath context, DAG guards should reflect this context.

Reviewers: spatel, arsenm, hfinkel, wristow, craig.topper

Reviewed By: spatel

Subscribers: rampitec, foad, nhaehnle, wuzish, nemanjai, jvesely, wdng, javed.absar, MaskRay, jsji

Differential Revision: https://reviews.llvm.org/D65170

llvm-svn: 367486
2019-07-31 21:57:28 +00:00
Philip Reames f8e7b53657 [IndVars, RLEV] Support rewriting exit values in loops without known exits (prep work)
This is a prepatory patch for future work on support exit value rewriting in loops with a mixture of computable and non-computable exit counts.  The intention is to be "mostly NFC" - i.e. not enable any interesting new transforms - but in practice, there are some small output changes.

The test differences are caused by cases wherewhere getSCEVAtScope can simplify a single entry phi without needing any knowledge of the loop.

llvm-svn: 367485
2019-07-31 21:15:21 +00:00
Amy Huang 27a73dd02c Fix to r367374 "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG"
after windows buildbot failure.

Added a check that the MachineInstr exists and is a call before trying
to add symbols around it.

llvm-svn: 367483
2019-07-31 21:03:38 +00:00
Eric Christopher 36fb93982f Fix unused variable warning for non-assert builds.
llvm-svn: 367482
2019-07-31 21:02:03 +00:00
Mark Lacey 641ea2e701 [GISel] Address review feedback on passing MD_callees to lowerCall.
Preserve the nullptr default for KnownCallees that appears in
the base class.

llvm-svn: 367477
2019-07-31 20:34:05 +00:00
Mark Lacey 7b8d3eb9e2 [GISel] Pass MD_callees metadata down in call lowering.
Summary:
This will make it possible to improve IPRA by taking into account
register usage in indirect calls.

NFC yet; this is just laying the groundwork to start building
up patches to take advantage of the information for improved register
allocation.

Reviewers: aditya_nandakumar, volkan, qcolombet, arsenm, rovka, aemerson, paquette

Subscribers: sdardis, wdng, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65488

llvm-svn: 367476
2019-07-31 20:34:02 +00:00
Peter Collingbourne 09f39967a2 AArch64: Add a tagged-globals backend feature.
This feature instructs the backend to allow locally defined global variable
addresses to contain a pointer tag in bits 56-63 that will be ignored by
the hardware (i.e. TBI), but may be used by an instrumentation pass such
as HWASAN. It works by adding a MOVK instruction to the regular ADRP/ADD
sequence that sets bits 48-63 to the corresponding bits of the global, with
the linker bounds check disabled on the ADRP instruction to prevent the tag
from causing a link failure.

This implementation of the feature omits the MOVK when loading from or storing
to a global, which is sufficient for TBI. If the same approach is extended
to MTE, assuming that 0 is not configured as a catch-all tag, we will most
likely also need the MOVK in this case in order to avoid a tag mismatch.

Differential Revision: https://reviews.llvm.org/D65364

llvm-svn: 367475
2019-07-31 20:14:19 +00:00
Peter Collingbourne 33773d5cfc SelectionDAG, MI, AArch64: Widen target flags fields/arguments from unsigned char to unsigned.
This makes the field wider than MachineOperand::SubReg_TargetFlags so that
we don't end up silently truncating any higher bits. We should still catch
any bits truncated from the MachineOperand field as a consequence of the
assertion in MachineOperand::setTargetFlags().

Differential Revision: https://reviews.llvm.org/D65465

llvm-svn: 367474
2019-07-31 20:14:09 +00:00
Wei Mi f49c107f06 [DAGCombine] Limit the number of times for the same store and root nodes
to bail out in store merging dependence check.

We run into a case where dependence check in store merging bail out many times
for the same store and root nodes in a huge basicblock. That increases compile
time by almost 100x. The patch add a map to track how many times the bailing
out happen for the same store and root, and if it is over a limit, stop
considering the store with the same root as a merging candidate.

Differential Revision: https://reviews.llvm.org/D65174

llvm-svn: 367472
2019-07-31 19:59:24 +00:00
Alina Sbirlea 7153f2784c [SCCP] Update condition to avoid overflow.
Summary:
Update condition to remove addition that may cause an overflow.
Resolves PR42814.

Reviewers: sanjoy, RKSimon

Subscribers: jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65417

llvm-svn: 367461
2019-07-31 18:22:22 +00:00
Alina Sbirlea 63e97fa0b3 [MemorySSA] Add additional verification for phis.
Summary:
Verify that the incoming defs into phis are the last defs from the
respective incoming blocks.
When moving blocks, insertDef must RenameUses. Adding this verification
makes GVNHoist tests fail that uncovered this issue.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63147

llvm-svn: 367451
2019-07-31 17:41:04 +00:00
Sanjay Patel 435cdecdf7 [InstCombine] canonicalize fneg before fmul/fdiv
Reverse the canonicalization of fneg relative to fmul/fdiv. That makes it
easier to implement the transforms (and possibly other fneg transforms) in
1 place because we can always start the pattern match from fneg (either the
legacy binop or the new unop).

There's a secondary practical benefit seen in PR21914 and PR42681:
https://bugs.llvm.org/show_bug.cgi?id=21914
https://bugs.llvm.org/show_bug.cgi?id=42681
...hoisting fneg rather than sinking seems to play nicer with LICM in IR
(although this change may expose analysis holes in the other direction).

1. The instcombine test changes show the expected neutral IR diffs from
   reversing the order.

2. The reassociation tests show that we were missing an optimization
   opportunity to fold away fneg-of-fneg. My reading of IEEE-754 says
   that all of these transforms are allowed (regardless of binop/unop
   fneg version) because:

   "For all other operations [besides copy/abs/negate/copysign], this
   standard does not specify the sign bit of a NaN result."
   In all of these transforms, we always have some other binop
   (fadd/fsub/fmul/fdiv), so we are free to flip the sign bit of a
   potential intermediate NaN operand.
   (If that interpretation is wrong, then we must already have a bug in
   the existing transforms?)

3. The clang tests shouldn't exist as-is, but that's effectively a
   revert of rL367149 (the test broke with an extension of the
   pre-existing fneg canonicalization in rL367146).

Differential Revision: https://reviews.llvm.org/D65399

llvm-svn: 367447
2019-07-31 16:53:22 +00:00
Djordje Todorovic b9973f87c6 Reland "[DwarfDebug] Dump call site debug info"
The build failure found after the rL365467 has been
resolved.

Differential Revision: https://reviews.llvm.org/D60716

llvm-svn: 367446
2019-07-31 16:51:28 +00:00
Stanislav Mekhanoshin ba1e845c21 [AMDGPU] Fix for vectorizer crash with pointers of different size
When vectorizer strips pointers it can eventually end up with
pointers of two different sizes, then SCEV will crash.

Differential Revision: https://reviews.llvm.org/D65480

llvm-svn: 367443
2019-07-31 16:33:11 +00:00
Simon Pilgrim 0707f66ad0 [X86] Moved IsNOT helper earlier. NFCI.
Makes it available for more combines to use without adding declarations.

llvm-svn: 367436
2019-07-31 14:36:04 +00:00
Mikhail Maltsev 806231ecc3 [ARM] Reject CSEL instructions with invalid operands
Summary:
According to the Armv8.1-M manual CSEL, CSINC, CSINV and CSNEG are
"constrained unpredictable" when SP is used as the source register Rn.

The assembler should diagnose this case.

Reviewers: momchil.velikov, dmgreen, ostannard, simon_tatham, t.p.northover

Reviewed By: ostannard

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65505

llvm-svn: 367433
2019-07-31 14:22:45 +00:00
Florian Hahn fa42f42858 [IPSCCP] Move callsite check to the beginning of the loop.
We have some code marks instructions with struct operands as overdefined,
but if the instruction is a call to a function with tracked arguments,
this breaks the assumption that the lattice values of all call sites
are not overdefined and will be replaced by a constant.

This also re-adds the assertion from D65222, with additionally skipping
non-callsite uses. This patch should address the cases reported in which
the assertion fired.

Fixes PR42738.

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D65439

llvm-svn: 367430
2019-07-31 12:57:04 +00:00
Simon Pilgrim 24ad2b5e7d [X86][AVX] Ensure chained subvector insertions are the same size (PR42833)
Before combining insert_subvector(insert_subvector(vec, sub0, c0), sub1, c1) patterns, ensure that the subvectors are all the same type. On AVX512 targets especially we might have a mixture of 128/256 subvector insertions.

llvm-svn: 367429
2019-07-31 12:55:39 +00:00
Momchil Velikov a36d31478c [AArch64] Add support for Transactional Memory Extension (TME)
Re-commit r366322 after some fixes

TME is a future architecture technology, documented in

  https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools
  https://developer.arm.com/docs/ddi0601/a

More about the future architectures:

  https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture

This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and
TCANCEL and the target feature/arch extension "tme".

It also implements TME builtin functions, defined in ACLE Q2 2019
(https://developer.arm.com/docs/101028/latest)

Differential Revision: https://reviews.llvm.org/D64416

Patch by Javed Absar and Momchil Velikov

llvm-svn: 367428
2019-07-31 12:52:17 +00:00
Roman Lebedev 5e4e6b1fb1 [DivRemPairs] Fixup DNDEBUG build - variable is only used in assertion
llvm-svn: 367423
2019-07-31 12:26:37 +00:00
Roman Lebedev a686c60c45 [DivRemPairs] Recommit: Handling for expanded-form rem - recomposition (PR42673)
Summary:
While `-div-rem-pairs` pass can decompose rem in div+rem pair when div-rem pair
is unsupported by target, nothing performs the opposite fold.
We can't do that in InstCombine or DAGCombine since neither of those has access to TTI.
So it makes most sense to teach `-div-rem-pairs` about it.

If we matched rem in expanded form, we know we will be able to place div-rem pair
next to each other so we won't regress the situation.
Also, we shouldn't decompose rem if we matched already-decomposed form.
This is surprisingly straight-forward otherwise.

The original patch was committed in rL367288 but was reverted in rL367289
because it exposed pre-existing RAUW issues in internal data structures
of the pass; those now have been addressed in a previous patch.

https://bugs.llvm.org/show_bug.cgi?id=42673

Reviewers: spatel, RKSimon, efriedma, ZaMaZaN4iK, bogner

Reviewed By: bogner

Subscribers: bogner, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65298

llvm-svn: 367419
2019-07-31 12:06:51 +00:00
Roman Lebedev 5f616901f5 [DivRemPairs] Avoid RAUW pitfalls (PR42823)
Summary:
`DivRemPairs` internally creates two maps:
* {sign, divident, divisor} -> div instruction
* {sign, divident, divisor} -> rem instruction
Then it iterates over rem map, and looks if there is an entry
in div map with the same key. Then depending on some internal logic
it may RAUW rem instruction with something else.

But if that rem instruction is an input to other div/rem,
then it was used as a key in these maps, so the old value (used in key)
is now dandling, because RAUW didn't update those maps.
And we can't even RAUW map keys in general, there's `ValueMap`,
but we don't have a single `Value` as key...

The bug was discovered via D65298, and the test there exists.
Now, i'm not sure how to expose this issue in trunk.
The bug is clearly there if i change the map keys to be `AssertingVH`/`PoisoningVH`,
but i guess this didn't miscompiled anything thus far?
I really don't think this is benin without that patch.

The fix is actually rather straight-forward - instead of trying to somehow
shoe-horn `ValueMap` here (doesn't fit, key isn't just `Value`), or writing a new
`ValueMap` with key being a struct of `Value`s, we can just have an intermediate
data structure - a vector, each entry containing matching `Div, Rem` pair,
and pre-filling it before doing any modifications.
This way we won't need to query map after doing RAUW, so no bug is possible.

Reviewers: spatel, bogner, RKSimon, craig.topper

Reviewed By: spatel

Subscribers: hiraditya, hans, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65451

llvm-svn: 367417
2019-07-31 12:06:38 +00:00
Oliver Cruickshank 09a1b8172b [ARM] Generate MVE VFMAs
llvm-svn: 367408
2019-07-31 10:44:11 +00:00
Oliver Cruickshank e7241e8592 [NFC] Test Commit
llvm-svn: 367405
2019-07-31 10:08:09 +00:00
Sam Elliott 9e6b2e1605 [RISCV] Support 'f' Inline Assembly Constraint
Summary:
This adds the 'f' inline assembly constraint, as supported by GCC. An
'f'-constrained operand is passed in a floating point register. Exactly
which kind of floating-point register (32-bit or 64-bit) is decided
based on the operand type and the available standard extensions (-f and
-d, respectively).

This patch adds support in both the clang frontend, and LLVM itself.

Reviewers: asb, lewis-revill

Reviewed By: asb

Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65500

llvm-svn: 367403
2019-07-31 09:45:55 +00:00
Sam Parker 5ea07f7c07 [NFC][ARMCGP] Use switch in isSupportedValue
Use a switch instead of many isa<> while checking for supported
values. Also be explicit about which cast instructions are supported;
This allows the removal of SIToFP from GenerateSignBits.

llvm-svn: 367402
2019-07-31 09:34:11 +00:00
Florian Hahn 189efe295b Recommit "[GVN] Preserve loop related analysis/canonical forms."
This fixes some pipeline tests.
This reverts commit d0b6f42936.

llvm-svn: 367401
2019-07-31 09:27:54 +00:00
Cullen Rhodes 1518c88a7d [AArch64][SVE2] Load/store instruction fixes
Summary:
* Loads and stores in SVE2 are gather/scatter not contiguous, fixed by
  renaming multiclasses to reflect this and also updated comments.
* Remove aliases from load/store multiclasses that reflect the behaviour
  of the original form.
* Fix bug in scatter store implementation, vector list should be used as
  input, not output.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D65392

llvm-svn: 367398
2019-07-31 09:10:36 +00:00
Simon Cook 8d7ec4d644 [RISCV] Add support for lowering floating point inlineasm clobbers
This adds the required extension to RISC-V's getRegForInlineAsmConstraint
in order to be able to correctly distringuish between the 32 and 64-bit
floating point registers when the generic fX name appears in inlineasm
clobber contraints. It also adds a check to validate that callee saved
floating point registers are only saved in this case when a hard-float
ABI is selected.

Differential Revision: https://reviews.llvm.org/D64751

llvm-svn: 367397
2019-07-31 09:07:21 +00:00
Cullen Rhodes 17230e026d [AArch64][SVE2] Minor refactoring and cleanup
Summary:
* Clarify comment with SVE2 for predicated shifts and move next to other
  shift instructions.
* Clarify comments for various instructions.
* Move FCVTX instruction next to other fp conversions.
* Move FLOGB to next to other fp instructions and fix description.
* Remove "cons" from non-constructive multiclass for bitwise shift-right
  and accumulate instructions.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D65390

llvm-svn: 367396
2019-07-31 08:58:16 +00:00
Cullen Rhodes e8eb8b9c3a [AArch64][SVE2] Use destination register as source register
Summary:
This patch fixes a bug in the following instructions that should have been
implemented as destructive. A destructive instruction is an instruction where
one of the source registers also acts as the destination register. Therefore,
the contents of the source register, when the instruction begins execution, are
replaced by the result of the instruction when the instruction completes
execution [1]:

  * SRI/SLI
  * EORBT/EORTB
  * TBX
  * Narrowing top instructions
  * FP convert precision instructions

These changes are non-functional from the assembler/diassembler point-of-view
but are necessary for correct codegen.

[1] https://static.docs.arm.com/ddi0584/ae/DDI0584A_e_SVE_supp_armv8A.pdf

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D65389

llvm-svn: 367394
2019-07-31 08:45:57 +00:00
Sam Parker 2200a9bdf3 [ARM][ParallelDSP] Convert to function pass
Run across a whole function, visiting each basic block one at a time.

Differential Revision: https://reviews.llvm.org/D65324

llvm-svn: 367389
2019-07-31 07:32:03 +00:00
Zi Xuan Wu 54d446f70e revert r367382 because buildbot failure
llvm-svn: 367388
2019-07-31 07:03:42 +00:00
Zi Xuan Wu e85f6bf66c [PowerPC] Eliminate loads/swap feeding swap/store for vector type by using big-endian load/store
In PowerPC, there is instruction to load vector in big endian element order when it's in little endian target. 
So we can combine vector load + reverse into big endian load to eliminate the swap instruction.
Also combine vector reverse + store into big endian store.

llvm-svn: 367382
2019-07-31 02:56:00 +00:00
Stanislav Mekhanoshin 2594fa8593 [AMDGPU] Fix high occupancy calculation and print it
We had couple places which still return 10 as a maximum
occupancy. Fixed.

Also print comment about occupancy as compiler see it.

Differential Revision: https://reviews.llvm.org/D65423

llvm-svn: 367381
2019-07-31 01:07:10 +00:00
Amy Huang 53da7ca943 [MS] Emit S_HEAPALLOCSITE debug info in SelectionDAG
Summary: This emits labels around heapallocsite calls in SelectionDAG.

Reviewers: rnk

Subscribers: MatzeB, aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61105

llvm-svn: 367374
2019-07-31 00:16:13 +00:00
Matt Arsenault 52c262484f TableGen: Add MinAlignment predicate
AMDGPU uses some custom code predicates for testing alignments.

I'm still having trouble comprehending the behavior of predicate bits
in the PatFrag hierarchy. Any attempt to abstract these properties
unexpectdly fails to apply them.

llvm-svn: 367373
2019-07-31 00:14:43 +00:00
Francis Visoiu Mistrih 84e80979b5 Reland: [Remarks] Add an LLVM-bitstream-based remark serializer
Add a new serializer, using a binary format based on the LLVM bitstream
format.

This format provides a way to serialize the remarks in two modes:

1) Separate mode: the metadata is separate from the remark entries.
2) Standalone mode: the metadata and the remark entries are in the same
file.

The format contains:

* a meta block: container version, container type, string table,
external file path, remark version
* a remark block: type, remark name, pass name, function name, debug
file, debug line, debug column, hotness, arguments (key, value, debug
file, debug line, debug column)

A string table is required for this format, which will be dumped in the
meta block to be consumed before parsing the remark blocks.

On clang itself, we noticed a size reduction of 13.4x compared to YAML,
and a compile-time reduction of between 1.7% and 3.5% on CTMark.

Differential Revision: https://reviews.llvm.org/D63466

Original llvm-svn: 367364
Revert llvm-svn: 367370

llvm-svn: 367372
2019-07-31 00:13:51 +00:00
Francis Visoiu Mistrih d8e7967a22 Revert "[Remarks] Add an LLVM-bitstream-based remark serializer"
This reverts commit r367364.

Breaks some bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-gn/builds/3161/steps/annotate/logs/stdio

llvm-svn: 367370
2019-07-31 00:01:34 +00:00
Matt Arsenault 9cf980d4a7 GlobalISel: Add G_ATOMICRMW_{FADD|FSUB}
llvm-svn: 367369
2019-07-30 23:56:30 +00:00
Wei Mi 888efda280 [DAGCombiner] Add an option to control whether or not to enable store merging.
Add an option to control whether or not to enable store merging in dag combiner
so we can workaround some bugs more easily.

Differential Revision: https://reviews.llvm.org/D65482

llvm-svn: 367365
2019-07-30 23:14:56 +00:00
Francis Visoiu Mistrih 6c3c9483e7 [Remarks] Add an LLVM-bitstream-based remark serializer
Add a new serializer, using a binary format based on the LLVM bitstream
format.

This format provides a way to serialize the remarks in two modes:

1) Separate mode: the metadata is separate from the remark entries.
2) Standalone mode: the metadata and the remark entries are in the same
file.

The format contains:

* a meta block: container version, container type, string table,
external file path, remark version
* a remark block: type, remark name, pass name, function name, debug
file, debug line, debug column, hotness, arguments (key, value, debug
file, debug line, debug column)

A string table is required for this format, which will be dumped in the
meta block to be consumed before parsing the remark blocks.

On clang itself, we noticed a size reduction of 13.4x compared to YAML,
and a compile-time reduction of between 1.7% and 3.5% on CTMark.

Differential Revision: https://reviews.llvm.org/D63466

llvm-svn: 367364
2019-07-30 23:11:57 +00:00
Craig Topper 8b58371fae [X86] Fix mistake in comment. NFC
The code is matching sext not zext.

llvm-svn: 367357
2019-07-30 21:00:24 +00:00
Stanislav Mekhanoshin 9aff33bb95 [AMDGPU] Print register pressure for agpr and vgpr separately
Differential Revision: https://reviews.llvm.org/D65476

llvm-svn: 367355
2019-07-30 20:45:15 +00:00
Alina Sbirlea 4bc625cae0 [MemorySSA] Extend allowed behavior for simplified instructions.
Summary:
LoopRotate may simplify instructions, leading to the new instructions not having memory accesses created for them.
Allow this behavior, by allowing the new access to be null when the template is null, and looking upwards for the proper defined access when dealing with simplified instructions.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65338

llvm-svn: 367352
2019-07-30 20:10:33 +00:00
Michael Liao f3983cc14a [NVPTX] Fix PR41651
Summary:
- Use the passed `DL` directly as retrieving data layout from CS by
  checking the called function is not reliable. Under indirect function
  call, there is no called function.

Subscribers: jholewinski, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65468

llvm-svn: 367349
2019-07-30 19:52:01 +00:00
Stanislav Mekhanoshin 450afcea39 [AMDGPU] Reserve all AGPRs on targets which do not have them
Differential Revision: https://reviews.llvm.org/D65471

llvm-svn: 367347
2019-07-30 19:29:33 +00:00
Austin Kerbow c99f62e313 [AMDGPU/GlobalISel] Add llvm.amdgcn.fdiv.fast legalization.
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64966

llvm-svn: 367344
2019-07-30 18:49:16 +00:00
Hideto Ueno 6e2be4eab3 [FunctionAttrs] Annotate "willreturn" for AssumeLikeInst
Summary:
In D37215, AssumeLikeInstruction are regarded as `willreturn`. In this patch, annotation is added to those which don't have `willreturn` now(`sideeffect, object_size, experimental_widenable_condition`).

Reviewers: jdoerfert, nikic, sstefan1

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65455

llvm-svn: 367342
2019-07-30 18:35:29 +00:00
Thomas Lively e0a9dce543 [WebAssembly] Do not emit tail calls with return type mismatch
Summary:
return_call and return_call_indirect are only valid if the return
types of the callee and caller match. We were previously not enforcing
that, which was producing invalid modules.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65246

llvm-svn: 367339
2019-07-30 18:08:39 +00:00
Florian Hahn d0b6f42936 Revert [GVN] Preserve loop related analysis/canonical forms.
This reverts r367332 (git commit 2d7227ec3a)

llvm-svn: 367335
2019-07-30 17:04:58 +00:00
Florian Hahn 2d7227ec3a [GVN] Preserve loop related analysis/canonical forms.
LoopInfo can be easily preserved by passing it to the functions that
modify the CFG (SplitCriticalEdge and MergeBlockIntoPredecessor.
SplitCriticalEdge also preserves LoopSimplify and LCSSA form when when passing in
LoopInfo. The test case shows that we preserve LoopSimplify and
LoopInfo. Adding addPreservedID(LCSSAID) did not preserve LCSSA for some
reason.

Also I am not sure if it is possible to preserve those in the new pass
manager, as they aren't analysis passes.

Reviewers: reames, hfinkel, davide, jdoerfert

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D65137

llvm-svn: 367332
2019-07-30 16:43:39 +00:00
Francis Visoiu Mistrih 5ed3d146f8 [Remarks] Add two serialization modes for remarks: separate and standalone
The default mode is separate, where the metadata is serialized
separately from the remarks.

Another mode is the standalone mode, where the metadata is serialized
before the remarks, on the same stream.

llvm-svn: 367328
2019-07-30 16:01:40 +00:00
Kit Barton de0b633999 [LoopFusion] Extend use of OptimizationRemarkEmitter
Summary:
This patch extends the use of the OptimizationRemarkEmitter to provide
information about loops that are not fused, and loops that are not eligible for
fusion. In particular, it uses the OptimizationRemarkAnalysis to identify loops
that are not eligible for fusion and the OptimizationRemarkMissed to identify
loops that cannot be fused.

It also reuses the statistics to provide the messages used in the
OptimizationRemarks. This provides common message strings between the
optimization remarks and the statistics.

I would like feedback on this approach, in general. If people are OK with this,
I will flesh out additional remarks in subsequent commits.

Subscribers: hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63844

llvm-svn: 367327
2019-07-30 15:58:43 +00:00
Matt Arsenault 57ef94fb06 AMDGPU: Avoid emitting "true" predicates
Empty condition strings are considerde always true. This removes a lot
of clutter from the generated matcher tables.

This shrinks the source size of AMDGPUGenDAGISel.inc from 7.3M to
6.1M.

llvm-svn: 367326
2019-07-30 15:56:43 +00:00
Sean Fertile 39f3503814 Address post commit review comments on revision 366727.
Addresses number of comment made on D64652 after commiting:

- Reorders function decls in the TargetLoweringObjectFileXCOFF class.
- Fix comment in MCSectionXCOFF to include description of external reference
  csects.
- Convert several llvm_unreachables to report_fatal_error
- Convert several dyn_casts to casts as they are expected not to fail.
- Avoid copying DataLayout object.

llvm-svn: 367324
2019-07-30 15:37:01 +00:00
Roman Lebedev be612ea471 [InstCombine] Fold "x ?% y ==/!= 0" to "x & (y-1) ==/!= 0" iff y is power-of-two
Summary:
I have stumbled into this by accident while preparing to extend backend `x s% C ==/!= 0` handling.

While we did happen to handle this fold in most of the cases,
the folding is indirect - we fold `x u% y` to `x & (y-1)` (iff `y` is power-of-two),
or first turn `x s% -y` to `x u% y`; that does handle most of the cases.
But we can't turn `x s% INT_MIN` to `x u% -INT_MIN`,
and thus we end up being stuck with `(x s% INT_MIN) == 0`.

There is no such restriction for the more general fold:
https://rise4fun.com/Alive/IIeS

To be noted, the fold does not enforce that `y` is a constant,
so it may indeed increase instruction count.
This is consistent with what `x u% y`->`x & (y-1)` already does.
I think it makes sense, it's at most one (simple) extra instruction,
while `rem`ainder is really much more un-simple (and likely **very** costly).

Reviewers: spatel, RKSimon, nikic, xbolva00, craig.topper

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65046

llvm-svn: 367322
2019-07-30 15:28:22 +00:00
Simon Pilgrim b989bc47c0 [X86] SimplifyDemandedVectorEltsForTargetNode should be calling resolveTargetShuffleInputs not getTargetShuffleMask
Add TODO comment.

llvm-svn: 367318
2019-07-30 15:06:09 +00:00
Simon Pilgrim e4d5423dcd [X86][AVX] SimplifyDemandedVectorElts - handle extraction from X86ISD::SUBV_BROADCAST source (PR42819)
PR42819 showed an issue that we couldn't handle the case where we demanded a 'sub-sub-vector' of the SUBV_BROADCAST 'sub-vector' source.

This patch recognizes these cases and extracts the sub-sub-vector instead of trying to broadcast to a type smaller than the 'sub-vector' source. 

llvm-svn: 367306
2019-07-30 11:35:13 +00:00
Sam Parker e3a4a13fcc [ARM][LowOverheadLoops] Enable by default
The code is now in a good enough state to pass the bunch of tests that
I have run (after fixing the bugs), so let's enable it by default.

Differential Revision: https://reviews.llvm.org/D65277

llvm-svn: 367297
2019-07-30 08:14:28 +00:00
Sam Parker ed2ea3e46b [ARM][LowOverheadLoops] Revert non-header LE target
Revert the hardware loop upon finding a LoopEnd that doesn't target
the loop header, instead of asserting a failure.

Differential Revision: https://reviews.llvm.org/D65268

llvm-svn: 367296
2019-07-30 08:08:44 +00:00
Roman Lebedev 8e0cf076ac Revert "[DivRemPairs] Handling for expanded-form rem - recomposition (PR42673)"
test-suite/MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG broke:

Only PHI nodes may reference their own value!
  %sub33 = srem i32 %sub33, %ranks_in_i

This reverts commit r367288.

llvm-svn: 367289
2019-07-30 07:44:58 +00:00
Roman Lebedev c75cdd056f [DivRemPairs] Handling for expanded-form rem - recomposition (PR42673)
Summary:
While `-div-rem-pairs` pass can decompose rem in div+rem pair when div-rem pair
is unsupported by target, nothing performs the opposite fold.
We can't do that in InstCombine or DAGCombine since neither of those has access to TTI.
So it makes most sense to teach `-div-rem-pairs` about it.

If we matched rem in expanded form, we know we will be able to place div-rem pair
next to each other so we won't regress the situation.
Also, we shouldn't decompose rem if we matched already-decomposed form.
This is surprisingly straight-forward otherwise.

https://bugs.llvm.org/show_bug.cgi?id=42673

Reviewers: spatel, RKSimon, efriedma, ZaMaZaN4iK, bogner

Reviewed By: bogner

Subscribers: bogner, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65298

llvm-svn: 367288
2019-07-30 07:10:00 +00:00
Michael Pozulp 074db9b8e9 Revert "[llvm-objdump] Add warning messages if disassembly + source for problematic inputs"
This reverts r367284 (git commit b1cbe51bdf).
My changes to LLVMSymbolizer caused a test to fail:
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/29488

llvm-svn: 367286
2019-07-30 07:05:27 +00:00
Michael Pozulp b1cbe51bdf [llvm-objdump] Add warning messages if disassembly + source for problematic inputs
Summary: Addresses https://bugs.llvm.org/show_bug.cgi?id=41905

Reviewers: jhenderson, rupprecht, grimar

Reviewed By: jhenderson, grimar

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62462

llvm-svn: 367284
2019-07-30 05:28:26 +00:00
Alex Lorenz 9e38f4d973 [FileCollector] Add a VFS that records FS accesses using the FileCollector
This patch adds a VFS that can be overlaid on top of another VFS
to record file system accesses using the FileCollector.
This can help to gather files that are needed for reproducers.

Differential Revision: https://reviews.llvm.org/D65411

llvm-svn: 367278
2019-07-29 23:38:30 +00:00
Vedant Kumar d01ae675af [IR] Consolidate fixed metadata kind definitions (NFC)
Put the list of fixed metadata kinds in one place.

Testing: check-llvm with+without LLVM_ENABLE_MODULES=On

Differential Revision: https://reviews.llvm.org/D64437

llvm-svn: 367257
2019-07-29 20:24:20 +00:00
Jinsong Ji 5bb6202c44 [PowerPC][NFC]Fix a typo in comment.
llvm-svn: 367252
2019-07-29 19:27:54 +00:00
Craig Topper 479b45411e [X86] Fix typo in comment. We're looking at a right shift not a left shift. NFC
llvm-svn: 367251
2019-07-29 19:22:51 +00:00
Francis Visoiu Mistrih 72d00802d8 [Remarks] Update error message format string
All the clang-cmake-armv{7,8} bots are failing this test. This is an
attempt to fix this.

llvm-svn: 367243
2019-07-29 17:40:34 +00:00
Peter Collingbourne dd9682196b ThinLTOBitcodeWriter: Include globals associated with type metadata globals in the merged module.
Globals that are associated with globals with type metadata need to appear
in the merged module because they will reference the global's section directly.

Differential Revision: https://reviews.llvm.org/D65312

llvm-svn: 367242
2019-07-29 17:22:40 +00:00
Simon Pilgrim 962c03fac4 [X86] resolveTargetShuffleInputs - add depth to limit recursion.
Avoids slow downs from calls to ComputeNumSignBits/computeKnownBits going too deep.

llvm-svn: 367240
2019-07-29 17:17:58 +00:00
Tom Stellard cc0bc941d4 AMDGPU/LoadStoreOptimizer: combine MMOs when merging instructions
Summary:
The LoadStoreOptimizer was creating instructions with 2
MachineMemOperands, which meant they were assumed to alias with all other instructions,
because MachineInstr:mayAlias() returns true when an instruction has multiple
MachineMemOperands.

This was preventing these instructions from being merged again, and was
giving the scheduler less freedom to reorder them.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65036

llvm-svn: 367237
2019-07-29 16:40:58 +00:00
Jay Foad 3bdcedbf3d [AMDGPU] Fix typo in error message
llvm-svn: 367235
2019-07-29 16:17:13 +00:00
Simon Pilgrim 5ab948f823 [X86] combineX86ShufflesRecursively - start recursion at depth = 0. NFCI.
As discussed on rL367171, we have a problem where the depth recursion used in combineX86ShufflesRecursively was subtly different to computeKnownBits etc. - it starts at Depth=1 instead of Depth=0 like the others and has a different maximum recursion depth.

This NFC patch fixes the recursion depth to start at 0, so we can more easily reuse depth values in calls from combineX86ShufflesRecursively and its helper functions in computeKnownBits etc.

llvm-svn: 367232
2019-07-29 15:57:06 +00:00
Francis Visoiu Mistrih d42289e291 [RISCV] Fix uninitialized variable after call to evaluateConstantImm
For llvm/test/MC/RISCV/rv64i-aliases-invalid.s, UBSan reports:

lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp:371:9: runtime error:
load of value 3879186881, which is not a valid value for type
'RISCVMCExpr::VariantKind'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp:371:9 in

It turns out that evaluateConstantImm does not set `VK` and it remains
unitialized when doing comparisons in `isImmXLenLI()`.

Differential Revision: https://reviews.llvm.org/D65347

llvm-svn: 367230
2019-07-29 15:52:13 +00:00
Sanjay Patel e9ee7b47d4 [InstCombine] fold fadd+fneg with fdiv/fmul betweena
The backend already does this via isNegatibleForFree(),
but we may want to alter the fneg IR canonicalizations
that currently exist, so we need to try harder to fold
fneg in IR to avoid regressions.

llvm-svn: 367227
2019-07-29 13:50:25 +00:00
Hideto Ueno 98d281a99f [ValueTracking] Remove volatile check in isGuaranteedToTransferExecutionToSuccessor
Summary: As clarified in D53184, volatile load and store do not trap. Therefore, we should remove volatile checks for instructions in  `isGuaranteedToTransferExecutionToSuccessor`.

Reviewers: jdoerfert, efriedma, nikic

Reviewed By: nikic

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65375

llvm-svn: 367226
2019-07-29 13:35:34 +00:00
Sanjay Patel 5483f4225e [InstCombine] reduce code for fadd with fneg operand; NFC
llvm-svn: 367224
2019-07-29 13:20:46 +00:00
Simon Pilgrim f8a7e9de06 [DAGCombine] narrowInsertExtractVectorBinOp - early out for binops that change value type. NFCI.
This is implicit in the value type checks in getSubVectorSrc - this just makes it upfront and obvious.

llvm-svn: 367220
2019-07-29 11:34:45 +00:00
Jay Foad dcb7532479 [DivergenceAnalysis] Add methods for querying divergence at use
Summary:
The existing isDivergent(Value) methods query whether a value is
divergent at its definition. However even if a value is uniform at its
definition, a use of it in another basic block can be divergent because
of divergent control flow between the def and the use.

This patch adds new isDivergent(Use) methods to DivergenceAnalysis,
LegacyDivergenceAnalysis and GPUDivergenceAnalysis.

This might allow D63953 or other similar workarounds to be removed.

Reviewers: alex-t, nhaehnle, arsenm, rtaylor, rampitec, simoll, jingyue

Reviewed By: nhaehnle

Subscribers: jfb, jvesely, wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65141

llvm-svn: 367218
2019-07-29 10:22:09 +00:00
Sam Parker 414dd1c946 [NFC][ARM[ParallelDSP] Cleanup of BinOpChain
- Remove some unused typedefs.
- Rename BinOpChain struct to MulCandidate.
- Remove the size method of MulCandidate.
- Store only the first input of the ValueList provided to
  MulCandidate, as it's the only value we care about. This means we
  don't have to perform any ugly (and unnecessary) iterations of the 
  list later on.

llvm-svn: 367208
2019-07-29 08:41:51 +00:00
David Stuttard 20235ef3e7 [AMDGPU] Enable v4f16 and above for v_pk_fma instructions
Summary:
If isel is presented with <2 x half> vectors then it will correctly select
v_pk_fma style instructions.
If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for
other instruction types (such as fadd, fmul etc.)

Added extra support to enable this. Updated one of the tests to include a test
for this (as well as extending the test to GFX9)

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65325

Change-Id: I50a4577a3f8223fb53992af3b7d26121f65b71ee
llvm-svn: 367206
2019-07-29 08:15:10 +00:00
Sam Parker 8538060103 [NFC][ARM][ParallelDSP] Remove AreSymmetrical
We explicitly search for a parallel mac and we only care about its
inputs, checking for symmetry doesn't add anything here.

llvm-svn: 367205
2019-07-29 08:12:24 +00:00
Sam Parker 11ad33ede6 [NFC][ARM][ParallelDSP] Remove PopulateLoads
We no longer have to check what loads are used, all this
is performed at the start of the transform, so it's not
doing anything now.

llvm-svn: 367204
2019-07-29 08:07:23 +00:00
Craig Topper eb1beabad9 [X86] Don't use PMADDWD for vector add reductions of multiplies if the mul inputs have an additional user.
The pmaddwd inserts a truncate, if that truncate would end up
creating additional instructions instead of making a zext
narrower, then we shouldn't do it.

I've restricted this to only sse4.1 targets since on prior
targets the zext will be done in stages. So the truncate will
probably not create additional instructions. Might need some
more investigation of mul shrinking and the other pmaddwd
transform to be sure this is the right decision.

There might be a slight regression on AVX1 targets due to add
splitting. Hard to say for sure. Maybe we need to look into
using the vector reduction flag to use 2 narrow loads and a
blend instead of extracting and inserting.

llvm-svn: 367198
2019-07-29 01:36:58 +00:00
Craig Topper 894916cac9 [X86] In combineLoopMAddPattern and combineLoopSADPattern, preserve the vector reduction flag on the final add. Handle unrolled loops by letting DAG combine revisit.
This reverts r340478 and r340631 and replaces them with a simpler
method of just letting DAG combine revisit the nodes to handle
the other operand.

llvm-svn: 367195
2019-07-28 18:45:42 +00:00
Sanjay Patel 99c57c6daf [InstCombine] fold fsub+fneg with fdiv/fmul between
The backend already does this via isNegatibleForFree(),
but we may want to alter the fneg IR canonicalizations
that currently exist, so we need to try harder to fold
fneg in IR to avoid regressions.

llvm-svn: 367194
2019-07-28 17:10:06 +00:00
David Green b8b8b46a51 [ARM] MVE VPNOT
This adds the patterns required to transform xor P0, -1 to a VPNOT. The
instruction operands have to change a little for this, adding an in and an out
VCCR reg and using a custom DecodeMVEVPNOT for the decode.

Differential Revision: https://reviews.llvm.org/D65133

llvm-svn: 367192
2019-07-28 14:07:48 +00:00
David Green 9cf344e739 [ARM] Better patterns for fp <> predicate vectors
These are some better patterns for converting between predicates and floating
points. Much like the extends, we select "1"/"-1" or "0" depending on the
predicate value. Or we perform a compare against 0 to convert to a predicate.

Differential Revision: https://reviews.llvm.org/D65103

llvm-svn: 367191
2019-07-28 13:53:39 +00:00
Hideto Ueno e7bea9b73a [Attributor] Deduce "align" attribute
Summary:
Deduce "align" attribute in attributor.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64152

llvm-svn: 367187
2019-07-28 07:04:01 +00:00
Hideto Ueno afd4a37b2a [IR] Fix getPointerAlignment for CallBase
Summary:
In current getPointerAlignemnt implementation, CallBase.getPointerAlignement(..) checks only parameter attriutes in the callsite.  For example,

```
declare align 8 i8* @foo()

define void @bar() {
    %a = tail call align 8 i8* @foo() ; getPointerAlignment returns 8
    %b = tail call i8* @foo() ; getPointerAlignemnt returns 0
    ret void
}
```

This patch will fix the problem.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65281

llvm-svn: 367185
2019-07-28 06:17:46 +00:00
Simon Pilgrim 76f2f04d9d [DAGCombine] narrowInsertExtractVectorBinOp - early out for illegal op. NFCI.
If the subvector binop is illegal then early-out and avoid the subvector searches.

llvm-svn: 367181
2019-07-27 19:42:58 +00:00
Simon Pilgrim 603f94aa2a [TargetLowering] SimplifyMultipleUseDemandedBits - add BITCAST pass through support (Reapplied)
This allows us to peek through BITCASTs, attempt to simplify the source operand, and then bitcast back.

This reapplies rL367091 which was reverted at rL367118 - we were inconsistently peeking through the bitcasts to the source value.

Fixes PR42777

llvm-svn: 367174
2019-07-27 14:11:59 +00:00
Sanjay Patel 02b9e45a7e [InstSimplify] remove quadratic time looping (PR42771)
The test case from:
https://bugs.llvm.org/show_bug.cgi?id=42771
...shows a ~30x slowdown caused by the awkward loop iteration (rL207302) that is
seemingly done just to avoid invalidating the instruction iterator. We can instead
delay instruction deletion until we reach the end of the block (or we could delay
until we reach the end of all blocks).

There's a test diff here for a degenerate case with llvm.assume that is not
meaningful in itself, but serves to verify this change in logic.

This change probably doesn't result in much overall compile-time improvement
because we call '-instsimplify' as a standalone pass only once in the standard
-O2 opt pipeline currently.

Differential Revision: https://reviews.llvm.org/D65336

llvm-svn: 367173
2019-07-27 14:05:51 +00:00
Simon Pilgrim 353a848473 [X86][SSE] Replace PMULDQ GetDemandedBits combine with SimplifyMultipleUseDemandedBits handler (Reapplied)
Recommit rL367100 which was reverted at rL367141. Until PR42777 is fixed, we no longer get the benefits of peeking through bitcasts but it does still remove a GetDemandedBits user and gives us the equivalent combines.

llvm-svn: 367172
2019-07-27 13:30:29 +00:00
Simon Pilgrim 8a52671782 [SelectionDAG] Check for any recursion depth greater than or equal to limit instead of just equal the limit.
If anything called the recursive isKnownNeverNaN/computeKnownBits/ComputeNumSignBits/SimplifyDemandedBits/SimplifyMultipleUseDemandedBits with an incorrect depth then we could continue to recurse if we'd already exceeded the depth limit.

This replaces the limit check (Depth == 6) with a (Depth >= 6) to make sure that we don't circumvent it. 

This causes a couple of regressions as a mixture of calls (SimplifyMultipleUseDemandedBits + combineX86ShufflesRecursively) were calling with depths that were already over the limit. I've fixed SimplifyMultipleUseDemandedBits to not do this. combineX86ShufflesRecursively is trickier as we get a lot of regressions if we reduce its own limit from 8 to 6 (it also starts at Depth == 1 instead of Depth == 0 like the others....) - I'll see what I can do in future patches.

llvm-svn: 367171
2019-07-27 12:48:46 +00:00
Simon Pilgrim 3ff6126487 [TargetLowering] Add depth limit to SimplifyMultipleUseDemandedBits
We're getting reports of massive compile time increases because SimplifyMultipleUseDemandedBits was losing track of the depth and not earlying-out. No repro yet, but consider this a pre-emptive commit.

llvm-svn: 367169
2019-07-27 12:23:36 +00:00
Amara Emerson 7bc4fad0fb [AArch64][GlobalISel] Implement narrowing of G_SEXT.
We need this to narrow a sext to s128.

Differential Revision: https://reviews.llvm.org/D65357

llvm-svn: 367164
2019-07-26 23:46:38 +00:00
Jessica Paquette aa8b9993c2 [AArch64][GlobalISel] Select @llvm.aarch64.stlxr for 32-bit pointers
Add partial instruction selection for intrinsics like this:

```
declare i32 @llvm.aarch64.stlxr(i64, i32*)
```

(This only handles the case where a G_ZEXT is feeding the intrinsic.)

Also make sure that the added store instruction actually has the memory op from
the original G_STORE.

Update select-stlxr-intrin.mir and arm64-ldxr-stxr.ll.

Differential Revision: https://reviews.llvm.org/D65355

llvm-svn: 367163
2019-07-26 23:28:53 +00:00
Francis Visoiu Mistrih f5a338369b [Remarks] Silence Wreturn-type warning
Shows up here: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer/builds/27771/steps/annotate/logs/stdio.

llvm-svn: 367162
2019-07-26 22:42:54 +00:00
Florian Hahn d89f6cb299 Revert [IPSCCP] Add assertion to surface cases where we zap returns with overdefined users.
This reverts r366998 (git commit 5354c83ece)

This breaks a linux kernel build and we have reproducer to investigate.

llvm-svn: 367160
2019-07-26 22:14:08 +00:00
Francis Visoiu Mistrih 64a5f9e112 Reland: [Remarks] Support parsing remark metadata in the YAML remark parser
This adds support to the yaml remark parser to be able to parse remarks
directly from the metadata.

This supports parsing separate metadata and following the external file
with the associated metadata, and also a standalone file containing
metadata + remarks all together.

Original llvm-svn: 367148
Revert llvm-svn: 367151

This has a fix for gcc builds.

llvm-svn: 367155
2019-07-26 21:02:02 +00:00
Wei Mi 55a68a2400 [JumpThreading] Stop searching predecessor when the current bb is in a
unreachable loop.

updatePredecessorProfileMetadata in jumpthreading tries to find the
first dominating predecessor block for a PHI value by searching upwards
the predecessor block chain.

But jumpthreading may see some temporary IR state which contains
unreachable bb not being cleaned up. If an unreachable loop happens to
be on the predecessor block chain, keeping chasing the predecessor
block will run into an infinite loop.

The patch fixes it.

Differential Revision: https://reviews.llvm.org/D65310

llvm-svn: 367154
2019-07-26 20:59:22 +00:00
Francis Visoiu Mistrih cdc74e2197 Revert "[Remarks] Support parsing remark metadata in the YAML remark parser"
This reverts r367148.

Seems to fail on
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer/builds/27768.

llvm-svn: 367151
2019-07-26 20:54:44 +00:00
Francis Visoiu Mistrih a41f61625a [Remarks] Support parsing remark metadata in the YAML remark parser
This adds support to the yaml remark parser to be able to parse remarks
directly from the metadata.

This supports parsing separate metadata and following the external file
with the associated metadata, and also a standalone file containing
metadata + remarks all together.

llvm-svn: 367148
2019-07-26 20:11:53 +00:00
Sanjay Patel a9ab31558c [InstCombine] canonicalize negated operand of fdiv
This is a transform that we use with fmul, so use
it for fdiv too for consistency.

llvm-svn: 367146
2019-07-26 19:56:59 +00:00
Vlad Tsyrklevich 485b8789de Revert "[X86][SSE] Replace PMULDQ GetDemandedBits combine with SimplifyMultipleUseDemandedBits handler."
This reverts r367100, it appears to be causing test failures after
Nico's revert of r367091.

llvm-svn: 367141
2019-07-26 18:14:21 +00:00
Sean Fertile 9df6177d38 [PowerPC][AIX]Add lowering of MCSymbol MachineOperand.
Adds machine operand lowering for MCSymbolSDNodes to the PowerPC
backend. This is needed to produce call instructions in assembly for AIX
because the callee operand is a MCSymbolSDNode. The test is XFAIL'ed for
asserts due to a (valid) assertion in PEI that the AIX ABI isn't supported yet.

Differential Revision: https://reviews.llvm.org/D63738

llvm-svn: 367133
2019-07-26 17:25:27 +00:00
Michael Liao 711556e6a8 [AMDGPU] Fix typo.
llvm-svn: 367131
2019-07-26 17:13:59 +00:00
Cullen Rhodes 2cde8b5db6 [AArch64][SVE2] Rename bitperm feature to sve2-bitperm
Summary:
The bitperm feature flag is now prefixed with SVE2, as it is for all other SVE2
extensions

Patch by Maciej Gabka.

Reviewers: sdesmalen, rovka, chill, SjoerdMeijer, rengolin

Reviewed By: SjoerdMeijer, rengolin

Differential Revision: https://reviews.llvm.org/D65327

llvm-svn: 367124
2019-07-26 15:57:50 +00:00
Nico Weber 13f337c4cb Revert r367091, it caused PR42777.
llvm-svn: 367118
2019-07-26 14:58:42 +00:00
Sam Parker 3da59e5513 [ARM][ParallelDSP] Combine structs
Combine OpChain and BinOpChain structs as OpChain is a base class to
BinOpChain that is never used.

llvm-svn: 367114
2019-07-26 14:11:40 +00:00
Sean Fertile 9bd22fec0d [PowerPC] Add getCRSaveOffset to improve readability. [NFC]
In preperation for AIX support in FrameLowering: replace a number of literal
'8' that represent the stack offset of the condition register save area with
a member in PPCFrameLowering.

Patch by Chris Bowler.

llvm-svn: 367111
2019-07-26 14:02:17 +00:00
Petar Avramovic cf21794566 [MIPS GlobalISel] Fix check for void return during lowerCall
Void return used to have unsigned with value 0 for virtual register
but with addition of Register class and changes to arguments to lowerCall
this is no longer valid.
Check for void return by inspecting the Ty field in OrigRet.

Differential Revision: https://reviews.llvm.org/D65321

llvm-svn: 367107
2019-07-26 13:19:37 +00:00
Carl Ritson 0b28357053 [AMDGPU] Move WQM/WWM intrinsic instruction selection to AMDGPUISelDAGToDAG
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65328

llvm-svn: 367105
2019-07-26 13:11:44 +00:00
Petar Avramovic b1fc6f6130 [MIPS GlobalISel] Select inttoptr and ptrtoint
Select G_INTTOPTR and G_PTRTOINT for MIPS32.

Differential Revision: https://reviews.llvm.org/D65217

llvm-svn: 367104
2019-07-26 13:08:06 +00:00
Sanjay Patel c229cfeb7a [InstCombine] remove flop from lerp patterns
(Y * (1.0 - Z)) + (X * Z) -->
Y - (Y * Z) + (X * Z) -->
Y + Z * (X - Y)

This is part of solving:
https://bugs.llvm.org/show_bug.cgi?id=42716

Factoring eliminates an instruction, so that should be a good canonicalization.
The potential conversion to FMA would be handled by the backend based on target
capabilities.

Differential Revision: https://reviews.llvm.org/D65305

llvm-svn: 367101
2019-07-26 11:19:18 +00:00
Simon Pilgrim d93e8ece7b [X86][SSE] Replace PMULDQ GetDemandedBits combine with SimplifyMultipleUseDemandedBits handler.
This removes a GetDemandedBits user and allows us to benefit from the DemandedElts propagated through SimplifyDemandedBits.

llvm-svn: 367100
2019-07-26 11:10:20 +00:00
Sam Parker 7440065bd8 [NFC][ARM][ParallelDSP] Cleanup isNarrowSequence
Remove unused logic.

llvm-svn: 367099
2019-07-26 10:57:42 +00:00
Simon Pilgrim a424a1f351 [SelectionDAG] GetDemandedBits - update SIGN_EXTEND_INREG op to just call SimplifyMultipleUseDemandedBits.
llvm-svn: 367098
2019-07-26 10:03:07 +00:00
Carl Ritson 00e89b428b [AMDGPU] Add llvm.amdgcn.softwqm intrinsic
Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm
only if there is other WQM computation in the shader.

Reviewers: nhaehnle, tpr

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64935

llvm-svn: 367097
2019-07-26 09:54:12 +00:00
Simon Pilgrim 9758407bf1 [TargetLowering] SimplifyMultipleUseDemandedBits - add SIGN_EXTEND_INREG support.
llvm-svn: 367096
2019-07-26 09:41:08 +00:00
Momchil Velikov 898d953693 [AArch64] Define ETE and TRBE system registers
Embedded Trace Extension and Trace Buffer Extension are optional
future architecture extensions.
(cf. https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools)

Their system registers are documented here:
https://developer.arm.com/docs/ddi0601/a

ETE shares register names with ETM. One exception is the ETE
TRCEXTINSELR0 register, which has the same encoding as the ETM
TRCEXTINSELR register (but different semantics). This patch treats
them as aliases: the assembler will accept both names, emitting
identical encoding, and the disassembler will keep disassembling
to TRCEXRINSELR.

Differential Revision: https://reviews.llvm.org/D63707

llvm-svn: 367093
2019-07-26 09:19:08 +00:00
Simon Pilgrim d0164fc525 [SelectionDAG] GetDemandedBits - update OR/XOR ops to just call SimplifyMultipleUseDemandedBits.
Eventually all of these will be moved over, but we create nodes in GetDemandedBits recursion at the moment which causes regressions when we try to remove them all.

llvm-svn: 367092
2019-07-26 09:13:29 +00:00
Simon Pilgrim b32ceb79b0 [TargetLowering] SimplifyMultipleUseDemandedBits - add BITCAST pass through support.
This allows us to peek through BITCASTs and attempt simplify the source operand, and then bitcast back.

llvm-svn: 367091
2019-07-26 08:38:39 +00:00
Sam Parker c760b5da11 [ARM][LowOverheadLoops] Add CPSR defs
Both WhileLoopStart and LoopEnd may get turned into a cmp and br pair,
so add an implicit def to these pseudo instructions in case that WLS
and LE aren't generated.

Differential Revision: https://reviews.llvm.org/D65275

llvm-svn: 367089
2019-07-26 08:15:01 +00:00
Pengfei Wang 9ad565f70e [WinEH] Allocate space in funclets stack to save XMM CSRs
Summary:
This is an alternate approach to D57970.
Currently funclets reuse the same stack slots that are used in the
parent function for saving callee-saved xmm registers. If the parent
function modifies a callee-saved xmm register before an excpetion is
thrown, the catch handler will overwrite the original saved value.

This patch allocates space in funclets stack for saving callee-saved xmm
registers and uses RSP instead RBP to access memory.

Reviewers: andrew.w.kaylor, LuoYuanke, annita.zhang, craig.topper,
RKSimon

Subscribers: rnk, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63396

Signed-off-by: pengfei <pengfei.wang@intel.com>
llvm-svn: 367088
2019-07-26 07:33:15 +00:00
Serguei Katkov 7f8c809592 [Loop Utils] Extend the scope of addStringMetadataToLoop.
To avoid duplicates in loop metadata, if the string to add is
already there, just update the value.

Reviewers: reames, Ashutosh
Reviewed By: reames
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D65265

llvm-svn: 367087
2019-07-26 07:04:34 +00:00
Serguei Katkov 3c3a76527e [Loop Utils] Move utilty addStringMetadataToLoop to LoopUtils.cpp. NFC.
Just move the utility function to LoopUtils.cpp to re-use it in loop peeling.

Reviewers: reames, Ashutosh
Reviewed By: reames
Subscribers: hiraditya, asbirlea, llvm-commits
Differential Revision: https://reviews.llvm.org/D65264

llvm-svn: 367085
2019-07-26 06:10:08 +00:00
Yi Kong 1755abe1fb Fix macOS build after r358716
COPYFILE_CLONE is only defined on newer macOS versions, using it without
check breaks build on systems running legacy OS and toolchain.

Differential Revision: https://reviews.llvm.org/D65317

llvm-svn: 367084
2019-07-26 05:17:14 +00:00
Kang Zhang 4e794a8bae Some case eror for: detected memory leaks
llvm-svn: 367083
2019-07-26 03:25:58 +00:00
Matt Arsenault a9ea8a9aae AMDGPU/GlobalISel: Handle most function return types
handleAssignments gives up pretty easily on structs, and i8 values for
some reason. The other case that doesn't work is when an implicit sret
needs to be inserted if the return size exceeds the number of return
registers.

llvm-svn: 367082
2019-07-26 02:36:05 +00:00
Kang Zhang 5c61015455 [PowerPC] Do the Simple Early Return in block-placement pass to optimize the blocks
Summary:
In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun.
But the `early-ret` pass is before `block-placement`, we don't want to run it again.
This patch is to do the simple early return to optimize the blocks at the last of `block-placement`. 

Below is an example
```
BB:                   | BB:
   XOR 3, 3, 4        |   XOR 3, 3, 4
   B TBB              |   B ChainBB
...                   | ...
ChainBB:              | ChainBB:
   B TBB              |   ADD 3, 3, 4
...                   |   BLR
TBB:                  |
   ADD 3, 3, 4        |
   BLR                |
```

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D63972

llvm-svn: 367080
2019-07-26 01:58:53 +00:00
Francis Visoiu Mistrih 2d8fdcae96 Reland: [Remarks] Add support for serializing metadata for every remark streamer
This allows every serializer format to implement metaSerializer() and
return the corresponding meta serializer.

Original llvm-svn: 366946
Reverted llvm-svn: 367004

This fixes the unit tests on Windows bots.

llvm-svn: 367078
2019-07-26 01:33:30 +00:00
Amara Emerson c07fe307b4 [AArch64][GlobalISel] Simplify zext/sext selection, use MachineIRBuilder. NFC.
llvm-svn: 367075
2019-07-26 00:01:09 +00:00
Francis Visoiu Mistrih 0503add6da [CodeGen] Don't resolve the stack protector frame accesses until PEI
Currently, stack protector loads and stores are resolved during
LocalStackSlotAllocation (if the pass needs to run). When this is the
case, the base register assigned to the frame access is going to be one
of the vregs created during LocalStackSlotAllocation. This means that we
are keeping a pointer to the stack protector slot, and we're using this
pointer to load and store to it.

In case register pressure goes up, we may end up spilling this pointer
to the stack, which can be a security concern.

Instead, leave it to PEI to resolve the frame accesses. In order to do
that, we make all stack protector accesses go through frame index
operands, then PEI will resolve this using an offset from sp/fp/bp.

Differential Revision: https://reviews.llvm.org/D64759

llvm-svn: 367068
2019-07-25 22:23:48 +00:00
Yonghong Song 329abf2939 [BPF] fix typedef issue for offset relocation
Currently, the CO-RE offset relocation does not work
if any struct/union member or array element is a typedef.
For example,
  typedef const int arr_t[7];
  struct input {
      arr_t a;
  };
  func(...) {
       struct input *in = ...;
       ... __builtin_preserve_access_index(&in->a[1]) ...
  }
The BPF backend calculated default offset is 0 while
4 is the correct answer. Similar issues exist for struct/union
typedef's.

When getting struct/union member or array element type,
we should trace down to the type by skipping typedef
and qualifiers const/volatile as this is what clang did
to generate getelementptr instructions.
(const/volatile member type qualifiers are already
ignored by clang.)

This patch fixed this issue, for each access index,
skipping typedef and const/volatile/restrict BTF types.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D65259

llvm-svn: 367062
2019-07-25 21:47:27 +00:00
Alex Lorenz b680422ef8 [FileCollector] add support for recording empty directories
The file collector class is useful for constructing reproducers by
creating a snapshot of the files that are accessed. Sometimes it might
also be important to construct directories that don't necessarily have files,
but are still accessed by some tool that we want to make a reproducer for.
This is useful for instance for modeling the behavior of Clang's header search,
which scans through a number of directories it doesn't actually access when
looking for framework headers. This commit extends the file collector to allow
it to work with paths that are just directories, by constructing them as the
files are copied over.

Differential Revision: https://reviews.llvm.org/D65297

llvm-svn: 367061
2019-07-25 21:47:11 +00:00
Amara Emerson e54dc6b8b5 [AArch64][GlobalISel] Fix G_SELECT legalization fallback after r366943.
Changes the order of legalization of G_ICMP suggested by Petar in D65079.

llvm-svn: 367060
2019-07-25 21:44:52 +00:00
Leonard Chan 007f674c6a Reland the "[NewPM] Port Sancov" patch from rL365838. No functional
changes were made to the patch since then.

--------

[NewPM] Port Sancov

This patch contains a port of SanitizerCoverage to the new pass manager. This one's a bit hefty.

Changes:

- Split SanitizerCoverageModule into 2 SanitizerCoverage for passing over
  functions and ModuleSanitizerCoverage for passing over modules.
- ModuleSanitizerCoverage exists for adding 2 module level calls to initialization
  functions but only if there's a function that was instrumented by sancov.
- Added legacy and new PM wrapper classes that own instances of the 2 new classes.
- Update llvm tests and add clang tests.

llvm-svn: 367053
2019-07-25 20:53:15 +00:00
Florian Hahn c74808b914 [PredicateInfo] Replace pointer comparisons with deterministic compares.
Currently there are a few pointer comparisons in ValueDFS_Compare, which
can cause non-deterministic ordering when materializing values. There
are 2 cases this patch fixes:

1. Order defs before uses used to compare pointers, which guarantees
   defs before uses, but causes non-deterministic ordering between 2
   uses or 2 defs, depending on the allocation order. By converting the
   pointers to booleans, we can circumvent that problem.

2. comparePHIRelated was comparing the basic block pointers of edges,
   which also results in a non-deterministic order and is also not
   really meaningful for ordering. By ordering by their destination DFS
   numbers we guarantee a deterministic order.

For the example below, we can end up with 2 different uselist orderings,
when running `opt -mem2reg -ipsccp` hundreds of times. Because the
non-determinism is caused by allocation ordering, we cannot reproduce it
with ipsccp alone.

    declare i32 @hoge() local_unnamed_addr #0

    define dso_local i32 @ham(i8* %arg, i8* %arg1) #0 {
    bb:
      %tmp = alloca i32
      %tmp2 = alloca i32, align 4
      br label %bb19

    bb4:                                              ; preds = %bb20
      br label %bb6

    bb6:                                              ; preds = %bb4
      %tmp7 = call i32 @hoge()
      store i32 %tmp7, i32* %tmp
      %tmp8 = load i32, i32* %tmp
      %tmp9 = icmp eq i32 %tmp8, 912730082
      %tmp10 = load i32, i32* %tmp
      br i1 %tmp9, label %bb11, label %bb16

    bb11:                                             ; preds = %bb6
      unreachable

    bb13:                                             ; preds = %bb20
      br label %bb14

    bb14:                                             ; preds = %bb13
      %tmp15 = load i32, i32* %tmp
      br label %bb16

    bb16:                                             ; preds = %bb14, %bb6
      %tmp17 = phi i32 [ %tmp10, %bb6 ], [ 0, %bb14 ]
      br label %bb19

    bb18:                                             ; preds = %bb20
      unreachable

    bb19:                                             ; preds = %bb16, %bb
      br label %bb20

    bb20:                                             ; preds = %bb19
      indirectbr i8* null, [label %bb4, label %bb13, label %bb18]
    }

Reviewers: davide, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D64866

llvm-svn: 367049
2019-07-25 20:48:13 +00:00
Serguei Katkov cde00c02e1 [Loop Peeling] Fix idom detection algorithm.
We'd like to determine the idom of exit block after peeling one iteration.
Let Exit is exit block.
Let ExitingSet - is a set of predecessors of Exit block. They are exiting blocks.
Let Latch' and ExitingSet' are copies after a peeling.
We'd like to find an idom'(Exit) - idom of Exit after peeling.
It is an evident that idom'(Exit) will be the nearest common dominator of ExitingSet and ExitingSet'.
idom(Exit) is a nearest common dominator of ExitingSet.
idom(Exit)' is a nearest common dominator of ExitingSet'.
Taking into account that we have a single Latch, Latch' will dominate Header and idom(Exit).
So the idom'(Exit) is nearest common dominator of idom(Exit)' and Latch'.
All these basic blocks are in the same loop, so what we find is
(nearest common dominator of idom(Exit) and Latch)'.

Reviewers: reames, fhahn
Reviewed By: reames
Subscribers: hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D65292

llvm-svn: 367044
2019-07-25 19:31:50 +00:00
Sanjay Patel b456310902 [SimplifyCFG] avoid crashing after simplifying a switch (PR42737)
Later code in TryToSimplifyUncondBranchFromEmptyBlock() assumes that
we have cleaned up unreachable blocks, but that was not happening
with this switch transform.

llvm-svn: 367037
2019-07-25 17:01:12 +00:00
JF Bastien cbeff368fc Make GCC happy about attribute location
It doesn't like function attributes on definitions, only declarations.

llvm-svn: 367036
2019-07-25 16:58:15 +00:00
JF Bastien 463e9bdfa9 Fix unused function from r367031
llvm-svn: 367035
2019-07-25 16:50:10 +00:00
Whitney Tsang 8ee361ebe5 [LOOPINFO] Introduce the loop guard API.
Summary:
This is the first patch for the loop guard. We introduced
getLoopGuardBranch() and isGuarded().
This currently only works on simplified loop, as it requires a preheader
and a latch to identify the guard.
It will work on loops of the form:
/// GuardBB:
///   br cond1, Preheader, ExitSucc <== GuardBranch
/// Preheader:
///   br Header
/// Header:
///  ...
///   br Latch
/// Latch:
///   br cond2, Header, ExitBlock
/// ExitBlock:
///   br ExitSucc
/// ExitSucc:
Prior discussions leading upto the decision to introduce the loop guard
API: http://lists.llvm.org/pipermail/llvm-dev/2019-May/132607.html
Reviewer: reames, kbarton, hfinkel, jdoerfert, Meinersbur, dmgreen
Reviewed By: reames
Subscribers: wuzish, hiraditya, jsji, llvm-commits, bmahjour, etiotto
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63885

llvm-svn: 367033
2019-07-25 16:13:18 +00:00
JF Bastien dbc0a5df8d Allow prefetching from non-zero address spaces
Summary:
This is useful for targets which have prefetch instructions for non-default address spaces.

<rdar://problem/42662136>

Subscribers: nemanjai, javed.absar, hiraditya, kbarton, jkorous, dexonsmith, cfe-commits, llvm-commits, RKSimon, hfinkel, t.p.northover, craig.topper, anemet

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65254

llvm-svn: 367032
2019-07-25 16:11:57 +00:00
JF Bastien eb3c1ca896 CrashHandler: be careful about crashing while handling
Summary:
Looking at the current Apple-specific code for crash handling it does a few
silly things that I think we should avoid while handling crashes:

  * Try real hard not to allocate.
  * Set the global crash reporter string early so that any crash while
    generating the stack trace will still report some info.
  * Prevent reordering of operations in the current thread.

<rdar://problem/53503334>

Subscribers: hiraditya, jkorous, dexonsmith, llvm-commits, beanz, Bigcheese, thakis, lattner, jordan_rose

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65235

llvm-svn: 367031
2019-07-25 16:07:41 +00:00
Yonghong Song d8efec97be [BPF] fix CO-RE incorrect index access string
Currently, we expect the CO-RE offset relocation records
a string encoding the original getelementptr access index,
so kernel bpf loader can decode it correctly.

For example,
  struct s { int a; int b; };
  struct t { int c; int d; };
  #define _(x) (__builtin_preserve_access_index(x))
  int get_value(const void *addr1, const void *addr2);
  int test(struct s *arg1, struct t *arg2) {
    return get_value(_(&arg1->b), _(&arg2->d));
  }

We expect two offset relocations:
  reloc 1: type s, access index 0, 1
  reloc 2: type t, access index 0, 1

Two globals are created to retain access indexes for the
above two relocations with global variable names.
The first global has a name "0:1:". Unfortunately,
the second global has the name "0:1:.1" as the llvm
internals automatically add suffix ".1" to a global
with the same name. Later on, the BPF peels the last
character and record "0:1" and "0:1:." in the
relocation table.

This is not desirable. BPF backend could use the global
variable suffix knowledge to generate correct access str.
This patch rather took an approach not relying on
that knowledge. It generates "s:0:1:" and "t:0:1:" to
avoid global variable suffixes and later on generate
correct index access string "0:1" for both records.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D65258

llvm-svn: 367030
2019-07-25 16:01:26 +00:00
Vlad Tsyrklevich 5d5a58317c Revert "[InstCombine] try to narrow a truncated load"
This reverts commit bc4a63fd3c, this is a
speculative revert to fix a number of sanitizer bots (like
sanitizer-x86_64-linux-bootstrap-ubsan) that have started to see stage2
compiler crashes, presumably due to a miscompile.

llvm-svn: 367029
2019-07-25 15:37:57 +00:00
Florian Hahn c0d0e3bda8 [PredicateInfo] Use SmallVector instead of SmallPtrSet.
We do not need the SmallPtrSet to avoid adding duplicates to
OpsToRename, because we already keep a ValueInfo mapping. If we see an
op for the first time, Infos will be empty and we can also add it to
OpsToRename.

We process operands by visiting BBs depth-first and then iterate over
all instructions & users, so the order should be deterministic.
Therefore we can skip one round of sorting, which we purely needed for
guaranteeing a deterministic order when iterating over the SmallPtrSet.

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D64816

llvm-svn: 367028
2019-07-25 15:35:10 +00:00
Michael Liao 53f967f2bd [AMDGPU] Run `unreachable-mbb-elimination` after isel to clean up PHIs.
Summary:
- As LCSSA is turned on just before isel, it may create PHI of the flow,
  which is consumed by pseudo structurized CFG instructions. When that
  PHIs are eliminated in O0, COPY may be placed wrongly as the these
  pseudo structurized CFG instructions are considering prologue of MBB.
- Run extra `unreachable-mbb-elimination` at the end of isel to clean up
  PHIs.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64353

llvm-svn: 367023
2019-07-25 14:50:18 +00:00
Momchil Velikov a655f476b0 [AArch64][SVE] Allow explicit size specifier for predicate operand
... for the vector forms of `{SQ,UQ,}{INC,DEC}P` instructions. Also continue
supporting the exsting behaviour of not requiring an explicit size
specifier. The preferred disasembly is *with* the specifier.

This is implemented by redefining intruction forms to require vector predicates
with explicit size and adding aliases, which allow a predicate with no size.

Differential Revision: https://reviews.llvm.org/D65145

llvm-svn: 367019
2019-07-25 13:56:04 +00:00
Matt Arsenault a85af76c72 AMDGPU: Don't assert on v4f16 arguments to shader calling conventions
llvm-svn: 367018
2019-07-25 13:55:07 +00:00
Sanjay Patel 38a0200868 [Utils] remove duplicated documentation comments; NFC
http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments

llvm-svn: 367015
2019-07-25 13:11:21 +00:00
Simon Pilgrim 447fe31964 [X86] concatSubVectors - remove unnecessary args. NFCI.
All these args can be cheaply recomputed and it makes it much easier to use the function as a quick helper.

llvm-svn: 367014
2019-07-25 13:05:46 +00:00
Sanjay Patel bc4a63fd3c [InstCombine] try to narrow a truncated load
trunc (load X) --> load (bitcast X to narrow type)

We have this transform in DAGCombiner::ReduceLoadWidth(), but the truncated
load pattern can interfere with other instcombine transforms, so I'd like to
allow the fold sooner.

Example:
https://bugs.llvm.org/show_bug.cgi?id=16739
...in that report, we have bitcasts bracketing these ops, so those could get
eliminated too.

We've generally ruled out widening of loads early in IR ( LoadCombine -
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105291.html ), but
that reasoning may not apply to narrowing if we can preserve information
such as the dereferenceable range.

Differential Revision: https://reviews.llvm.org/D64432

llvm-svn: 367011
2019-07-25 12:14:27 +00:00
Pablo Barrio 275954539d [ARM][AArch64] Support for Cortex-A65 & A65AE, Neoverse E1 & N1
Summary:
Add support for Cortex-A65, Cortex-A65AE, Neoverse E1 and Neoverse N1.
Neoverse E1 and Cortex-A65(&AE) only implement the AArch64 state of the
Arm architecture. Neoverse N1 implements both AArch32 and AArch64.

Cortex-A65:
https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65

Cortex-A65AE:
https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65ae

Neoverse E1:
https://developer.arm.com/ip-products/processors/neoverse/neoverse-e1

Neoverse N1:
https://developer.arm.com/ip-products/processors/neoverse/neoverse-n1

Patch by Diogo Sampaio and Pablo Barrio

Reviewers: samparker, LukeCheeseman, sbaranga, ostannard

Reviewed By: ostannard

Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64406

llvm-svn: 367007
2019-07-25 10:59:45 +00:00
Simon Pilgrim 55fd57ba95 Revert rL366946 : [Remarks] Add support for serializing metadata for every remark streamer
This allows every serializer format to implement metaSerializer() and
return the corresponding meta serializer.
........
Fix windows build bots
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-win-fast
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win

llvm-svn: 367004
2019-07-25 10:20:39 +00:00
Fangrui Song 058858851c [MC] Delete unused MCInstPrinter::markup overload and getPrintHexStyle
llvm-svn: 367000
2019-07-25 09:54:12 +00:00
Florian Hahn 5354c83ece [IPSCCP] Add assertion to surface cases where we zap returns with overdefined users.
We should only zap returns in functions, where all live users have a
replace-able value (are not overdefined). Unused return values should be
undefined.

This should make it easier to detect bugs like in PR42738.

Alternatively we could bail out of zapping the function returns, but I
think it would be better to address those divergences between function
and call-site values where they are actually caused.

Reviewers: davide, efriedma

Reviewed By: davide, efriedma

Differential Revision: https://reviews.llvm.org/D65222

llvm-svn: 366998
2019-07-25 09:37:09 +00:00
Kai Luo 985e52a4c1 [PowerPC][NFC] Make `getDefMIPostRA` public
llvm-svn: 366995
2019-07-25 08:36:44 +00:00
Sjoerd Meijer 5c606cef79 [LV] Scalar Epilogue Lowering. NFC.
This refactors boolean 'OptForSize' that was passed around in a lot of places.
It controlled folding of the tail loop, the scalar epilogue, into the main loop
but code-size reasons may not be the only reason to do this. Thus, this is a
first step to generalise the concept of tail-loop folding, and hence OptForSize
has been renamed and is using an enum ScalarEpilogueStatus that holds the
status how the epilogue should be lowered.

This will be followed up by D65197, that picks up the predicate loop hint and
performs the tail-loop folding.

Differential Revision: https://reviews.llvm.org/D64916

llvm-svn: 366993
2019-07-25 08:06:02 +00:00
Kai Luo 5c8af53806 [PowerPC][NFC] Added `getDefMIPostRA` method
Summary:
In PostRA phase, we often have to find out the most recent definition
of a register.  This patch adds getDefMIPostRA so that other methods
can use it rather than implementing it repeatedly.

Differential Revision: https://reviews.llvm.org/D65131

llvm-svn: 366990
2019-07-25 07:47:52 +00:00
Seiya Nuta 21277e3ec2 [MC] Add MCInstrAnalysis::evaluateMemoryOperandAddress
Summary:
Add a new method which tries to compute the target address referenced by an operand.

This patch supports x86_64 RIP-relative addressing for now.

It is necessary to print referenced symbol names in llvm-objdump.

Reviewers: andreadb, MaskRay, grosbach, jgalenson, craig.topper

Reviewed By: MaskRay, craig.topper

Subscribers: bcain, rupprecht, jhenderson, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63847

llvm-svn: 366987
2019-07-25 06:57:09 +00:00
Chen Zheng a2d74d3d90 [PowerPC] exclude more icmps in LSR which is converted in later hardware loop pass
Differential Revision: https://reviews.llvm.org/D64795

llvm-svn: 366976
2019-07-25 01:22:08 +00:00
Shoaib Meenai a67f6f1746 [Object] Add public MaxSectionAlignment to MachOUniversal
Change MAXSECTALIGN to a public MaxSectionAlignment in MachOUniversal.
Will be used in a follow-up.

Patch by Anusha Basana <anusha.basana@gmail.com>

Differential Revision: https://reviews.llvm.org/D65117

llvm-svn: 366969
2019-07-25 00:29:13 +00:00
Jonas Devlieghere eb1b4c5d4c [FileCollector] Change coding style from LLDB to LLVM (NFC)
This patch changes the coding style of the FileCollector from the LLDB
to the LLVM coding style. Alex recently lifted it into LLVM and I
volunteered to do the conversion.

llvm-svn: 366966
2019-07-25 00:17:39 +00:00
Francis Visoiu Mistrih ab56cf8914 [Remarks][NFC] Rename remarks::Parser to remarks::RemarkParser
llvm-svn: 366965
2019-07-25 00:16:56 +00:00
Eli Friedman 82e109279d [ARM] Remove dead code from ARMConstantIslands.
tLDRHi is not a pc-relative load; it can't directly refer to a
constant pool or jump table.

llvm-svn: 366963
2019-07-24 23:36:14 +00:00
Evandro Menezes 5cd5f9b65d [InstCombine] Swap order of checks to improve compile time (NFC)
llvm-svn: 366962
2019-07-24 23:31:04 +00:00
Jessica Paquette 728b18f29f [AArch64][GlobalISel] Select immediate modes for ADD when selecting G_GEP
Before, we weren't able to select things like this for G_GEP:

add	x0, x8, #8

And instead we'd materialize the 8.

This teaches GISel to do that. It gives some considerable code size savings
on 252.eon-- about 4%!

Differential Revision: https://reviews.llvm.org/D65248

llvm-svn: 366959
2019-07-24 23:11:01 +00:00
Amara Emerson de81bd0faa [AArch64][GlobalISel] Don't try to use GISel if subtarget doesn't have neon or fp.
Throughout the legalizerinfo we currently make the assumption that the target
has neon and FP target features available. Fixing it will require a refactor of
the whole thing, so until then make sure we fall back.

Works around PR42734

Differential Revision: https://reviews.llvm.org/D65244

llvm-svn: 366957
2019-07-24 23:00:04 +00:00
Alex Lorenz 86814bf658 [Support] move FileCollector from LLDB to llvm/Support
The file collector class is useful for creating reproducers,
not just for LLDB, but for other tools as well in LLVM/Clang.

Differential Revision: https://reviews.llvm.org/D65237

llvm-svn: 366956
2019-07-24 22:59:20 +00:00
Roman Lebedev 017e272c3a [Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold
Summary:
This was originally reported in D62818.
https://rise4fun.com/Alive/oPH

InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression
will be hoisted out of a loop if `Y` is invariant and `X` is not.
But as it is seen from the diffs here, if it didn't get hoisted,
the produced assembly is almost universally worse.

Much like with my recent "hoist add/sub by/from const" patches,
we should get almost universal win if we hoist constant,
there is almost always an "and/test by imm" instruction,
but "shift of imm" not so much, so we may avoid having to
materialize the immediate, and thus need one less register.
And since we now shift not by constant, but by something else,
the live-range of that something else may reduce.

Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit`
instruction pattern. And to not get into endless combine loop.

Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm

Reviewed By: spatel

Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62871

llvm-svn: 366955
2019-07-24 22:57:22 +00:00
Jessica Paquette 68499112cf [AArch64][GlobalISel] Fold G_MUL into XRO load addressing mode when possible
If we have a G_MUL, and either the LHS or the RHS of that mul is the legal
shift value for a load addressing mode, we can fold it into the load.

This gives some code size savings on some SPEC tests. The best are around 2%
on 300.twolf and 3% on 254.gap.

Differential Revision: https://reviews.llvm.org/D65173

llvm-svn: 366954
2019-07-24 22:49:42 +00:00
Peter Collingbourne 72391ab4f1 IR: Teach GlobalIndirectSymbol::getBaseObject() to handle more kinds of expressions.
For aliases, any expression that lowers at the MC level to global_object or
global_object+constant is valid at the object file level. getBaseObject()
should return a result if the aliasee ends up being of that form even if
the IR used to produce it is somewhat unconventional.

Note that this is different from what stripInBoundsOffsets() and that family
of functions is doing. Those functions are concerned about semantic properties
of IR, whereas here we only care about the lowering result.

Therefore reimplement getBaseObject() in a way that matches the lowering
result. This fixes a crash when producing a summary for aliases such as
that in the included test case.

Differential Revision: https://reviews.llvm.org/D65115

llvm-svn: 366952
2019-07-24 22:23:05 +00:00
Amara Emerson 13af1ed8e3 [GlobalISel] Support for inlining memcpy, memset and memmove calls.
This introduces a new family of combiner helper routines that re-use the
target specific cost model from SelectionDAG, and generate inline implementations
of the memcpy family of intrinsics.

The combines are only enabled at optimization levels higher than -O0, and give
very substantial performance improvements.

Differential Revision: https://reviews.llvm.org/D65167

llvm-svn: 366951
2019-07-24 22:17:31 +00:00
Sanjay Patel 86e9f9dc26 [Transforms] move copying of load metadata to helper function; NFC
There's another proposed load combine that can make use of this code
in D64432.

llvm-svn: 366949
2019-07-24 22:11:11 +00:00
Francis Visoiu Mistrih 62388e3846 [Remarks] Add support for serializing metadata for every remark streamer
This allows every serializer format to implement metaSerializer() and
return the corresponding meta serializer.

llvm-svn: 366946
2019-07-24 21:29:44 +00:00
Craig Topper e9abc8177a [InstCombine] Teach foldOrOfICmps to allow icmp eq MIN_INT/MAX to be part of a range comparision. Similar for foldAndOfICmps
We can treat icmp eq X, MIN_UINT as icmp ule X, MIN_UINT and allow
it to merge with icmp ugt X, C. Similar for the other constants.

We can do simliar for icmp ne X, (U)INT_MIN/MAX in foldAndOfICmps. And we already handled UINT_MIN there.

Fixes PR42691.

Differential Revision: https://reviews.llvm.org/D65017

llvm-svn: 366945
2019-07-24 20:57:29 +00:00
Amara Emerson a1997ce2e5 [AArch64][GlobalISel] Fix a crash during s128 G_ICMP legalization due to r366317.
r366317 added a legalization for s128 G_ICMP narrow scalar which tried to hard
code the result type of the new legalized G_SELECT. Change this to instead use
type of the original G_ICMP result and allow the target to legalize it if necessary
later.

llvm-svn: 366943
2019-07-24 20:46:42 +00:00
David Bolvansky d2904ccf88 Let CorrelatedValuePropagation preserve LazyValueInfo
Summary:
This patch makes CorrelatedValuePropagation preserve LazyValueInfo by adding LazyValueInfo::eraseValue & calling it whenever an instruction is erased.

Passes `make check` , test-suite, and SPECrate 2017.

Patch by aqjune (Juneyoung Lee)

Reviewers: reames, mzolotukhin

Reviewed By: reames

Subscribers: xbolva00, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59349

llvm-svn: 366942
2019-07-24 20:27:32 +00:00
Francis Visoiu Mistrih ff4b515a77 [Remarks][NFC] Rename remarks::Serializer to remarks::RemarkSerializer
llvm-svn: 366939
2019-07-24 19:47:57 +00:00
Stanislav Mekhanoshin c43784ff26 [AMDGPU] Increase kernel padding
To support prefetch mode 3 we need to pad current
cacheline and fill 3 cachelines after. Current padding
is only sufficient for mode 2.

Differential Revision: https://reviews.llvm.org/D65236

llvm-svn: 366938
2019-07-24 19:40:13 +00:00
Simon Pilgrim 2bf871be4c Fix signed/unsigned comparison warning. NFCI.
llvm-svn: 366935
2019-07-24 17:44:22 +00:00
David Green cd7a6fa314 [ARM] Rewrite how VCMP are lowered, using a single node
This removes the VCEQ/VCNE/VCGE/VCEQZ/etc nodes, just using two called VCMP and
VCMPZ with an extra operand as the condition code. I believe this will make
some combines simpler, allowing us to just look at these codes and not the
operands. It also helps fill in a missing VCGTUZ MVE selection without adding
extra nodes for it.

Differential Revision: https://reviews.llvm.org/D65072

llvm-svn: 366934
2019-07-24 17:36:47 +00:00
Simon Pilgrim 7d318b2bb1 [DAGCombine] matchBinOpReduction - add partial reduction matching
This patch adds support for recognizing cases where a larger vector type is being used to reduce just the elements in the lower subvector:

e.g. <8 x i32> reduction pattern in a <16 x i32> vector:

<4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u>
<2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u>
<1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u>

matchBinOpReduction returns the lower extracted subvector in such cases, assuming isExtractSubvectorCheap accepts the extraction.

I've only enabled it for X86 reduction sums so far. I intend to enable it for the bitop/minmax cases in future patches, and eventually I think its worth turning it on all the time. This is mainly just a case of ensuring calls to matchBinOpReduction don't make assumptions on the vector width based on the original vector extraction.

Fixes the x86 partial reduction sum cases in PR33758 and PR42023.

Differential Revision: https://reviews.llvm.org/D65047

llvm-svn: 366933
2019-07-24 17:29:56 +00:00
David Green 047a0b6575 [ARM] Disable MVE fptosi and friends
The prevents us from trying to convert an i1 predicate vector to a float, or
vice-versa. Better patterns are possible, which will follow in a subsequent
commit. For now we just expand them.

Differential Revision: https://reviews.llvm.org/D65066

llvm-svn: 366931
2019-07-24 17:26:26 +00:00
Jessica Paquette c19c30776a [AArch64][GlobalISel] Make vector dup optimization look at last elt of ZeroVec
Fix an off-by-one error which made us not look at the last element of the
zero vector. This caused a miscompile in 188.ammp.

Differential Revision: https://reviews.llvm.org/D65168

llvm-svn: 366930
2019-07-24 17:18:51 +00:00
David Green b342bddbe2 [ARM] More MVE compare vector splat combines for ANDs
Adds some extra r register compare combines, this time for ANDs.

Differential Revision: https://reviews.llvm.org/D65062

llvm-svn: 366928
2019-07-24 17:08:09 +00:00
David Green 93b5f61295 [ARM] MVE compare vector splat combine
MVE VCMP instructions can use a general purpose register as the second operand.
This adds the combines for it, selecting from a compare of a vdup.

Differential Revision: https://reviews.llvm.org/D65061

llvm-svn: 366924
2019-07-24 16:58:41 +00:00
Simon Pilgrim 3f01c7197f [SelectionDAG] makeEquivalentMemoryOrdering - early out for equal chains (PR42727)
If we are already using the same chain for the old/new memory ops then just return.

Fixes PR42727 which had getLoad() reusing an existing node.

llvm-svn: 366922
2019-07-24 16:53:14 +00:00
Dmitry Preobrazhensky 5e1dd02c90 [AMDGPU][MC][GFX10] Enabled GFX10 assembly with arbitrary wavesize assumed by the code
Reviewers: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D65216

llvm-svn: 366921
2019-07-24 16:50:17 +00:00
David Green bab4d8ac5a [ARM] Better OR's for MVE compares
This adds a DeMorgan combine for OR's of compares to turn them into AND's,
helping prevent them from going into and out of gpr registers. It also fills in
the VCLE and VCLT nodes that MVE can select, allowing it to invert more
compares.

Differential Revision: https://reviews.llvm.org/D65059

llvm-svn: 366920
2019-07-24 16:42:09 +00:00
Francis Visoiu Mistrih c5cc9efa07 [Remarks] Simplify the creation of remark serializers
Introduce two new functions to create a serializer, and add support for
more combinations to the YAMLStrTabSerializer.

llvm-svn: 366919
2019-07-24 16:36:35 +00:00
Stanislav Mekhanoshin 5cdacea297 [AMDGPU] Add all vgpr classes to asm parser
Differential Revision: https://reviews.llvm.org/D65158

llvm-svn: 366917
2019-07-24 16:21:18 +00:00
Matt Arsenault 0e7d8698b5 AMDGPU/GlobalISel: Don't assume instruction can be erased when selecting exts
The G_ANYEXT handling can end up reaching selectCOPY, which mutates
the instruction in place.

llvm-svn: 366915
2019-07-24 16:05:53 +00:00
Sanjay Patel 10dad95a75 [SDAG] convert (sub x, 1) to (add x, -1) in ctpop expansion; NFC
We canonicalize to the add form, so create that directly for efficiency.

llvm-svn: 366914
2019-07-24 15:43:50 +00:00
Anton Afanasyev 4fdcabf259 [Support] Fix `-ftime-trace-granularity` option
Summary:
Move `-ftime-trace-granularity` option to frontend options. Without patch
this option is showed up in the help for any tool that links libSupport.

Reviewers: sammccall

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65202

llvm-svn: 366911
2019-07-24 14:55:40 +00:00
David Green 69fba7434e [ARM] Better AND's for MVE compares
Add a number of folds to convert and(vcmp, vcmp) into a single VPT block, where
the second vcmp becomes predicated on the first.

The VCMP; VPST; VCMP will eventually be converted to VPT; VCMP in the
VPTBlockPass.

Differential Revision: https://reviews.llvm.org/D65058

llvm-svn: 366910
2019-07-24 14:42:05 +00:00
David Green 4fc78c496e [ARM] MVE floating point compares and selects
Much like integers, this adds MVE floating point compares and select. It
requires a lot more buildvector/shuffle code because we may need to expand the
compares without mve.fp, and requires support for and/or because of the way we
lower llvm condition codes.

Some original code by David Sherwood

Differential Revision: https://reviews.llvm.org/D65054

llvm-svn: 366909
2019-07-24 14:28:22 +00:00
David Green a4a4698c16 [ARM] Basic And/Or/Xor handling for MVE predicates
This adds some basic, "worst case" handling for MVE predicate Or/And/Xor. It
does this by going into and out of GPRs, doing the operation on scalars.

Code by David Sherwood.

Differential Revision: https://reviews.llvm.org/D65053

llvm-svn: 366907
2019-07-24 14:17:54 +00:00
Simi Pallipurath 724888af45 [ARM] Make sure that the constant pool does not keep in the middle of an IT block.
This change make sure that llvm does not emit an invalid IT block
by putting the constant pool in the middle of an IT block.

We have code to try to avoid putting a constant island in the middle of an
IT block, but it only works if we see an IT between the one currently
referencing CPE and possible insertion point. If the first instruction
we look at is the VLDRD after the IT , we never see the IT and does not
realize that the instruction doing the load could be in an IT block itself.

Differential Revision: https://reviews.llvm.org/D64621

Change-Id: I24cecb37cded75e8992870bd997f6226853bd920
llvm-svn: 366905
2019-07-24 13:54:14 +00:00
Sjoerd Meijer a19f5a76e6 Test commit. NFC.
Removed 2 trailing whitespaces in 2 files that used to be in different
repos to test my new github monorepo workflow.

llvm-svn: 366904
2019-07-24 13:30:36 +00:00
Jay Foad 565c54320e [InstSimplify] Rename SimplifyFPUnOp and SimplifyFPBinOp
Summary:
SimplifyFPBinOp is a variant of SimplifyBinOp that lets you specify
fast math flags, but the name is misleading because both functions
can simplify both FP and non-FP ops. Instead, overload SimplifyBinOp
so that you can optionally specify fast math flags.

Likewise for SimplifyFPUnOp.

Reviewers: spatel

Reviewed By: spatel

Subscribers: xbolva00, cameron.mcinally, eraman, hiraditya, haicheng, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64902

llvm-svn: 366902
2019-07-24 12:50:10 +00:00
Thomas Preud'homme 4cd9b853b5 FileCheck [8/12]: Define numeric var from expr
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch lift the restriction for a
numeric expression to either be a variable definition or a numeric
expression to try to match.

This commit allows a numeric variable to be set to the result of the
evaluation of a numeric expression after it has been matched
successfully. When it happens, the variable is allowed to be used on
the same line since its value is known at match time.

It also makes use of this possibility to reuse the parsing code to
parse a command-line definition by crafting a mirror string of the
-D option with the equal sign replaced by a colon sign, e.g. for option
'-D#NUMVAL=10' it creates the string
'-D#NUMVAL=10 (parsed as [[#NUMVAL:10]])' where the numeric expression
is parsed to define NUMVAL. This result in a few tests needing updating
for the location diagnostics on top of the tests for the new feature.

It also enables empty numeric expression which match any number without
defining a variable. This is done here rather than in commit #5 of the
patch series because it requires to dissociate automatic regex insertion
in RegExStr from variable definition which would make commit #5 even
bigger than it already is.

Copyright:
    - Linaro (changes up to diff 183612 of revision D55940)
    - GraphCore (changes in later versions of revision D55940 and
                 in new revision created off D55940)

Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk

Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60388

> llvm-svn: 366860

llvm-svn: 366897
2019-07-24 12:38:22 +00:00
David Green c7e55d4f52 [ARM] MVE predicate register support
This adds support code for building and shuffling i1 predicate registers. It
generally uses two basic principles, either converting the predicate into an
scalar (through a PREDICATE_CAST) and doing scalar operations on it there, or
by converting the register to an full vector register and back.

Some of the code here is a not super efficient but will hopefully cover most
cases of moving i1 vectors around and can be improved in subsequent patches.

Some code by David Sherwood.

Differential Revision: https://reviews.llvm.org/D65052

llvm-svn: 366890
2019-07-24 11:51:36 +00:00
Igor Kudrin 3daefb0744 [DWARF][NFC] Add constants for reserved values of an initial length field.
Differential Revision: https://reviews.llvm.org/D65039

llvm-svn: 366887
2019-07-24 11:34:29 +00:00
David Green b9d96ceca0 [ARM] MVE integer compares and selects
This adds the very basics for MVE vector predication, adding integer VCMP and
VSEL instruction support. This is done through predicate registers (MVT::v16i1,
MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom
lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc).

An extra VCNE was added, as this can be handled sensibly by MVE's expanded
number of VCMP condition codes. (There are also VCLE and VCLT which are added
later).

VPSEL is also added here, simply selecting on the vselect.

Original code by David Sherwood.

Differential Revision: https://reviews.llvm.org/D65051

llvm-svn: 366885
2019-07-24 11:08:14 +00:00
Sam Parker aeb21b96a0 [ARM][ParallelDSP] Fix pointer operand reordering
While combining two loads into a single load, we often need to
reorder the pointer operands for the new load. This reordering was
broken in the cases where there was a chain of values that built up
the pointer.

Differential Revision: https://reviews.llvm.org/D65193

llvm-svn: 366881
2019-07-24 09:38:39 +00:00
Haojian Wu 509ad30d85 [Remark] Suppress the "-Wreturn-type" compiler warning, NFC
llvm-svn: 366874
2019-07-24 07:55:01 +00:00
Thomas Preud'homme 5ecb880241 Revert "FileCheck [8/12]: Define numeric var from expr"
This reverts commit 1b05977538.

llvm-svn: 366872
2019-07-24 07:32:34 +00:00
Chen Zheng 8b7e82be12 [PowerPC][NFC] use opcode instead of MachineInstr for instrHasImmForm().
llvm-svn: 366867
2019-07-24 04:50:23 +00:00
Fangrui Song 305ace7cc8 [AArch64] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after r366857
llvm-svn: 366866
2019-07-24 01:59:44 +00:00
Petr Hosek 8b161bacf4 [SafeStack] Insert the deref before remaining elements
This is a follow up to D64971. While we need to insert the deref after
the offset, it needs to come before the remaining elements in the
original expression since the deref needs to happen before the LLVM
fragment if present.

Differential Revision: https://reviews.llvm.org/D65172

llvm-svn: 366865
2019-07-24 00:16:23 +00:00
Francis Visoiu Mistrih 4287c95b08 [Remarks] String tables should be move-only
Copying them is expensive. This allows the tables to be moved around at
lower cost, and allows a remarks::StringTable to be constructed from
a remarks::ParsedStringTable.

llvm-svn: 366864
2019-07-23 22:50:08 +00:00
Thomas Preud'homme 1b05977538 FileCheck [8/12]: Define numeric var from expr
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch lift the restriction for a
numeric expression to either be a variable definition or a numeric
expression to try to match.

This commit allows a numeric variable to be set to the result of the
evaluation of a numeric expression after it has been matched
successfully. When it happens, the variable is allowed to be used on
the same line since its value is known at match time.

It also makes use of this possibility to reuse the parsing code to
parse a command-line definition by crafting a mirror string of the
-D option with the equal sign replaced by a colon sign, e.g. for option
'-D#NUMVAL=10' it creates the string
'-D#NUMVAL=10 (parsed as [[#NUMVAL:10]])' where the numeric expression
is parsed to define NUMVAL. This result in a few tests needing updating
for the location diagnostics on top of the tests for the new feature.

It also enables empty numeric expression which match any number without
defining a variable. This is done here rather than in commit #5 of the
patch series because it requires to dissociate automatic regex insertion
in RegExStr from variable definition which would make commit #5 even
bigger than it already is.

Copyright:
    - Linaro (changes up to diff 183612 of revision D55940)
    - GraphCore (changes in later versions of revision D55940 and
                 in new revision created off D55940)

Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk

Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60388

llvm-svn: 366860
2019-07-23 22:41:38 +00:00
Jonas Devlieghere f8552e67e9 [DWARF] Use 32-bit format specifier for offset
This should fix PR42730.

llvm-svn: 366859
2019-07-23 22:34:21 +00:00
Amara Emerson 511f7f5785 [AArch64][GlobalISel] Add support for s128 loads, stores, extracts, truncs.
We need to be able to load and store s128 for memcpy inlining, where we want to
generate Q register mem ops. Making these legal also requires that we add some
support in other instructions. Regbankselect should also know about these since
they have no GPR register class that can hold them, so need special handling to
live on the FPR bank.

Differential Revision: https://reviews.llvm.org/D65166

llvm-svn: 366857
2019-07-23 22:05:13 +00:00
Simon Pilgrim 78b1e777f5 Fix "control reaches end of non-void function" warning. NFCI.
llvm-svn: 366856
2019-07-23 21:59:48 +00:00
Jessica Paquette a2fae1e3e9 [GlobalISel][AArch64] Save a copy on G_SELECT by fixing condition to GPR
The condition can never be fed by FPRs, so it should always be on a GPR.

Differential Revision: https://reviews.llvm.org/D65157

llvm-svn: 366854
2019-07-23 21:39:50 +00:00
Eli Friedman b27fc95e89 [ARM] Add opt-bisect support to ARMParallelDSP.
llvm-svn: 366851
2019-07-23 20:48:46 +00:00
Francis Visoiu Mistrih c5b5cc4575 [Remarks] Introduce a new format: yaml-strtab
This exposes better support to use a string table with a format through
an actual new remark::Format, called yaml-strtab.

This can now be used with -fsave-optimization-record=yaml-strtab.

llvm-svn: 366849
2019-07-23 20:42:46 +00:00
Francis Visoiu Mistrih cbbdc41838 [Remarks][NFC] Move the YAML serializer to its own header
llvm-svn: 366842
2019-07-23 19:28:03 +00:00
Yi-Hong Lyu 41a010a4ef [PowerPC] Remove redundant load immediate instructions
Currently PowerPC backend emits code like this:

  r3 = li 0
  std r3, 264(r1)
  r3 = li 0
  std r3, 272(r1)

This patch fixes that and other cases where a register already contains a value that is loaded so we will get:

  r3 = li 0
  std r3, 264(r1)
  std r3, 272(r1)

Differential Revision: https://reviews.llvm.org/D64220

llvm-svn: 366840
2019-07-23 19:11:07 +00:00
Craig Topper 76bc3d6e07 [X86] In lowerVectorShuffle, instead of creating a new node to canonicalize the shuffle mask by commuting, just commute the mask and swap V1/V2.
LegalizeDAG tries to legal the DAG by legalizing nodes before
their operands.

If we create a new node, we end up legalizing it after its operands.
This prevents some of the optimizations that can be done when the
operand is a build_vector since the build_vector will have been
legalized to something else.

Differential Revision: https://reviews.llvm.org/D65132

llvm-svn: 366835
2019-07-23 18:46:15 +00:00
Francis Visoiu Mistrih 78c92d2ec3 [Remarks] Add unit tests for YAML serialization
Add tests for both the string table and non string table case.

llvm-svn: 366832
2019-07-23 18:09:12 +00:00
Philip Reames ea5c94b497 [IndVars] Fix a subtle bug in optimizeLoopExits
The original code failed to account for the fact that one exit can have a pointer exit count without all of them having pointer exit counts.  This could cause two separate bugs:
1) We might exit the loop early, and leave optimizations undone.  This is what triggered the assertion failure in the reported test case.
2) We might optimize one exit, then exit without indicating a change.  This could result in an analysis invalidaton bug if no other transform is done by the rest of indvars.

Note that the pointer exit counts are a really fragile concept.  They show up only when we have a pointer IV w/o a datalayout to provide their size.  It's really questionable to me whether the complexity implied is worth it.

llvm-svn: 366829
2019-07-23 17:45:11 +00:00
Ryan Taylor 6f13637a3e [IR][Verifier] Allow IntToPtrInst to be !dereferenceable
Summary:
Allow IntToPtrInst to carry !dereferenceable metadata tag.
This is valid since !dereferenceable can be only be applied to
pointer type values.

Change-Id: If8a6e3c616f073d51eaff52ab74535c29ed497b4

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64954

llvm-svn: 366826
2019-07-23 17:19:56 +00:00
Jessica Paquette 2b404d01e8 [GlobalISel][AArch64] Teach GISel to handle shifts in load addressing modes
When we select the XRO variants of loads, we can pull in very specific shifts
(of the size of an element). E.g.

```
ldr x1, [x2, x3, lsl #3]
```

This teaches GISel to handle these when they're coming from shifts
specifically.

This adds a new addressing mode function, `selectAddrModeShiftedExtendXReg`
which recognizes this pattern.

This also packs this up with `selectAddrModeRegisterOffset` into
`selectAddrModeXRO`. This is intended to be equivalent to `selectAddrModeXRO`
in AArch64ISelDAGtoDAG.

Also update load-addressing-modes to show that all of the cases here work.

Differential Revision: https://reviews.llvm.org/D65119

llvm-svn: 366819
2019-07-23 16:09:42 +00:00
Simon Pilgrim 0e8359aec1 [TargetLowering] SimplifyMultipleUseDemandedBits - add VECTOR_SHUFFLE support.
If all the demanded elts are from one operand and are inline, then we can use the operand directly.

The changes are mainly from SSE41 targets which has blendvpd but not cmpgtq, allowing the v2i64 comparison to be simplified as we only need the signbit from alternate v4i32 elements.

llvm-svn: 366817
2019-07-23 15:35:55 +00:00
Owen Reynolds 24f3e102a6 [llvm-ar] Fix support for archives with members larger than 4GB
llvm-ar outputs a strange error message when handling archives with
members larger than 4GB due to not checking file size when passing the
value as an unsigned 32 bit integer. This overflow issue caused
malformed archives to be created.:

https://bugs.llvm.org/show_bug.cgi?id=38058

This change allows for members above 4GB and will error in a case that
is over the formats size limit, a 10 digit decimal integer.

Differential Revision: https://reviews.llvm.org/D65093

llvm-svn: 366813
2019-07-23 14:44:21 +00:00
Sam Parker 57e87dd81b [ARM][LowOverheadLoops] Fix branch target codegen
While lowering test.set.loop.iterations, it wasn't checked how the
brcond was using the result and so the wls could branch to the loop
preheader instead of not entering it. The same was true for
loop.decrement.reg.
    
So brcond and br_cc and now lowered manually when using the hwloop
intrinsics. During this we now check whether the result has been
negated and whether we're using SETEQ or SETNE and 0 or 1. We can
then figure out which basic block the WLS and LE should be targeting.

Differential Revision: https://reviews.llvm.org/D64616

llvm-svn: 366809
2019-07-23 14:08:46 +00:00
Simon Pilgrim c60c12fb10 Fix MSVC warning about extending a uint32_t shift result to uint64_t. NFCI.
llvm-svn: 366808
2019-07-23 14:04:54 +00:00
Simon Pilgrim 5d4bb8628c [SLPVectorizer] Revert local change that got accidently got committed in rL366799
This wasn't part of D63281

llvm-svn: 366807
2019-07-23 13:42:01 +00:00
Simon Pilgrim 743d45ee25 [TargetLowering] Add SimplifyMultipleUseDemandedBits
This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured.

The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use.

We do see a couple of regressions that need to be addressed:

    AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better).
	
    X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG.

The code owners have confirmed its ok for these cases to fixed up in future patches.

Differential Revision: https://reviews.llvm.org/D63281

llvm-svn: 366799
2019-07-23 12:39:08 +00:00
Simon Pilgrim 87adcf8c47 [SLPVectorizer] Remove null-pointer test. NFCI.
cast<CallInst> shouldn't return null and we dereference the pointer in a lot of other places, causing both MSVC + cppcheck to warn about dereferenced null pointers

llvm-svn: 366793
2019-07-23 10:51:43 +00:00
David Green fdedf240f8 [ARM] Rename NEONModImm to VMOVModImm. NFC
Rename NEONModImm to VMOVModImm as it is used in both NEON and MVE.

llvm-svn: 366790
2019-07-23 09:19:24 +00:00
Hideto Ueno 9f5d80d79c [Attributor][NFC] Re-run clang-format on the Attributor.cpp
llvm-svn: 366789
2019-07-23 08:29:22 +00:00
Hideto Ueno 19c07afe17 [Attributor] Deduce "dereferenceable" attribute
Summary:
Deduce dereferenceable attribute in Attributor.

These will be added in a later patch.
* dereferenceable(_or_null)_globally (D61652)
* Deduction based on load instruction (similar to D64258)

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64876

llvm-svn: 366788
2019-07-23 08:16:17 +00:00
Craig Topper a658cb0b12 [DAGCombiner] Make ShrinkLoadReplaceStoreWithStore return an SDValue instead of an SDNode*. NFCI
The function was calling getNode() on an SDValue to return and the
caller turned the result back into a SDValue. So just return the
original SDValue to avoid this.

llvm-svn: 366779
2019-07-23 05:13:39 +00:00
Craig Topper f5247244f2 [DAGCombiner] Use SDNode::isOperandOf to simplify some code. NFCI
llvm-svn: 366778
2019-07-23 05:13:35 +00:00
Robert Widmann fcf3c55a8c [LLVM-C] Improve Bindings to The Internalize Pass
Summary: Adds a binding to the internalize pass that allows the caller to pass a function pointer that acts as the visibility-preservation predicate.  Previously, one could only pass an unsigned value (not LLVMBool?) that directed the pass to consider "main" or not.

Reviewers: whitequark, deadalnix, harlanhaskins

Reviewed By: whitequark, harlanhaskins

Subscribers: kren1, hiraditya, llvm-commits, harlanhaskins

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62456

llvm-svn: 366777
2019-07-23 04:56:44 +00:00
Zi Xuan Wu 57d17ec2e1 [PowerPC] Replace float load/store pair with integer load/store pair when it's only used in load/store
Replace float load/store pair with integer load/store pair when it's only used in load/store,
because float load/store instructions cost more cycles then integer load/store.

A typical scenario is when there is a call with more than 13 float arguments passing, we need pass them by stack.
So we need a load/store pair to do such memory operation if the variable is global variable.

Differential Revision: https://reviews.llvm.org/D64195

llvm-svn: 366775
2019-07-23 03:34:40 +00:00
Richard Trieu 3a52c3857f Inline function call into assert to fix unused variable warning.
llvm-svn: 366774
2019-07-23 03:10:06 +00:00
Richard Trieu 81a5045cd6 Move variable out from debug only section.
MFI is no longer just needed for an assert.  Move it out of the debug only
section to allow non-assert builds to be able to find it.

llvm-svn: 366773
2019-07-23 02:59:15 +00:00
Stefan Stipanovic 6058b86373 Fixing build error from commit 95cbc3d
[Attributor] Liveness analysis.

Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.

Reviewers: jdoerfert, uenoku

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D64162

llvm-svn: 366769
2019-07-22 23:58:23 +00:00
Philip Reames 2f5543aa72 [Statepoints] Fix a bug in statepoint lowering for functions w/no-realign-stack
We were silently using the ABI alignment for all of the stores generated for deopt and gc values.  We'd gotten the alignment of the stack slot itself properly reduced (via MachineFrameInfo's clamping), but having the MMO on the store incorrect was enough for us to generate an aligned store to a unaligned location.

The simplest fix would have been to just pass the alignment to the helper function, but once we do that, the helper function doesn't really help.  So, inline it and directly call the MMO version of DAG.getStore with a properly constructed MMO.

Note that there's a separate performance possibility here.  Even if we *can* realign stacks, we probably don't *want to* if all of the stores are in slowpaths.  But that's a later patch, if at all.  :)

llvm-svn: 366765
2019-07-22 23:33:18 +00:00
Jonas Devlieghere 0e7ba06e82 [DWARF] Add more error handling to debug line parser.
This patch exnteds the error handling in the debug line parser to get
rid of the existing MD5 assertion. I want to reuse the debug line parser
from LLVM in LLDB where we cannot crash on invalid input.

Differential revision: https://reviews.llvm.org/D64544

llvm-svn: 366762
2019-07-22 23:23:34 +00:00
Stefan Stipanovic 5a9ba27c71 Revert "Fixing build error from commit 9285295."
This reverts commit 95cbc3da88.

llvm-svn: 366759
2019-07-22 22:55:05 +00:00
Peter Collingbourne 710605c085 Analysis: Don't look through aliases when simplifying GEPs.
It is not safe in general to replace an alias in a GEP with its aliasee
if the alias can be replaced with another definition (i.e. via strong/weak
resolution (linkonce_odr) or via symbol interposition (default visibility
in ELF)) while the aliasee cannot. An example of how this can go wrong is
in the included test case.

I was concerned that this might be a load-bearing misoptimization (it's
possible for us to use aliases to share vtables between base and derived
classes, and on Windows, vtable symbols will always be aliases in RTTI
mode, so this change could theoretically inhibit trivial devirtualization
in some cases), so I built Chromium for Linux and Windows with and without
this change. The file sizes of the resulting binaries were identical, so it
doesn't look like this is going to be a problem.

Differential Revision: https://reviews.llvm.org/D65118

llvm-svn: 366754
2019-07-22 22:13:46 +00:00
Stefan Stipanovic 95cbc3da88 Fixing build error from commit 9285295.
[Attributor] Liveness analysis.

Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.

Reviewers: jdoerfert, uenoku

Subscribers: hiraditya, llvm-commits

Differential revision: https://reviews.llvm.org/D64162

llvm-svn: 366753
2019-07-22 22:10:59 +00:00
Roman Lebedev 3a94765bfc [NFC][PatternMatch] Refactor code into a proper "matcher for any integral constant"
Having it as a proper matcher is better for reusability elsewhere
(in a follow-up patch.)

llvm-svn: 366752
2019-07-22 22:09:24 +00:00
Matt Arsenault 827427f65b AMDGPU: Don't use SDNodeXForm for DS offset output
The xform has no real valuewhen it's using out of a complex pattern
output. The complex pattern was already creating TargetConstants with
i16, so this was just unnecessary machinery.

This allows global isel to import the simple cases once the complex
pattern is implemented.

llvm-svn: 366743
2019-07-22 21:38:11 +00:00
Eric Christopher 77dc6d2479 Temporarily Revert "[Attributor] Liveness analysis." as it's breaking the build.
This reverts commit 9285295f75.

llvm-svn: 366737
2019-07-22 21:04:23 +00:00
Stefan Stipanovic 9285295f75 [Attributor] Liveness analysis.
Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.

Reviewers: jdoerfert, uenoku

Subscribers: hiraditya, llvm-commits

Differential revision: https://reviews.llvm.org/D64162

llvm-svn: 366736
2019-07-22 20:54:30 +00:00
Craig Topper 510e6fadaa [X86] When using AND+PACKUS in lowerV16I8Shuffle, generate the build vector directly in v16i8 with the correct 0x00 or 0xFF elements rather than using another VT and bitcasting it.
The build_vector will become a constant pool load. By using the
desired type initially, it ensures we don't generate a bitcast
of the constant pool load which will need to be folded with
the load.

While experimenting with another patch, I noticed that when the
load type and the constant pool type don't match, then
SimplifyDemandedBits can't handle it. While we should probably
fix that, this was a simple way to fix the issue I saw.

llvm-svn: 366732
2019-07-22 19:58:49 +00:00
Jason Liu 8dd563ef4b [NFC][PowerPC]Change ADDIStocHA to ADDIStocHA8 to follow 64-bit naming convention
Summary:

Since we are planning to add ADDIStocHA for 32bit in later patch, we decided
 to change 64bit one first to follow naming convention with 8 behind opcode.

Patch by: Xiangling_L

Differential Revision: https://reviews.llvm.org/D64814

llvm-svn: 366731
2019-07-22 19:55:33 +00:00
Stefan Stipanovic 69ebb02001 [Attributor] NoAlias on return values.
Porting function return value attribute noalias to attributor.
This will be followed with a patch for callsite and function argumets.

Reviewers: jdoerfert

Subscribers: lebedev.ri, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D63067

llvm-svn: 366728
2019-07-22 19:36:27 +00:00
Sean Fertile 942537d9fa Stubs out TLOF for AIX and add support for common vars in assembly output.
Stubs out a TargetLoweringObjectFileXCOFF class, implementing only
SelectSectionForGlobal for common symbols. Also adds an override of
EmitGlobalVariable in PPCAIXAsmPrinter which adds a number of defensive errors
and adds support for emitting common globals.

llvm-svn: 366727
2019-07-22 19:15:29 +00:00
Petr Hosek f6cd6ffbc9 [SafeStack] Insert the deref after the offset
While debugging code that uses SafeStack, we've noticed that LLVM
produces an invalid DWARF. Concretely, in the following example:

  int main(int argc, char* argv[]) {
    std::string value = "";
    printf("%s\n", value.c_str());
    return 0;
  }

DWARF would describe the value variable as being located at:

  DW_OP_breg14 R14+0, DW_OP_deref, DW_OP_constu 0x20, DW_OP_minus

The assembly to get this variable is:

  leaq    -32(%r14), %rbx

The order of operations in the DWARF symbols is incorrect in this case.
Specifically, the deref is incorrect; this appears to be incorrectly
re-inserted in repalceOneDbgValueForAlloca.

With this change which inserts the deref after the offset instead of
before it, LLVM produces correct DWARF:

  DW_OP_breg14 R14-32

Differential Revision: https://reviews.llvm.org/D64971

llvm-svn: 366726
2019-07-22 18:52:42 +00:00
Peter Collingbourne ef5cfc2dae WholeProgramDevirt: Teach the pass to respect the global's alignment.
The bytes inserted before an overaligned global need to be padded according
to the alignment set on the original global in order for the initializer
to meet the global's alignment requirements. The previous implementation
that padded to the pointer width happened to be correct for vtables on most
platforms but may do the wrong thing if the vtable has a larger alignment.

This issue is visible with a prototype implementation of HWASAN for globals,
which will overalign all globals including vtables to 16 bytes.

There is also no padding requirement for the bytes inserted after the global
because they are never read from nor are they significant for alignment
purposes, so stop inserting padding there.

Differential Revision: https://reviews.llvm.org/D65031

llvm-svn: 366725
2019-07-22 18:50:45 +00:00
Sean Fertile 324d33dd4e [PowerPC] Fix comment on MO_PLT Target Operand Flag. [NFC]
Patch by Xiangling Liao.

llvm-svn: 366724
2019-07-22 18:47:59 +00:00
Sean Fertile 8034daca5f [Object][XCOFF] Remove extra includes from XCOFF related files. [NFC]
Differential Revision: https://reviews.llvm.org/D60885

llvm-svn: 366723
2019-07-22 18:47:55 +00:00
Peter Collingbourne c3b8661df5 LowerTypeTests: Teach the pass to respect global alignments.
We were previously ignoring alignment entirely when combining globals
together in this pass. There are two main things that we need to do here:
add additional padding before each global to meet the alignment requirements,
and set the combined global's alignment to the maximum of all of the original
globals' alignments.

Since we now need to calculate layout as we go anyway, use the calculated
layout to produce GlobalLayout instead of using StructLayout.

Differential Revision: https://reviews.llvm.org/D65033

llvm-svn: 366722
2019-07-22 18:47:03 +00:00
Nilanjana Basu 06b8fe8d03 Changes to emit CodeView debug info nested type records properly using MCStreamer directives
llvm-svn: 366720
2019-07-22 18:22:55 +00:00
Simon Pilgrim 3ebd2fe91a [SLPVectorizer] Fix some MSVC/cppcheck uninitialized variable warnings. NFCI.
llvm-svn: 366712
2019-07-22 17:57:36 +00:00
Vlad Tsyrklevich 5874a28ac5 Revert "Reland [ELF] Loose a condition for relocation with a symbol"
This reverts commit r366686 as it appears to be causing buildbot
failures on sanitizer-x86_64-linux-android and sanitizer-x86_64-linux.

llvm-svn: 366708
2019-07-22 17:48:53 +00:00
Matt Arsenault 542720b2bc TableGen: Support physical register inputs > 255
This was truncating register value that didn't fit in unsigned char.
Switch AMDGPU sendmsg intrinsics to using a tablegen pattern.

llvm-svn: 366695
2019-07-22 15:02:34 +00:00
Sam Parker 4379a40088 [ARM][LowOverheadLoops] Revert remaining pseudos
ARMLowOverheadLoops would assert a failure if it did not find all the
pseudo instructions that comprise the hardware loop. Instead of doing
this, iterate through all the instructions of the function and revert
any remaining pseudo instructions that haven't been converted.

Differential Revision: https://reviews.llvm.org/D65080

llvm-svn: 366691
2019-07-22 14:16:40 +00:00
Nikola Prica 0166cff09b Reland [ELF] Loose a condition for relocation with a symbol
This patch was not the reason of the buildbot failure.

Deleted code was introduced as a work around for a bug in the gold linker
(http://sourceware.org/PR16794). Test case that was given as a reason for
this part of code, the one on previous link, now works for the gold.
This condition is too strict and when a code is compiled with debug info
it forces generation of numerous relocations with symbol for architectures
that do not have relocation addend.

Reviewers: arsenm, espindola

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D64327

llvm-svn: 366686
2019-07-22 13:07:01 +00:00
Matt Arsenault 937d0ee5d8 AMDGPU/GlobalISel: Remove unnecessary code
The minnum/maxnum case are dead, and the cvt is handled by the
default.

llvm-svn: 366685
2019-07-22 13:05:25 +00:00
David Green 8876a312a8 [ARM] Fix for MVE VPT block pass
We need to ensure that the number of T's is correct when adding multiple
instructions into the same VPT block.

Differential revision: https://reviews.llvm.org/D65049

llvm-svn: 366684
2019-07-22 12:51:38 +00:00
Simon Pilgrim b3d719e1cf [X86] EltsFromConsecutiveLoads - support common source loads (REAPPLIED)
This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load.

A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match.

Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle.

Fixed out of bounds load assert identified in rL366501

Differential Revision: https://reviews.llvm.org/D64551

llvm-svn: 366681
2019-07-22 12:44:10 +00:00
Christudasan Devadasan 006cf8c03d Added address-space mangling for stack related intrinsics
Modified the following 3 intrinsics:
int_addressofreturnaddress,
int_frameaddress & int_sponentry.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D64561

llvm-svn: 366679
2019-07-22 12:42:48 +00:00
Oliver Stannard 6771a89fa0 [IPRA][ARM] Make use of the "returned" parameter attribute
ARM has code to recognise uses of the "returned" function parameter
attribute which guarantee that the value passed to the function in r0
will be returned in r0 unmodified. IPRA replaces the regmask on call
instructions, so needs to be told about this to avoid reverting the
optimisation.

Differential revision: https://reviews.llvm.org/D64986

llvm-svn: 366669
2019-07-22 08:44:36 +00:00
Jay Foad 298500ae33 [AMDGPU] Save some work when an atomic op has no uses
Summary:
In the atomic optimizer, save doing a bunch of work and generating a
bunch of dead IR in the fairly common case where the result of an
atomic op (i.e. the value that was in memory before the atomic op was
performed) is not used. NFC.

Reviewers: arsenm, dstuttard, tpr

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64981

llvm-svn: 366667
2019-07-22 07:19:44 +00:00
Serguei Katkov c6c31da867 [Loop Peeling] Fix the handling of branch weights of peeled off branches.
Current algorithm to update branch weights of latch block and its copies is
based on the assumption that number of peeling iterations is approximately equal
to trip count.

However it is not correct. According to profitability check in one case we can decide to peel
in case it helps to reduce the number of phi nodes. In this case the number of peeled iteration
can be less then estimated trip count.

This patch introduces another way to set the branch weights to peeled of branches.
Let F is a weight of the edge from latch to header.
Let E is a weight of the edge from latch to exit.
F/(F+E) is a probability to go to loop and E/(F+E) is a probability to go to exit.
Then, Estimated TripCount = F / E.
For I-th (counting from 0) peeled off iteration we set the the weights for
the peeled latch as (TC - I, 1). It gives us reasonable distribution,
The probability to go to exit 1/(TC-I) increases. At the same time
the estimated trip count of remaining loop reduces by I.

As a result after peeling off N iteration the weights will be
(F - N * E, E) and trip count of loop becomes
F / E - N or TC - N.

The idea is taken from the review of the patch D63918 proposed by Philip.

Reviewers: reames, mkuper, iajbar, fhahn
Reviewed By: reames
Subscribers: hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D64235

llvm-svn: 366665
2019-07-22 05:15:34 +00:00
Simon Pilgrim 86fa3270ef [X86] SimplifyDemandedVectorEltsForTargetNode - Move SUBV_BROADCAST narrowing handling. NFCI.
Move the narrowing of SUBV_BROADCAST to where we handle all the other opcodes.

llvm-svn: 366660
2019-07-21 19:04:44 +00:00
Craig Topper e6cd20ba53 [InstCombine] Update comment I missed in r366649. NFC
llvm-svn: 366658
2019-07-21 16:15:03 +00:00
Aditya Nandakumar d7504a1569 [GISel]: Attach missing range metadata while translating G_LOADs
https://reviews.llvm.org/D65048

Attach range information to G_LOAD when only defining one register.

reviewed by: arsenm

llvm-svn: 366656
2019-07-21 14:07:54 +00:00
Craig Topper 1d149d08d3 [InstCombine] Remove insertRangeTest code that handles the equality case.
For equality, the function called getTrue/getFalse with the VT
of the comparison input. But getTrue/getFalse need the boolean VT.
So if this code ever executed, it would assert.

I believe these cases are removed by InstSimplify so we don't get here.

So this patch just fixes up an assert to exclude the equality
possibility and removes the broken code.

llvm-svn: 366649
2019-07-21 06:43:38 +00:00
Craig Topper 8fabdfe9fc [InstCombine] Don't use AddOne/SubOne to see if two APInts are 1 apart. Use APInt operations instead. NFCI
AddOne/SubOne create new Constant objects. That seems heavy for
comparing ConstantInts which wrap APInts. Just do the math on
on the APInts and compare them.

llvm-svn: 366648
2019-07-21 05:26:05 +00:00
Roman Lebedev cd9b19484b [Codegen][SelectionDAG] X u% C == 0 fold: non-splat vector improvements
Summary:
Four things here:
1. Generalize the fold to handle non-splat divisors. Reasonably trivial.
2. Unban power-of-two divisors. I don't see any reason why they should
   be illegal.
   * There is no ban in Hacker's Delight
   * I think the ban came from the same bug that caused the miscompile
      in the base patch - in `floor((2^W - 1) / D)` we were dividing by
      `D0` instead of `D`, and we **were** ensuring that `D0` is not `1`,
      which made sense.
3. Unban `1` divisors. I no longer believe Hacker's Delight actually says
   that the fold is invalid for `D = 0`. Further considerations:
   * We know that
     * `(X u% 1) == 0`  can be constant-folded to `1`,
     * `(X u% 1) != 0`  can be constant-folded to `0`,
   *  Also, we know that
     * `X u<= -1` can be constant-folded to `1`,
     * `X u>  -1` can be constant-folded to `0`,
   * https://godbolt.org/z/7jnZJX https://rise4fun.com/Alive/oF6p
   * We know will end up with the following:
       `(setule/setugt (rotr (mul N, P), K), Q)`
   * Therefore, for given new DAG nodes and comparison predicates
     (`ule`/`ugt`), we will still produce the correct answer if:
     `Q` is a all-ones constant; and both `P` and `K` are *anything*
     other than `undef`.
   * The fold will indeed produce `Q = all-ones`.
4. Try to re-splat the `P` and `K` vectors - we don't care about
   their values for the lanes where divisor was `1`.

Reviewers: RKSimon, hermord, craig.topper, spatel, xbolva00

Reviewed By: RKSimon

Subscribers: hiraditya, javed.absar, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63963

llvm-svn: 366637
2019-07-20 16:33:15 +00:00
Simon Pilgrim adec0f2252 [X86][SSE] Use PSADBW to improve vXi8 sum reduction (PR42674)
As detailed on PR42674, we can reduce a vXi8 down until we have the final <8 x i8>, and then use PSADBW with zero, to sum those values. We then extract the bottom i8, discarding any overflow from the upper bits of the i16 result.

llvm-svn: 366636
2019-07-20 15:20:11 +00:00
Florian Hahn 0a7faa4e3d [Local] Zap blockaddress without users in ConstantFoldTerminator.
If the blockaddress is not destoryed, the destination block will still
be marked as having its address taken, limiting further transformations.

I think there are other places where the dead blockaddress constants are kept
around, I'll look into that as follow up.

Reviewers: craig.topper, brzycki, davide

Reviewed By: brzycki, davide

Differential Revision: https://reviews.llvm.org/D64936

llvm-svn: 366633
2019-07-20 12:25:47 +00:00
Jessica Paquette 41affad967 [GlobalISel][AArch64] Contract trivial same-size cross-bank copies into G_STOREs
Sometimes, you can end up with cross-bank copies between same-sized GPRs and
FPRs, which feed into G_STOREs. When these copies feed only into stores, they
aren't necessary; we can just store using the original register bank.

This provides some minor code size savings for some floating point SPEC
benchmarks. (Around 0.2% for 453.povray and 450.soplex)

This issue doesn't seem to show up due to regbankselect or anything similar. So,
this patch introduces an early select function, `contractCrossBankCopyIntoStore`
which performs the contraction when possible. The selector then continues
normally and selects the correct store opcode, eliminating needless copies
along the way.

Differential Revision: https://reviews.llvm.org/D65024

llvm-svn: 366625
2019-07-20 01:55:35 +00:00
Guanzhong Chen 5204f7611f [WebAssembly] Compute and export TLS block alignment
Summary:
Add immutable WASM global `__tls_align` which stores the alignment
requirements of the TLS segment.

Add `__builtin_wasm_tls_align()` intrinsic to get this alignment in Clang.

The expected usage has now changed to:

    __wasm_init_tls(memalign(__builtin_wasm_tls_align(),
                             __builtin_wasm_tls_size()));

Reviewers: tlively, aheejin, sbc100, sunfish, alexcrichton

Reviewed By: tlively

Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65028

llvm-svn: 366624
2019-07-19 23:34:16 +00:00
Matt Arsenault f3bfb85bce AMDGPU/GlobalISel: Legalize GEP for other 32-bit address spaces
llvm-svn: 366621
2019-07-19 22:28:44 +00:00
Stanislav Mekhanoshin 05d9e6a2a3 [AMDGPU] Autogenerate register sequences in tuples
Differential Revision: https://reviews.llvm.org/D65007

llvm-svn: 366619
2019-07-19 21:43:42 +00:00
Stanislav Mekhanoshin 7b5a54e369 [AMDGPU] Fixed occupancy calculation for gfx10
Differential Revision: https://reviews.llvm.org/D65010

llvm-svn: 366616
2019-07-19 21:29:51 +00:00
Matt Arsenault 5e23f42820 AMDGPU: Avoid custom predicates for stores with glue
llvm-svn: 366613
2019-07-19 21:01:30 +00:00
Matt Arsenault e3401a9b86 AMDGPU: Redefine setcc condition PatLeafs
Avoid using custom code predicates.

llvm-svn: 366609
2019-07-19 20:24:40 +00:00
Matt Arsenault 48c0df5d46 AMDGPU: Don't rely on m0 being -1 for GWS offsets
This only works if the high bits of m0 are also 0, so m0 would have to
be set to 0xffff.

llvm-svn: 366608
2019-07-19 20:01:24 +00:00
Matt Arsenault 85f3890126 AMDGPU: Force s_waitcnt after GWS instructions
This is apparently required to be the immediately following
instruction, so force it into a bundle with a waitcnt.

llvm-svn: 366607
2019-07-19 19:47:30 +00:00
Matt Arsenault c14334e959 LiveIntervals: Fix handleMove asserting on BUNDLE
The top-level BUNDLE instruction should behave as an ordinary
instruction. It is supposed to have all relevant registers as implicit
operands. Moving it should work as any other instruction. I believe
the assert intended to avoid moving instructions inside bundles.

llvm-svn: 366605
2019-07-19 19:32:00 +00:00
Nick Desaulniers 4e9196ebcb Revert "Use the MachineBasicBlock symbol for a callbr target"
This reverts commit r366523/ccbffefccaff42b0d094c9ef0f49fc3e8c8456ea.

Two regressions were immediately reported:
- https://github.com/ClangBuiltLinux/linux/issues/614
- https://github.com/ClangBuiltLinux/linux/issues/615

Reported-by: nathanchance
llvm-svn: 366600
2019-07-19 18:18:02 +00:00
Stanislav Mekhanoshin 01fcf9238f [AMDGPU] Allow register tuples to set asm names
This change reverts most of the previous register name generation.
The real problem is that RegisterTuple does not generate asm names.
Added optional operand to RegisterTuple. This way we can simplify
register name access and dramatically reduce the size of static
tables for the backend.

Differential Revision: https://reviews.llvm.org/D64967

llvm-svn: 366598
2019-07-19 18:05:01 +00:00
Matt Arsenault 7df225dfc2 AMDGPU/GlobalISel: Fix MMO flags for kernel argument loads
The DAG lowering sets dereferencable and invariant, not nontemporal.

llvm-svn: 366597
2019-07-19 17:52:56 +00:00
Matt Arsenault 08494f6231 AMDGPU/GlobalISel: Selection for fminnum/fmaxnum
v2f16 case doesn't work yet because the VOP3P complex patterns haven't
been ported yet.

llvm-svn: 366585
2019-07-19 14:42:40 +00:00
Matt Arsenault b60a2ae40e AMDGPU/GlobalISel: Support arguments with multiple registers
Handles structs used directly in argument lists.

llvm-svn: 366584
2019-07-19 14:29:30 +00:00
Matt Arsenault fecf43eba3 AMDGPU/GlobalISel: Rewrite lowerFormalArguments
This should now handle everything except structs passed as multiple
registers.

I think most of the packing logic should be handled by
handleAssignments, but I'm unclear on what the contract is for
multiple registers. This is copying how x86 handles this.

This does change the behavior of the test_sgpr_alignment0 amdgpu_vs
test. I don't think shader arguments should try to follow the
alignment, and registers need to be repacked. I also don't think it
matters, since I think the pointers are packed to the beginning of the
argument list anyway.

llvm-svn: 366582
2019-07-19 14:15:18 +00:00
Matt Arsenault 1022c0dfde AMDGPU: Decompose all values to 32-bit pieces for calling conventions
This is the more natural lowering, and presents more opportunities to
reduce 64-bit ops to 32-bit.

This should also help avoid issues graphics shaders have had with
64-bit values, and simplify argument lowering in globalisel.

llvm-svn: 366578
2019-07-19 13:57:44 +00:00
Matt Arsenault 5905aae169 DAG: Handle dbg_value for arguments split into multiple subregs
This was handled previously for arguments split due to not fitting in
an MVT. This was dropping the register for argument registers split
due to TLI::getRegisterTypeForCallingConv.

llvm-svn: 366574
2019-07-19 13:36:46 +00:00
Dmitry Preobrazhensky 4ccb7f8c45 [AMDGPU][MC] Corrected parsing of branch offsets
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D64629

llvm-svn: 366571
2019-07-19 13:12:47 +00:00
Kai Luo dec624682e [MachineCSE][MachinePRE] Avoid hoisting code from code regions into hot BBs.
Summary:
Current PRE hoists common computations into
CMBB = DT->findNearestCommonDominator(MBB, MBB1).
However, if CMBB is in a hot loop body, we might get performance
degradation.

Differential Revision: https://reviews.llvm.org/D64394

llvm-svn: 366570
2019-07-19 12:58:16 +00:00
Than McIntosh e238a4c757 [X86] for split stack, not save/restore nested arg if unused
Summary:
For split-stack, if the nested argument (i.e. R10) is not used, no need to save/restore it in the prologue.

Reviewers: thanm

Reviewed By: thanm

Subscribers: mstorsjo, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64673

llvm-svn: 366569
2019-07-19 12:54:44 +00:00
Oliver Stannard 8780c0dda2 Don't update NoTrappingFPMath and FPDenormalMode in resetTargetOptions
We'd like to remove this whole function, because these are properties of
functions, not the target as a whole. These two are easy to remove
because they are only used for emitting ARM build attributes, which
expects them to represent the defaults for the whole module, not just
the last function generated.

This is needed to get correct build attributes when using IPRA on ARM,
because IPRA causes resetTargetOptions to get called before
ARMAsmPrinter::emitAttributes.

Differential revision: https://reviews.llvm.org/D64929

llvm-svn: 366562
2019-07-19 10:37:37 +00:00
Oliver Stannard 0ed7732671 [IPRA] Don't rely on non-exact function definitions
If a function definition is not exact, then the linker could select a
differently-compiled version of it, which could use different registers.

https://reviews.llvm.org/D64909

llvm-svn: 366557
2019-07-19 09:59:26 +00:00
Mikhail Maltsev 0b001f94a5 [ARM] Add <saturate> operand to SQRSHRL and UQRSHLL
Summary:
According to the new Armv8-M specification
https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf the
instructions SQRSHRL and UQRSHLL now have an additional immediate
operand <saturate>. The new assembly syntax is:

SQRSHRL<c> RdaLo, RdaHi, #<saturate>, Rm
UQRSHLL<c> RdaLo, RdaHi, #<saturate>, Rm

where <saturate> can be either 64 (the existing behavior) or 48, in
that case the result is saturated to 48 bits.

The new operand is encoded as follows:
  #64 Encoded as sat = 0
  #48 Encoded as sat = 1
sat is bit 7 of the instruction bit pattern.

This patch adds a new assembler operand class MveSaturateOperand which
implements parsing and encoding. Decoding is implemented in
DecodeMVEOverlappingLongShift.

Reviewers: ostannard, simon_tatham, t.p.northover, samparker, dmgreen, SjoerdMeijer

Reviewed By: simon_tatham

Subscribers: javed.absar, kristof.beyls, hiraditya, pbarrio, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64810

llvm-svn: 366555
2019-07-19 09:46:28 +00:00
Hubert Tong 2711e16b35 [sanitizers] Use covering ObjectFormatType switches
Summary:
This patch removes the `default` case from some switches on
`llvm::Triple::ObjectFormatType`, and cases for the missing enumerators
(`UnknownObjectFormat`, `Wasm`, and `XCOFF`) are then added.

For `UnknownObjectFormat`, the effect of the action for the `default`
case is maintained; otherwise, where `llvm_unreachable` is called,
`report_fatal_error` is used instead.

Where the `default` case returns a default value, `report_fatal_error`
is used for XCOFF as a placeholder. For `Wasm`, the effect of the action
for the `default` case in maintained.

The code is structured to avoid strongly implying that the `Wasm` case
is present for any reason other than to make the switch cover all
`ObjectFormatType` enumerator values.

Reviewers: sfertile, jasonliu, daltenty

Reviewed By: sfertile

Subscribers: hiraditya, aheejin, sunfish, llvm-commits, cfe-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D64222

llvm-svn: 366544
2019-07-19 08:46:18 +00:00
Jay Foad 7d06ffff46 [AMDGPU] Simplify the exclusive scan used for optimized atomics
Summary:
Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8,
16, 32) instead of starting off shifting by 1, 2 and 3 and then doing
a 3-way ADD, because:

1. It simplifies the compiler a little.
2. It minimizes vgpr pressure because each instruction is now of the
   form vn = vn + vn << c.
3. It is more friendly to the DPP combiner, which currently can't
   combine into an ADD3 instruction.

Because of #2 and #3 the end result is improved from this:

  v_add_u32_dpp v4, v3, v3  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
  v_mov_b32_dpp v5, v3  row_shr:2 row_mask:0xf bank_mask:0xf
  v_mov_b32_dpp v1, v3  row_shr:3 row_mask:0xf bank_mask:0xf
  v_add3_u32 v1, v4, v5, v1
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:4 row_mask:0xf bank_mask:0xe
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:8 row_mask:0xf bank_mask:0xc
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:15 row_mask:0xa bank_mask:0xf
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:31 row_mask:0xc bank_mask:0xf

To this:

  v_add_u32_dpp v1, v1, v1  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:4 row_mask:0xf bank_mask:0xe
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:8 row_mask:0xf bank_mask:0xc
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:15 row_mask:0xa bank_mask:0xf
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:31 row_mask:0xc bank_mask:0xf

I.e. two fewer computational instructions, one extra nop where we could
schedule something else.

Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64411

llvm-svn: 366543
2019-07-19 08:40:37 +00:00
Serguei Katkov bde33af85a [Loop Peeling] Enable peeling of multiple exits by default.
Enable loop peeling with multiple exits where all non-latch exits
ends up with deopt by default.

Reviewers: reames, fhahn
Reviewed By: reames
Subscribers: xbolva00, hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D64619

llvm-svn: 366542
2019-07-19 08:35:45 +00:00
Roman Lebedev f2eb403144 [InstCombine] Dropping redundant masking before left-shift [5/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
f. `((x << MaskShAmt) a>> MaskShAmt) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
f. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)

Normally, the inner pattern is sign-extend,
but for our purposes it's no different to other patterns:

alive proofs:
f: https://rise4fun.com/Alive/7U3

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Differential Revision: https://reviews.llvm.org/D64524

llvm-svn: 366540
2019-07-19 08:26:58 +00:00
Roman Lebedev 441c9d6ca8 [InstCombine] Dropping redundant masking before left-shift [4/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
e. `((x << MaskShAmt) l>> MaskShAmt) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
e. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)

alive proofs:
e: https://rise4fun.com/Alive/0FT

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Differential Revision: https://reviews.llvm.org/D64521

llvm-svn: 366539
2019-07-19 08:26:47 +00:00
Roman Lebedev 3c212ce305 [InstCombine] Dropping redundant masking before left-shift [3/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
d. `(x & ((-1 << MaskShAmt) >> MaskShAmt)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
d. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)

alive proofs:
d: https://rise4fun.com/Alive/I5Y

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Differential Revision: https://reviews.llvm.org/D64519

llvm-svn: 366538
2019-07-19 08:26:37 +00:00
Roman Lebedev 2ebe57386d [InstCombine] Dropping redundant masking before left-shift [2/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
c. `(x & (-1 >> MaskShAmt)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
c. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)

alive proofs:
c: https://rise4fun.com/Alive/RgJh

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Differential Revision: https://reviews.llvm.org/D64517

llvm-svn: 366537
2019-07-19 08:26:25 +00:00
Roman Lebedev 4422a1657c [InstCombine] Dropping redundant masking before left-shift [1/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
b. `(x & (~(-1 << maskNbits))) << shiftNbits`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
b. `(MaskShAmt+ShiftShAmt) u>= bitwidth(x)`

alive proof:
b: https://rise4fun.com/Alive/y8M

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Differential Revision: https://reviews.llvm.org/D64514

llvm-svn: 366536
2019-07-19 08:26:13 +00:00
Roman Lebedev a5f0824eb5 [InstCombine] Dropping redundant masking before left-shift [0/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
a. `(x & ((1 << MaskShAmt) - 1)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
a. `(MaskShAmt+ShiftShAmt) u>= bitwidth(x)`

alive proof:
a: https://rise4fun.com/Alive/wi9

Indeed, not all of these patterns are canonical.
But since this fold will only produce a single instruction
i'm really interested in handling even uncanonical patterns,
since i have this general kind of pattern in hotpaths,
and it is not totally outlandish for bit-twiddling code.

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Reviewers: spatel, nikic, huihuiz, xbolva00

Reviewed By: xbolva00

Subscribers: efriedma, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64512

llvm-svn: 366535
2019-07-19 08:25:43 +00:00
Hsiangkai Wang c5ecdd3c5a [DebugInfo] Some fields do not need relocations even relax is enabled.
In debug frame information, some fields, e.g., Length in CIE/FDE and
Offset in FDE are attributes to describe the structure of CIE/FDE. They
are not related to the relaxed code. However, these attributes are
symbol differences. So, in current design, these attributes will be
filled as zero and LLVM generates relocations for them.

We only need to generate relocations for symbols in executable sections.
So, if the symbols are not located in executable sections, we still
evaluate their values under relaxation.

Differential Revision: https://reviews.llvm.org/D61584

llvm-svn: 366531
2019-07-19 06:10:36 +00:00
Hsiangkai Wang 18ccfadd46 [DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame.
It is necessary to generate fixups in .debug_frame or .eh_frame as
relaxation is enabled due to the address delta may be changed after
relaxation.

There is an opcode with 6-bits data in debug frame encoding. So, we
also need 6-bits fixup types.

Differential Revision: https://reviews.llvm.org/D58335

llvm-svn: 366524
2019-07-19 02:03:34 +00:00
Bill Wendling ccbffefcca Use the MachineBasicBlock symbol for a callbr target
Summary:
Inline asm doesn't use labels when compiled as an object file. Therefore, we
shouldn't create one for the (potential) callbr destination. Instead, use the
symbol for the MachineBasicBlock.

Reviewers: nickdesaulniers, craig.topper

Reviewed By: nickdesaulniers

Subscribers: xbolva00, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64888

llvm-svn: 366523
2019-07-19 01:10:28 +00:00
Amara Emerson cf12c7815f [GlobalISel] Translate calls to memcpy et al to G_INTRINSIC_W_SIDE_EFFECTs and legalize later.
I plan on adding memcpy optimizations in the GlobalISel pipeline, but we can't
do that unless we delay lowering to actual function calls. This patch changes
the translator to generate G_INTRINSIC_W_SIDE_EFFECTS for these functions, and
then have each target specify that using the new custom legalizer for intrinsics
hook that they want it expanded it a libcall.

Differential Revision: https://reviews.llvm.org/D64895

llvm-svn: 366516
2019-07-19 00:24:45 +00:00
Stanislav Mekhanoshin a9c71e01e7 [AMDGPU] Drop Reg32 and use regular AsmName
This allows to reduce generated AMDGPUGenAsmWriter.inc by ~100Kb.

Differential Revision: https://reviews.llvm.org/D64952

llvm-svn: 366505
2019-07-18 22:18:33 +00:00
Jessica Paquette 7a1dcc5ff1 [GlobalISel][AArch64] Add support for base register + offset register loads
Add support for folding G_GEPs into loads of the form

```
ldr reg, [base, off]
```

when possible. This can save an add before the load. Currently, this is only
supported for loads of 64 bits into 64 bit registers.

Add a new addressing mode function, `selectAddrModeRegisterOffset` which
performs this folding when it is profitable.

Also add a test for addressing modes for G_LOAD.

Differential Revision: https://reviews.llvm.org/D64944

llvm-svn: 366503
2019-07-18 21:50:11 +00:00
Peter Collingbourne 50057f3288 CodeGen: Allow !associated metadata to point to aliases.
This is a small extension of !associated, mostly useful for the implementation
convenience of instrumentation passes that RAUW globals with aliases, such
as LowerTypeTests.

Differential Revision: https://reviews.llvm.org/D64951

llvm-svn: 366502
2019-07-18 21:37:16 +00:00
Reid Kleckner ba9c9e62cb Revert [X86] EltsFromConsecutiveLoads - support common source loads
This reverts r366441 (git commit 48104ef7c9)

This causes clang to fail to compile some file in Skia. Reduction soon.

llvm-svn: 366501
2019-07-18 21:26:41 +00:00
Guanzhong Chen df4479200b [WebAssembly] Fix __builtin_wasm_tls_base intrinsic
Summary:
Properly generate the outchain for the `__builtin_wasm_tls_base` intrinsic.

Also marked the intrinsic pure, per @sunfish's suggestion.

Reviewers: tlively, aheejin, sbc100, sunfish

Reviewed By: tlively

Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits, sunfish

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D64949

llvm-svn: 366499
2019-07-18 21:17:52 +00:00
Peter Collingbourne 68f3fc2d91 Fix typo in r366494. Spotted by Yuanfang Chen.
llvm-svn: 366497
2019-07-18 21:03:37 +00:00
Steven Wu dac7fca530 Remove the static initialize introduced in r365099
Summary:
Some polish for r365099 which adds a static initializer to
MachOObjectFile. Remove it by moving it to file scope.

Reviewers: smeenai, alexshap, compnerd, mtrent, anushabasana

Reviewed By: smeenai

Subscribers: hiraditya, jkorous, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64873

llvm-svn: 366496
2019-07-18 21:01:21 +00:00
Peter Collingbourne d1ec8eb84f IR: Teach Constant::needsRelocation() that relative pointers don't need to be relocated.
This causes sections with relative pointers to be marked as read only,
which means that they won't end up sharing pages with writable data.

Differential Revision: https://reviews.llvm.org/D64948

llvm-svn: 366494
2019-07-18 20:56:21 +00:00
Jordan Rose 887d31ccee FileSystem: Check for DTTOIF alone, not _DIRENT_HAVE_D_TYPE
While 'd_type' is a non-standard extension to `struct dirent`, only
glibc signals its presence with a macro '_DIRENT_HAVE_D_TYPE'.
However, any platform with 'd_type' also includes a way to convert to
mode_t values using the macro 'DTTOIF', so we can check for that alone
and still be confident that the 'd_type' member exists.

(If this turns out to be wrong, I'll go back and set up an actual
CMake check.)

I couldn't think of how to write a test for this, because I couldn't
think of how to test that a 'stat' call doesn't happen without
controlling the filesystem or intercepting 'stat', and there's no good
cross-platform way to do that that I know of.

Follow-up (almost a year later) to r342089.

rdar://problem/50592673
https://reviews.llvm.org/D64940

llvm-svn: 366486
2019-07-18 20:05:11 +00:00
Lang Hames 9e52d0576a [ORC] Suppress an ORCv1 deprecation warning.
llvm-svn: 366485
2019-07-18 19:55:42 +00:00
Amy Huang f332fe642c [COFF] Change a variable type to be const in the HeapAllocSite map.
llvm-svn: 366479
2019-07-18 18:22:52 +00:00
Guanzhong Chen 801fa8e6b9 [WebAssembly] Implement __builtin_wasm_tls_base intrinsic
Summary:
Add `__builtin_wasm_tls_base` so that LeakSanitizer can find the thread-local
block and scan through it for memory leaks.

Reviewers: tlively, aheejin, sbc100

Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D64900

llvm-svn: 366475
2019-07-18 17:53:22 +00:00
Michael Liao 17a8a9277c [LAA] Re-check bit-width of pointers after stripping.
Summary:
- As the pointer stripping now tracks through `addrspacecast`, prepare
  to handle the bit-width difference from the result pointer.

Reviewers: jdoerfert

Subscribers: jvesely, nhaehnle, hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64928

llvm-svn: 366470
2019-07-18 17:30:27 +00:00
Peter Collingbourne aa6a7df64a MC: AArch64: Add support for prel_g* relocation specifiers.
Differential Revision: https://reviews.llvm.org/D64683

llvm-svn: 366462
2019-07-18 16:54:33 +00:00
Peter Collingbourne 76427f849f AArch64: Unify relocation restrictions between MOVK/MOVN/MOVZ.
There doesn't seem to be a practical reason for these instructions to have
different restrictions on the types of relocations that they may be used
with, notwithstanding the language in the ELF AArch64 spec that implies that
specific relocations are meant to be used with specific instructions.

For example, we currently forbid the first instruction in the following
sequence, despite it currently being used by clang to generate a global
reference under -mcmodel=large:

	movz	x0, #:abs_g0_nc:foo
	movk	x0, #:abs_g1_nc:foo
	movk	x0, #:abs_g2_nc:foo
	movk	x0, #:abs_g3:foo

Therefore, allow MOVK/MOVN/MOVZ to accept the union of the set of relocations
that they currently accept individually.

Differential Revision: https://reviews.llvm.org/D64466

llvm-svn: 366461
2019-07-18 16:51:53 +00:00
Hsiangkai Wang 657277e0f1 Revert "[DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame."
This reverts commit 17e3cbf5fe656483d9016d0ba9e1d0cd8629379e.

llvm-svn: 366444
2019-07-18 15:06:50 +00:00
Hsiangkai Wang e43ce1a958 [DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame.
It is necessary to generate fixups in .debug_frame or .eh_frame as
relaxation is enabled due to the address delta may be changed after
relaxation.

There is an opcode with 6-bits data in debug frame encoding. So, we
also need 6-bits fixup types.

Differential Revision: https://reviews.llvm.org/D58335

llvm-svn: 366442
2019-07-18 14:47:34 +00:00
Simon Pilgrim 48104ef7c9 [X86] EltsFromConsecutiveLoads - support common source loads
This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load.

A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match.

Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle.

Differential Revision: https://reviews.llvm.org/D64551

llvm-svn: 366441
2019-07-18 14:33:25 +00:00
Simon Pilgrim 8b525e357f [DAGCombine] Pull getSubVectorSrc helper out of narrowInsertExtractVectorBinOp. NFCI.
NFC step towards reusing this in other EXTRACT_SUBVECTOR combines.

llvm-svn: 366435
2019-07-18 13:45:53 +00:00
Thomas Preud'homme 70494494c1 [FileCheck] Fix numeric variable redefinition
Summary:
Commit r365249 changed usage of FileCheckNumericVariable to have one
instance of that class per variable as opposed to one instance per
definition of a given variable as was done before. However, it retained
the safety check in setValue that it should only be called with the
variable unset, even after r365625.

However this causes assert failure when a non-pseudo variable is being
redefined. And while redefinition of @LINE at each CHECK line work in
the general case, it caused problem when a substitution failed (fixed in
r365624) and still causes problem when a CHECK line does not match since
@LINE's value is cleared after substitutions in match() happened but
printSubstitutions also attempts a substitution.

This commit solves the root of the problem by changing setValue to set a
new value regardless of whether a value was set or not, thus fixing all
the aforementioned issues.

Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk

Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64882

llvm-svn: 366434
2019-07-18 13:39:04 +00:00
Sanjay Patel e654785912 [x86] try harder to form LEA from ADD to avoid flag conflicts (PR40483)
LEA doesn't affect flags, so use it more liberally to replace an ADD when
we know that the ADD operands affect flags.

In the motivating example from PR40483:
https://bugs.llvm.org/show_bug.cgi?id=40483
...this lets us avoid duplicating a math op just to avoid flag conflict.

As mentioned in the TODO comments, this heuristic can be extended to
fire more often if that leads to more improvements.

Differential Revision: https://reviews.llvm.org/D64707

llvm-svn: 366431
2019-07-18 12:48:01 +00:00
Diogo N. Sampaio 11512e742b [ARM][DAGCOMBINE][FIX] PerformVMOVRRDCombine
Summary:
PerformVMOVRRDCombine ommits adding a offset
of 4 to the PointerInfo, when converting a
f64 = load[M]
to
{i32, i32} = {load[M], load[M + 4]}

Which would allow the machine scheduller
to break dependencies with the second load.

 - pr42638

Reviewers: eli.friedman, dmgreen, ostannard

Reviewed By: ostannard

Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64870

llvm-svn: 366423
2019-07-18 10:05:56 +00:00
Chen Zheng c38e3efe27 [SCEV] add no wrap flag for SCEVAddExpr.
Differential Revision: https://reviews.llvm.org/D64868

llvm-svn: 366419
2019-07-18 09:23:19 +00:00
Alex Bradbury b8d352a08b [RISCV] Reset NoPHIS MachineFunctionProperty in emitSelectPseudo
We insered PHIS were there were none before, so the property must be
reset. This error was found on an EXPENSIVE_CHECKS build.

llvm-svn: 366412
2019-07-18 07:52:41 +00:00
Serguei Katkov 0ffa833d54 [LoopInfo] Use early return in branch weight update functions. NFC.
llvm-svn: 366411
2019-07-18 07:36:20 +00:00
Craig Topper 8da0402210 [X86] Disable combineConcatVectors for vXi1 vectors.
I'm not convinced the code this calls is properly vetted for
vXi1 vectors. Experimental vector widening legalization testing
for D55251 is now hitting an assertion failure inside
EltsFromConsecutiveLoads. This is occurring from a v2i1 load
having a store size different than its VT size. Hopefully
this commit will keep such issues from happening.

llvm-svn: 366405
2019-07-18 06:18:06 +00:00
Alex Bradbury 44deaf7e54 [DWARF][RISCV] Add support for RISC-V relocations needed for debug info
When code relaxation is enabled many RISC-V fixups are not resolved but
instead relocations are emitted. This happens even for DWARF debug
sections. Therefore, to properly support the parsing of DWARF debug info
we need to be able to resolve RISC-V relocations. This patch adds:

* Support for RISC-V relocations in RelocationResolver
* DWARF support for two relocations per object file offset
* DWARF changes to support relocations in more DIE fields

The two relocations per offset change is needed because some RISC-V
relocations (used for label differences) come in pairs.

Relocations can also be emitted for DWARF fields where relocations were
not yet evaluated. Adding relocation support for some of these fields is
essencial. On the other hand, LLVM currently emits RISC-V relocations
for fixups that could be safely evaluated, since they can never be
affected by code relaxations. This patch also adds relocation support
for the fields affected by those extraneous relocations (the DWARF unit
entry Length, and the DWARF debug line entry TotalLength and
PrologueLength), for testing purposes.

Differential Revision: https://reviews.llvm.org/D62062
Patch by Luís Marques.

llvm-svn: 366402
2019-07-18 05:22:55 +00:00
Alex Bradbury 8aba95d64c [RISCV] Avoid signed integer overflow UB in RISCVMatInt::generateInstSeq
Found by UBSan.

llvm-svn: 366398
2019-07-18 04:02:58 +00:00
Alex Bradbury ad73a436dc [RISCV] Don't acccess an invalidated iterator in RISCVInstrInfo::removeBranch
Issue found by ASan.

llvm-svn: 366397
2019-07-18 03:23:47 +00:00
Fangrui Song f358cf8de2 [AArch64] Add dependency from AArch64CodeGen to TransformUtils to fix -DBUILD_SHARED_LIBS=on link error after D64173/r366361
This fixes:

ld.lld: error: undefined symbol: llvm::findAllocaForValue(llvm::Value*, llvm::DenseMap<llvm::Value*, llvm::Alloc aInst*, llvm::DenseMapInfo<llvm::Value*>, llvm::detail::DenseMapPair<llvm::Value*, llvm::AllocaInst*> >&)
>>> referenced by AArch64StackTagging.cpp

llvm-svn: 366396
2019-07-18 01:53:08 +00:00
Nilanjana Basu 4e22770219 Changes to display code view debug info type records in hex format
llvm-svn: 366390
2019-07-17 23:43:58 +00:00
Evgeniy Stepanov 6abd78cc7c Make DT a transitive dependency of LI.
Summary:
LoopInfoWrapperPass::verify uses DT, which means DT must be alive
even if it has no direct users.

Fixes a crash in expensive checks mode.

Reviewers: pcc, leonardchan

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64896

llvm-svn: 366388
2019-07-17 23:31:59 +00:00
Denis Bakhvalov 3eab4819f2 [llvm-bcanalyzer] Fixed error 'Expected<T> must be checked before access or destruction'
After rL365286 I had failing test:
  LLVM :: tools/gold/X86/v1.12/thinlto_emit_linked_objects.ll

It was failing with the output:
$ llvm-bcanalyzer --dump llvm/test/tools/gold/X86/v1.12/Output/thinlto_emit_linked_objects.ll.tmp3.o.thinlto.bc
Expected<T> must be checked before access or destruction.
Unchecked Expected<T> contained error:
Unexpected end of file reading 0 of 0 bytesStack dump:

Change-Id: I07e03262074ea5e0aae7a8d787d5487c87f914a2
llvm-svn: 366387
2019-07-17 23:28:39 +00:00
Nico Weber 7bb5fc0583 llvm-pdbdump: Fix several smaller issues with injected source compression handling
- getCompression() used to return a PDB_SourceCompression even though
  the docs for IDiaInjectedSource are explicit about the return value
  being compiler-dependent. Return an uint32_t instead, and make the
  printing code handle unknown values better by printing "Unknown" and
  the int value instead of not printing any compression.

- Print compressed contents as hex dump, not as string.

- Add compression type "DotNet", which is used (at least) by csc.exe,
  the C# compiler. Also add a lengthy comment describing the stream
  contents (derived from looking at the raw hex contents long enough
  to see the GUIDs, which led me to the roslyn and mono implementations
  for handling this).

- The native injected source dumper was dumping the contents of the
  whole data stream -- but csc.exe writes a stream that's padded with
  zero bytes to the next 512 boundary, and the dia api doesn't display
  those padding bytes. So make NativeInjectedSource::getCode() do the
  same thing.

Differential Revision: https://reviews.llvm.org/D64879

llvm-svn: 366386
2019-07-17 22:59:52 +00:00
Stanislav Mekhanoshin 7872d76a16 [AMDGPU] Simplify AMDGPUInstPrinter::printRegOperand()
Differential Revision: https://reviews.llvm.org/D64892

llvm-svn: 366385
2019-07-17 22:58:43 +00:00
Craig Topper 61fff7a337 [X86] Make sure we mark 128/256 MLOAD as Legal with VLX when min-legal-vector-width=256 is in effect.
This started triggering an assertion after r364718 when we made
these Custom under AVX2.

llvm-svn: 366382
2019-07-17 22:26:00 +00:00
Peter Collingbourne 3b82b92c6b hwasan: Initialize the pass only once.
This will let us instrument globals during initialization. This required
making the new PM pass a module pass, which should still provide access to
analyses via the ModuleAnalysisManager.

Differential Revision: https://reviews.llvm.org/D64843

llvm-svn: 366379
2019-07-17 21:45:19 +00:00
Stanislav Mekhanoshin 9c7f4264d3 [AMDGPU] Stop special casing flat_scratch for register name
Differential Revision: https://reviews.llvm.org/D64885

llvm-svn: 366376
2019-07-17 21:35:11 +00:00
Evgeniy Stepanov f45fd429b7 Speculative fix for stack-tagging.ll failure.
Depending on the evaluation order of function call arguments,
the current code may insert a use before def.

llvm-svn: 366375
2019-07-17 21:27:44 +00:00
Hideto Ueno 4a09a73fb0 [Attributor][NFC] Remove unnecessary debug output
llvm-svn: 366373
2019-07-17 21:11:02 +00:00
Nilanjana Basu 6e4076699c Adding inline comments to code view type record directives for better readability
llvm-svn: 366372
2019-07-17 21:01:12 +00:00
Francis Visoiu Mistrih 9f2b290add [PEI] Don't re-allocate a pre-allocated stack protector slot
The LocalStackSlotPass pre-allocates a stack protector and makes sure
that it comes before the local variables on the stack.

We need to make sure that later during PEI we don't re-allocate a new
stack protector slot. If that happens, the new stack protector slot will
end up being **after** the local variables that it should be protecting.

Therefore, we would have two slots assigned for two different stack
protectors, one at the top of the stack, and one at the bottom. Since
PEI will overwrite the assigned slot for the stack protector, the load
that is used to compare the value of the stack protector will use the
slot assigned by PEI, which is wrong.

For this, we need to check if the object is pre-allocated, and re-use
that pre-allocated slot.

Differential Revision: https://reviews.llvm.org/D64757

llvm-svn: 366371
2019-07-17 20:46:19 +00:00
Francis Visoiu Mistrih 90ba54bf67 [CodeGen][NFC] Simplify checks for stack protector index checking
Use `hasStackProtectorIndex()` instead of `getStackProtectorIndex() >=
0`.

llvm-svn: 366369
2019-07-17 20:46:09 +00:00
Matt Arsenault 0966dd0d69 GlobalISel: Handle widenScalar of arbitrary G_MERGE_VALUES sources
Extract the sources to the GCD of the original size and target size,
padding with implicit_def as necessary.

Also fix the case where the requested source type is wider than the
original result type. This was ignoring the type, and just using the
destination. Do the operation in the requested type and truncate back.

llvm-svn: 366367
2019-07-17 20:22:44 +00:00
Matt Arsenault 914a59cad8 GlobalISel: Handle more cases for widenScalar of G_MERGE_VALUES
Use an anyext to the requested type for the leftover operand to
produce a slightly wider type, and then truncate the final merge.

I have another implementation almost ready which handles arbitrary
widens, but I think it produces worse code in this example (which I
think is 90% due to not folding redundant copies or folding out
implicit_def users), so I wanted to add this as a baseline first.

llvm-svn: 366366
2019-07-17 20:22:38 +00:00
Evgeniy Stepanov 851339fb29 Basic MTE stack tagging instrumentation.
Summary:
Use MTE intrinsics to tag stack variables in functions with
sanitize_memtag attribute.

Reviewers: pcc, vitalybuka, hctim, ostannard

Subscribers: srhines, mgorny, javed.absar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64173

llvm-svn: 366361
2019-07-17 19:24:12 +00:00
Evgeniy Stepanov d752f5e953 Basic codegen for MTE stack tagging.
Implement IR intrinsics for stack tagging. Generated code is very
unoptimized for now.

Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are
used to implement a tagged stack frame pointer in a virtual register.

Differential Revision: https://reviews.llvm.org/D64172

llvm-svn: 366360
2019-07-17 19:24:02 +00:00
Momchil Velikov 0e2b74a2b0 Revert [AArch64] Add support for Transactional Memory Extension (TME)
This reverts r366322 (git commit 4b8da3a503)

llvm-svn: 366355
2019-07-17 17:43:32 +00:00
Daniil Fukalov d912a9ba9b [AMDGPU] Tune inlining parameters for AMDGPU target
Summary:
Since the target has no significant advantage of vectorization,
vector instructions bous threshold bonus should be optional.

amdgpu-inline-arg-alloca-cost parameter default value and the target
InliningThresholdMultiplier value tuned then respectively.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64642

llvm-svn: 366348
2019-07-17 16:51:29 +00:00
Lang Hames 1716454027 [ORC] Add deprecation warnings to ORCv1 layers and utilities.
Summary:
ORCv1 is deprecated. The current aim is to remove it before the LLVM 10.0
release. This patch adds deprecation attributes to the ORCv1 layers and
utilities to warn clients of the change.

Reviewers: dblaikie, sgraenitz, AlexDenisov

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64609

llvm-svn: 366344
2019-07-17 16:40:52 +00:00
Matt Arsenault 06eed42213 AMDGPU: Use getTargetConstant
Avoids creating an extra intermediate mov.

llvm-svn: 366340
2019-07-17 15:35:36 +00:00
Hideto Ueno 11d3710c1c [Attributor] Deduce "willreturn" function attribute
Summary:
Deduce the "willreturn" attribute for functions.

For now, intrinsics are not willreturn. More annotation will be done in another patch.

Reviewers: jdoerfert

Subscribers: jvesely, nhaehnle, nicholas, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63046

llvm-svn: 366335
2019-07-17 15:15:43 +00:00
Alex Bradbury ab009a602e [AsmPrinter] Make the encoding of call sites in .gcc_except_table configurable and use for RISC-V
The original behavior was to always emit the offsets to each call site in the
call site table as uleb128 values, however on some architectures (eg RISCV)
these uleb128 offsets into the code cannot always be resolved until link time
(because relaxation will invalidate any calculated offsets), and there are no
appropriate relocations for uleb128 values. As a consequence it needs to be
possible to specify an alternative.

This also switches RISCV to use DW_EH_PE_udata4 for call side encodings in
.gcc_except_table

Differential Revision: https://reviews.llvm.org/D63415
Patch by Edward Jones.

llvm-svn: 366329
2019-07-17 14:00:35 +00:00
Alex Bradbury b94c233d06 [RISCV] Set correct encodings for DWARF exception handling
This patch sets correct encodings for DWARF exception handling for RISC-V
(other than call site encoding, which must be udata4 rather than uleb128 and
is handled by D63415).

This has the same intend as D63409, except this version matches GCC/binutils
behaviour which uses the same encodings regardless of PIC/non-PIC and
medlow/medany code model.

llvm-svn: 366327
2019-07-17 13:54:38 +00:00
Jay Foad 70235c642e [AMDGPU] Optimize atomic AND/OR/XOR
Summary: Extend the atomic optimizer to handle AND, OR and XOR.

Reviewers: arsenm, sheredom

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64809

llvm-svn: 366323
2019-07-17 13:40:03 +00:00
Momchil Velikov 4b8da3a503 [AArch64] Add support for Transactional Memory Extension (TME)
TME is a future architecture technology, documented in

https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools
https://developer.arm.com/docs/ddi0601/a

More about the future architectures:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture

This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and
TCANCEL and the target feature/arch extension "tme".

It also implements TME builtin functions, defined in ACLE Q2 2019
(https://developer.arm.com/docs/101028/latest)

Patch by Javed Absar and Momchil Velikov

Differential Revision: https://reviews.llvm.org/D64416

llvm-svn: 366322
2019-07-17 13:23:27 +00:00
Justin Hibbits 0257c6b659 PowerPC: Fix register spilling for SPE registers
Summary:
Missed in the original commit, use the correct callee-saved register
list for spilling, instead of the standard SVR432 list.  This avoids
needlessly spilling the SPE non-volatile registers when they're not used.

As part of this, also add where missing, and sort, the spill opcode
checks for SPE and SPE4 register classes.

Reviewers: nemanjai, hfinkel, joerg

Subscribers: kbarton, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D56703

llvm-svn: 366319
2019-07-17 12:30:48 +00:00
Justin Hibbits 5214956eaa PowerPC/SPE: Fix load/store handling for SPE
Summary:
Pointed out in a comment for D49754, register spilling will currently
spill SPE registers at almost any offset.  However, the instructions
`evstdd` and `evldd` require a) 8-byte alignment, and b) a limit of 256
(unsigned) bytes from the base register, as the offset must fix into a
5-bit offset, which ranges from 0-31 (indexed in double-words).

The update to the register spill test is taken partially from the test
case shown in D49754.

Additionally, pointed out by Kei Thomsen, globals will currently use
evldd/evstdd, though the offset isn't known at compile time, so may
exceed the 8-bit (unsigned) offset permitted.  This fixes that as well,
by forcing it to always use evlddx/evstddx when accessing globals.

Part of the patch contributed by Kei Thomsen.

Reviewers: nemanjai, hfinkel, joerg

Subscribers: kbarton, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D54409

llvm-svn: 366318
2019-07-17 12:30:04 +00:00
Petar Avramovic 1e62635d05 [MIPS GlobalISel] ClampScalar and select pointer G_ICMP
Add narrowScalar to half of original size for G_ICMP.
ClampScalar G_ICMP's operands 2 and 3 to to s32.
Select G_ICMP for pointers for MIPS32. Pointer compare is same
as for integers, it is enough to declare them as legal type.

Differential Revision: https://reviews.llvm.org/D64856

llvm-svn: 366317
2019-07-17 12:08:01 +00:00
Nicolai Haehnle 8b7041a5c6 AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC
Summary: Change-Id: I854fbf7d48e937bef9f8f3f5d0c8aeb970652630

Reviewers: rampitec, mareko

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64807

Change-Id: I4405b3a7f84186acea5a78d291bff71056e745fc
llvm-svn: 366314
2019-07-17 11:22:57 +00:00
Nicolai Haehnle a256b8b7d7 AMDGPU: Improve alias analysis for GDS
Summary: GDS cannot alias anything else.

Original patch by: Marek Olšák

Reviewers: arsenm, mareko

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64114

Change-Id: I07bfbd96f5d5c37a6dfba7997df12f291dd794b0
llvm-svn: 366313
2019-07-17 11:22:19 +00:00
Diana Picus 37e403d18c [ARM GlobalISel] Cleanup CallLowering. NFC
Migrate CallLowering::lowerReturnVal to use the same infrastructure as
lowerCall/FormalArguments and remove the now obsolete code path from
splitToValueTypes.

Forgot to push this earlier.

llvm-svn: 366308
2019-07-17 10:01:27 +00:00
Simon Atanasyan 4c1e440892 [mips] Use mult/mflo pattern on 64-bit targets prior to MIPS64
The `MUL` instruction is available starting from the MIPS32/MIPS64 targets.

llvm-svn: 366301
2019-07-17 08:11:40 +00:00
Simon Atanasyan a884afb6f8 [mips] Implement .cplocal directive
This directive forces to use the alternate register for context pointer.
For example, this code:
  .cplocal $4
  jal foo
expands to:
  ld    $25, %call16(foo)($4)
  jalr  $25

Differential Revision: https://reviews.llvm.org/D64743

llvm-svn: 366300
2019-07-17 08:11:31 +00:00
Simon Atanasyan 7f308af5ee [mips] Support the "o" inline asm constraint
As well as other LLVM targets we do not handle "offsettable"
memory addresses in any special way. In other words, the "o" constraint
is an exact equivalent of the "m" one. But some existing code require
the "o" constraint support.

This fixes PR42589.

Differential Revision: https://reviews.llvm.org/D64792

llvm-svn: 366299
2019-07-17 08:11:15 +00:00
Stanislav Mekhanoshin e5012ab308 [AMDGPU] Autogenerate register asm names
Differential Revision: https://reviews.llvm.org/D64839

llvm-svn: 366283
2019-07-16 23:44:21 +00:00
Matt Arsenault 1c3f4ec7fc GlobalISel: Add overload of handleAssignments with CCState
AMDGPU needs to allocate special argument registers separately from
the user function argument list, so needs direct control over the
CCState.

The ArgLocs argument is only really necessary because CCState doesn't
allow access to it.

llvm-svn: 366279
2019-07-16 22:41:34 +00:00
Guanzhong Chen 0a8d4df799 [WebAssembly] Compile all TLS on Emscripten as local-exec
Summary:
Currently, on Emscripten, dynamic linking is not supported with threads.
This means that if thread-local storage is used, it must be used in a
statically-linked executable. Hence, local-exec is the only possible model.

This diff compiles all TLS variables to use local-exec on Emscripten as a
temporary measure until dynamic linking is supported with threads.

The goal for this is to allow C++ types with constructors to be thread-local.

Currently, when `clang` compiles a `thread_local` variable with a constructor,
it generates `__tls_guard` variable:

    @__tls_guard = internal thread_local global i8 0, align 1

As no TLS model is specified, this is treated as general-dynamic, which we do
not support (and cannot support without implementing dynamic linking support
with threads in Emscripten). As a result, any C++ constructor in `thread_local`
variables would not compile.

By compiling all `thread_local` as local-exec, `__tls_guard` will compile and
we can support C++ constructors with TLS without implementing dynamic linking
with threads.

Depends on D64537

Reviewers: tlively, aheejin, sbc100

Reviewed By: aheejin

Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64776

llvm-svn: 366275
2019-07-16 22:22:08 +00:00
Guanzhong Chen 42bba4b852 [WebAssembly] Implement thread-local storage (local-exec model)
Summary:
Thread local variables are placed inside a `.tdata` segment. Their symbols are
offsets from the start of the segment. The address of a thread local variable
is computed as `__tls_base` + the offset from the start of the segment.

`.tdata` segment is a passive segment and `memory.init` is used once per thread
to initialize the thread local storage.

`__tls_base` is a wasm global. Since each thread has its own wasm instance,
it is effectively thread local. Currently, `__tls_base` must be initialized
at thread startup, and so cannot be used with dynamic libraries.

`__tls_base` is to be initialized with a new linker-synthesized function,
`__wasm_init_tls`, which takes as an argument a block of memory to use as the
storage for thread locals. It then initializes the block of memory and sets
`__tls_base`. As `__wasm_init_tls` will handle the memory initialization,
the memory does not have to be zeroed.

To help allocating memory for thread-local storage, a new compiler intrinsic
is introduced: `__builtin_wasm_tls_size()`. This instrinsic function returns
the size of the thread-local storage for the current function.

The expected usage is to run something like the following upon thread startup:

    __wasm_init_tls(malloc(__builtin_wasm_tls_size()));

Reviewers: tlively, aheejin, kripken, sbc100

Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D64537

llvm-svn: 366272
2019-07-16 22:00:45 +00:00
Sanjay Patel d746a210e1 [x86] use more phadd for reductions
This is part of what is requested by PR42023:
https://bugs.llvm.org/show_bug.cgi?id=42023

There's an extension needed for FP add, but exactly how we would specify
that using flags is not clear to me, so I left that as a TODO.
We're still missing patterns for partial reductions when the input vector
is 256-bit or 512-bit, but I think that's a failure of vector narrowing.
If we can reduce the widths, then this matching should work on those tests.

Differential Revision: https://reviews.llvm.org/D64760

llvm-svn: 366268
2019-07-16 21:30:41 +00:00
David Blaikie 40580d36c4 DWARF: Skip zero column for inline call sites
D64033 <https://reviews.llvm.org/D64033> added DW_AT_call_column for
inline sites. However, that change wasn't aware of "-gno-column-info".
To avoid adding column info when "-gno-column-info" is used, now
DW_AT_call_column is only added when we have non-zero column (when
"-gno-column-info" is used, column will be zero).

Patch by Wenlei He!

Differential Revision: https://reviews.llvm.org/D64784

llvm-svn: 366264
2019-07-16 21:15:19 +00:00
Matt Arsenault f8c8284455 AMDGPU/GlobalISel: Select G_ASHR
llvm-svn: 366257
2019-07-16 20:31:25 +00:00
Matt Arsenault e5b28b98e9 AMDGPU/GlobalISel: Select G_LSHR
llvm-svn: 366256
2019-07-16 20:25:43 +00:00
Jinsong Ji 65e34a3143 [PowerPC][HTM] Fix impossible reg-to-reg copy assert with ttest builtin
Summary:
This is exposed by our internal testing.
The reduced testcase will assert with "Impossible reg-to-reg copy"

We can't use COPY to do 32-bit to 64-bit conversion.

Reviewers: kbarton, hfinkel, nemanjai

Reviewed By: hfinkel

Subscribers: hiraditya, MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64499

llvm-svn: 366255
2019-07-16 20:24:33 +00:00
Matt Arsenault 1b69fd275d AMDGPU/GlobalISel: Select G_SHL
I think this manages to not break the DAG handling with the divergent
predicates because the stadalone divergent patterns end up with a
higher priority than the pattern on the instruction definition.

The 16-bit versions don't work yet.

llvm-svn: 366254
2019-07-16 20:15:30 +00:00
Stanislav Mekhanoshin 6e0fa292c2 [AMDGPU] Change register type for v32 vectors
When it is AReg_1024 this results in unnecessary copying into
AGPRs of a 32 element vectors even though they are not intended
for an mfma instruction.

Differential Revision: https://reviews.llvm.org/D64815

llvm-svn: 366252
2019-07-16 20:06:00 +00:00
Michael Liao ccf22ef94c Fix -Wreturn-type warning. NFC.
llvm-svn: 366251
2019-07-16 19:59:08 +00:00
Matt Arsenault 2d10407719 AMDGPU/GlobalISel: Fix selection of private stores
llvm-svn: 366249
2019-07-16 19:27:44 +00:00
Matt Arsenault 7161fb0be5 AMDGPU/GlobalISel: Select private loads
llvm-svn: 366248
2019-07-16 19:22:21 +00:00
Matt Arsenault dad1f89210 AMDGPU/GlobalISel: Select flat stores
llvm-svn: 366246
2019-07-16 18:42:53 +00:00
Matt Arsenault 7eb1902cd5 AMDGPU: Add register classes to flat store patterns
For some reason GlobalISelEmitter needs register classes to import
these, although it works for the load patterns.

llvm-svn: 366242
2019-07-16 18:26:42 +00:00
Philip Reames 6e1c3bb181 [IndVars] Speculative fix for an assertion failure seen in bots
I don't have an IR sample which is actually failing, but the issue described in the comment is theoretically possible, and should be guarded against even if there's a different root cause for the bot failures.

llvm-svn: 366241
2019-07-16 18:23:49 +00:00
Matt Arsenault 8f8d07e93b AMDGPU: Replace store PatFrags
Convert the easy cases to formats understood for GlobalISel.

llvm-svn: 366240
2019-07-16 18:21:25 +00:00
Matt Arsenault 35c96598b1 AMDGPU/GlobalISel: Select flat loads
Now that the patterns use the new PatFrag address space support, the
only blocker to importing most load patterns is the addressing mode
complex patterns.

llvm-svn: 366237
2019-07-16 18:05:29 +00:00
Nico Weber d100b5dd01 Teach `llvm-pdbutil pretty -native` about `-injected-sources`
`pretty -native -injected-sources -injected-source-content` works with
this patch, and produces identical output to the dia version.

Differential Revision: https://reviews.llvm.org/D64428

llvm-svn: 366236
2019-07-16 18:04:26 +00:00
Jay Foad 17060f0a54 [AMDGPU] Optimize atomic max/min
Summary:
Extend the atomic optimizer to handle signed and unsigned max and min
operations, as well as add and subtract.

Reviewers: arsenm, sheredom, critson, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64328

llvm-svn: 366235
2019-07-16 17:44:54 +00:00
Matt Arsenault c6fd5abecc AMDGPU: Redefine load PatFrags
Rewrite PatFrags using the new PatFrag address space matching in
tablegen. These will now work with both SelectionDAG and GlobalISel.

llvm-svn: 366234
2019-07-16 17:38:50 +00:00
Michael Liao b3f967d411 [AMDGPU] Add the adjusted FP as a livein register.
Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64145

llvm-svn: 366223
2019-07-16 15:57:12 +00:00
Ulrich Weigand 450c62e33e [Strict FP] Allow more relaxed scheduling
Reimplement scheduling constraints for strict FP instructions in
ScheduleDAGInstrs::buildSchedGraph to allow for more relaxed
scheduling.  Specifially, allow one strict FP instruction to
be scheduled across another, as long as it is not moved across
any global barrier.

Differential Revision: https://reviews.llvm.org/D64412

Reviewed By: cameron.mcinally

llvm-svn: 366222
2019-07-16 15:55:45 +00:00
Francis Visoiu Mistrih 94bad22c2c [Remarks] Simplify and refactor the RemarkParser interface
Before, everything was based on some kind of type erased parser
implementation which container a lot of boilerplate code when multiple
formats were to be supported.

This simplifies it by:

* the remark now owns its arguments
* *always* returning an error from the implementation side
* working around the way the YAML parser reports errors: catch them through
callbacks and re-insert them in a proper llvm::Error
* add a CParser wrapper that is used when implementing the C API to
avoid cluttering the C++ API with useless state
* LLVMRemarkParserGetNext now returns an object that needs to be
released to avoid leaking resources
* add a new API to dispose of a remark entry: LLVMRemarkEntryDispose

llvm-svn: 366217
2019-07-16 15:25:05 +00:00
Francis Visoiu Mistrih cc909812a3 [Remarks][NFC] Combine ParserFormat and SerializerFormat
It's useless to have both.

llvm-svn: 366216
2019-07-16 15:24:59 +00:00
Amara Emerson 228a7b4f2a [ADCE] Fix non-deterministic behaviour due to iterating over a pointer set.
Original patch by Yann Laigle-Chapuy

Differential Revision: https://reviews.llvm.org/D64785

llvm-svn: 366215
2019-07-16 15:23:10 +00:00
Amaury Sechet f34a69c2e2 [DAGCombiner] fold (addcarry (xor a, -1), b, c) -> (subcarry b, a, !c) and flip carry.
Summary:
As per title. DAGCombiner only mathes the special case where b = 0, this patches extends the pattern to match any value of b.

Depends on D57302

Reviewers: hfinkel, RKSimon, craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59208

llvm-svn: 366214
2019-07-16 15:17:00 +00:00
Matt Arsenault 22c4a147a9 AMDGPU/GlobalISel: Fix test failures in release build
Apparently the check for legal instructions during instruction
select does not happen without an asserts build, so these would
successfully select in release, and fail in debug.

Make s16 and/or/xor legal. These can just be selected directly
to the 32-bit operation, as is already done in SelectionDAG, so just
make them legal.

llvm-svn: 366210
2019-07-16 14:28:30 +00:00
Kyrylo Tkachov eb72138340 [AArch64] Implement __jcvt intrinsic from Armv8.3-A
The jcvt intrinsic defined in ACLE [1] is available when ARM_FEATURE_JCVT is defined.

This change introduces the AArch64 intrinsic, wires it up to the instruction and a new clang builtin function.
The __ARM_FEATURE_JCVT macro is now defined when an Armv8.3-A or higher target is used.
I've implemented the target detection logic in Clang so that this feature is enabled for architectures from armv8.3-a onwards (so -march=armv8.4-a also enables this, for example).

make check-all didn't show any new failures.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics

Differential Revision: https://reviews.llvm.org/D64495

llvm-svn: 366197
2019-07-16 09:27:39 +00:00
Kyrylo Tkachov a3e26d1a6c [NFC] Test commit: add full stop at end of comment
llvm-svn: 366195
2019-07-16 09:15:01 +00:00
Igor Kudrin f48bc01812 [DWARF] Fix the reserved values for unit length in DWARFDebugLine.
The DWARF3 documentation had inconsistency concerning the reserved range
for unit length values. The issue was fixed in DWARF4.

Differential Revision: https://reviews.llvm.org/D64622

llvm-svn: 366190
2019-07-16 07:01:08 +00:00
Igor Kudrin 74c350af21 [DWARF] Fix an incorrect format specifier.
This adjusts the format specifier because PCOffset is uint16_t.

Differential Revision: https://reviews.llvm.org/D64620

llvm-svn: 366189
2019-07-16 06:56:10 +00:00
Igor Kudrin 860f7ec058 [DWARF] Simplify DWARFAttribute. NFC.
The first argument in the constructor was ignored, and the remaining
arguments were always passed as their defaults.

Differential Revision: https://reviews.llvm.org/D64407

llvm-svn: 366188
2019-07-16 06:53:06 +00:00
Craig Topper c0b2ed664b [X86] In combineStore, don't convert v2f32 load/store pairs to f64 loads/stores.
Type legalization can take care of this. This gives DAG combine
a little more time with the original types.

llvm-svn: 366182
2019-07-16 05:52:27 +00:00
Alex Bradbury 1ffceaa543 [RISCV] Match GNU tools canonical JALR and add aliases
The canonical GNU form of JALR resembles a load/store instruction rather
than placing the immediate offset as a separate argument, so match this
behaviour. Also add parser-only aliases for the three-operand form, and
add other shorter aliases also emitted by GNU tools.

Differential Revision: https://reviews.llvm.org/D55277
Patch by James Clarke.

llvm-svn: 366179
2019-07-16 04:56:43 +00:00
Rui Ueyama 49a3ad21d6 Fix parameter name comments using clang-tidy. NFC.
This patch applies clang-tidy's bugprone-argument-comment tool
to LLVM, clang and lld source trees. Here is how I created this
patch:

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \
    -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \
    -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm
$ ninja
$ parallel clang-tidy -checks='-*,bugprone-argument-comment' \
    -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \
    ::: ../llvm/lib/**/*.{cpp,h} ../clang/lib/**/*.{cpp,h} ../lld/**/*.{cpp,h}

llvm-svn: 366177
2019-07-16 04:46:31 +00:00
Alex Bradbury bb479ca311 [RISCV] Avoid overflow when determining number of nops for code align
RISCVAsmBackend::shouldInsertExtraNopBytesForCodeAlign() assumed that the
align specified would be greater than or equal to the minimum nop length, but
that is not always the case - for example if a user specifies ".align 0" in
assembly.

Differential Revision: https://reviews.llvm.org/D63274
Patch by Edward Jones.

llvm-svn: 366176
2019-07-16 04:40:25 +00:00
Alex Bradbury e9ad0cf6cf [RISCV] Fix a potential issue in shouldInsertFixupForCodeAlign()
The bool result of shouldInsertExtraNopBytesForCodeAlign() is not checked but
the returned nop count is unconditionally read even though it could be
uninitialized.

Differential Revision: https://reviews.llvm.org/D63285
Patch by Edward Jones.

llvm-svn: 366175
2019-07-16 04:37:19 +00:00
Alex Bradbury ef8577ef98 [RISCV][NFC] Split PseudoCALL pattern out from instruction
Since PseudoCALL defines AsmString, it can be generated from assembly,
and so code-gen patterns should be defined separately to be consistent
with the style of the RISCV backend. Other pseudo-instructions exist
that have code-gen patterns defined directly, but these instructions are
purely for code-gen and cannot be written in assembly.

Differential Revision: https://reviews.llvm.org/D64012
Patch by James Clarke.

llvm-svn: 366174
2019-07-16 03:56:45 +00:00
Alex Bradbury a3c7b27419 [RISCV][NFC] Fix HasStedExtA -> HasStdExtA typo in comment
Differential Revision: https://reviews.llvm.org/D64011
Patch by James Clarke.

llvm-svn: 366173
2019-07-16 03:54:08 +00:00
Alex Bradbury 4ac0b9be23 [RISCV] Make RISCVELFObjectWriter::getRelocType check IsPCRel
Previously, this function didn't check the IsPCRel argument. But doing so is a
useful check for errors, and also seemingly necessary for FK_Data_4 (which we
produce a R_RISCV_32_PCREL relocation for if IsPCRel).

Other than R_RISCV_32_PCREL, this should be NFC. Future exception handling
related patches will include tests that capture this behaviour.

llvm-svn: 366172
2019-07-16 03:47:34 +00:00
Peter Collingbourne e5c4b468f0 hwasan: Pad arrays with non-1 size correctly.
Spotted by eugenis.

Differential Revision: https://reviews.llvm.org/D64783

llvm-svn: 366171
2019-07-16 03:25:50 +00:00
Matt Arsenault 1739b700b1 AMDGPU: Avoid code predicates for extload PatFrags
Use the MemoryVT field. This will be necessary for tablegen to
automatically handle patterns for GlobalISel.

Doesn't handle the d16 lo/hi patterns. Those are a special case since
it involvess the custom node type.

llvm-svn: 366168
2019-07-16 02:46:05 +00:00
Jonas Devlieghere ca16d280f7 Re-land "[DebugInfo] Move function from line table to the prologue (NFC)"
In LLDB, when parsing type units, we don't need to parse the whole line
table. Instead, we only need to parse the "support files" from the line
table prologue.

To make that possible, this patch moves the respective functions from
the LineTable into the Prologue. Because I don't think users of the
LineTable should have to know that these files come from the Prologue,

I've left the original methods in place, and made them redirect to the
LineTable.

Differential revision: https://reviews.llvm.org/D64774

llvm-svn: 366164
2019-07-16 01:21:25 +00:00
Michael Liao 543ba4e9e0 [InstructionSimplify] Apply sext/trunc after pointer stripping
Summary:
- As the pointer stripping could trace through `addrspacecast` now, need
  to sext/trunc the offset to ensure it has the same width as the
  pointer after stripping.

Reviewers: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64768

llvm-svn: 366162
2019-07-16 01:03:06 +00:00
Jonas Devlieghere 01ee172e9e Revert "[DebugInfo] Move function from line table to the prologue (NFC)"
This broke LLD, which I didn't have enabled.

llvm-svn: 366160
2019-07-16 00:59:04 +00:00
Jonas Devlieghere 509903e887 [DebugInfo] Move function from line table to the prologue (NFC)
In LLDB, when parsing type units, we don't need to parse the whole line
table. Instead, we only need to parse the "support files" from the line
table prologue.

To make that possible, this patch moves the respective functions from
the LineTable into the Prologue. Because I don't think users of the
LineTable should have to know that these files come from the Prologue,

I've left the original methods in place, and made them redirect to the
LineTable.

Differential revision: https://reviews.llvm.org/D64774

llvm-svn: 366158
2019-07-16 00:37:17 +00:00
Eric Christopher 93dfb93ad6 Temporarily Revert "[SLP] Recommit: Look-ahead operand reordering heuristic."
As there are some reported miscompiles with AVX512 and performance regressions
in Eigen. Verified with the original committer and testcases will be forthcoming.

This reverts commit r364964.

llvm-svn: 366154
2019-07-15 23:36:02 +00:00
Leonard Chan bb147aabc6 Revert "[NewPM] Port Sancov"
This reverts commit 5652f35817.

llvm-svn: 366153
2019-07-15 23:18:31 +00:00
Craig Topper 51193871da [X86] Teach convertToThreeAddress to handle SUB with immediate
We mostly avoid sub with immediate but there are a couple cases that can create them. One is the add 128, %rax -> sub -128, %rax trick in isel. The other is when a SUB immediate gets created for a compare where both the flags and the subtract value is used. If we are unable to linearize the SelectionDAG to satisfy the flag user and the sub result user from the same instruction, we will clone the sub immediate for the two uses. The one that produces flags will eventually become a compare. The other will have its flag output dead, and could then be considered for LEA creation.

I added additional test cases to add.ll to show the the sub -128 trick gets converted to LEA and a case where we don't need to convert it.

This showed up in the current codegen for PR42571.

Differential Revision: https://reviews.llvm.org/D64574

llvm-svn: 366151
2019-07-15 23:07:56 +00:00
Heejin Ahn 1cf6922660 [WebAssembly] Add missing utility methods for exnref type
Summary:
This adds missing utility methods and copy instruction handling for
`exnref` type and also adds tests.

`tee` instruction tests are missing because `isTee` is currently only
used in ExplicitLocals pass and testing that pass in mir requires
serialization of stackified registers in mir files, which is a bit
nontrivial because `MachineFunctionInfo` only has info of vreg numbers
(which are large integers) but not the mir's register numbers. But this
change is quite trivial anyway.

Reviewers: tlively

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64705

llvm-svn: 366149
2019-07-15 23:04:00 +00:00
Heejin Ahn 9f96a58ccc [WebAssembly] Rename except_ref type to exnref
Summary:
We agreed to rename `except_ref` to `exnref` for consistency with other
reference types in
https://github.com/WebAssembly/exception-handling/issues/79. This also
renames WebAssemblyInstrExceptRef.td to WebAssemblyInstrRef.td in order
to use the file for other reference types in future.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64703

llvm-svn: 366145
2019-07-15 22:49:25 +00:00
Wouter van Oortmerssen 292e21d8bc [WebAssembly] Assembler: support special floats: infinity / nan
Summary:
These are emitted as identifiers by the InstPrinter, so we should
parse them as such. These could potentially clash with symbols of
the same name, but that is out of our (the WebAssembly backend) control.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, aheejin, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64770

llvm-svn: 366139
2019-07-15 22:13:39 +00:00
Austin Kerbow 423b4a18a4 [AMDGPU] Enable merging m0 initializations.
Summary:
Enable hoisting and merging m0 defs that are initialized with the same
immediate value. Fixes bug where removed instructions are not considered
to interfere with other inits, and make sure to not hoist inits before block
prologues.

Reviewers: rampitec, arsenm

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64766

llvm-svn: 366135
2019-07-15 22:07:05 +00:00
Simon Atanasyan becae2b232 [mips] Print BEQZL and BNEZL pseudo instructions
One of the reasons - to be compatible with GNU tools.

llvm-svn: 366133
2019-07-15 21:46:38 +00:00
Matt Arsenault b082f1055b AMDGPU: Use standalone MUBUF load patterns
We already do this for the flat and DS instructions, although it is
certainly uglier and more verbose.

This will allow using separate pattern definitions for extload and
zextload. Currently we get away with using a single PatFrag with
custom predicate code to check if the extension type is a zextload or
anyextload. The generic mechanism the global isel emitter understands
treats these as mutually exclusive. I was considering making the
pattern emitter accept zextload or sextload extensions for anyextload
patterns, but in global isel, the different extending loads have
distinct opcodes, and there is currently no mechanism for an opcode
matcher to try multiple (and there probably is very little need for
one beyond this case).

llvm-svn: 366132
2019-07-15 21:41:44 +00:00
Nick Desaulniers c4f245b40a [LoopUnroll+LoopUnswitch] do not transform loops containing callbr
Summary:
There is currently a correctness issue when unrolling loops containing
callbr's where their indirect targets are being updated correctly to the
newly created labels, but their operands are not.  This manifests in
unrolled loops where the second and subsequent copies of callbr
instructions have blockaddresses of the label from the first instance of
the unrolled loop, which would result in nonsensical runtime control
flow.

For now, conservatively do not unroll the loop.  In the future, I think
we can pursue unrolling such loops provided we transform the cloned
callbr's operands correctly.

Such a transform and its legalities are being discussed in:
https://reviews.llvm.org/D64101

Link: https://bugs.llvm.org/show_bug.cgi?id=42489
Link: https://groups.google.com/forum/#!topic/clang-built-linux/z-hRWP9KqPI

Reviewers: fhahn, hfinkel, efriedma

Reviewed By: fhahn, hfinkel, efriedma

Subscribers: efriedma, hiraditya, zzheng, dmgreen, llvm-commits, pirama, kees, nathanchance, E5ten, craig.topper, chandlerc, glider, void, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64368

llvm-svn: 366130
2019-07-15 21:16:29 +00:00
Matt Arsenault 66ee934440 AMDGPU/GlobalISel: Allow scalar s1 and/or/xor
If a 1-bit value is in a 32-bit VGPR, the scalar opcodes set SCC to
whether the result is 0. If the inputs are SCC, these can be copied to
a 32-bit SGPR to produce an SCC result.

llvm-svn: 366125
2019-07-15 20:20:18 +00:00
Evgeniy Stepanov c5e7f56249 ARM MTE stack sanitizer.
Add "memtag" sanitizer that detects and mitigates stack memory issues
using armv8.5 Memory Tagging Extension.

It is similar in principle to HWASan, which is a software implementation
of the same idea, but there are enough differencies to warrant a new
sanitizer type IMHO. It is also expected to have very different
performance properties.

The new sanitizer does not have a runtime library (it may grow one
later, along with a "debugging" mode). Similar to SafeStack and
StackProtector, the instrumentation pass (in a follow up change) will be
inserted in all cases, but will only affect functions marked with the
new sanitize_memtag attribute.

Reviewers: pcc, hctim, vitalybuka, ostannard

Subscribers: srhines, mehdi_amini, javed.absar, kristof.beyls, hiraditya, cryptoad, steven_wu, dexonsmith, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D64169

llvm-svn: 366123
2019-07-15 20:02:23 +00:00
Matt Arsenault c8291c94f8 AMDGPU/GlobalISel: Select G_AND/G_OR/G_XOR
llvm-svn: 366121
2019-07-15 19:50:07 +00:00
Matt Arsenault ad19b50c00 AMDGPU/GlobalISel: Don't constrain source register of VCC copies
This is a hack until I come up with a better way of dealing with the
pseudo-register banks used for boolean values. If the use instruction
constrains the register, the selector for the def instruction won't
see that the bank was VCC. A 1-bit SReg_32 is could ambiguously have
been SCCRegBank or VCCRegBank in wave32.

This is necessary to successfully select branches with and and/or/xor
condition.

llvm-svn: 366120
2019-07-15 19:48:36 +00:00
Matt Arsenault e1b52f4180 AMDGPU/GlobalISel: Fix selecting vcc->vcc bank copies
The extra test change is correct, although how it arrives there is a
bug that needs work. With wave32, the test for isVCC ambiguously
reports true for an SCC or VCC source. A new allocatable pseudo
register class for SCC may be necesssary.

llvm-svn: 366119
2019-07-15 19:46:48 +00:00
Matt Arsenault 3bfdb54d88 AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC
llvm-svn: 366118
2019-07-15 19:45:49 +00:00
Matt Arsenault 18b7133843 AMDGPU/GlobalISel: Fix handling of sgpr (not scc bank) s1 to VCC
This was emitting a copy from a 32-bit register to a 64-bit.

llvm-svn: 366117
2019-07-15 19:44:07 +00:00
Matt Arsenault 6ed315f89b AMDGPU/GlobalISel: Custom legalize G_INSERT_VECTOR_ELT
llvm-svn: 366116
2019-07-15 19:43:04 +00:00
Matt Arsenault b0e04c018c AMDGPU/GlobalISel: Custom legalize G_EXTRACT_VECTOR_ELT
Turn the constant cases into G_EXTRACTs.

llvm-svn: 366115
2019-07-15 19:40:59 +00:00
Matt Arsenault 5dfd466032 AMDGPU/GlobalISel: Fix G_ICMP for wave32
llvm-svn: 366114
2019-07-15 19:39:31 +00:00
Matt Arsenault 434d664095 GlobalISel: Implement narrowScalar for vector extract/insert indexes
llvm-svn: 366113
2019-07-15 19:37:34 +00:00
Thomas Preud'homme 99f2a10870 [FileCheck] Store line numbers as optional values
Summary:
Processing of command-line definition of variable and logic around
implicit not directives both reuse parsing code that expects a line
number to be defined. So far, a special line number of 0 was used for
those users of the parsing code where a line number does not make sense.
This commit instead represents line numbers as Optional values so that
they can be None for those cases.

Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk

Subscribers: JonChesterfield, rogfer01, hfinkel, kristina, rnk, tra, arichardson, grimar, dblaikie, probinson, llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64639

llvm-svn: 366109
2019-07-15 19:04:56 +00:00
Nico Weber ac6375d99d Expand comment about how StringsToBuckets was computed, and add more entries
The construction was explained in
https://reviews.llvm.org/D44810?id=139526#inline-391999 but reading the code
shouldn't require hunting down old reviews to understand it.

The precomputed list was missing an entry for the empty list case, and
one entry at the very end. (The current last entry is the last one where
3 * BucketCount fits in a signed int, but the reference implementation
uses unsigneds as far as I can tell, so there's room for one more entry.)

No behavior change for inputs seen in practice.

Differential Revision: https://reviews.llvm.org/D64738

llvm-svn: 366107
2019-07-15 18:56:56 +00:00
David Green dc56995c57 [ARM] MVE vector for 64bit types
We need to make sure that we are sensibly dealing with vectors of types v2i64
and v2f64, even if most of the time we cannot generate native operations for
them. This mostly adds a lot of testing, plus fixes up a couple of the issues
found. And, or and xor can be legal for v2i64, and shifts combining needs a
slight fixup.

Differential Revision: https://reviews.llvm.org/D64316

llvm-svn: 366106
2019-07-15 18:42:54 +00:00
Wouter van Oortmerssen b2a0745e2d [WebAssembly] Assembler: recognize .init_array as data section.
Reviewers: sbc100

Subscribers: dschuff, jgravelle-google, aheejin, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64602

llvm-svn: 366104
2019-07-15 18:36:07 +00:00
Matt Arsenault 90bdfb3daf AMDGPU/GlobalISel: Widen vector extracts
llvm-svn: 366103
2019-07-15 18:31:10 +00:00
Matt Arsenault 53fa759ff5 AMDGPU/GlobalISel: Handle llvm.amdgcn.if.break
llvm-svn: 366102
2019-07-15 18:25:24 +00:00
Matt Arsenault b390121efb AMDGPU/GlobalISel: Select llvm.amdgcn.end.cf
llvm-svn: 366099
2019-07-15 18:18:46 +00:00
Sanjay Patel eb99165b97 [x86] try to keep FP casted+truncated+extracted vector element out of GPRs
inttofp (trunc (extelt X, 0)) --> inttofp (extelt (bitcast X), 0)

We have pseudo-vectorization of scalar int to FP casts, so this tries to
make that more likely by replacing a truncate with a bitcast. I didn't see
any test diffs starting from 'uitofp', so I left that as a TODO. We can't
only match the shorter trunc+extract pattern because there's an opposing
transform somewhere, so we infinite loop. Waiting to try this during
lowering is another possibility.

A motivating case is shown in PR39975 and included in the test diffs here:
https://bugs.llvm.org/show_bug.cgi?id=39975

Differential Revision: https://reviews.llvm.org/D64710

llvm-svn: 366098
2019-07-15 18:17:23 +00:00
Stella Stamenova 032e3c468f [llvm-lib] Add a dependency to intrinsics_gen to the LLVMLibDriver build
Summary:
Occasionally the build of LLVMLibDriver will fail because Attributes.inc has not been generated yet. Add an explicit dependency, so that we can guarantee that the file has been generated before LLVMLibDriver is build.

##[error]llvm\include\llvm\IR\Attributes.h(73,0): Error C1083: Cannot open include file: 'llvm/IR/Attributes.inc': No such file or directory
llvm\include\llvm/IR/Attributes.h(73): fatal error C1083: Cannot open include file: 'llvm/IR/Attributes.inc': No such file or directory [LLVMLibDriver.vcxproj]

Reviewers: asmith

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64357

llvm-svn: 366097
2019-07-15 18:15:12 +00:00
Craig Topper 81971b2b79 [X86] Return UNDEF from LowerScalarImmediateShift when the shift amount is out of range.
I think we only turn out of range shiftss to undef when
all elements are out of range or the shift amount is a splat out
of range. I'm not sure which, I didn't check.

During lowering we can split a shift where some elements
are out of range into multiple shifts. This can create a
new shift with a splat shift amount that is out of range.

This patch returns undef for this case.

Fixes PR42615.

Differential Revision: https://reviews.llvm.org/D64699

llvm-svn: 366096
2019-07-15 17:56:57 +00:00
Matt Arsenault 49169a963e AMDGPU: Add 24-bit mul intrinsics
Insert these during codegenprepare.

This works around a DAG issue where generic combines eliminate the and
asserting the high bits are zero, which then exposes an unknown read
source to the mul combine. It doesn't worth the hassle of trying to
insert an AssertZext or something to try to deal with it.

llvm-svn: 366094
2019-07-15 17:50:31 +00:00
Stanislav Mekhanoshin 7938424eb9 [AMDGPU] Copy missing predicate from pseudo to real
NFC at the momemnt, needed for future commit.

Differential Revision: https://reviews.llvm.org/D64761

llvm-svn: 366092
2019-07-15 17:49:25 +00:00
Johannes Doerfert 3dcd7996f1 [FunctionAttrs] Remove readonly and writeonly assertion
There are scenarios where mutually recursive functions may cause the SCC
to contain both read only and write only functions. This removes an
assertion when adding read attributes which caused a crash with a the
provided test case, and instead just doesn't add the attributes.

Patch by Luke Lau <luke.lau@intel.com>

Differential Revision: https://reviews.llvm.org/D60761

llvm-svn: 366090
2019-07-15 17:31:26 +00:00
David Green 8e7eee617a [ARM] Minor formatting in ARMInstrMVE.td. NFC
llvm-svn: 366089
2019-07-15 17:29:06 +00:00
Matt Arsenault a65913e752 AMDGPU/GlobalISel: Select easy cases for G_BUILD_VECTOR
llvm-svn: 366087
2019-07-15 17:26:43 +00:00
Matt Arsenault cc02b17082 AMDGPU/GlobalISel: RegBankSelect for G_CONCAT_VECTORS
llvm-svn: 366086
2019-07-15 17:20:40 +00:00
Stanislav Mekhanoshin fd08dcb9db [AMDGPU] fixed scheduler crash in gfx908
For some reason scheduler can send down an SUnit without an
instruction.

Differential Revision: https://reviews.llvm.org/D64709

llvm-svn: 366074
2019-07-15 15:34:05 +00:00
Dmitry Preobrazhensky 5153b1723a [AMDGPU][MC][GFX9][GFX10] Added support of GET_DOORBELL message
Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D64729

llvm-svn: 366071
2019-07-15 15:12:16 +00:00
Dmitry Preobrazhensky 8d879c8d95 [AMDGPU][MC] Corrected encoding of src0 for DS_GWS_* instructions
See bug 42599: https://bugs.llvm.org/show_bug.cgi?id=42599

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D64716

llvm-svn: 366067
2019-07-15 14:37:57 +00:00
Simon Pilgrim 60fb5e97a0 [X86] isTargetShuffleEquivalent - assert the expected mask is correctly formed. NFCI.
While we don't make any assumptions about the actual mask, assert that the expected mask only contains valid mask element values.

llvm-svn: 366066
2019-07-15 14:29:14 +00:00
Simon Atanasyan 83ae0b5eb4 [mips] Remove "else-after-return". NFC
llvm-svn: 366064
2019-07-15 13:12:36 +00:00
David Green 6e89887642 [ARM] MVE Vector Shifts
This adds basic lowering for MVE shifts. There are many shifts in MVE, but the
instructions handled here are:
 VSHL (imm)
 VSHRu (imm)
 VSHRs (imm)
 VSHL (vector)
 VSHL (register)

MVE, like NEON before it, doesn't have shift right by a vector (or register).
We instead have to negate the amount and shift in the opposite direction. This
means we have to convert any SHR's into a form of SHL (that is still signed or
unsigned) with a negated condition and selecting from there. MVE still does
have shifting by an immediate for SHL, ASR and LSR.

This adds lowering for these and for register forms, which work well for shift
lefts but may require an extra fold of neg(vdup(x)) -> vdup(neg(x)) to potentially
work optimally for right shifts.

Differential Revision: https://reviews.llvm.org/D64212

llvm-svn: 366056
2019-07-15 11:35:39 +00:00
David Green f059147a10 [ARM] Move Shifts after Bits. NFC
This just moves the shift instruction definitions further down the
ARMInstrMVE.td file, to make positioning patterns slightly more natural.

llvm-svn: 366054
2019-07-15 11:22:05 +00:00
David Green da750b1688 [ARM] Adjust how NEON shifts are lowered
This adjusts the way that we lower NEON shifts to use a DAG target node, not
via a neon intrinsic. This is useful for handling MVE shifts operations in the
same the way. It also renames some of the immediate shift nodes for
consistency, and moves some of the processing of immediate shifts into
LowerShift allowing it to capture more cases.

Differential Revision: https://reviews.llvm.org/D64426

llvm-svn: 366051
2019-07-15 10:44:50 +00:00
Serguei Katkov d021ad9fbe [Loop Peeling] Fix the bug with IDom setting for exit loops
It is possible that loop exit has two predecessors in a loop body.
In this case after the peeling the iDom of the exit should be a clone of
iDom of original exit but no a clone of a block coming to this exit.

Reviewers: reames, fhahn
Reviewed By: reames
Subscribers: hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D64618

llvm-svn: 366050
2019-07-15 09:13:11 +00:00
Florian Hahn 1d554b7441 [LoopVectorize] Pass unfiltered list of arguments to getIntrinsicInstCost.
We do not compute the scalarization overhead in getVectorIntrinsicCost
and TTI::getIntrinsicInstrCost requires the full arguments list.

llvm-svn: 366049
2019-07-15 08:48:47 +00:00
Serguei Katkov 3ed93b4673 [Loop Peeling] Enable peeling for loops with multiple exits
This CL enables peeling of the loop with multiple exits where
one exit should be from latch and others are basic blocks with
call to deopt.

The peeling is enabled under the flag which is false by default.

Reviewers: reames, mkuper, iajbar, fhahn
Reviewed By: reames
Subscribers: xbolva00, hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D63923

llvm-svn: 366048
2019-07-15 08:26:45 +00:00
Hideto Ueno 54869ec907 [Attributor] Deduce "nonnull" attribute
Summary:
Porting nonnull attribute to attributor.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63604

llvm-svn: 366043
2019-07-15 06:49:04 +00:00
Serguei Katkov 45c43e7d04 [LoopUtils] Extend the scope of getLoopEstimatedTripCount
With this patch the getLoopEstimatedTripCount function will
accept also the loops where there are more than one exit but
all exits except latch block should ends up with a call to deopt.

This side exits should not impact the estimated trip count.

Reviewers: reames, mkuper, danielcdh
Reviewed By: reames
Subscribers: fhahn, lebedev.ri, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D64553

llvm-svn: 366042
2019-07-15 06:42:39 +00:00
Bill Wendling 796ed134cc Remove set but unused variable.
llvm-svn: 366041
2019-07-15 06:35:28 +00:00
Serguei Katkov f1ee04c42a [LoopInfo] Introduce getUniqueNonLatchExitBlocks utility function
Extract the code from LoopUnrollRuntime into utility function to
re-use it in D63923.

Reviewers: reames, mkuper
Reviewed By: reames
Subscribers: fhahn, hiraditya, zzheng, dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D64548

llvm-svn: 366040
2019-07-15 05:51:10 +00:00
Fangrui Song 335f955dc4 [PowerPC] Support fp128 libcalls
On PowerPC, IEEE 754 quadruple-precision libcall names use "kf" instead of "tf".

In libgcc, libgcc/config/rs6000/float128-sed converts TF names to KF
names. This patch implements its 24 substitution rules.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D64282

llvm-svn: 366039
2019-07-15 05:02:32 +00:00
Johannes Doerfert 2d63fbb7b1 [ValueTracking] Look through constant Int2Ptr/Ptr2Int expressions
Summary:
This is analogous to the int2ptr/ptr2int instruction handling introduced
in D54956.

Reviewers: fhahn, efriedma, spatel, nlopes, sanjoy, lebedev.ri

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64708

llvm-svn: 366036
2019-07-15 03:24:35 +00:00
Craig Topper 635d103e0b [X86] Separate the memory size of vzext_load/vextract_store from the element size of the result type. Use them improve the codegen of v2f32 loads/stores with sse1 only.
Summary:
SSE1 only supports v4f32. But does have instructions like movlps/movhps that load/store 64-bits of memory.

This patch breaks the connection between the node VT of the vzext_load/vextract_store patterns and the memory VT. Enabling a v4f32 node with a 64-bit memory VT. I've used i64 as the memory VT here. I've written the PatFrag predicate to just check the store size not the specific VT. I think the VT will only matter for CSE purposes. We could use v2f32, but if we want to start using these operations in more places a simple integer type might make the most sense.

I'd like to maybe use this same thing for SSE2 and later as well, but that will need more work to be supported by EltsFromConsecutiveLoads to avoid regressing lit tests. I'd maybe also like to combine bitcasts with these load/stores nodes now that the types are disconnected. And I'd also like to consider canonicalizing (scalar_to_vector + load) to vzext_load.

If you want I can split the mechanical tablegen stuff where I added the 32/64 off from the sse1 change.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64528

llvm-svn: 366034
2019-07-15 02:02:31 +00:00
Alexandros Lamprineas 951bb68ce2 [TargetParser][ARM] Account dependencies when processing target features
Teaches ARM::appendArchExtFeatures to account dependencies when processing
target features: i.e. when you say -march=armv8.1-m.main+mve.fp+nofp it
means mve.fp should get discarded too. (Split from D63936)

Differential Revision: https://reviews.llvm.org/D64048

llvm-svn: 366031
2019-07-14 20:31:15 +00:00
Florian Hahn 9428d95ce7 [LV] Exclude loop-invariant inputs from scalar cost computation.
Loop invariant operands do not need to be scalarized, as we are using
the values outside the loop. We should ignore them when computing the
scalarization overhead.

Fixes PR41294

Reviewers: hsaito, rengolin, dcaballe, Ayal

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D59995

llvm-svn: 366030
2019-07-14 20:12:36 +00:00
Alexandros Lamprineas 24cacf9c56 [clang][Driver][ARM] Favor -mfpu over default CPU features
When processing the command line options march, mcpu and mfpu, we store
the implied target features on a vector. The change D62998 introduced a
temporary vector, where the processed features get accumulated. When
calling DecodeARMFeaturesFromCPU, which sets the default features for
the specified CPU, we certainly don't want to override the features
that have been explicitly specified on the command line. Therefore, the
default features should appear first in the final vector. This problem
became evident once I added the missing (unhandled) target features in
ARM::getExtensionFeatures.

Differential Revision: https://reviews.llvm.org/D63936

llvm-svn: 366027
2019-07-14 18:32:42 +00:00
Florian Hahn 19d3fdb08b Recommit "[BitcodeReader] Validate OpNum, before accessing Record array."
This recommits r365750 (git commit 8b222ecf27)

Original message:

   Currently invalid bitcode files can cause a crash, when OpNum exceeds
   the number of elements in Record, like in the attached bitcode file.

   The test case was generated by clusterfuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=15698

   Reviewers: t.p.northover, thegameg, jfb

   Reviewed By: jfb

   Differential Revision: https://reviews.llvm.org/D64507

   llvm-svn: 365750jkkkk

llvm-svn: 366018
2019-07-14 14:06:25 +00:00
Florian Hahn 864474c9c7 [BitcodeReader] Use tighter upper bound to validate forward references.
At the moment, bitcode files with invalid forward reference can easily
cause the bitcode reader to run out of memory, by creating a forward
reference with a very high index.

We can use the size of the bitcode file as an upper bound, because a
valid bitcode file can never contain more records. This should be
sufficient to fail early in most cases. The only exception is large
files with invalid forward references close to the file size.

There are a couple of clusterfuzz runs that fail with out-of-memory
because of very high forward references and they should be fixed by this
patch.

A concrete example for this is D64507, which causes out-of-memory on
systems with low memory, like the hexagon upstream bots.

Reviewers: t.p.northover, thegameg, jfb, efriedma, hfinkel

Reviewed By: jfb

Differential Revision: https://reviews.llvm.org/D64577

llvm-svn: 366017
2019-07-14 12:35:50 +00:00
Craig Topper 9450b0084a [X86] Remove offset of 8 from the call to FuseInst for UNPCKLPDrr folding added in r365287.
This was copy/pasted from above and I forgot to change it. We just
need the default offset of 0 here.

Fixes PR42616.

llvm-svn: 366011
2019-07-14 04:13:33 +00:00
David Green 458a720ec1 [ARM] Add sign and zero extend patterns for MVE
The vmovlb instructions can be uses to sign or zero extend vector registers
between types. This adds some patterns for them and relevant testing. The
VBICIMM generation is also put behind a hasNEON check (as is already done for
VORRIMM).

Code originally by David Sherwood.

Differential Revision: https://reviews.llvm.org/D64069

llvm-svn: 366008
2019-07-13 15:43:00 +00:00
David Green 07a7ec2021 [ARM] MVE VNEG instruction patterns
This selects integer VNEG instructions, which can be especially useful with shifts.

Differential Revision: https://reviews.llvm.org/D64204

llvm-svn: 366006
2019-07-13 15:26:51 +00:00
David Green 4ce648b5e8 [ARM] MVE integer abs
Similar to floating point abs, we also have instructions for integers.

Differential Revision: https://reviews.llvm.org/D64027

llvm-svn: 366005
2019-07-13 14:58:32 +00:00
David Green 701bf714db [ARM] MVE integer min and max
This simply makes the MVE integer min and max instructions legal and adds the
relevant patterns for them.

Differential Revision: https://reviews.llvm.org/D64026

llvm-svn: 366004
2019-07-13 14:48:54 +00:00
David Green ac5bcbeb9f [ARM] MVE VRINT support
This adds support for the floor/ceil/trunc/... series of instructions,
converting to various forms of VRINT. They use the same suffixes as their
floating point counterparts. There is not VTINTR, so nearbyint is expanded.

Also added a copysign test, to show it is expanded.

Differential Revision: https://reviews.llvm.org/D63985

llvm-svn: 366003
2019-07-13 14:38:53 +00:00
David Green ec8af0db6c [ARM] MVE minnm and maxnm instructions
This adds the patterns for minnm and maxnm from the fminnum and fmaxnum nodes,
similar to scalar types.

Original patch by Simon Tatham

Differential Revision: https://reviews.llvm.org/D63870

llvm-svn: 366002
2019-07-13 14:29:02 +00:00
Thomas Preud'homme 2a7f520460 FileCheck [7/12]: Arbitrary long numeric expressions
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch extend numeric expression to
support an arbitrary number of operands, either variable or literals.

Copyright:
    - Linaro (changes up to diff 183612 of revision D55940)
    - GraphCore (changes in later versions of revision D55940 and
                 in new revision created off D55940)

Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk

Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60387

llvm-svn: 366001
2019-07-13 13:24:30 +00:00
Sanjay Patel 2097f75eab [x86] simplify cmov with same true/false operands
llvm-svn: 365998
2019-07-13 12:04:52 +00:00
Fangrui Song 327db23b66 [Object] isNotObjectErrorInvalidFileType: simplify
llvm-svn: 365997
2019-07-13 09:28:33 +00:00
Fangrui Song 16ac7a5a27 [Object] isNotObjectErrorInvalidFileType: fix use-after-move
llvm-svn: 365996
2019-07-13 09:23:35 +00:00
Johannes Doerfert c7a1db3298 [Attributor][NFC] Run clang-format on the attributor files (.h/.cpp)
The Attributor files are kept formatted with clang-format, we should try
to keep this state.

llvm-svn: 365984
2019-07-13 01:09:27 +00:00
Johannes Doerfert 0a7f4cdce9 [Attributor] Only return attributes with a valid state
Attributor::getAAFor will now only return AbstractAttributes with a
valid AbstractState. This simplifies call sites as they only need to
check if the returned pointer is non-null. It also reduces the potential
for accidental misuse.

llvm-svn: 365983
2019-07-13 01:09:21 +00:00
Evgeniy Stepanov 41c22b4390 Extend function attributes bitset size from 64 to 96.
Summary: We are going to add a function attribute number 64.

Reviewers: pcc, jdoerfert, lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64663

llvm-svn: 365980
2019-07-13 00:29:03 +00:00
Nico Weber 51a52b5893 PDB HashTable: Move TraitsT from class parameter to the methods that need it
The traits object is only used by a few methods. Deserializing a hash
table and walking it is possible without the traits object, so it
shouldn't be required to build a dummy object for that use case.

The TraitsT object used to be a function template parameter before
r327647, this restores it to that state.

This makes it clear that the traits object isn't needed at all in 1 of
the current 3 uses of HashTable (and I am going to add another use that
doesn't need it), and that the default PdbHashTraits isn't used outside
of tests.

While here, also re-enable 3 checks in the test that were commented out
(which requires making HashTableInternals templated and giving FooBar
an operator==).

No intended behavior change.

Differential Revision: https://reviews.llvm.org/D64640

llvm-svn: 365974
2019-07-12 23:30:55 +00:00
Stanislav Mekhanoshin 1dfae6fe50 [AMDGPU] use v32f32 for 3 mfma intrinsics
These should really use v32f32, but were defined as v32i32
due to the lack of the v32f32 type.

Differential Revision: https://reviews.llvm.org/D64667

llvm-svn: 365972
2019-07-12 22:42:01 +00:00
Vitaly Buka b1bff76e22 isBytewiseValue checks ConstantVector element by element
Summary: Vector of the same value with few undefs will sill be considered "Bytewise"

Reviewers: eugenis, pcc, jfb

Reviewed By: jfb

Subscribers: dexonsmith, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64031

llvm-svn: 365971
2019-07-12 22:37:55 +00:00
Alina Sbirlea db101864bd [MemorySSA] Use SetVector to avoid nondeterminism.
Summary:
Use a SetVector for DeadBlockSet.
Resolves PR42574.

Reviewers: george.burgess.iv, uabelho, dblaikie

Subscribers: jlebar, Prazek, mgrang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64601

llvm-svn: 365970
2019-07-12 22:30:30 +00:00
Wouter van Oortmerssen d8ddf83950 [WebAssembly] refactored utilities to not depend on MachineInstr
Summary:
Most of these functions can work for MachineInstr and MCInst
equally now.

Reviewers: dschuff

Subscribers: MatzeB, sbc100, jgravelle-google, aheejin, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64643

llvm-svn: 365965
2019-07-12 22:08:25 +00:00
Alex Lorenz 6d187f0eff [macCatalyst] Use macCatalyst pretty name in .build_version darwin
assembly command

'macCatalyst' is more readable than 'maccatalyst'. I renamed the objdump output,
but the assembly should match it as well.

llvm-svn: 365964
2019-07-12 22:06:08 +00:00
David Bolvansky 9f0d718c66 [InstCombine] Disable fold from D64285 for non-integer types
llvm-svn: 365959
2019-07-12 21:14:21 +00:00
Evgeniy Stepanov 32452487ae Factor out resolveFrameOffsetReference (NFC).
Split AArch64FrameLowering::resolveFrameIndexReference in two parts
* Finding frame offset for the index.
* Finding base register and offset to that register.

The second part will be used to implement a virtual frame pointer in
armv8.5 MTE stack instrumentation lowering.

Reviewers: pcc, vitalybuka, hctim, ostannard

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64171

llvm-svn: 365958
2019-07-12 21:13:55 +00:00
Matt Arsenault 51a05d72ae AMDGPU: Drop remnants of byval support for shaders
Before 2018, mesa used to use byval interchangably with inreg, which
didn't really make sense. Fix tests still using it to avoid breaking
in a future commit.

llvm-svn: 365953
2019-07-12 20:12:17 +00:00
David Tenty ae79a2c390 Fix missing use of defined() in include guard
Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64657

llvm-svn: 365952
2019-07-12 20:12:15 +00:00
Nikita Popov 411fa4c0df [SystemZ] Fix addcarry of addcarry of const carry (PR42606)
This fixes https://bugs.llvm.org/show_bug.cgi?id=42606 by extending
D64213. Instead of only checking if the carry comes from a matching
operation, we now check the full chain of carries. Otherwise we might
custom lower the outermost addcarry, but then generically legalize
an inner addcarry.

Differential Revision: https://reviews.llvm.org/D64658

llvm-svn: 365949
2019-07-12 20:03:34 +00:00
Craig Topper b828f0b90a [X86] Use MachineInstr::findRegisterDefOperand to simplify some code in optimizeCompareInstr. NFCI
llvm-svn: 365946
2019-07-12 19:26:35 +00:00