Commit Graph

138742 Commits

Author SHA1 Message Date
Paul Walker b6c7b7fa31 [SVE] Add ISD nodes for predicated integer extend inreg operations.
These are useful instructions when lowering fixed length vector
extends, so I've broken this patch out as kind of NFC like work.

Differential Revision: https://reviews.llvm.org/D85546
2020-08-11 11:39:26 +01:00
Simon Pilgrim 49016eeab6 [X86] Rename combineVectorPackWithShuffle -> combineHorizOpWithShuffle. NFC.
The plan is to use this for (F)HADD/SUB opcodes as well as PACKs - similar to how we use combineShuffleWithHorizOp
2020-08-11 11:38:43 +01:00
Paul Walker d542feb8e4 [SVE] Lower fixed length vector integer subtract operations.
Differential Revision: https://reviews.llvm.org/D85665
2020-08-11 11:32:12 +01:00
Kai Nacke d6f710fd46 [NFC] Fix typo in comment.
Twelvth -> Twelfth
2020-08-11 05:27:56 -04:00
Kai Nacke b3aece0531 [SystemZ/ZOS] Add binary format goff and operating system zos to the triple
Adds the binary format goff and the operating system zos to the triple
class. goff is selected as default binary format if zos is choosen as
operating system. No further functionality is added.

Reviewers: efriedma, tahonermann, hubert.reinterpertcast, MaskRay

Reviewed By: efriedma, tahonermann, hubert.reinterpertcast

Differential Revision: https://reviews.llvm.org/D82081
2020-08-11 05:26:26 -04:00
Florian Hahn 0b774acf11 [SLP] Make sure instructions are ordered when computing spill cost.
The entries in VectorizableTree are not necessarily ordered by their
position in basic blocks. Collect them and order them by dominance so
later instructions are guaranteed to be visited first. For instructions
in different basic blocks, we only scan to the beginning of the block,
so their order does not matter, as long as all instructions in a basic
block are grouped together. Using dominance ensures a deterministic order.

The modified test case contains an example where we compute a wrong
spill cost (2) without this patch, even though there is no call between
any instruction in the bundle.

This seems to have limited practical impact, .e.g on X86 with a recent
Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006
there are no binary changes.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D82444
2020-08-11 11:18:12 +02:00
Dávid Bolvanský c2f0101310 [InstCombine] ~(~X + Y) -> X - Y
Proof:
https://alive2.llvm.org/ce/z/4xharr

Solves PR47051

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D85593
2020-08-11 11:05:42 +02:00
Florian Hahn 7829c33084 [SCEVExpander] Add helper to clean up instrs inserted while expanding.
SCEVExpander already tracks which instructions have been inserted n
InsertedValues/InsertedPostIncValues. This patch adds an additional
vector to collect the instructions in insertion order. This can then be
used to remove exactly the instructions inserted by the expander.

This replaces ExpandedValuesCleaner, which in some cases might remove
values not inserted by the expander (e.g. if a value was dead before
insertion and is then used during expansion).

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D84327
2020-08-11 09:30:31 +01:00
Sam Parker 8f92f3c2ea [RDA] Fix DBG_VALUE issues
We skip debug instructions in RDA so we cannot attempt to look them
up in our instruction map without causing a crash. But some of the
methods select the last instruction in the block and this
instruction may be a debug instruction... So, use getLastNonDebugInstr
instead of calling back on a MachineBasicBlock.

MachineBasicBlock iterators have also been updated to use
instructionsWithoutDebug so we can avoid the manual checks for debug
instructions.

Differential Revision: https://reviews.llvm.org/D85658
2020-08-11 09:03:09 +01:00
Juneyoung Lee 63b5b92bc9 [LazyValueInfo] Let getEdgeValueLocal look into freeze instructions
This patch makes getEdgeValueLocal more precise when a freeze instruction is
given, by adding support for freeze into constantFoldUser

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84629
2020-08-11 16:39:34 +09:00
Shinji Okumura 06eee8748f [Attributor][NFC] Connect AAPotentialValues with AAValueSimplify
This patch enables `AAValueSimplify` to use information from `AAPotentialValues`

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85668
2020-08-11 15:52:02 +09:00
Craig Topper 9201efb3b9 [X86] Custom match X86ISD::VPTERNLOG in X86ISelDAGToDAG in order to reduce isel patterns.
By factoring out the end of tryVPTERNLOG, we can use the same code
to directly match X86ISD::VPTERNLOG. This allows us to remove
around 3-4K worth of X86GenDAGISel.inc.
2020-08-10 23:15:58 -07:00
QingShan Zhang 61ede38da0 [CodeGen] Expand float operand for STRICT_FSETCC/STRICT_FSETCCS
This patch is the continue work of https://reviews.llvm.org/D69281
to implement the way that expands STRICT_FSETCC/STRICT_FSETCCS.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D81906
2020-08-11 05:55:00 +00:00
Haowei Wu db91320a89 Revert "Move ELFObjHandler to TextAPI library"
This reverts commit e6f8ba12e6 due
to build failures.
2020-08-10 21:31:29 -07:00
Haowei Wu e6f8ba12e6 Move ELFObjHandler to TextAPI library
This change moves ELFObjHandler to llvm/TextAPI library so it can
be used by different llvm tools.
2020-08-10 21:23:39 -07:00
Wang, Pengfei 9512525947 [X86][FPEnv] Teach X86 mask compare intrinsics to respect strict FP semantics.
When we use mask compare intrinsics under strict FP option, the masked
elements shouldn't raise any exception. So, we cann't replace the
intrinsic with a full compare + "and" operation.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D85385
2020-08-11 10:28:41 +08:00
Xing GUO 3c5758964c [macho2yaml] Refactor the DWARF section dumpers.
This patch refactors the DWARF section dumpers. When dumping a DWARF
section, if the DWARF parser fails to parse the section, we will dump it
as a raw content section. This patch also fixes a bug in
DWARFYAML::Data::isEmpty(). Finally, a test case that tests dumping the
__debug_aranges section is added.

Reviewed By: jhenderson, grimar

Differential Revision: https://reviews.llvm.org/D85506
2020-08-11 10:18:34 +08:00
Lang Hames 6fd30f0669 [llvm-jitlink] Update llvm-jitlink to use TargetProcessControl. 2020-08-10 17:19:48 -07:00
Johannes Doerfert fa5d22a045 [OpenMP][NFC] Reuse OMPIRBuilder `struct ident_t` handling in Clang
Replace the `ident_t` handling in Clang with the methods offered by the
OMPIRBuilder. This cuts down on the clang code as well as the
differences between the two, making further transitions easier. Tests
have changed but there should not be a real functional change. The most
interesting difference is probably that we stop generating local ident_t
allocations for now and just use globals. Given that this happens only
with debug info, the location part of the `ident_t` is probably bigger
than the test anyway. As the location part is already a global, we can
avoid the allocation, memcpy, and store in favor of a constant global
that is slightly bigger. This can be revisited if there are
complications.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D80735
2020-08-10 17:13:26 -05:00
jasonliu 20abff0481 [XCOFF][AIX] Use TE storage mapping class when large code model is enabled
Summary:
Use TE SMC instead of TC SMC in large code model mode,
so that large code model TOC entries could get placed after all
the small code model TOC entries, which reduces the chance of TOC overflow.

Reviewed By: Xiangling_L

Differential Revision: https://reviews.llvm.org/D85455
2020-08-10 19:52:10 +00:00
Puyan Lotfi 7bc03f5553 [MachineOutliner][AArch64] WA for multiple stack fixup cases in MachineOutliner.
In cases where MachineOutliner candidates either are:

  * noreturn
  * have calls with no available LR or free regs
  * Don't use SP

we can end up hitting stack fixup code for the caller and the callee for
a FrameID of MachineOutlinerDefault. This triggers the assert:

  `assert(OF.FrameConstructionID != MachineOutlinerDefault &&
          "Can only fix up stack references once");`

in AArch64InstrInfo.cpp. This assert exists for now because a lot of the
fixup code is not tested to handle fixing up more than once and needs
some better checks and enhancements to avoid potentially generating
illegal code.

I've filed a Bugzilla report to track this until these cases are handled
by the AArch64 MachineOutliner: https://bugs.llvm.org/show_bug.cgi?id=46767

This diff detects cases that will cause these multiple stack fixups and
prune the Candidates from `RepeatedSequenceLocs`.

    Differential Revision: https://reviews.llvm.org/D83923
2020-08-10 15:43:30 -04:00
Wei Mi 4cd8e9b169 [SampleFDO] Stop letting findCalleeFunctionSamples return unrelated profiles
for invoke instructions.

We see a warning of "No debug information found in function foo: Function
profile not used" in a case. The function foo is called by an invoke
instruction. It has no debug information because it has attribute((nodebug))
in the definition. It shouldn't have profile instance in the sample profile
but compiler thinks it does, that turns out to be a compiler bug in
findCalleeFunctionSamples. The bug is exposed when sample-profile-merge-inlinee
is enabled recently.

Currently in findCalleeFunctionSamples, CalleeName is unset and is empty for
invoke instruction. For empty CalleeName, findFunctionSamplesAt will treat
the call as an indirect call and will return any inline instance profile at
the same location as the instruction. That leads to a wrong profile being
returned to function foo.

The patch set CalleeName when the instruction is an invoke.

Differential Revision: https://reviews.llvm.org/D85664
2020-08-10 12:41:09 -07:00
Thomas Lively 514445e035 [WebAssembly][ConstantFolding] Fold fp-to-int truncation intrinsics
Constant fold both the trapping and saturating versions of the
WebAssembly truncation intrinsics. The tests are adapted from the
WebAssembly spec tests for the corresponding instructions.

Requested in PR46982.

Differential Revision: https://reviews.llvm.org/D85392
2020-08-10 12:40:05 -07:00
Stanislav Mekhanoshin 08803f0e62 Unbundle KILL bundles in VirtRegRewriter
SplitKit forms invalid COPY subreg bundles without a leading
BUNDLE instruction. That manifests itself in post-RA scheduler
counting instruction and asserting on "Instruction count mismatch".

The bundle shall be undone by VirtRegRewriter::expandCopyBundle(),
but it does not because VirtRegRewriter::handleIdentityCopy() can
turn COPY bundle into a KILL bundle.

Process KILLs as well.

Differential Revision: https://reviews.llvm.org/D85484
2020-08-10 11:58:37 -07:00
Matt Arsenault 6fe6b29c29 AMDGPU: Fix assertion in performSHLPtrCombine for 64-bit pointers 2020-08-10 13:46:52 -04:00
Matt Arsenault 68fab44acf AMDGPU: Fix visiting physreg dest users when folding immediate copies
This can fold the immediate into the physical destination, but this
should not look for further users of the register. Fixes regression
introduced by 766cb615a3.
2020-08-10 13:46:51 -04:00
Alexandre Ganea a3036b3863 Re-Re-land: [CodeView] Add full repro to LF_BUILDINFO record
This patch adds the missing information to the LF_BUILDINFO record, which allows for rebuilding a .CPP without any external dependency but the .OBJ itself (other than the compiler).

Some external tools that we are using (Recode, Live++) are extracting the information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO stores a full path to the compiler, the PWD (CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variables). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.

For more information see PR36198 and D43002.

Differential Revision: https://reviews.llvm.org/D80833
2020-08-10 13:36:30 -04:00
Craig Topper 96dfc783b2 [BreakFalseDeps][X86] Move operand loop out of X86's getUndefRegClearance and put in the pass.
X86 is the only user of this interface in tree. Previously the
X86 pass would loop over operands looking for one undef operand for
the pass to fix. But there could theoretically be multiple operands
to fix. So it makes more sense for the pass to do the looping and
ask the target if an operand needs to be fixed.
2020-08-10 10:32:29 -07:00
Wouter van Oortmerssen 582fd474dd [WebAssembly] wasm64: fix memory.init operand types
I had assumed they would all become in i64, but this is not necessary as long as data segments stay 32-bit, see:
https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md

Differential Revision: https://reviews.llvm.org/D85552
2020-08-10 10:15:20 -07:00
Mircea Trofin 211117b660 [NFC][MLInliner] remove curly braces for a few sinle-line loops 2020-08-10 09:32:21 -07:00
Mircea Trofin d5c81be3ca [NFC][MLInliner] Set up the logger outside the development mode advisor
This allows us to subsequently configure the logger for the case when we
use a model evaluator and want to log additional outputs.

Differential Revision: https://reviews.llvm.org/D85577
2020-08-10 09:22:17 -07:00
Fangrui Song 3b21a07fd7 [PGO] Delete dead comdat renaming code related to GlobalAlias. NFC
A GlobalAlias is an address-taken user of its aliased function.
canRenameComdatFunc has excluded such cases.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D85597
2020-08-10 09:02:04 -07:00
Simon Pilgrim 9a368d2b00 [X86][SSE] shuffle(hop,hop) - canonicalize unary hop(x,x) shuffle masks
If a shuffle is referring to both the lower and upper half lanes of an unary horizontal op, then canonicalize the mask to only refer to the lower half.
2020-08-10 16:09:27 +01:00
jasonliu 7866442b3f [XCOFF] Adjust .rename emission sequence
Summary:
AIX assembler does not generate correct relocation when .rename
appear between tc entry label and .tc directive.
So only emit .rename after .tc/.comm or other linkage is emitted.

Reviewed By: daltenty, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D85317
2020-08-10 14:48:24 +00:00
Xiangling Liao 6ef801aa6b [AIX] Static init frontend recovery and backend support
On the frontend side, this patch recovers AIX static init implementation to
use the linkage type and function names Clang chooses for sinit related function.

On the backend side, this patch sets correct linkage and function names on aliases
created for sinit/sterm functions.

Differential Revision: https://reviews.llvm.org/D84534
2020-08-10 10:10:49 -04:00
Simon Pilgrim 07e673a02b [X86][SSE] Pull out shuffle(hop,hop) combine into combineShuffleWithHorizOp helper. NFC. 2020-08-10 15:08:57 +01:00
Stefan Pintilie 81883ca074 [PowerPC] Add option to control PCRel GOT indirect linker optimization
Add a hidden option to the compiler to control a the PC Relative GOT indirect
linker optimization.

If this option is set to false the compiler will no loger produce the
relocations required by the linker to perform the optimization.

Reviewed By: nemanjai, NeHuang, #powerpc

Differential Revision: https://reviews.llvm.org/D85377
2020-08-10 09:07:17 -05:00
Sam Parker 4f9f4b21e0 [ARM] Unrestrict Armv8-a IT when at minsize
IT blocks with more than one instruction were performance deprecated in Armv8
but that doesn't mean we should follow that advise when optimising for size.

Differential Revision: https://reviews.llvm.org/D85638
2020-08-10 14:59:53 +01:00
James Henderson ca05601cd2 [DebugInfo] Don't error for zero-length arange entries
Although the DWARF specification states that .debug_aranges entries
can't have length zero, these can occur in the wild. There's no
particular reason to enforce this part of the spec, since functionally
they have no impact. The patch removes the error and introduces a new
warning for premature terminator entries which does not stop parsing.

This is a relanding of cb3a598c87, adding the missing obj2yaml part
that was needed.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46805. See also
https://reviews.llvm.org/D71932 which originally introduced the error.

Reviewed by: ikudrin, dblaikie, Higuoxing

Differential Revision: https://reviews.llvm.org/D85313
2020-08-10 14:57:52 +01:00
Simon Pilgrim e6dc2c8ce7 [X86][SSE] combineTargetShuffle - rearrange shuffle(hop,hop) matching to delay shuffle mask manipulation. NFC.
Check that we're shuffling hadd/pack ops first before altering shuffle masks.

First step towards adding extra functionality, plus it avoids costly shuffle mask manipulation if not necessary.
2020-08-10 14:13:19 +01:00
Matt Arsenault 40188f807d AMDGPU/GlobalISel: Don't try to handle undef source operand
This is now illegal MIR
2020-08-10 08:49:43 -04:00
Matt Arsenault f9c279b057 PeepholeOptimizer: Use Register 2020-08-10 08:49:36 -04:00
Matt Arsenault 0bbf4bb8db GlobalISel: Remove redundant check for empty blocks 2020-08-10 08:46:30 -04:00
Matt Arsenault a0ec81f70d AMDGPU/GlobalISel: Merge load/store select cases 2020-08-10 08:46:26 -04:00
Matt Arsenault c8b17874e5 AMDGPU/GlobalISel: Fix typo 2020-08-10 08:41:17 -04:00
Matt Arsenault 9533f0ea68 AMDGPU/GlobalISel: Use nicer form of buildInstr 2020-08-10 08:41:07 -04:00
Nico Weber bc5d68dd8a Revert "[DebugInfo] Don't error for zero-length arange entries"
This reverts commit cb3a598c87.
Breaks build of check-llvm dep obj2yaml everywhere.
2020-08-10 08:20:35 -04:00
Sanjay Patel bebca662d4 [InstCombine] rearrange code for readability; NFC
The code comment refers to the path where we change the
size of the integer type, so handle that first, otherwise
deal with the general case.
2020-08-10 08:07:29 -04:00
James Henderson cb3a598c87 [DebugInfo] Don't error for zero-length arange entries
Although the DWARF specification states that .debug_aranges entries
can't have length zero, these can occur in the wild. There's no
particular reason to enforce this part of the spec, since functionally
they have no impact. The patch removes the error and introduces a new
warning for premature terminator entries which does not stop parsing.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46805. See also
https://reviews.llvm.org/D71932 which originally introduced the error.

Reviewed by: ikudrin, dblaikie

Differential Revision: https://reviews.llvm.org/D85313
2020-08-10 12:48:31 +01:00
Florian Hahn 8393b9fd1f [LoopInterchange] Move instructions from preheader to outer loop header.
Instructions defined in the original inner loop preheader may depend on
values defined in the outer loop header, but the inner loop header will
become the entry block in the loop nest. Move the instructions from the
preheader to the outer loop header, so we do not break dominance. We
also have to check for unsafe instructions in the preheader. If there
are no unsafe instructions, all instructions should be movable.

Currently we move all instructions except the terminator and rely on
LICM to hoist out invariant instructions later.

Fixes PR45743
2020-08-10 12:41:33 +01:00
Florian Hahn 54cb552b96 [LoopInterchange] Form LCSSA phis for values in orig outer loop header.
Values defined in the outer loop header could be used in the inner loop
latch. In that case, we need to create LCSSA phis for them, because after
interchanging they will be defined in the new inner loop and used in the
new outer loop.
2020-08-10 11:33:19 +01:00
Qiu Chaofan dbcfbffc7a [PowerPC] Add intrinsic to read or set FPSCR register
This patch introduces two intrinsics: llvm.ppc.setflm and
llvm.ppc.readflm. They read from or write to FPSCR register
(floating-point status & control) which contains rounding mode and
exception status.

To ensure correctness of program, we need to prevent FP operations from
being moved across these intrinsics (mffs/mtfsf instruction), so here I
set them as scheduling boundaries. We can relax such restriction if
FPSCR is modeled well in the future.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D84914
2020-08-10 18:27:45 +08:00
Simon Pilgrim c0c3b9a25f [ScalarizeMaskedMemIntrin] Scalarize constant mask expandload as shuffle(build_vector,pass_through)
As noticed on D66004, scalarization of an expandload with a constant mask as a chain of irregular loads+inserts makes it tricky to optimize before lowering, resulting in difficulties in merging loads etc.

This patch instead scalarizes the expansion to a build_vector(load0, load1, undef, load2,....) style pattern and then performs a blend shuffle with the pass through vector. This allows us to more easily make use of all the build_vector combines, merging of consecutive loads etc.

Differential Revision: https://reviews.llvm.org/D85416
2020-08-10 11:05:57 +01:00
Igor Kudrin d400606f8c [DebugInfo] Fix initialization of DwarfCompileUnit::LabelBegin.
This also fixes the condition in the assertion in
DwarfCompileUnit::getLabelBegin() because it checked something unrelated
to the returned value.

Differential Revision: https://reviews.llvm.org/D85437
2020-08-10 15:57:21 +07:00
Petar Avramovic 0d58d9e8fb AMDGPU/GlobalISel: Lower G_FREM
Add custom lower for G_FREM.

Differential Revision: https://reviews.llvm.org/D84324
2020-08-10 10:10:46 +02:00
Vitaly Buka 1970eefb17 [NFC][StackSafety] Add a couple of early returns 2020-08-09 23:42:09 -07:00
Vitaly Buka 8d91ce8f58 [NFC][StackSafety] Count dataflow inputs 2020-08-09 23:32:41 -07:00
Vitaly Buka dee812a297 [StackSafety] Fix union which produces wrapped sets 2020-08-09 23:20:17 -07:00
Vitaly Buka a6feeb1c6b [NFC][StackSafety] Avoid assert in getBaseObjec 2020-08-09 23:20:17 -07:00
Juneyoung Lee ef018cb65c [BuildLibCalls] Add noundef to standard I/O functions
This patch adds noundef to return value and arguments of standard I/O functions.
With this patch, passing undef or poison to the functions becomes undefined
behavior in LLVM IR. Since undef/poison is lowered from operations having UB in C/C++,
passing undef to them was already UB in source.

With this patch, the functions cannot return undef or poison anymore as well.
According to C17 standard, ungetc/ungetwc/fgetpos/ftell can generate unspecified
value; 3.19.3 says unspecified value is a valid value of the relevant type,
and using unspecified value is unspecified behavior, which is not UB, so it
cannot be undef (using undef is UB when e.g. it is used at branch condition).

— The value of the file position indicator after a successful call to the ungetc function for a text stream, or the ungetwc function for any stream, until all pushed-back characters are read or discarded (7.21.7.10, 7.29.3.10).
— The details of the value stored by the fgetpos function (7.21.9.1).
— The details of the value returned by the ftell function for a text stream (7.21.9.4).

In the long run, most of the functions listed in BuildLibCalls should have noundefs; to remove redundant diffs which will anyway disappear in the future, I added noundef to a few more non-I/O functions as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85345
2020-08-10 10:58:25 +09:00
Vitaly Buka 3a34228bff [StackSafety] Don't keep FullSet in index
Optimization. Missing record is enterpreted as FullSet anyway.
2020-08-09 15:01:46 -07:00
Vitaly Buka 654266bea9 [StackSafety] Use getSignedMin() to serialize ranges
Almost NFC as it's important only for full sets which should not
be serialized at all.
2020-08-09 14:53:13 -07:00
Piotr Sobczak 62d8b8a225 Fix 64-bit copy to SCC
Fix 64-bit copy to SCC by restricting the pattern resulting
in such a copy to subtargets supporting 64-bit scalar compare,
and mapping the copy to S_CMP_LG_U64.

Before introducing the S_CSELECT pattern with explicit SCC
(0045786f14), there was no need
for handling 64-bit copy to SCC ($scc = COPY sreg_64).

The proposed handling to read only the low bits was however
based on a false premise that it is only one bit that matters,
while in fact the copy source might be a vector of booleans and
all bits need to be considered.

The practical problem of mapping the 64-bit copy to SCC is that
the natural instruction to use (S_CMP_LG_U64) is not available
on old hardware. Fix it by restricting the problematic pattern
to subtargets supporting the instruction (hasScalarCompareEq64).

Differential Revision: https://reviews.llvm.org/D85207
2020-08-09 20:50:30 +02:00
Florian Hahn d236e1c7b6 [InstSimplify/NewGVN] Add option to control the use of undef.
Making use of undef is not safe if the simplification result is not used
to replace all uses of the result. This leads to problems in NewGVN,
which does not replace all uses in the IR directly. See PR33165 for more
details.

This patch adds an option to SimplifyQuery to disable the use of undef.

Note that I've only guarded uses if isa<UndefValue>/m_Undef where
SimplifyQuery is currently available. If we agree on the general
direction, I'll update the remaining uses.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84792
2020-08-09 19:16:56 +01:00
Florian Hahn 23817cbd0b [SCEVExpander] Make sure cast properly dominates Builder's IP.
The selected cast must properly dominate the Builder's IP, so we cannot
re-use the cast, if it matches the builder's IP.
2020-08-09 16:51:19 +01:00
Aditya Kumar 53ac144848 [HotColdSplit] Add options for splitting cold functions in separate section
Add support for (if enabled) splitting cold functions into a separate section
in order to further boost locality of hot code.

Authored by: rjf (Ruijie Fang)
Reviewed by: hiraditya,rcorcs,vsk

Differential Revision: https://reviews.llvm.org/D85331
2020-08-09 08:48:12 -07:00
Sanjay Patel 43bdac2906 [VectorCombine] try to create vector loads from scalar loads
This patch was adjusted to match the most basic pattern that starts with an insertelement
(so there's no extract created here). Hopefully, that removes any concern about
interfering with other passes. Ie, the transform should almost always be profitable.

We could make an argument that this could be part of canonicalization, but we
conservatively try not to create vector ops from scalar ops in passes like instcombine.

If the transform is not profitable, the backend should be able to re-scalarize the load.

Differential Revision: https://reviews.llvm.org/D81766
2020-08-09 09:05:06 -04:00
Florian Hahn c70f0b9d4a [SCEVExpander] Avoid re-using existing casts if it means updating users.
Currently the SCEVExpander tries to re-use existing casts, even if they
are not exactly at the insertion point it was asked to create the cast.
To do so in some case, it creates a new cast at the insertion point and
updates all users to use the new cast.

This behavior is problematic, because it changes the IR outside of the
instructions created during the expansion. Therefore we cannot
completely undo all changes made during expansion.

This re-use should be only an extra optimization, so only using the new
cast in the expanded instructions should not be a correctness issue.
There are many cases equivalent instructions are created during
expansion.

This patch also adjusts findInsertPointAfter to skip instructions
inserted during expansion. This enables re-using existing casts without
the renaming any uses, by picking a better insertion point.

Reviewed By: efriedma, lebedev.ri

Differential Revision: https://reviews.llvm.org/D84399
2020-08-09 13:25:17 +01:00
David Green 186a7f81e8 [ARM] Add VADDV and VMLAV patterns for v16i16
This adds patterns for v16i16's vecreduce, using all the existing code
to go via an i32 VADDV/VMLAV and truncating the result.

Differential Revision: https://reviews.llvm.org/D85452
2020-08-09 11:09:49 +01:00
David Green 8590e5abad [ARM] Allow vecreduce_add in tail predicated loops
This allows vecreduce_add in loops so that we can tailpredicate them.

Differential Revision: https://reviews.llvm.org/D85454
2020-08-09 10:57:17 +01:00
David Green 296faa91ed [ARM] Some formatting and predicate VRHADD patterns. NFC
This formats some of the MVE patterns, and adds a missing
Predicates = [HasMVEInt] to some VRHADD patterns I noticed
as going through. Although I don't believe NEON would ever
use the patterns (as it would use ADDL and VSHRN instead)
they should ideally be predicated on having MVE instructions.
2020-08-09 10:07:52 +01:00
Craig Topper bc8be30540 [X86][GlobalISel] Remove unneeded code for handling zext i8->16, i8->i64, i16->i64, i32->i64.
These all seem to be handled by tablegen pattern imports.
2020-08-09 00:26:15 -07:00
Craig Topper fdfdee98ac [DAGCombiner] Teach SimplifySetCC SETUGE X, SINTMIN -> SETLT X, 0 and SETULE X, SINTMAX -> SETGT X, -1.
These aren't the canonical forms we'd get from InstCombine, but
we do have X86 tests for them. Recognizing them is pretty cheap.

While there make use of APInt:isSignedMinValue/isSignedMaxValue
instead of creating a new APInt to compare with. Also use
SelectionDAG::getAllOnesConstant helper to hide the all ones
APInt creation.
2020-08-08 22:27:16 -07:00
Petr Hosek a4d78d23c5 Revert "[CMake] Simplify CMake handling for zlib"
This reverts commit ccbc1485b5 which
is still failing on the Windows MLIR bots.
2020-08-08 17:08:23 -07:00
Petr Hosek ccbc1485b5 [CMake] Simplify CMake handling for zlib
Rather than handling zlib handling manually, use find_package from CMake
to find zlib properly. Use this to normalize the LLVM_ENABLE_ZLIB,
HAVE_ZLIB, HAVE_ZLIB_H. Furthermore, require zlib if LLVM_ENABLE_ZLIB is
set to YES, which requires the distributor to explicitly select whether
zlib is enabled or not. This simplifies the CMake handling and usage in
the rest of the tooling.

This is a reland of abb0075 with all followup changes and fixes that
should address issues that were reported in PR44780.

Differential Revision: https://reviews.llvm.org/D79219
2020-08-08 16:44:08 -07:00
Thomas Lively cc612c2908 [WebAssembly] Fix FastISel address calculation bug
Fixes PR47040, in which an assertion was improperly triggered during
FastISel's address computation. The issue was that an `Address` set to
be relative to the FrameIndex with offset zero was incorrectly
considered to have an unset base. When the left hand side of an add
set the Address to be 0 off the FrameIndex, the right side would not
detect that the Address base had already been set and could try to set
the Address to be relative to a register instead, triggering an
assertion.

This patch fixes the issue by explicitly tracking whether an `Address`
has been set rather than interpreting an offset of zero to mean the
`Address` has not been set.

Differential Revision: https://reviews.llvm.org/D85581
2020-08-08 15:23:11 -07:00
Craig Topper d3153b5ca2 [X86] Remove a DCI.isBeforeLegalize() call from combineVSelectWithAllOnesOrZeros.
This was blocking isTypeLegal call so that we could do a particular
transform on illegal types before type legalization. But the we
create a target specific node using that type. We shouldn't do
that if the type isn't legal. So I think we should just always
make sure the type is legal.

I suspect that in order to get the condition VT to not be a vector
of i1 we already completed type legalization anyway so this probably
doesn't matter much in practice.
2020-08-08 14:19:13 -07:00
Craig Topper 966a58e329 [X86] Support matching VPTERNLOG when the root node is X86ISD::ANDNP. 2020-08-08 13:11:47 -07:00
Dávid Bolvanský c814eca3e4 [AArch64RegisterInfo] Supress new warning 2020-08-08 21:47:01 +02:00
Craig Topper 815a9b256b [X86] Remove isSafeToClobberEFLAGS helper and just inline it into the call sites.
This is just a thin wrapper around computeRegisterLivness which
we can just call directly. The only real difference is that
isSafeToClobberEFLAGS returns a bool and computeRegisterLivness
returns an enum. So we need to check for the specific enum value
that isSafeToClobberEFLAGS was hiding.

I've also adjusted which sites pass an explicit value for
Neighborhood since the default for computeRegisterLivness is 10.
2020-08-08 12:31:58 -07:00
Craig Topper 8d3ae64b04 Recommit "[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places"
I messed up the bug numbers in the commit message before

Previously this function searched 4 instructions forwards or
backwards to determine if it was ok to clobber eflags.

This is called in 3 places: rematerialization, turning 2 operand
leas into adds or splitting 3 ops leas into an lea and add on some
CPU targets.

This patch increases the search limit to 10 instructions for
rematerialization and 2 operand lea to add. I've left the old
treshold for 3 ops lea spliting as that increases code size.

Fixes PR47024 and PR46315.
2020-08-08 11:53:14 -07:00
Craig Topper 761f568420 Revert "[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places"
This reverts commit 44b260cb0a.

I messed up the bug number in the commit message so I'm reverting
to fix it.
2020-08-08 11:53:14 -07:00
Simon Pilgrim cc15380f10 [X86][SSE] combineTargetShuffle - use scaleShuffleMask helper to widen shuffle mask. NFCI.
Use scaleShuffleMask helper for the shuffle(hadd,hadd) canonicalization.
2020-08-08 19:36:18 +01:00
Craig Topper 44b260cb0a [X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places
Previously this function searched 4 instructions forwards or
backwards to determine if it was ok to clobber eflags.

This is called in 3 places: rematerialization, turning 2 operand
leas into adds or splitting 3 ops leas into an lea and add on some
CPU targets.

This patch increases the search limit to 10 instructions for
rematerialization and 2 operand lea to add. I've left the old
treshold for 3 ops lea spliting as that increases code size.

Fixes PR47024 and PR43014
2020-08-08 11:29:41 -07:00
Simon Pilgrim f13e92d4b2 [InstCombine] Use CreateVectorSplat(ElementCount) variant directly
This was introduced at rGe20223672100, and the CreateVectorSplat(unsigned NumElements) variant calls it internally
2020-08-08 19:26:02 +01:00
Roman Lebedev e492f0e03b
[SimplifyCFG] Fix invoke->call fold w/ multiple invokes in presence of lifetime intrinsics
SimplifyCFG has two main folds for resumes - one when resume is directly
using the landingpad, and the other one where resume is using a PHI node.

While for the first case, we were already correctly ignoring all the
PHI nodes, and both the debug info intrinsics and lifetime intrinsics,
in the PHI-based-one, we weren't ignoring PHI's in the resume block,
and weren't ignoring lifetime intrinsics. That is clearly a bug.

On RawSpeed library, this results in +9.34% (+81) more invoke->call folds,
-0.19% (-39) landing pads, -0.24% (-81) invoke instructions
but +51 call instructions and -132 basic blocks.

Though, the run-time performance impact appears to be within the noise.
2020-08-08 20:00:28 +03:00
Roman Lebedev 1f452ac1d7
[NFC][SimplifyCFG] Rewrite isCleanupBlockEmpty() to be iterator_range-based 2020-08-08 20:00:28 +03:00
Roman Lebedev a587bf3eb0
[NFC][SimplifyCFG] Count the number of invokes turned into calls due to empty cleanup blocks 2020-08-08 20:00:27 +03:00
Sanjay Patel f22ac1d15b [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division, part 2
Follow-up to D82716 / rGea71ba11ab11
We do not have the fabs removal fold in IR yet for the case
where the sqrt operand is repeated, so that's another potential
improvement.
2020-08-08 10:38:06 -04:00
Benjamin Kramer 38537307e5 lib/CodeGen doesn't depend on lib/Passes. 2020-08-08 13:40:24 +02:00
Juneyoung Lee b6d9add71b [InstCombine] Optimize select(freeze(icmp eq/ne x, y), x, y)
This patch adds an optimization that folds select(freeze(icmp eq/ne x, y), x, y)
to x or y.
This was needed to resolve slowdown after D84940 is applied.

I tried to bake this logic into foldSelectInstWithICmp, but it wasn't clear.
This patch conservatively writes the pattern in a separate function,
foldSelectWithFrozenICmp.

The output does not need freeze; https://alive2.llvm.org/ce/z/X49hNE (from @nikic)

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D85533
2020-08-08 15:22:29 +09:00
Craig Topper 514b00c439 [X86] Limit the scope of the min/max canonicalization in combineSelect
Previously the transform was doing these two canonicalizations
(x > y) ? x : y -> (x >= y) ? x : y
(x < y) ? x : y -> (x <= y) ? x : y

But those don't seem to be useful generally. And they actively
pessimize the cases in PR47049.

This patch limits it to
(x > 0) ? x : 0 -> (x >= 0) ? x : 0
(x < -1) ? x : -1 -> (x <= -1) ? x : -1

These are the cases mentioned in the comments as the motivation
for the canonicalization. These allow the CMOV to use the S
flag from the compare thus improving opportunities to use a TEST
or the flags from an arithmetic instruction.
2020-08-07 22:51:49 -07:00
Keno Fischer c58674df14 [X86] Don't produce bad x86andp nodes for i1 vectors
In D85499, I attempted to fix this same issue by canonicalizing
andnp for i1 vectors, but since there was some opposition to such
a change, this commit just fixes the bug by using two different
forms depending on which kind of vector type is in use. We can
then always decide to switch the canonical forms later.

Description of the original bug:
We have a DAG combine that tries to fold (vselect cond, 0000..., X) -> (andnp cond, x).
However, it does so by attempting to create an i64 vector with the number
of elements obtained by truncating division by 64 from the bitwidth. This is
bad for mask vectors like v8i1, since that division is just zero. Besides,
we don't want i64 vectors anyway. For i1 vectors, switch the pattern
to (andnp (not cond), x), which is the canonical form for `kandn`
on mask registers.

Fixes https://github.com/JuliaLang/julia/issues/36955.

Differential Revision: https://reviews.llvm.org/D85553
2020-08-07 20:05:47 -04:00
Yuanfang Chen f5b5ccf2a6 Reland "Revert "[NewPM][CodeGen] Introduce machine pass and machine pass manager""
This relands commit 320eab2d55.

The test failed because it was looking for x86-linux target
unconditionally. Now it gets the default target.
2020-08-07 16:40:49 -07:00
Matt Arsenault 3c0597a9e4 AMDGPU: Avoid explicitly listing all the memory nodes 2020-08-07 19:22:46 -04:00
Vitaly Buka 648228bcc3 [NFC][StackSafety] Fix statistics 2020-08-07 16:18:52 -07:00
Arthur Eubanks 7abef41674 [NewPM] Print 'Skipping pass' as pass instrumentation
If OptNoneInstrumentation prints it instead, 'Skipping pass' will print for even required passes.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D85493
2020-08-07 15:02:02 -07:00
Mircea Trofin 64372d93bc [NFC][MLInliner] Refactor logging implementation
This prepares it for logging externally-specified outputs.

Differential Revision: https://reviews.llvm.org/D85451
2020-08-07 14:56:56 -07:00
Vitaly Buka 7547508b7a Revert "[StackSafety] Skip ambiguous lifetime analysis"
This reverts commit 0b2616a804.

Crashes with safe-stack.
2020-08-07 14:02:50 -07:00
Vitaly Buka 7d4996033b [StackSafety,NFC] Add Stats counters 2020-08-07 14:02:50 -07:00
Gui Andrade 17ff170e3a Revert "[MSAN] Instrument libatomic load/store calls"
Problems with instrumenting atomic_load when the call has no successor,
blocking compiler roll

This reverts commit 33d239513c.
2020-08-07 19:45:51 +00:00
Yuanfang Chen 320eab2d55 Revert "[NewPM][CodeGen] Introduce machine pass and machine pass manager"
This reverts commit 911565d108.

Broke some non-Linux bots.
2020-08-07 11:59:58 -07:00
Jianzhou Zhao aedaa077f5 Reduce dropTriviallyDeadConstantArrays cumulative time percentage from 17% to 4%
The history of dropTriviallyDeadConstantArrays is like this. Because the appending linkage uses too much memory (http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150105/251381.html), dropTriviallyDeadConstantArrays was introduced (https://reviews.llvm.org/rG81f385b0c6ea37dd7195a65be162c75bbdef29d2) to release unused constant arrays. Recently, dropTriviallyDeadConstantArrays was improved (https://reviews.llvm.org/rG81f385b0c6ea37dd7195a65be162c75bbdef29d2) to reduce its quadratic cost.

Our recent LTO profiling shows that when a target is large, 15-20% of time cost is from the SetVector::insert called by dropTriviallyDeadConstantArrays.

A large application has hundreds or thousands of modules; each module calls dropTriviallyDeadConstantArrays once for cleaning up tens of thousands of ConstantArrays a module has. In those ConstantArrays, usually around 5 can be deleted; a very very few deleted ConstantArrays reference other ConstantArrays: less than 10 out of millions.

Given this, the cost of SetVector::insert is mainly from the construction of WorkList from ArrayConstants. This motivated the fix that iterates ArrayConstants directly, and uses WorkList only when necessary.

Our evaluation shows that
1) The cumulative time percentage of dropTriviallyDeadConstantArrays is reduced from 15-17% to 4-6%.
2) For targets with LTO time > 20min, the time reduction is about 20%.
3) No observable performance impact for build without using LTO.

{F12506218}
{F12506221}

Reviewed By: mehdi_amini, tejohnson, jdoerfert

Differential Revision: https://reviews.llvm.org/D85379
2020-08-07 11:36:30 -07:00
Arthur Eubanks 1bf4629f11 [PPC] Rename bool-ret-to-int -> ppc-bool-ret-to-int
Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D85391
2020-08-07 11:27:05 -07:00
Arthur Eubanks 2b5502c350 [NFC] Use value initializer for OVERLAPPED
To fix
../llvm/lib/Support/Windows/Path.inc(1265,21): warning: missing field
'InternalHigh' initializer [-Wmissing-field-initializers]
  OVERLAPPED OV = {0};

Differential Revision: https://reviews.llvm.org/D85480
2020-08-07 11:18:33 -07:00
Vang Thao 04bd5b5286 [AMDGPU] Fix not rescheduling without clustering
Regions are sometimes skipped which should be rescheduled without memory op
clustering. RegionIdx is not incremented when iterating over regions that
are flagged to be skipped, causing the index to be incorrect.

Thanks to Vang Thao for discovering this bug!

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D85498
2020-08-07 11:15:58 -07:00
Yuanfang Chen 911565d108 [NewPM][CodeGen] Introduce machine pass and machine pass manager
machine pass could define four methods:
- `PreservedAnalyses run(MachineFunction &, MachineFunctionAnalysisManager &)`
- `Error doInitialization(Module &, MachineFunctionAnalysisManager &)`
- `Error doFinalization(Module &, MachineFunctionAnalysisManager &)`
- `Error run(Module &, MachineFunctionAnalysisManager &)`

machine pass manger:
- MachineFunctionAnalysisManager:
  Basically an AnalysisManager<MachineFunction> augmented with the ability to
  register and query IR analyses
- MachineFunctionPassManager: support only two methods, `addPass` and `run`

Reviewed By: arsenm, asbirlea, aeubanks

Differential Revision: https://reviews.llvm.org/D67687
2020-08-07 11:00:31 -07:00
Yuanfang Chen 954bd9c861 [NewPM] Only verify loop for nonskipped user loop pass
No verification for pass mangers since it is not needed.
No verification for skipped loop pass since the asserted condition is not used.

Add a BeforeNonSkippedPass callback for this. The callback needs more
inputs than its parameters to work so the callback is added on-the-fly.

Reviewed By: aeubanks, asbirlea

Differential Revision: https://reviews.llvm.org/D84977
2020-08-07 11:00:31 -07:00
Mitch Phillips 382df1c674 Revert "Reland D64327 [MC][ELF] Allow STT_SECTION referencing SHF_MERGE on REL targets"
This reverts commit b497665d98.

Spent some time trying to reproduce this locally, reverting in a
desparate attempt to fix the sanitizer buildbot:
 - http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/28828

I don't know exactly why or how this patch breaks the bots, but it seems
pretty concrete that it's the culprit.
2020-08-07 10:56:33 -07:00
Amy Kwan 98eccec3ae [PowerPC] Add Vector Extract/Expand/Count with Mask, Move to VSR Mask Instruction Definitions and MC Tests
This patch adds the instruction definitions and assembly/disassembly tests for
the following set of instructions:

Vector Extract [byte | half | word | doubleword | quad] with mask
Vector Expand [byte | half | word | doubleword | quad] with mask
Move to VSR [byte | byte immediate | half | word | doubleword | quad] with mask
Vector Count Mask Bits [byte | half | word | doubleword]

Differential Revision: https://reviews.llvm.org/D83724
2020-08-07 11:02:08 -05:00
Kamau Bridgeman d8c6d083c9 [PowerPC][PCRelative] Set TLS unsupported with PC relative memops
Introduce a fatal error if any thread local storage code is compiled
using pc relative memory operations as well as a hidden override
option `-enable-ppc-pcrel-tls` so that this support can be incrementally
added if possible.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D85448
2020-08-07 10:56:24 -05:00
Jay Foad ffe1edfc53 [NFC][GVN] Fix "avaliable" typos
Differential Revision: https://reviews.llvm.org/D85520
2020-08-07 14:22:24 +01:00
Bevin Hansson 5de6c56f7e [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics.
Summary:
This patch adds two intrinsics, llvm.sshl.sat and llvm.ushl.sat,
which perform signed and unsigned saturating left shift,
respectively.

These are useful for implementing the Embedded-C fixed point
support in Clang, originally discussed in
http://lists.llvm.org/pipermail/llvm-dev/2018-August/125433.html
and
http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html

Reviewers: leonardchan, craig.topper, bjope, jdoerfert

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83216
2020-08-07 15:09:24 +02:00
Simon Pilgrim 66a163f328 [DAG] GetDemandedBits - remove custom AND handling.
As mentioned on D85463, we should be using SimplifyMultipleUseDemandedBits (which is the default fallback).

The minor regression in illegal-bitfield-loadstore.ll will be addressed properly by D77804.
2020-08-07 12:55:47 +01:00
Simon Pilgrim fcefb53222 Remove unreachable break. NFC 2020-08-07 12:37:49 +01:00
Kazushi (Jam) Marukawa 63bc5d7863 [VE] Change to expand multiply related instructions
Change to expand MULHU/MULHS/UMUL_LOHI/SMUL_LOHI for i32 and i64 since
those instructions are not available on Aurora SX VE.  Some of them
are used in expansion of i128 multiply, so need to modify them to
support i128.  Then, update basic arithmetic regression tests of
i128 and signed/unsigned i32 typed integer values.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D85490
2020-08-07 18:22:25 +09:00
Igor Kudrin 1eade73d8b [DebugInfo] Remove DwarfUnit::getDwarfVersion(). NFC.
This helper method was used only in one place, which can easily use the
direct call.

Differential revision: https://reviews.llvm.org/D85438
2020-08-07 15:55:44 +07:00
Igor Kudrin b6b0ff18a3 [DebugInfo] Clean up DIEUnit. NFC.
This removes members of the DIEUnit class which were used only in unit
tests. Note also that child classes shadowed some of these methods,
namely, getDwarfVersion() was overridden in DwartfUnit and getLength()
was overridden in DwarfCompileUnit.

Differential Revision: https://reviews.llvm.org/D85436
2020-08-07 15:55:44 +07:00
Shinji Okumura c575ba28de [Attributor] AAPotentialValues Interface
This is a split patch of D80991.
This patch introduces AAPotentialValues and its interface only.
For more detail of AAPotentialValues abstract attribute, see the original patch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83283
2020-08-07 17:35:12 +09:00
Christian Kühnel f3cc4df51d Revert "[CMake] Simplify CMake handling for zlib"
This reverts commit 1adc494bce.
This patch broke the Windows compilation on buildbot and pre-merge testing:
http://lab.llvm.org:8011/builders/mlir-windows/builds/5945
https://buildkite.com/llvm-project/llvm-master-build/builds/780
2020-08-07 09:36:49 +02:00
QingShan Zhang 2b2bfdb474 [NFC] Add the stats for load/store cluster
We have the stats for MacroFusion but miss it for load/store cluster.
2020-08-07 07:09:48 +00:00
David Sherwood 0905d9f31e [SVE][CodeGen] Fix bug with store of unpacked FP scalable vectors
Fixed an incorrect pattern in lib/Target/AArch64/AArch64SVEInstrInfo.td
for storing out <vscale x 2 x f32> unpacked scalable vectors. Added
a couple of tests to

  test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll

Differential Revision: https://reviews.llvm.org/D85441
2020-08-07 07:19:09 +01:00
biplmish cce1b0e891 [PowerPC] Implement Vector Extract Low/High Order Builtins in LLVM/Clang
This patch implements the function prototypes vec_extractl and vec_extracth in altivec.h to utilize the vector extract double element instructions introduced in Power10.

Differential Revision: https://reviews.llvm.org/D84622
2020-08-07 01:02:29 -05:00
QingShan Zhang 55de46f3b2 [PowerPC] Support constrained fp operation for setcc
The constrained fp operation fcmp was added by https://reviews.llvm.org/D69281.
This patch is trying to add the support for PowerPC backend.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D81727
2020-08-07 05:16:36 +00:00
QingShan Zhang 3359ea62ed [Scheduling] Create the missing dependency edges for store cluster
If it is load cluster, we don't need to create the dependency edges(SUb->reg) from SUb to SUa
as they both depend on the base register "reg"

     +-------+
+---->  reg  |
|    +---+---+
|        ^
|        |
|        |
|        |
|    +---+---+
|    |  SUa  |  Load 0(reg)
|    +---+---+
|        ^
|        |
|        |
|    +---+---+
+----+  SUb  |  Load 4(reg)
     +-------+

But if it is store cluster, we need to create it as follow shows to avoid the instruction store
depend on scheduled in-between SUb and SUa.

     +-------+
+---->  reg  |
|    +---+---+
|        ^
|        |         Missing       +-------+
|        | +-------------------->+   y   |
|        | |                     +---+---+
|    +---+-+-+                       ^
|    |  SUa  |  Store x 0(reg)       |
|    +---+---+                       |
|        ^                           |
|        |  +------------------------+
|        |  |
|    +---+--++
+----+  SUb  |  Store y 4(reg)
     +-------+

Reviewed By: evandro, arsenm, rampitec, foad, fhahn

Differential Revision: https://reviews.llvm.org/D72031
2020-08-07 04:58:03 +00:00
Vitaly Buka 7fb9de2c6f [StackSafety,NFC] Fix tests in debug 2020-08-06 20:46:39 -07:00
Shinji Okumura f13f2e16f0 [Attributor] Check violation of returned position nonnull and noundef attribute in AAUndefinedBehavior
This patch is a follow up of D84733.
If a function has noundef attribute in returned position, instructions that return undef or poison value cause UB.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85178
2020-08-07 12:02:42 +09:00
Vitaly Buka 58b95c9b2b [StackSafety,NFC] Add debug counters 2020-08-06 19:24:02 -07:00
Vitaly Buka 92dcf12b2f [StackSafety,NFC] Use CHECK-EMPTY in tests 2020-08-06 19:19:51 -07:00
Vitaly Buka faeeed6f52 [LLParser,NFC] Simplify forward GV refs update
Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D85238
2020-08-06 19:18:51 -07:00
Vitaly Buka 0b2616a804 [StackSafety] Skip ambiguous lifetime analysis
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D84630
2020-08-06 19:10:33 -07:00
Vitaly Buka 5c6d9b2bbf [LTO,NFC] Skip generateParamAccessSummary when empty
addGlobalValueSummary can check newly added FunctionSummary
and set HasParamAccess to mark that generateParamAccessSummary
is needed.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D85182
2020-08-06 19:01:19 -07:00
Kazushi (Jam) Marukawa f92e0d9384 [VE] Optimize trunc related instructions
Change to not generate truncate instructions if all use of a truncate
operation don't care about higher bits.  For example, an i32 add
instruction doesn't care about higher 32 bits in 64 bit registers.
Updates regression tests also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D85418
2020-08-07 09:21:05 +09:00
Jessica Paquette c8a282bcf7 [GlobalISel] Fix computing known bits for loads with range metadata
In GlobalISel, if you have a load into a small type with a range, you'll hit
an assert if you try to compute known bits on it starting at a larger type.

e.g.

```
%x:_(s8) = G_LOAD %whatever(p0) :: (load 1 ... !range !n)
...
%y:_(s32) = G_SOMETHING %x
```

When we walk through G_SOMETHING and hit the load, the width of our known bits
is 32. However, the width of the range is going to be 8. This will cause us
to hit an assert.

To fix this, make computeKnownBitsFromRangeMetadata zero extend or truncate
the range type to match the bitwidth of the known bits we're calculating.

Add a testcase in CodeGen/GlobalISel/KnownBitsTest.cpp to reflect that this
works now.

https://reviews.llvm.org/D85375
2020-08-06 16:47:07 -07:00
Matt Arsenault 1ad051dd8c GlobalISel: Implement lower for G_INSERT_VECTOR_ELT 2020-08-06 19:29:17 -04:00
Yonghong Song c50f5dece9 BPF: fix libLLVMBPFCodeGen.so build failure
Buildbot reported a build failure when building shared
library libLLVMBPFCodeGen.so with unknown reference to
"createCFGSimplificationPass".

Commit 87cba43402 ("BPF: add a SimplifyCFG IR pass during
generic Scalar/IPO optimization") added an IR pass SimplifyCFG
by BPF target. The commit called function
createCFGSimplificationPass() defined in "Scalar" library.
Add this library in Target/BPF/LLVMBuild.txt so
shared library build can succeed.
2020-08-06 15:27:15 -07:00
Matt Arsenault 87b2af8140 AMDGPU/GlobalISel: Enable s_{and|or}n2_{b32|b64} patterns 2020-08-06 18:00:38 -04:00
Roman Lebedev be02adfad7
[InstCombine] Fold (x + C1) * (-1<<C2) --> (-C1 - x) * (1<<C2)
Negator knows how to do this, but the one-use reasoning is getting
a bit muddy here, we don't really want to increase instruction count,
so we need to both lie that "IsNegation" and have an one-use check
on the outermost LHS value.
2020-08-06 23:40:16 +03:00
Roman Lebedev 0c1c756a31
[InstCombine] Generalize %x * (-1<<C) --> (-%x) * (1<<C) fold
Multiplication is commutative, and either of operands can be negative,
so if the RHS is a negated power-of-two, we should try to make it
true power-of-two (which will allow us to turn it into a left-shift),
by trying to sink the negation down into LHS op.

But, we shouldn't re-invent the logic for sinking negation,
let's just use Negator for that.

Tests and original patch by: Simon Pilgrim @RKSimon!

Differential Revision: https://reviews.llvm.org/D85446
2020-08-06 23:39:53 +03:00
Roman Lebedev 7ce76b06ec
[InstCombine] Fold sdiv exact X, -1<<C --> -(ashr exact X, C)
While that does increases instruction count,
shift is obviously better than a division.

Name: base
Pre: (1<<C1) >= 0
%o0 = shl i8 1, C1
%r = sdiv exact i8 C0, %o0
  =>
%r = ashr exact i8 C0, C1

Name: neg
%o0 = shl i8 -1, C1
%r = sdiv exact i8 C0, %o0
  =>
%t0 = ashr exact i8 C0, C1
%r = sub i8 0, %t0

Name: reverse
Pre: C1 != 0 && C1 u< 8
%t0 = ashr exact i8 C0, C1
%r = sub i8 0, %t0
  =>
%o0 = shl i8 -1, C1
%r = sdiv exact i8 C0, %o0

https://rise4fun.com/Alive/MRplf
2020-08-06 23:37:16 +03:00
Roman Lebedev 47aec80e4a
[NFC][InstCombine] Negator: add a comment about negating exact arithmentic shift 2020-08-06 23:37:16 +03:00
Roman Lebedev 442cb88f53
[InstCombine] Generalize sdiv exact X, 1<<C --> ashr exact X, C fold to handle non-splat vectors 2020-08-06 23:37:15 +03:00
Craig Topper ffc248f3b8 [LegalTypes] Move VSELECT node creation out of WidenVSELECTAndMask and push to 2 of the 3 callers.
One of the callers only wants the condition, but the vselect can
be simplified by getNode making it hard or impossible to retrieve
the condition.

Instead, return the condition and make the other 2 callers
responsible for creating the vselect node using the condition.
Rename the function to WidenVSELECTMask accordingly.

Differential Revision: https://reviews.llvm.org/D85468
2020-08-06 13:18:16 -07:00
Yonghong Song 87cba43402 BPF: add a SimplifyCFG IR pass during generic Scalar/IPO optimization
The following bpf linux kernel selftest failed with latest
llvm:
  $ ./test_progs -n 7/10
  ...
  The sequence of 8193 jumps is too complex.
  verification time 126272 usec
  stack depth 320
  processed 114799 insns (limit 1000000)
  ...
  libbpf: failed to load object 'pyperf600_nounroll.o'
  test_bpf_verif_scale:FAIL:110
  #7/10 pyperf600_nounroll.o:FAIL
  #7 bpf_verif_scale:FAIL

After some investigation, I found the following llvm patch
  https://reviews.llvm.org/D84108
is responsible. The patch disabled hoisting common instructions
in SimplifyCFG by default. Later on, the code changes and a
SimplifyCFG phase with hoisting on cannot do the work any more.

A test is provided to demonstrate the problem.
The IR before simplifyCFG looks like:
  for.cond:
    %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
    %cmp = icmp ult i32 %i.0, 6
    br i1 %cmp, label %for.body, label %for.cond.cleanup

  for.cond.cleanup:
    %2 = load i8*, i8** %frame_ptr, align 8, !tbaa !2
    %cmp2 = icmp eq i8* %2, null
    %conv = zext i1 %cmp2 to i32
    call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %1) #3
    call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %0) #3
    ret i32 %conv

  for.body:
    %3 = load i8*, i8** %frame_ptr, align 8, !tbaa !2
    %tobool.not = icmp eq i8* %3, null
    br i1 %tobool.not, label %for.inc, label %land.lhs.true

The first two insns of `for.cond.cleanup` and `for.body`, load and
icmp, can be hoisted to `for.cond` block. With Patch D84108, the
optimization is delayed. But unfortunately, later on loop rotation
added addition phi nodes to `for.body` and hoisting cannot
be done any more.

Note such a hoisting is beneficial to bpf programs as
bpf verifier does path sensitive analysis and verification.
The hoisting preverts reloading from stack which will assume
conservative value and increase exploited insns. In this case,
it caused verifier failure.

To fix this problem, I added an IR pass from bpf target
to performance additional simplifycfg with hoisting common inst
enabled.

Differential Revision: https://reviews.llvm.org/D85434
2020-08-06 13:16:00 -07:00
Snehasish Kumar 8d943a928d [NFC] Rename BBSectionsPrepare -> BasicBlockSections.
Rename the BBSectionsPrepare pass as suggested by the review comment in
https://reviews.llvm.org/D85368.

Differential Revision: https://reviews.llvm.org/D85380
2020-08-06 13:12:06 -07:00
Sanjay Patel 250a167c41 [InstSimplify] avoid crashing by trying to rem-by-zero
Bug was noted in the post-commit comments for:
rGe8760bb9a8a3
2020-08-06 16:06:31 -04:00
Anton Afanasyev a7478fab6c [SLP] Fix order of `insertelement`/`insertvalue` seed operands
Summary:
This patch takes the indices operands of `insertelement`/`insertvalue`
into account while generation of seed elements for `findBuildAggregate()`.
This function has kept the original order of `insert`s before.
Also this patch optimizes `findBuildAggregate()` preventing it from
redundant temporary vector allocations and its multiple reversing.

Fixes llvm.org/pr44067

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83779
2020-08-06 22:09:24 +03:00
dfukalov 4ccc38813e [AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation.
Add cases of fused fmul+fadd/fsub with f16 and f64 operands to cost model.
Also added operations with contract attribute.

Fixed line endings in test.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D84995
2020-08-06 21:43:27 +03:00
Matt Arsenault e00201539f GlobalISel: Implement fewerElementsVector for G_EXTRACT_VECTOR_ELT
Use the same basic strategy as LegalizeVectorTypes. Try to index into
smaller pieces if there's a constant index, and otherwise fall back to
a stack temporary.
2020-08-06 14:33:16 -04:00
Matt Arsenault 1a0c0944c6 AMDGPU: Define raw/struct variants of buffer atomic fadd
Somehow the new FP atomic buffer intrinsics ended up using the legacy
style for buffer intrinsics.
2020-08-06 13:36:19 -04:00
Matt Arsenault eae9c54148 AArch64/GlobalISel: Fix verifier error after selecting returnaddress
This was caching the wrong register to re-use later.
2020-08-06 13:18:05 -04:00
Mircea Trofin ca7973cf18 [NFC]{MLInliner] Point out the tests' model dependencies 2020-08-06 09:57:26 -07:00
Matt Arsenault 90eb7d5283 AMDGPU: Fix spilling of 96-bit AGPRs 2020-08-06 12:42:07 -04:00
Matt Arsenault 56270d1d42 AMDGPU/GlobalISel: Start trying to handle AGPR bank
Try to use AGPR banks for the various merge/unmerge type
operations. Previously these would introduce copies to VGPR.
2020-08-06 12:39:50 -04:00
Matt Arsenault 34040a4f61 GlobalISel: Define InvalidRegBankID enum value 2020-08-06 12:39:49 -04:00
Mircea Trofin 87fb7aa137 [llvm][MLInliner] Don't log 'mandatory' events
We don't want mandatory events in the training log. We do want to handle
them, to keep the native size accounting accurate, but that's all.

Fixed the code, also expanded the test to capture this.

Differential Revision: https://reviews.llvm.org/D85373
2020-08-06 09:04:15 -07:00
Matt Arsenault 63cdc9a49f AMDGPU/GlobalISel: Handle llvm.amdgcn.ds.{fadd|fmin|fmax}
These intrinsics are missing mangling for both the pointer and data
type.
2020-08-06 11:09:08 -04:00
Matt Arsenault 63c4be53cf AMDGPU/GlobalISel: Try to promote to use packed saturating add/sub
This produces worse results right now for i8 vectors, but that should
be addressed when we actually try to optimize packed vectors.
2020-08-06 11:08:45 -04:00
Matt Arsenault dcf3ffb0a8 AMDGPU/GlobalISel: Move frame index selection to patterns
Doesn't really save any code until global value is handled too.
2020-08-06 10:42:15 -04:00
Matt Arsenault d188a608bd AMDGPU: Fix code duplication between the selectors
Not sure this is the right place for this helper.
2020-08-06 10:42:15 -04:00
jasonliu e5062a6caf [XCOFF][AIX] Put each jump table in an independent section if -ffunction-sections is specified
If a function is in a unique section, putting all jump tables in
 .rodata will prevent functions that have a jump table to get
garbage collect by the linker.
Therefore, we need to put jump table into a unique section as well.

Reviewed By: Xiangling_L

Differential Revision: https://reviews.llvm.org/D84761
2020-08-06 14:31:04 +00:00
Matt Arsenault 5a503521e7 AMDGPU/GlobalISel: Implement expansion for rsq.clamp
Not sure why we handle this removed instruction on newer subtargets
for this one and no others, but maintain compatibility with the DAG.
2020-08-06 10:23:25 -04:00
Matt Arsenault c015cbc68b AMDGPU/GlobalISel: Fix trying to widen <3 x s1> boolean ops 2020-08-06 10:07:22 -04:00
Matt Arsenault 28124a0a63 AMDGPU/GlobalISel: Stop using G_EXTRACT in argument lowering
We really need to put this undef padding stuff into a helper
somewhere, but leave that for when this is moved to generic code.
2020-08-06 09:55:35 -04:00
Matt Arsenault 6c7f640bf7 AMDGPU/GlobalISel: Implement LLT version of allowsMisalignedMemoryAccesses 2020-08-06 09:50:36 -04:00
Matt Arsenault 37894ba661 AMDGPU/GlobalISel: Make s16 phi legal
If we were to have an operation with an s16 def that needs to be
executed in a waterfall loop, not having s16 legal would place an
avoidable burden on RegBankSelect to widen it.
2020-08-06 09:41:14 -04:00
Matt Arsenault 5316256709 AMDGPU/GlobalISel: Fix assert on copy to vcc
This was trying to constrain a physical register. By the verifier's
understanding, it's impossible to have a 1-bit copy to vcc/vcc_lo so
don't try to handle physregs.
2020-08-06 09:41:14 -04:00
Raphael Isemann 1de43bd6df Revert "PDBExtras.h - remove unnecessary raw_ostream forward declaration. NFCI."
This reverts commit 87c5437afd.

The commit includes several headers in the middle of a function, which
breaks pretty much everything.
2020-08-06 15:15:43 +02:00
Xing GUO 40506d5e2f [DWARFYAML][debug_info] Make the 'Values' field optional.
This patch makes the 'Values' field optional. This is useful when we
handcraft the terminating entry of DIEs.

```
debug_info:
  - Version:  4
    ...
    Entries:
      - AbbrCode: 1
        Values:
          - Value: 0x1234
      - AbbrCode: 0 ## Termination
```

Reviewed By: jhenderson, grimar

Differential Revision: https://reviews.llvm.org/D85397
2020-08-06 20:43:52 +08:00
Petar Avramovic d893278bba [GlobalISel][InlineAsm] Fix matching input constraint to physreg
Add given input and mark it as tied.
Doesn't create additional copy compared to
matching input constraint to virtual register.

Differential Revision: https://reviews.llvm.org/D85122
2020-08-06 14:35:51 +02:00
Simon Pilgrim 3d10050e37 BitstreamRemarkParser.h - remove unnecessary includes. NFCI.
Remove unused includes, moving to the lib header or cpp file as necessary.
2020-08-06 13:17:53 +01:00
Simon Pilgrim 807467009d [X86] getX86MaskVec - replace mask limit from NumElts < 8 with NumElts <= 4
As noted on PR46885, the number of mask elements should always be a power of 2, so to fix the static analyzer warning we are better off replacing the condition to <= 4, and I've added a pow2 assertion as well.
2020-08-06 11:46:19 +01:00
Simon Pilgrim 87c5437afd PDBExtras.h - remove unnecessary raw_ostream forward declaration. NFCI.
We already need to include raw_ostream.h, also add missing StringRef.h and cstdint implicit dependencies.

Remove unnecessary includes from PDBExtras.cpp
2020-08-06 11:28:42 +01:00
Paul Walker 0d33a8ef5b [SVE] Lower scalable vector mul operations.
This allows us to remove extra patterns from AArch64SVEInstrInfo.td
because we can reuse those required for fixed length vectors.

Differential Revision: https://reviews.llvm.org/D85328
2020-08-06 11:15:35 +01:00
Paul Walker 3ed59b775d [SVE] Implement lowering for fixed length vector multiplication.
NOTE: Also uses SVE code generation for NEON size vectors, instead
of expanding i64 based vector multiplications.

Differential Revision: https://reviews.llvm.org/D85327
2020-08-06 11:01:39 +01:00
Juneyoung Lee c771087161 [InstCombine] Fold freeze(undef) into a proper constant
This is a simple patch that folds freeze(undef) into a proper constant after inspecting its uses.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84948
2020-08-06 18:40:04 +09:00
David Green 745bf6cf44 [LoopVectorizer] Inloop vector reductions
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).

Differential Revision: https://reviews.llvm.org/D75069
2020-08-06 10:10:50 +01:00
Roman Lebedev a512c89476
[NFC][InstCombine] Refactor '(-NSW x) pred x' fold 2020-08-06 11:50:36 +03:00
Roman Lebedev 141357663e
[InstCombine] (-NSW x) u<= x --> x s<=0 (PR39480)
Name: (-x) u<= x  -->  x s<= 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp ule i8 %neg_x, %x
  =>
%r = icmp sle i8 %x, 0

https://rise4fun.com/Alive/V22

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:36 +03:00
Roman Lebedev 132be1f502
[InstCombine] (-NSW x) u< x --> x s< 0 (PR39480)
Name: (-x) u< x  -->  x s< 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp ult i8 %neg_x, %x
  =>
%r = icmp slt i8 %x, 0

https://rise4fun.com/Alive/zSuf

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:36 +03:00
Roman Lebedev 0e1241a3c9
[InstCombine] (-NSW x) u>= x --> x s>= 0 (PR39480)
Name: (-x) u>= x  -->  x s>= 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp uge i8 %neg_x, %x
  =>
%r = icmp sge i8 %x, 0

https://rise4fun.com/Alive/LLHd

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:35 +03:00
Roman Lebedev 16c642fa39
[InstCombine] (-NSW x) u> x --> x s> 0 (PR39480)
Name: (-x) u> x  -->  x s> 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp ugt i8 %neg_x, %x
  =>
%r = icmp sgt i8 %x, 0

https://rise4fun.com/Alive/Raea

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:35 +03:00
Roman Lebedev 59387c0dd7
[InstCombine] (-NSW x) s<= x --> x s>= 0 (PR39480)
Name: (-x) s<= x  -->  x >= 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp sle i8 %neg_x, %x
  =>
%r = icmp sge i8 %x, 0

https://rise4fun.com/Alive/91k

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:35 +03:00
Roman Lebedev 01a6c4bd26
[InstCombine] (-NSW x) s< x --> x s> 0 (PR39480)
Name: (-x) s< x  -->  x > 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp slt i8 %neg_x, %x
  =>
%r = icmp sgt i8 %x, 0

https://rise4fun.com/Alive/3IXb

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:35 +03:00
Roman Lebedev 3885207651
[InstCombine] (-NSW x) s>= x --> x s<= 0 (PR39480)
Name: (-x) s>= x  -->  x s<= 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp sge i8 %neg_x, %x
  =>
%r = icmp sle i8 %x, 0

https://rise4fun.com/Alive/Hdip

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:34 +03:00
Roman Lebedev 8878b79cfe
[InstCombine] (-NSW x) ==/!= x --> x ==/!= 0 (PR39480)
Name: (-x) == x  -->  x == 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp eq i8 %neg_x, %x
  =>
%r = icmp eq i8 %x, 0

Name: (-x) != x  -->  x != 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp ne i8 %neg_x, %x
  =>
%r = icmp ne i8 %x, 0

https://rise4fun.com/Alive/4slH

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:34 +03:00
Roman Lebedev 5060f5682b
[InstCombine] (-NSW x) s> x --> x s< 0 (PR39480)
Name: (-x) s> x  -->  x s< 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp sgt i8 %neg_x, %x
  =>
%r = icmp slt i8 %x, 0

https://rise4fun.com/Alive/ZslD

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:34 +03:00
Xing GUO 4357986b41 [DWARFYAML][debug_info] Pull out dwarf::FormParams from DWARFYAML::Unit.
Unit.Format, Unit.Version and Unit.AddrSize are replaced with
dwarf::FormParams in D84496 to get rid of unnecessary functions
getOffsetSize() and getRefSize(). However, that change makes it
difficult to make AddrSize optional (Optional<uint8_t>). This change
pulls out dwarf::FormParams from DWARFYAML::Unit and use it as a helper
struct in DWARFYAML::emitDebugInfo().

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D85296
2020-08-06 16:39:00 +08:00
Craig Topper 504a197fe5 [X86] Rename X86::getImpliedFeatures to X86::updateImpliedFeatures and pass clang's StringMap directly to it.
No point in building a vector of StringRefs for clang to apply to the
StringMap. Just pass the StringMap and modify it directly.
2020-08-06 00:20:46 -07:00
Martin Storsjö f5e6fbac24 [AArch64] [Windows] Error out on unsupported symbol locations
These might occur in seemingly generic assembly. Previously when
targeting COFF, they were silently ignored, which certainly won't
give the right result. Instead clearly error out, to make it clear
that the assembly needs to be adjusted for this target.

Also change a preexisting report_fatal_error into a proper error
message, pointing out the offending source instruction. This isn't
strictly an internal error, as it can be triggered by user input.

Differential Revision: https://reviews.llvm.org/D85242
2020-08-06 09:23:46 +03:00
Martin Storsjö 5eedc01a82 [ARM, AArch64] Fix a comment typo. NFC. 2020-08-06 09:23:45 +03:00
Chuanqi Xu 92f1f1e40d [Coroutines] Use to collect lifetime marker of in CoroFrame Differential Revision: https://reviews.llvm.org/D85279 2020-08-06 14:21:55 +08:00
Craig Topper 0215ae9735 [X86] Remove incomplete custom handling of i128 sdivrem/udivrem on Windows.
We need to have special handling of i128 div/rem on Windows due
to a weird calling convention needed for the libcall. There was
also some code that made it look like we do the same for sdivrem/udiv,
but the code didn't account for multiple return values of those
functions so couldn't possibly work. I think this code never
triggers because we don't have libcall names defined for those
functions by default so DAGCombine never creates DIVREM nodes.
2020-08-05 23:01:07 -07:00
Lang Hames ba8683f292 [JITLink][MachO][AArch64] More PAGEOFF12 relocation fixes.
Correctly sign extend the addend, and fix implicit shift operand decoding
(it incorrectly returned 0 for some cases), and check that the initial
encoded immediate is 0.
2020-08-05 21:09:45 -07:00
Matt Arsenault 0ee1eba581 AMDGPU: Remove ATOMIC_PK_FADD
The f32 and v2f16 cases should be handled the same way.
2020-08-05 22:00:52 -04:00
Ruiling Song 5ddc8b49ba [AMDGPU] add buffer_atomic_swap for float
The functionality is used when calling imageAtomicExhange() on float
type imageBuffer in Graphics shaders.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D85187
2020-08-06 09:45:48 +08:00
Juneyoung Lee 9f717d7b94 [JumpThreading] Allow duplicating a basic block into preds when its branch condition is freeze(phi)
This is the last JumpThreading patch for getting the performance numbers shown at
https://reviews.llvm.org/D84940#2184653 .

This patch makes ProcessBlock call ProcessBranchOnPHI when the branch condition
is freeze(phi) as well (originally it calls the function when the condition is
phi only).

Since what ProcessBranchOnPHI does is to duplicate the basic block into
predecessors if profitable, it is still valid when the condition is freeze(phi)
too.

```
    p = phi [a, pred1] [b, pred2]
    p.fr = freeze p
    br p.fr, ...
=>
  pred1:
    p.fr = freeze a
    br p.fr, ...
  pred2:
    p.fr2 = freeze b
    br p.fr2, ...
```

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85029
2020-08-06 09:51:17 +09:00
Petr Hosek 1adc494bce [CMake] Simplify CMake handling for zlib
Rather than handling zlib handling manually, use find_package from CMake
to find zlib properly. Use this to normalize the LLVM_ENABLE_ZLIB,
HAVE_ZLIB, HAVE_ZLIB_H. Furthermore, require zlib if LLVM_ENABLE_ZLIB is
set to YES, which requires the distributor to explicitly select whether
zlib is enabled or not. This simplifies the CMake handling and usage in
the rest of the tooling.

This is a reland of abb0075 with all followup changes and fixes that
should address issues that were reported in PR44780.

Differential Revision: https://reviews.llvm.org/D79219
2020-08-05 16:07:11 -07:00
Craig Topper 08b2d0a963 [X86] Disable copy elision in LowerMemArgument for scalarized vectors when the loc VT is a different size than the original element.
For example a v4f16 argument is scalarized to 4 i32 values. So
the values are spread out instead of being packed tightly like
in the original vector.

Fixes PR47000.
2020-08-05 15:44:54 -07:00
Greg Clayton e1de85f9f4 Add verification for DW_AT_decl_file and DW_AT_call_file.
LTO builds have been creating invalid DWARF and one of the errors was a file index that was out of bounds. "llvm-dwarfdump --verify" will check all file indexes for line tables already, but there are no checks for the validity of file indexes in attributes.

The verification will verify if there is a DW_AT_decl_file/DW_AT_call_file that:
- there is a line table for the compile unit
- the file index is valid
- the encoding is appropriate

Tests are added that test all of the above conditions.

Differential Revision: https://reviews.llvm.org/D84817
2020-08-05 15:30:13 -07:00
Sanjay Patel c66169136f [InstCombine] fold icmp with 'mul nsw/nuw' and constant operands
This also removes a more specific fold that only handled icmp with 0.

https://rise4fun.com/Alive/sdM9

  Name: mul nsw with icmp eq
  Pre: (C1 != 0) && (C2 % C1) == 0
  %a = mul nsw i8 %x, C1
  %r = icmp eq i8 %a, C2
    =>
  %r = icmp eq i8 %x, C2 / C1

  Name: mul nuw with icmp eq
  Pre: (C1 != 0) && (C2 %u C1) == 0
  %a = mul nuw i8 %x, C1
  %r = icmp eq i8 %a, C2
    =>
  %r = icmp eq i8 %x, C2 /u C1

  Name: mul nsw with icmp ne
  Pre: (C1 != 0) && (C2 % C1) == 0
  %a = mul nsw i8 %x, C1
  %r = icmp ne i8 %a, C2
    =>
  %r = icmp ne i8 %x, C2 / C1

  Name: mul nuw with icmp ne
  Pre: (C1 != 0) && (C2 %u C1) == 0
  %a = mul nuw i8 %x, C1
  %r = icmp ne i8 %a, C2
    =>
  %r = icmp ne i8 %x, C2 /u C1
2020-08-05 17:29:32 -04:00
Stanislav Mekhanoshin 0bcda1a261 [AMDGPU] Scavenge temp reg for AGPR spill
Differential Revision: https://reviews.llvm.org/D85234
2020-08-05 13:29:19 -07:00
Rahman Lavaee 20a568c29d [Propeller]: Use a descriptive temporary symbol name for the end of the basic block.
This patch changes the functionality of AsmPrinter to name the basic block end labels as LBB_END${i}_${j}, with ${i} being the identifier for the function and ${j} being the identifier for the basic block. The new naming scheme is consistent with how basic block labels are named (.LBB${i}_{j}), and how function end symbol are named (.Lfunc_end${i}) and helps to write stronger tests for the upcoming patch for BB-Info section (as proposed in https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html). The end label is used with basicblock-labels (BB-Info section in future) and basicblock-sections to compute the size of basic blocks and basic block sections, respectively. For BB sections, the section containing the entry basic block will not have a BB end label since it already gets the function end-label.
This label is cached for every basic block (CachedEndMCSymbol) like the label for the basic block (CachedMCSymbol).

Differential Revision: https://reviews.llvm.org/D83885
2020-08-05 13:17:19 -07:00
Matt Arsenault ec8c172d01 AMDGPU: Correct prolog SP initialization logic
Having callees that will read SP is not the only reason we need to
reference the stack pointer.
2020-08-05 15:47:53 -04:00
Stanislav Mekhanoshin ea7d0e2996 [AMDGPU] gfx1031 target
Differential Revision: https://reviews.llvm.org/D85337
2020-08-05 12:36:26 -07:00
Arthur Eubanks 9e6a1e5781 [NewPM][LoopRotate] Rename rotate -> loop-rotate
To match legacy pass name.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D85338
2020-08-05 12:25:01 -07:00
Matt Arsenault 83eaf5d55d AMDGPU: Eliminate BUFFER_ATOMIC_PK_ADD_F16 node
This is redundant with the other no return buffer atomic node, and we
don't really need a separate type profile for it.
2020-08-05 15:16:51 -04:00
Roman Lebedev f3056dcc02
[InstCombine] Negator: -(cond ? x : -x) --> cond ? -x : x
We were errneously only doing that for old-style abs/nabs,
but we have no such legality check on the condition of the select.

https://rise4fun.com/Alive/xBHS
2020-08-05 21:47:30 +03:00
Matt Arsenault 43c0c9252a AMDGPU: Refactor buffer atomic intrinsic lowering
Move raw/struct buffer atomic lowering to separate functions. This
avoids a long nested switch, and simplifies a future patch.
2020-08-05 14:44:55 -04:00
Matt Arsenault 3e52667433 AMDGPU: Fix verifier error with undef source producing s_bitset*
This needs to preserve the undef flag.
2020-08-05 14:42:20 -04:00
Sanjay Patel e8760bb9a8 [InstSimplify] fold icmp with mul nsw and constant operands
https://rise4fun.com/Alive/slvl

  Name: mul nsw with icmp eq
  Pre: (C2 % C1) != 0
  %a = mul nsw i8 %x, C1
  %r = icmp eq i8 %a, C2
    =>
  %r = false

  Name: mul nsw with icmp ne
  Pre: (C2 % C1) != 0
  %a = mul nsw i8 %x, C1
  %r = icmp ne i8 %a, C2
    =>
  %r = true

Follow-up to the 'nuw' variation added with:
rGf879c9b79621
2020-08-05 14:38:39 -04:00
Sanjay Patel f879c9b796 [InstSimplify] fold icmp with mul nuw and constant operands
https://rise4fun.com/Alive/pZEr

  Name: mul nuw with icmp eq
  Pre: (C2 %u C1) != 0
  %a = mul nuw i8 %x, C1
  %r = icmp eq i8 %a, C2
    =>
  %r = false

  Name: mul nuw with icmp ne
  Pre: (C2 %u C1) != 0
  %a = mul nuw i8 %x, C1
  %r = icmp ne i8 %a, C2
    =>
  %r = true

There are potentially several other transforms we need to add based on:
D51625
...but it doesn't look like there was follow-up to that patch.
2020-08-05 14:32:17 -04:00
Evgenii Stepanov f2c0423995 [msan] Remove readnone and friends from call sites.
MSan removes readnone/readonly and similar attributes from callees,
because after MSan instrumentation those attributes no longer apply.

This change removes the attributes from call sites, as well.

Failing to do this may cause DSE of paramTLS stores before calls to
readonly/readnone functions.

Differential Revision: https://reviews.llvm.org/D85259
2020-08-05 10:34:45 -07:00
Simon Pilgrim b60f998859 [X86][SSE] Fold 128-bit PACK(EXTEND(X),EXTEND(Y)) -> CONCAT(X,Y) subvectors
This is seen in the sub-128-bit vector trunc(ext()) of comparison results

Fixes pr46585.ll regression in D66004
2020-08-05 18:27:40 +01:00
Jordan Rupprecht 3c39db0c44 Revert "[LoopVectorizer] Inloop vector reductions"
This reverts commit e9761688e4. It breaks the build:

```
~/src/llvm-project/llvm/lib/Analysis/IVDescriptors.cpp:868:10: error: no viable conversion from returned value of type 'SmallVector<[...], 8>' to function return type 'SmallVector<[...], 4>'
  return ReductionOperations;
```
2020-08-05 10:24:15 -07:00
Mircea Trofin b18c41c66f [TFUtils] Expose untyped accessor to evaluation result tensors
These were implementation detail, but become necessary for generic data
copying.

Also added const variations to them, and move assignment, since we had a
move ctor (and the move assignment helps in a subsequent patch).

Differential Revision: https://reviews.llvm.org/D85262
2020-08-05 10:22:45 -07:00
David Green e9761688e4 [LoopVectorizer] Inloop vector reductions
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).

Differential Revision: https://reviews.llvm.org/D75069
2020-08-05 18:14:05 +01:00
Roman Lebedev a05ec856a3
[NFC][InstCombine] Negator: include all the needed headers, IWYU 2020-08-05 20:12:36 +03:00
Roman Lebedev 3a3c9519e2
[InstCombine] Negator: 0 - (X + Y) --> (-X) - Y iff a single operand negated
This was the most obvious regression in
f5df5cd5586ae9cfb2d9e53704dfc76f47aff149.f5df5cd5586ae9cfb2d9e53704dfc76f47aff149

We really don't want to do this if the original/outermost subtraction
isn't a negation, and therefore doesn't go away - just sinking negation
isn't a win. We are actually appear to be missing folds so hoist it.

https://rise4fun.com/Alive/tiVe
2020-08-05 20:01:13 +03:00
Lang Hames 47cfffe893 [JITLink][AArch64] Handle addends on PAGE21 / PAGEOFF12 relocations. 2020-08-05 08:50:46 -07:00
Lang Hames d561d1bf96 [JITLink][AArch64] Improve debug output for addend relocations. 2020-08-05 08:50:46 -07:00
Sanjay Patel bd2c88b253 [InstSimplify] reduce code duplication in simplifyICmpWithMinMax(); NFC 2020-08-05 11:39:28 -04:00
Simon Pilgrim 6a06c7a0a7 [X86] isHorizontalBinOp - only update LHS/RHS references on success
We've had issues in the past where isHorizontalBinOp calls would affect later combines as the LHS/RHS references had been commuted but still failed to match.
2020-08-05 15:09:52 +01:00
Simon Pilgrim a57bfb44bc [X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for integer types 2020-08-05 15:09:51 +01:00
Georgii Rymar 6ae5b9e405 [llvm-readobj] - Make decode_relrs() don't return Expected<>. NFCI.
The `decode_relrs` helper is declared as:

`Expected<std::vector<Elf_Rel>> decode_relrs(Elf_Relr_Range relrs) const;`

it never returns an error though and hence can be simplified to return
a vector.

Differential revision: https://reviews.llvm.org/D85302
2020-08-05 17:05:47 +03:00
Denis Antrushin d21ce40821 [Statepoints] Operand folding in presense of tied registers.
Implement proper folding of statepoint meta operands (deopt and GC)
when statepoint uses tied registers.
For deopt operands it is just about properly preserving tiedness
in new instruction.
For tied GC operands folding is a little bit more tricky.
We can fold tied GC operands only from InlineSpiller, because it knows
how to properly reload tied def after it was turned into memory operand.
Other users (e.g. peephole) cannot properly fold such operands as they
do not know how (or when) to reload them from memory.
We do this by un-tieing operand we want to fold in InlineSpiller
and allowing to fold only untied operands in foldPatchpoint.
2020-08-05 20:18:28 +07:00
Roman Lebedev f5df5cd558
Recommit "[InstCombine] Negator: -(X << C) --> X * (-1 << C)"
This reverts commit ac70b37a00
which reverted commit 8aeb2fe13a
because codegen tests got broken and i needed time to investigate.

This shows some regressions in tests, but they are all around GEP's,
so i'm not really sure how important those are.

https://rise4fun.com/Alive/1Gn
2020-08-05 15:59:13 +03:00
Sam Parker f2675ab45f [ARM][CostModel] Implement getCFInstrCost
As with other targets, set the throughput cost of control-flow
instructions to free so that we don't miss out of vectorization
opportunities.

Differential Revision: https://reviews.llvm.org/D85283
2020-08-05 12:44:51 +01:00
Hans Wennborg 3ab01550b6 Revert "[CMake] Simplify CMake handling for zlib"
This quietly disabled use of zlib on Windows even when building with
-DLLVM_ENABLE_ZLIB=FORCE_ON.

> Rather than handling zlib handling manually, use find_package from CMake
> to find zlib properly. Use this to normalize the LLVM_ENABLE_ZLIB,
> HAVE_ZLIB, HAVE_ZLIB_H. Furthermore, require zlib if LLVM_ENABLE_ZLIB is
> set to YES, which requires the distributor to explicitly select whether
> zlib is enabled or not. This simplifies the CMake handling and usage in
> the rest of the tooling.
>
> This is a reland of abb0075 with all followup changes and fixes that
> should address issues that were reported in PR44780.
>
> Differential Revision: https://reviews.llvm.org/D79219

This reverts commit 10b1b4a231 and follow-ups
64d99cc6ab and
f9fec0447e.
2020-08-05 12:31:44 +02:00
Paul Walker 927fc536ca [SVE] Add lowering for fixed length vector and, or & xor operations.
Since there are no ill effects when performing these operations
with undefined elements, they are lowered to the already supported
unpredicated scalable vector equivalents.

Differential Revision: https://reviews.llvm.org/D85117
2020-08-05 11:28:34 +01:00
Simon Pilgrim 4aaf301fb8 [DAG] Fold vector (aext (load x)) -> (zext (truncate (zextload x)))
We currently don't do anything to fold any_extend vector loads as no target has such an instruction.

Instead I've added support for folding to a zextload, SimplifyDemandedBits does a good job of adjusting the zext(truncate(()) stages as required later on.

We still need the custom scalar extload handling instead of using the tryToFoldExtOfLoad helper as it has different legality tests - we can probably tweak that to reduce most of the code duplication.

Fixes the regression I mentioned in rG99a971cadff7

Differential Revision: https://reviews.llvm.org/D85129
2020-08-05 11:22:23 +01:00
Georgii Rymar f97019ad6e [llvm-readobj/elf] - Add a testing for --stackmap and refine the implementation.
Currently, we only test the `--stackmap` option here:
https://github.com/llvm/llvm-project/blob/master/llvm/test/Object/stackmap-dump.test
it uses a precompiled MachO binary currently and I've found no tests for this option for ELF.

The implementation also has issues. For example, it might assert on a wrong version
of the .llvm-stackmaps section. Or it might crash on an empty or truncated section.

This patch introduces a new tools/llvm-readobj/ELF test file as well as implements a few
basic checks to catch simple crashes/issues

It also eliminates `unwrapOrError` calls in `printStackMap()`.

Differential revision: https://reviews.llvm.org/D85208
2020-08-05 13:09:04 +03:00
David Turner ba0e71432a Do not map read-only data memory sections with EXECUTE flags.
The code in SectionMemoryManager.cpp unnecessarily maps
read-only data sections with the READ+EXECUTE flags. This is
undesirable from a security stand-point.

Moreover, on the Fuchsia platform, which is now very strict
about mapping pages with the EXECUTE permission, this simply
fails, because the section's pages were initially allocated
with only the READ+WRITE flags.

A more detailed description of the issue can be found in this
public SwiftShader bug:

  https://issuetracker.google.com/issues/154586551

This patch just restrict the mapping to the READ flag for ROData
sections. Code sections are still mapped with READ+EXECUTE as
expected.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D78574
2020-08-05 10:51:48 +02:00
Sander de Smalen f2916636f8 [AArch64][SVE] Disable tail calls if callee does not preserve SVE regs.
This fixes an issue triggered by the following code, where emitEpilogue
got confused when trying to restore the SVE registers after the call,
whereas the call to bar() is implemented as a TCReturn:

  int non_sve();
  int sve(svint32_t x) { return non_sve(); }

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84869
2020-08-05 09:38:54 +01:00
Jay Foad 8cbf4a17ac [AMDGPU] Propagate fast math flags in frem lowering
Differential Revision: https://reviews.llvm.org/D84518
2020-08-05 09:09:38 +01:00
Jay Foad 04cf4a5a65 [AMDGPU] Lower frem f16
Without this it would fail to select on subtargets that have 16-bit
instructions.

Differential Revision: https://reviews.llvm.org/D84517
2020-08-05 09:08:40 +01:00
Juneyoung Lee e0d99e9aaf [JumpThreading] Consider freeze as a zero-cost instruction
This is a simple patch that makes freeze as a zero-cost instruction, as bitcast already is.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85023
2020-08-05 14:42:36 +09:00
Evgeniy Brevnov 02a629daad [BPI][NFC] Unify handling of normal and SCC based loops
This is one more NFC part extracted from D79485. Normal and SCC based loops have very different representation and have to be handled separatly each time we deal with loops. D79485 is going to introduce much more extensive use of loops what will be problematic with out this change.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D84838
2020-08-05 11:19:24 +07:00
Matt Arsenault 93cebb190a GlobalISel: Use buildAnyExtOrTrunc 2020-08-04 22:04:04 -04:00
Matt Arsenault 1ea182ce79 GlobalISel: Simplify code
This cannot be a vector of pointers, so using getScalarSizeInBits just
added a bit extra noise.
2020-08-04 22:03:59 -04:00
Matt Arsenault 8f65c933c4 GlobalISel: Fix redundant variable and shadowing 2020-08-04 22:03:55 -04:00
Matt Arsenault 54615ec48f GlobalISel: Move load/store lowering to separate functions 2020-08-04 22:03:51 -04:00
Zequan Wu e3df947175 [llvm-cov] reset executation count to 0 after wrapped segment
Fix the bug: https://bugs.llvm.org/show_bug.cgi?id=36979. It also fixes this bug: https://bugs.llvm.org/show_bug.cgi?id=35404, which I think is caused by the same problem.

Differential Revision: https://reviews.llvm.org/D85036
2020-08-04 18:38:44 -07:00
Fangrui Song 0c7af8c83b [X86] Optimize getImpliedDisabledFeatures & getImpliedEnabledFeatures after D83273
Previously the time complexity is O(|number of paths from the root to an
implied feature| * CPU_FWATURE_MAX) where CPU_FEATURE_MAX is 92.

The number of paths can be large (theoretically exponential).

For an inline asm statement, there is a code path
`clang::Parser::ParseAsmStatement -> clang::Sema::ActOnGCCAsmStmt -> ASTContext::getFunctionFeatureMap`
leading to potentially many calls of getImpliedEnabledFeatures (41 for my -march=native case).

We should improve the performance a bit in case the number of inline asm
statements is large (Linux kernel builds).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D85257
2020-08-04 17:50:06 -07:00
Mircea Trofin 90b9c49ca6 [llvm] Expose type and element count-related APIs on TensorSpec
Added a mechanism to check the element type, get the total element
count, and the size of an element.

Differential Revision: https://reviews.llvm.org/D85250
2020-08-04 17:32:16 -07:00
Roman Lebedev ac70b37a00
Revert "[InstCombine] Negator: -(X << C) --> X * (-1 << C)"
Breaks codegen tests, will recommit later.

This reverts commit 8aeb2fe13a.
2020-08-05 03:19:38 +03:00
Roman Lebedev 8aeb2fe13a
[InstCombine] Negator: -(X << C) --> X * (-1 << C)
This shows some regressions in tests, but they are all around GEP's,
so i'm not really sure how important those are.

https://rise4fun.com/Alive/1Gn
2020-08-05 03:13:14 +03:00
Krzysztof Parzyszek 06d425737b [RDF] Add operator<<(raw_ostream&, RegisterAggr), NFC 2020-08-04 18:40:07 -05:00
Krzysztof Parzyszek 9521704553 [RDF] Use hash-based containers, cache extra information
This improves performance.
2020-08-04 18:36:49 -05:00
Yonghong Song 00602ee7ef BPF: simplify IR generation for __builtin_btf_type_id()
This patch simplified IR generation for __builtin_btf_type_id().
For __builtin_btf_type_id(obj, flag), previously IR builtin
looks like
   if (obj is a lvalue)
     llvm.bpf.btf.type.id(obj.ptr, 1, flag)  !type
   else
     llvm.bpf.btf.type.id(obj, 0, flag)  !type
The purpose of the 2nd argument is to differentiate
   __builtin_btf_type_id(obj, flag) where obj is a lvalue
vs.
   __builtin_btf_type_id(obj.ptr, flag)

Note that obj or obj.ptr is never used by the backend
and the `obj` argument is only used to derive the type.
This code sequence is subject to potential llvm CSE when
  - obj is the same .e.g., nullptr
  - flag is the same
  - metadata type is different, e.g., typedef of struct "s"
    and strust "s".
In the above, we don't want CSE since their metadata is different.

This patch change IR builtin to
   llvm.bpf.btf.type.id(seq_num, flag)  !type
and seq_num is always increasing. This will prevent potential
llvm CSE.

Also report an error if the type name is empty for
remote relocation since remote relocation needs non-empty
type name to do relocation against vmlinux.

Differential Revision: https://reviews.llvm.org/D85174
2020-08-04 16:29:42 -07:00
Krzysztof Parzyszek 4b25f67299 [RDF] Really remove remaining uses of PhysicalRegisterInfo::normalize 2020-08-04 18:23:38 -05:00
Krzysztof Parzyszek f0f467aeec [RDF] Cache register aliases in PhysicalRegisterInfo
This improves performance of PhysicalRegisterInfo::makeRegRef.
2020-08-04 18:10:00 -05:00
Krzysztof Parzyszek 47fe1b63f4 [RDF] Lower the sorting complexity in RDFLiveness::getAllReachingDefs
The sorting is needed, because reaching defs are (logically) ordered,
but are not collected in that order. This change will break up the
single call to std::sort into a series of smaller sorts, each of which
should use a cheaper comparison function than the original.
2020-08-04 18:06:37 -05:00
Adrian Prantl bf82ff61a6 Teach SROA to handle allocas with more than one dbg.declare.
It is technically legal for optimizations to create an alloca that is
used by more than one dbg.declare, if one or both of them are inlined
instances of aliasing variables.

Differential Revision: https://reviews.llvm.org/D85172
2020-08-04 15:54:51 -07:00
Arthur Eubanks f50b3ff02e [Hexagon] Use InstSimplify instead of ConstantProp
This is the last remaining use of ConstantProp, migrate it to InstSimplify in the goal of removing ConstantProp.

Add -hexagon-instsimplify option to enable skipping of instsimplify in
tests that can't handle the extra optimization.

Differential Revision: https://reviews.llvm.org/D85047
2020-08-04 15:42:39 -07:00
Eli Friedman 4a47f1c4ce [SelectionDAG][SVE] Support scalable vectors in getConstantFP()
Differential Revision: https://reviews.llvm.org/D85249
2020-08-04 15:32:43 -07:00
Krzysztof Parzyszek 09897b146a [RDF] Remove uses of RDFRegisters::normalize (deprecate)
This function has been reduced to an identity function for some time.
2020-08-04 17:02:12 -05:00
Matt Arsenault 486e84dfa4 AMDGPU/GlobalISel: Use live in helper function for returnaddress 2020-08-04 17:36:01 -04:00
Mircea Trofin 65b6dbf939 [llvm][NFC] Moved implementation of TrainingLogger outside of its decl
Also renamed a method - printTensor - to print; and added comments.
2020-08-04 14:35:35 -07:00
Matt Arsenault 89011fc3c9 AMDGPU/GlobalISel: Select llvm.returnaddress 2020-08-04 17:14:38 -04:00
Matt Arsenault f8fb7835d6 GlobalISel: Add utilty for getting function argument live ins
Get the argument register and ensure there's a copy to the virtual
register. AMDGPU and AArch64 have similarish code to get the livein
value, and I also want to use this in multiple places.

This is a bit more aggressive about setting the register class than
the original function, but that's probably OK.

I think we're missing a few verifier checks for function live ins. I
noticed AArch64's calling convention code is not actually adding
liveins to functions, only the entry block (which apparently might not
matter that much?). There should probably be a verifier check that
entry block live ins are also live into the function. We also might
need a verifier check that the copy to the livein virtual register is
in the entry block.
2020-08-04 16:55:55 -04:00
Eli Friedman 95efea4b93 [AArch64][SVE] Widen narrow sdiv/udiv operations.
The SVE instruction set only supports sdiv/udiv for 32-bit and 64-bit
integers.  If we see an 8-bit or 16-bit divide, widen the operands to 32
bits, and narrow the result.

Differential Revision: https://reviews.llvm.org/D85170
2020-08-04 13:22:15 -07:00
Ilya Leoshkevich 153df1373e [SanitizerCoverage] Fix types of __stop* and __start* symbols
If a section is supposed to hold elements of type T, then the
corresponding CreateSecStartEnd()'s Ty parameter represents T*.
Forwarding it to GlobalVariable constructor causes the resulting
GlobalVariable's type to be T*, and its SSA value type to be T**, which
is one indirection too many. This issue is mostly masked by pointer
casts, however, the global variable still gets an incorrect alignment,
which causes SystemZ to choose wrong instructions to access the
section.
2020-08-04 21:53:27 +02:00
Cameron McInally 0f2b47b6da [FastISel] Don't transform FSUB(-0, X) -> FNEG(X) in FastISel
This corresponds with the SelectionDAGISel change in D84056.

Also, rename some poorly named tests in CodeGen/X86/fast-isel-fneg.ll with NFC.

Differential Revision: https://reviews.llvm.org/D85149
2020-08-04 14:42:53 -05:00
Yonghong Song 6d218b4adb BPF: support type exist/size and enum exist/value relocations
Four new CO-RE relocations are introduced:
  - TYPE_EXISTENCE: whether a typedef/record/enum type exists
  - TYPE_SIZE: the size of a typedef/record/enum type
  - ENUM_VALUE_EXISTENCE: whether an enum value of an enum type exists
  - ENUM_VALUE: the enum value of an enum type

These additional relocations will make CO-RE bpf programs
more adaptive for potential kernel internal data structure
changes.

Differential Revision: https://reviews.llvm.org/D83878
2020-08-04 12:35:39 -07:00
Matt Arsenault 3e16e2152c GlobalISel: Handle llvm.localescape
This one is pretty easy and shrinks the list of unhandled
intrinsics. I'm not sure how relevant the insert point is. Using the
insert position of EntryBuilder will place this after
constants. SelectionDAG seems to end up emitting these after argument
copies and before anything else, but I don't think it really
matters. This also ends up emitting these in the opposite order from
SelectionDAG, but I don't think that matters either.

This also needs a fix to stop the later passes dropping this as a dead
instruction. DeadMachineInstructionElim's version of isDead special
cases LOCAL_ESCAPE for some reason, and I'm not sure why it's excluded
from MachineInstr::isLabel (or why isDead doesn't check it).

I also noticed DeadMachineInstructionElim never considers inline asm
as dead, but GlobalISel will drop asm with no constraints.
2020-08-04 15:19:02 -04:00
Bardia Mahjour 3c0f347002 [NFC][LV] Vectorized Loop Skeleton Refactoring
This patch tries to improve readability and maintenance
of createVectorizedLoopSkeleton by reorganizing some lines,
updating some of the comments and breaking it up into
smaller logical units.

Reviewed By: pjeeva01

Differential Revision: https://reviews.llvm.org/D83824
2020-08-04 14:50:57 -04:00
Xavier Denis 29fe3fe615 [InstSimplify] Peephole optimization for icmp (urem X, Y), X
This revision adds the following peephole optimization
and it's negation:

    %a = urem i64 %x, %y
    %b = icmp ule i64 %a, %x
    ====>
    %b = true

With John Regehr's help this optimization was checked with Alive2
which suggests it should be valid.

This pattern occurs in the bound checks of Rust code, the program

    const N: usize = 3;
    const T = u8;

    pub fn split_mutiple(slice: &[T]) -> (&[T], &[T]) {
        let len = slice.len() / N;
        slice.split_at(len * N)
    }

the method call slice.split_at will check that len * N is within
the bounds of slice, this bounds check is after some transformations
turned into the urem seen above and then LLVM fails to optimize it
any further. Adding this optimization would cause this bounds check
to be fully optimized away.

ref: https://github.com/rust-lang/rust/issues/74938

Differential Revision: https://reviews.llvm.org/D85092
2020-08-04 20:48:37 +02:00
Nikita Popov 4564974504 [SCCP] Propagate inequalities
Teach SCCP to create notconstant lattice values from inequality
assumes and nonnull metadata, and update getConstant() to make
use of them. Additionally isOverdefined() needs to be changed to
consider notconstant an overdefined value.

Handling inequality branches is delayed until our branch on undef
story in other passes has been improved.

Differential Revision: https://reviews.llvm.org/D83643
2020-08-04 20:20:52 +02:00
David Blaikie e31cfc4cd3 Fix -Wconstant-conversion warning with explicit cast
Introduced by fd6584a220

Following similar use of casts in AsmParser.cpp, for instance - ideally
this type would use unsigned chars as they're more representative of raw
data and don't get confused around implementation defined choices of
char's signedness, but this is what it is & the signed/unsigned
conversions are (so far as I understand) safe/bit preserving in this
usage and what's intended, given the API design here.
2020-08-04 10:41:27 -07:00
Xing GUO 12605bfd1f [DWARFYAML] Fix unintialized value Is64BitAddrSize. NFC.
This patch fixes the undefined behavior that reported by ubsan.

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/44524/
2020-08-05 00:28:17 +08:00
Matt Arsenault 0de547ed4a AMDGPU/GlobalISel: Ensure subreg is valid when selecting G_UNMERGE_VALUES
Fixes verifier error with SGPR unmerges with 96-bit result types.
2020-08-04 12:27:34 -04:00
Cameron McInally 23adbac9ee [GlobalISel] Don't transform FSUB(-0, X) -> FNEG(X) in GlobalISel.
This patch stops unconditionally transforming FSUB(-0, X) into an FNEG(X) while building the MIR.

This corresponds with the SelectionDAGISel change in D84056.

Differential Revision: https://reviews.llvm.org/D85139
2020-08-04 11:27:09 -05:00
Sanjay Patel a16882047a [InstSimplify] refactor min/max folds with shared operand; NFC 2020-08-04 12:21:05 -04:00
Fangrui Song 593e196297 [llvm-symbolizer] Switch command line parsing from llvm::cl to OptTable
for the advantage outlined by D83639 ([OptTable] Support grouped short options)

Some behavior changes:

* -i={0,false} is removed. Use --no-inlines instead.
* --demangle={0,false} is removed. Use --no-demangle instead
* -untag-addresses={0,false} is removed. Use --no-untag-addresses instead

Added a higher level API OptTable::parseArgs which handles optional
initial options populated from an environment variable, expands response
files recursively, and parses options.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D83530
2020-08-04 08:53:15 -07:00
Nemanja Ivanovic 14d726acd6 [PowerPC] Don't remove single swap between the load and store
The swap removal pass looks to remove swaps when a loaded value is swapped, some
number of lane-insensitive operations are performed and then the value is
swapped again and stored.

However, in a situation where we load the value, swap it and then store it
without swapping again, the pass erroneously removes the single swap. The
reason is that both checks in the same equivalence class:

- load feeds a swap
- swap feeds a store

pass. However, there is no check that the two swaps are actually a single swap.
This patch just fixes that.

Differential revision: https://reviews.llvm.org/D84785
2020-08-04 10:38:15 -05:00
Jay Foad 28e322ea93 [PowerPC] Custom lowering for funnel shifts
The custom lowering saves an instruction over the generic expansion, by
taking advantage of the fact that PowerPC shift instructions are well
defined in the shift-by-bitwidth case.

Differential Revision: https://reviews.llvm.org/D83948
2020-08-04 16:30:49 +01:00
Jay Foad 8ec8ad868d [AMDGPU] Use fma for lowering frem
This gives shorter f64 code and perhaps better accuracy.

Differential Revision: https://reviews.llvm.org/D84516
2020-08-04 16:18:23 +01:00
Simon Pilgrim 6f0da46d53 [X86] getFauxShuffleMask - drop unnecessary computeKnownBits OR(X,Y) shuffle decoding.
Now that rG47cea9e82dda941e lets us aggressively decode multi-use shuffles for the OR(SHUFFLE(),SHUFFLE()) case we don't need the computeKnownBits variant any more.
2020-08-04 15:57:47 +01:00
Nemanja Ivanovic 62a933b72c [Support][PPC] Fix bot failures due to cd53ded557
Commit https://reviews.llvm.org/rGcd53ded557c3 attempts to fix the
computation in computeHostNumPhysicalCores() to respect Affinity.
However, the GLIBC wrapper of the affinity system call fails with
a default size of cpu_set_t on systems that have more than 1024 CPUs.
This just fixes the computation on such large machines.
2020-08-04 09:00:49 -05:00
Simon Pilgrim 051f293b78 [X86] Remove unused canScaleShuffleElements helper
The only use was removed at rG36750ba5bd0e9e72

Thanks to @nemanjai for the heads up
2020-08-04 14:51:23 +01:00
Simon Pilgrim 36750ba5bd [X86][AVX] isHorizontalBinOp - relax lane-crossing limits for AVX1-only targets.
Permit lane-crossing post shuffles on AVX1 targets as long as every element comes from the same source lane, which for v8f32/v4f64 cases can be efficiently lowered with the LowerShuffleAsLanePermuteAnd* style methods.
2020-08-04 14:27:01 +01:00
Sanjay Patel 04e45ae1c6 [InstSimplify] fold nested min/max intrinsics with constant operands
This is based on the existing code for the non-intrinsic idioms
in InstCombine.

The vector constant constraint is non-obvious: undefs should be
ok in the outer call, but they can't propagate safely from the
inner call in all cases. Example:

https://alive2.llvm.org/ce/z/-2bVbM
  define <2 x i8> @src(<2 x i8> %x) {
  %0:
    %m = umin <2 x i8> %x, { 7, undef }
    %m2 = umin <2 x i8> { 9, 9 }, %m
    ret <2 x i8> %m2
  }
  =>
  define <2 x i8> @tgt(<2 x i8> %x) {
  %0:
    %m = umin <2 x i8> %x, { 7, undef }
    ret <2 x i8> %m
  }
  Transformation doesn't verify!
  ERROR: Value mismatch

  Example:
  <2 x i8> %x = < undef, undef >

  Source:
  <2 x i8> %m = < #x00 (0)	[based on undef value], #x00 (0) >
  <2 x i8> %m2 = < #x00 (0), #x00 (0) >

  Target:
  <2 x i8> %m = < #x07 (7), #x10 (16) >
  Source value: < #x00 (0), #x00 (0) >
  Target value: < #x07 (7), #x10 (16) >
2020-08-04 08:44:48 -04:00
Sanjay Patel 20c71e55aa [InstSimplify] reduce code for min/max analysis; NFC
This should probably be moved up to some common area eventually
when there's another user.
2020-08-04 08:02:33 -04:00
Sander de Smalen bb3344c7d8 [AArch64][SVE] Add missing unwind info for SVE registers.
This patch adds a CFI entry for each SVE callee saved register
that needs unwind info at an offset from the CFA. The offset is
a DWARF expression because the offset is partly scalable.

The CFI entries only cover a subset of the SVE callee-saves and
only encodes the lower 64-bits, thus implementing the lowest
common denominator ABI. Existing unwinders may support VG but
only restore the lower 64-bits.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84044
2020-08-04 11:47:06 +01:00
Sander de Smalen fd6584a220 [AArch64][SVE] Fix CFA calculation in presence of SVE objects.
The CFA is calculated as (SP/FP + offset), but when there are
SVE objects on the stack the SP offset is partly scalable and
should instead be expressed as the DWARF expression:

     SP + offset + scalable_offset * VG

where VG is the Vector Granule register, containing the
number of 64bits 'granules' in a scalable vector.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84043
2020-08-04 11:47:06 +01:00
Paul Walker 4be13b15d6 [SVE] Replace remaining _MERGE_OP1 nodes with _PRED variants.
This is the final bit of work to relax the register allocation
requirements when code generating normal LLVM IR, which rarely
care about the result of inactive lanes. By using _PRED nodes
we can make better use of SVE's reversed instructions.

Also removes a redundant parameter from the min/max tests.

Differential Revision: https://reviews.llvm.org/D85142
2020-08-04 11:19:17 +01:00
Juneyoung Lee e734e8286b [JumpThreading] Remove cast's constraint
As discussed in D84949, this removes the constraint to cast since it does not
cause compile time degradation.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D85188
2020-08-04 19:09:25 +09:00
David Green 3c7e7d40a9 [BasicAA] Enable -basic-aa-recphi by default
This option was added a while back, to help improve AA around pointer
phi loops. It looks for phi(gep(phi, const), x) loops, checking if x can
then prove more precise aliasing info.

Differential Revision: https://reviews.llvm.org/D82998
2020-08-04 10:43:42 +01:00
Meera Nakrani 20283ff491 [ARM] Generated SSAT and USAT instructions with shift
Added patterns so that both SSAT and USAT instructions are generated with shifts. Added corresponding regression tests.

Differential Review: https://reviews.llvm.org/D85120
2020-08-04 09:38:17 +00:00
Simon Pilgrim 47cea9e82d Revert rG66e7dce714fab "Revert "[X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero.""
[X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero (REAPPLIED)

This allows us to remove the (depth violating) code in getFauxShuffleMask where we were combining the OR(SHUFFLE,SHUFFLE) shuffle inputs as well, and not just the OR().

This is a minor step toward being able to shuffle combine from/to SELECT/BLENDV as a faux shuffle.

Reapplied with fixed signed/unsigned comparisons.
2020-08-04 10:32:39 +01:00
Florian Hahn f7658241cb [AArch64] Consider instruction-level contract FMFs in combiner patterns.
Currently, instruction level fast math flags are not considered when
generating patterns for the machine combiner.

This currently leads to some missed opportunities to generate FMAs in
combination with `#pragma clang fp contract (fast)`.

For example, when building the example below with -O3 for AArch64, no
FMADD is generated. If built with -O2 and the DAGCombiner is used
instead of the MachineCombiner for FMAs, an FMADD is generated.

With this patch, the same code is generated in both cases.

    float madd_contract(float a, float b, float c) {
    #pragma clang fp contract (fast)
      return (a * b) + c;
    }

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84930
2020-08-04 10:25:16 +01:00
Qiu Chaofan 6a78a8dd37 [NFC] [PowerPC] Refactor fp/int conversion lowering
For FP_TO_INT and INT_TO_FP lowering, we have direct-move and
non-direct-move methods. But they share some conversion logic, so we can
reduce redundant code by introducing new methods.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D81818
2020-08-04 15:48:16 +08:00
Juneyoung Lee 6f97103b56 [JumpThreading] Don't limit the type of an operand
Compared to the optimized code with branch conditions never frozen,
limiting the type of freeze's operand causes generation of suboptimal code in
some cases.
I would like to suggest removing the constraint, as this patch does.
If the number of freeze instructions becomes significant, this can be revisited.

Differential Revision: https://reviews.llvm.org/D84949
2020-08-04 16:21:58 +09:00
Fangrui Song b959906cb9 [PGO] Use multiple comdat groups for COFF
D84723 caused multiple definition issues (related to comdat) on Windows:
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/67465
2020-08-03 21:33:16 -07:00
Wang, Pengfei 6bc7ea2d8d [X86][AVX512] Fix build fail after D81548
Test function mask_cmp_128 failed during ISEL
LLVM ERROR: Cannot select: t37: v8i1 = X86ISD::KSHIFTL t48, TargetConstant:i8<4>
due to v8i1 only available under AVX512DQ.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D84922
2020-08-04 12:31:04 +08:00
Chen Zheng 45c46d180e [PowerPC] mark r+i as legal address mode for vector type after pwr9
Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D84735
2020-08-04 00:02:37 -04:00
Carl Ritson 57899934ea [AMDGPU] Make GCNRegBankReassign assign based on subreg banks
When scavenging consider the sub-register of the source operand
to determine the bank of a candidate register (not just sub0).
Without this it is possible to introduce an infinite loop,
e.g. $sgpr15_sgpr16_sgpr17 can be assigned for a conflict between
$sgpr0 and SGPR_96:sub1.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D84910
2020-08-04 12:54:44 +09:00
Fangrui Song e56626e438 [PGO] Move __profc_ and __profvp_ from their own comdat groups to __profd_'s comdat group
D68041 placed `__profc_`,  `__profd_` and (if exists) `__profvp_` in different comdat groups.
There are some issues:

* Cost: one or two additional section headers (`.group` section(s)): 64 or 128 bytes on ELF64.
* `__profc_`,  `__profd_` and (if exists) `__profvp_` should be retained or
  discarded. Placing them into separate comdat groups is conceptually inferior.
* If the prevailing group does not include `__profvp_` (value profiling not
  used) but a non-prevailing group from another translation unit has `__profvp_`
  (the function is inlined into another and triggers value profiling), there
  will be a stray `__profvp_` if --gc-sections is not enabled.
  This has been fixed by 3d6f53018f.

Actually, we can reuse an existing symbol (we choose `__profd_`) as the group
signature to avoid a string in the string table (the sole reason that D68041
could improve code size is that `__profv_` was an otherwise unused symbol which
wasted string table space). This saves one or two section headers.

For a -DCMAKE_BUILD_TYPE=Release -DLLVM_BUILD_INSTRUMENTED=IR build, `ninja
clang lld`, the patch has saved 10.5MiB (2.2%) for the total .o size.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D84723
2020-08-03 20:35:50 -07:00
Max Kazantsev 7647c2716e [SimpleLoopUnswitch][NFC] Add option to always drop make.implicit metadata in non-trivial unswitching and save compile time
We might want this if we find out that using of MustExecute analysis is too expensive.
By default we do the analysis because its complexity does not exceed the complexity
of whole loop copying in unswitching. Follow-up for D84925.

Differential Revision: https://reviews.llvm.org/D85001
Reviewed By: asbirlea
2020-08-04 10:16:40 +07:00
Chen Zheng ba955397ac [SCEVExpander][PowerPC]clear scev rewriter before deleting instructions.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D85130
2020-08-03 20:36:08 -04:00
Shinji Okumura ffe0066b62 [Attributor][NFC] Clang format 2020-08-04 09:04:12 +09:00
Christopher Tetreault c9e6887f83 [SVE] Remove bad calls to VectorType::getNumElements() from X86
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D85156
2020-08-03 16:34:10 -07:00
hgreving 509f5c4ec2 [MC] Fix memory leak when allocating MCInst with bump allocator
Adds the function createMCInst() to MCContext that creates a MCInst using
a typed bump alloctor.

MCInst contains a SmallVector<MCOperand, 8>. The SmallVector is POD only
for <= 8 operands. The default untyped bump pointer allocator of MCContext
does not delete the MCInst, so if the SmallVector grows, it's a leak.

This fixes https://bugs.llvm.org/show_bug.cgi?id=46900.
2020-08-03 16:08:26 -07:00
Christopher Tetreault 3b92db4c84 [SVE] Remove bad call to VectorType::getNumElements() from AMDGPU
Differential Revision: https://reviews.llvm.org/D85151
2020-08-03 15:56:10 -07:00
Christopher Tetreault b5059b7140 [SVE] Remove bad call to VectorType::getNumElements() from ARM
Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D85152
2020-08-03 15:41:14 -07:00
Jordan Rupprecht af3ec731d5 [NFC][ARM] Silence unused variable in release builds 2020-08-03 15:21:44 -07:00
Christopher Tetreault b43791e701 [SVE] Remove bad calls to VectorType::getNumElements() from PowerPC
Differential Revision: https://reviews.llvm.org/D85154
2020-08-03 15:15:20 -07:00
Alina Sbirlea 1ce82015f6 [MemorySSA] Restrict optimizations after a PhiTranslation.
Merging alias results from different paths, when a path did phi
translation is not necesarily correct. Conservatively terminate such paths.
Aimed to fix PR46156.

Differential Revision: https://reviews.llvm.org/D84905
2020-08-03 14:46:41 -07:00
Mitch Phillips 9a05fa10bd [HWASan] [GlobalISel] Add +tagged-globals backend feature for GlobalISel
GlobalISel is the default ISel for aarch64 at -O0. Prior to D78465, GlobalISel
didn't have support for dealing with address-of-global lowerings, so it fell
back to SelectionDAGISel.

HWASan Globals require special handling, as they contain the pointer tag in the
top 16-bits, and are thus outside the code model. We need to generate a `movk`
in the instruction sequence with a G3 relocation to ensure the bits are
relocated properly. This is implemented in SelectionDAGISel, this patch does
the same for GlobalISel.

GlobalISel and SelectionDAGISel differ in their lowering sequence, so there are
differences in the final instruction sequence, explained in
`tagged-globals.ll`. Both of these implementations are correct, but GlobalISel
is slightly larger code size / slightly slower (by a couple of arithmetic
instructions). I don't see this as a problem for now as GlobalISel is only on
by default at `-O0`.

Reviewed By: aemerson, arsenm

Differential Revision: https://reviews.llvm.org/D82615
2020-08-03 14:28:44 -07:00
David Green 22916481c1 [ARM] Convert VPSEL to VMOV in tail predicated loops
VPSEL has slightly different semantics under tail predication (it can
end up selecting from Qn, Qm and Qd). We do not model that at the moment
so they block tail predicated loops from being formed.

This just converts them into a predicated VMOV instead (via a VORR),
allowing tail predication to happen whilst still modelling the original
behaviour of the input.

Differential Revision: https://reviews.llvm.org/D85110
2020-08-03 22:03:14 +01:00
Thomas Lively cb32792210 [WebAssembly] Implement prototype v128.load{32,64}_zero instructions
Specified in https://github.com/WebAssembly/simd/pull/237, these
instructions load the first vector lane from memory and zero the other
lanes. Since these instructions are not officially part of the SIMD
proposal, they are only available on an opt-in basis via LLVM
intrinsics and clang builtin functions. If these instructions are
merged to the proposal, this implementation will change so that the
instructions will be generated from normal IR. At that point the
intrinsics and builtin functions would be removed.

This PR also changes the opcodes for the experimental f32x4.qfm{a,s}
instructions because their opcodes conflicted with those of the
v128.load{32,64}_zero instructions. The new opcodes were chosen to
match those used in V8.

Differential Revision: https://reviews.llvm.org/D84820
2020-08-03 13:54:00 -07:00
Mitch Phillips 66e7dce714 Revert "[X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero."
This reverts commit 219f32f4b6.

Commit contains unsigned compasions that break bots that build with
-Wsign-compare.
2020-08-03 13:48:30 -07:00
Fangrui Song 11bb7c220c [MC] Set sh_link to 0 if the associated symbol is undefined
Part of https://bugs.llvm.org/show_bug.cgi?id=41734

LTO can drop externally available definitions. Such AssociatedSymbol is
not associated with a symbol. ELFWriter::writeSection() will assert.

Allow a SHF_LINK_ORDER section to have sh_link=0.

We need to give sh_link a syntax, a literal zero in the linked-to symbol
position, e.g. `.section name,"ao",@progbits,0`

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D72899
2020-08-03 13:43:48 -07:00
Jon Roelofs 7f1556f292 Fix typo: s/epomymous/eponymous/ NFC 2020-08-03 14:09:46 -06:00
Eli Friedman dca23ed895 [AArch64] Add missing isel patterns for fcvtzs/u intrinsic on v1f64.
Fixes test-suite compile failure caused by 8dfb5d7.

While I'm in the area, add some more test coverage to related
operations, to make sure we aren't missing any other patterns.
2020-08-03 13:04:59 -07:00
Lang Hames 777824b49d [llvm-jitlink] Add support for static archives and MachO universal archives.
Archives can now be specified as input files the same way that object
files are. Archives will always be linked after all objects (regardless
of the relative order of the inputs) but before any dynamic libraries or
process symbols.

This patch also relaxes matching for slice triples in
StaticLibraryDefinitionGenerator in order to support this feature:
Vendors need not match if the source vendor is unknown.
2020-08-03 12:58:00 -07:00
Hiroshi Yamauchi 3e89cbf38e [PGO] Enable the extended value profile buckets for mem op sizes.
Following up D81682 and enable the new, extended value profile buckets for mem
op sizes.

Differential Revision: https://reviews.llvm.org/D83903
2020-08-03 12:25:11 -07:00
Sanjay Patel 9e5cf6bde5 [InstSimplify] fold variations of max-of-min with common operand
https://alive2.llvm.org/ce/z/ZtxpZ3
2020-08-03 15:02:46 -04:00
Arthur Eubanks 456f38a971 Fix layering violation Transforms/Utils -> Scalar
Introduced in D85063.
2020-08-03 11:53:23 -07:00
Jian Cai c6334db577 [X86] support .nops directive
Add support of .nops on X86. This addresses llvm.org/PR45788.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D82826
2020-08-03 11:50:56 -07:00
Florian Hahn 1e392fc445 [ArgPromotion] Replace all md uses of promoted values with undef.
Currently, ArgPromotion may leave metadata uses of promoted values,
which will end up in the wrong function, creating invalid IR.

PR33641 fixed this for dead arguments, but it can be also be triggered
arguments with users that are promoted (see the updated test case).

We also have to drop uses to them after promoting them. We need to do
this after dealing with the non-metadata uses, so I also moved the empty
use case to the loop that deals with updating the arguments of the new
function.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D85127
2020-08-03 19:31:53 +01:00
Hiroshi Yamauchi f78f509c75 [PGO] Extend the value profile buckets for mem op sizes.
Extend the memop value profile buckets to be more flexible (could accommodate a
mix of individual values and ranges) and to cover more value ranges (from 11 to
22 buckets).

Disabled behind a flag (to be enabled separately) and the existing code to be
removed later.

Differential Revision: https://reviews.llvm.org/D81682
2020-08-03 11:04:32 -07:00
Joao Moreira f208c659fb [X86] Make ENDBR instruction a scheduling boundary
Instructions should not be scheduled across ENDBR instructions, as this would result in the ENDBR being displaced, breaking the parity needed for the Indirect Branch Tracking feature of CET.

Currently, the X86IndirectBranchTracking pass is later than the instruction scheduling in the pipeline, what causes the bug to be unnoticeable and very hard (if not unfeasible) to be triggered while compiling C files with the standard LLVM setup. Yet, for correctness and to prevent issues in future changes, the compiler should prevent the such scheduling.

Differential Revision: https://reviews.llvm.org/D84862
2020-08-03 10:47:23 -07:00
Simon Pilgrim 219f32f4b6 [X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero.
This allows us to remove the (depth violating) code in getFauxShuffleMask where we were combining the OR(SHUFFLE,SHUFFLE) shuffle inputs as well, and not just the OR().

This is a minor step toward being able to shuffle combine from/to SELECT/BLENDV as a faux shuffle.
2020-08-03 18:32:47 +01:00
Arthur Eubanks 7c19c89dd5 [NewPM][LoopVersioning] Port LoopVersioning to NPM
Reviewed By: ychen, fhahn

Differential Revision: https://reviews.llvm.org/D85063
2020-08-03 10:32:09 -07:00
Craig Topper ac82b918c7 [X86] Use h-register for final XOR of __builtin_parity on 64-bit targets.
This adds an isel pattern and special XOR8rr_NOREX instruction
to enable the use of h-registers for __builtin_parity. This avoids
a copy and a shift instruction. The NOREX instruction is in case
register allocation doesn't use the matching l-register for some
reason. If a R8-R15 register gets picked instead, we won't be
able to encode the instruction since an h-register can't be used
with a REX prefix.

Fixes PR46954
2020-08-03 10:10:17 -07:00
Mircea Trofin 4b1b109c51 [llvm] Add a parser from JSON to TensorSpec
A JSON->TensorSpec utility we will use subsequently to specify
additional outputs needed for certain training scenarios.

Differential Revision: https://reviews.llvm.org/D84976
2020-08-03 09:49:31 -07:00
Gui Andrade 3ebd1ba64f [MSAN] Instrument freeze instruction by clearing shadow
Freeze always returns a defined value. This also prevents msan from
checking the input shadow, which happened because freeze wasn't
explicitly visited.

Differential Revision: https://reviews.llvm.org/D85040
2020-08-03 16:42:17 +00:00
Florian Hahn ee1c12708a [SCEV] If Start>=RHS, simplify (Start smin RHS) = RHS for trip counts.
In some cases, it seems like we can get rid of unnecessary s/umins by
using information from the loop guards (unless I am missing something).

One place where this seems to be helpful in practice is when computing
loop trip counts. This patch just changes howManyGreaterThans for now.
Note that this requires a loop for which we can check 'is guarded'.

On SPEC2000/SPEC2006/MultiSource, there are some notable changes for
some programs in the number of loops unrolled and trip counts computed.

```
Same hash: 179 (filtered out)
Remaining: 58
Metric: scalar-evolution.NumTripCountsComputed

Program                                        base    patch   diff
 test-suite...langs-C/compiler/compiler.test    25.00   31.00  24.0%
 test-suite.../Applications/SPASS/SPASS.test   2020.00 2323.00 15.0%
 test-suite...langs-C/allroots/allroots.test    29.00   32.00  10.3%
 test-suite.../Prolangs-C/loader/loader.test    17.00   18.00   5.9%
 test-suite...fice-ispell/office-ispell.test   253.00  265.00   4.7%
 test-suite...006/450.soplex/450.soplex.test   3552.00 3692.00  3.9%
 test-suite...chmarks/MallocBench/gs/gs.test   453.00  470.00   3.8%
 test-suite...ngs-C/assembler/assembler.test    29.00   30.00   3.4%
 test-suite.../Benchmarks/Ptrdist/bc/bc.test   263.00  270.00   2.7%
 test-suite...rks/FreeBench/pifft/pifft.test   722.00  741.00   2.6%
 test-suite...count/automotive-bitcount.test    41.00   42.00   2.4%
 test-suite...0/253.perlbmk/253.perlbmk.test   1417.00 1451.00  2.4%
 test-suite...000/197.parser/197.parser.test   387.00  396.00   2.3%
 test-suite...lications/sqlite3/sqlite3.test   1168.00 1189.00  1.8%
 test-suite...000/255.vortex/255.vortex.test   173.00  176.00   1.7%

Metric: loop-unroll.NumUnrolled

Program                                        base   patch  diff
 test-suite...langs-C/compiler/compiler.test     1.00   3.00 200.0%
 test-suite.../Applications/SPASS/SPASS.test   134.00 234.00 74.6%
 test-suite...count/automotive-bitcount.test     3.00   4.00 33.3%
 test-suite.../Prolangs-C/loader/loader.test     3.00   4.00 33.3%
 test-suite...langs-C/allroots/allroots.test     3.00   4.00 33.3%
 test-suite...Source/Benchmarks/sim/sim.test    10.00  12.00 20.0%
 test-suite...fice-ispell/office-ispell.test    21.00  25.00 19.0%
 test-suite.../Benchmarks/Ptrdist/bc/bc.test    32.00  38.00 18.8%
 test-suite...006/450.soplex/450.soplex.test   300.00 352.00 17.3%
 test-suite...rks/FreeBench/pifft/pifft.test    60.00  69.00 15.0%
 test-suite...chmarks/MallocBench/gs/gs.test    57.00  63.00 10.5%
 test-suite...ngs-C/assembler/assembler.test    10.00  11.00 10.0%
 test-suite...0/253.perlbmk/253.perlbmk.test   145.00 157.00  8.3%
 test-suite...000/197.parser/197.parser.test    43.00  46.00  7.0%
 test-suite...TimberWolfMC/timberwolfmc.test   205.00 214.00  4.4%
 Geomean difference                                           7.6%
```

Fixes https://bugs.llvm.org/show_bug.cgi?id=46939
Fixes https://bugs.llvm.org/show_bug.cgi?id=46924 on X86.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D85046
2020-08-03 17:22:42 +01:00
Cameron McInally 31c7a2fd5c [FPEnv] Don't transform FSUB(-0,X)->FNEG(X) in SelectionDAGBuilder.
This patch stops unconditionally transforming FSUB(-0,X) into an FNEG(X) while building the DAG. There is also one small change to handle the new FSUB(-0,X) similarly to FNEG(X) in the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D84056
2020-08-03 10:22:25 -05:00
Xing GUO 08649d4321 [DWARFYAML] Implement the .debug_loclists section.
This patch implements the .debug_loclists section. There are only two
DWARF expressions are implemented in this patch (DW_OP_consts,
DW_OP_stack_value). We will implement more in the future.

The YAML description of the .debug_loclists section is:

```
debug_loclists:
  - Format:              DWARF32 ## Optional
    Length:              0x1234  ## Optional
    Version:             5       ## Optional (5 by default)
    AddressSize:         8       ## Optional
    SegmentSelectorSize: 0       ## Optional (0 by default)
    OffsetEntryCount:    1       ## Optional
    Offsets:             [ 1 ]   ## Optional
    Lists:
      - Entries:
          - Operator:          DW_LLE_startx_endx
            Values:            [ 0x1234, 0x4321 ]
            DescriptorsLength: 0x1234             ## Optional
            Descriptors:
              - Operator: DW_OP_consts
                Values:   [ 0x1234 ]
```

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D84234
2020-08-03 23:20:15 +08:00
Shinji Okumura 1c2777f585 [NFC][APInt][DenseMapInfo] Move DenseMapAPIntKeyInfo into DenseMap.h as DenseMapInfo<APInt>
`DenseMapAPIntKeyInfo` is now located in `lib/IR/LLVMContextImpl.h`.
Moved it into `include/ADT/DenseMapInfo.h` to use it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85131
2020-08-03 23:31:13 +09:00
Sanjay Patel 23693ffc3b [InstCombine] reduce xor-of-or's bitwise logic (PR46955); 2nd try
The 1st try at this (rG2265d01f2a5b) exposed what looks like
unspecified behavior in C/C++ resulting in test variations.

The arguments to BinaryOperator::CreateAnd() were both IRBuilder
function calls, and the order in which they execute determines
the order of the new instructions in the IR. But the order of
function arg evaluation is not fixed by the rules of C/C++, so
depending on compiler config, the test would fail because the
test expected a single fixed ordering of instructions.

Original commit message:
I tried to use m_Deferred() on this, but didn't find
a clean way to do that.

http://bugs.llvm.org/PR46955

https://alive2.llvm.org/ce/z/2h6QTq
2020-08-03 10:21:56 -04:00
Xing GUO 2d8ca4ae2b [DWARFYAML] Offsets should be omitted when the OffsetEntryCount is 0.
The offsets field should be omitted when the 'OffsetEntryCount' entry is
specified to be 0.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D85006
2020-08-03 22:06:08 +08:00
Matt Arsenault 42a9f6c554 GlobalISel: Handle arbitrary FewerElementsVector for G_IMPLICIT_DEF 2020-08-03 09:14:08 -04:00
Matt Arsenault 2414bab5d7 AMDGPU/GlobalISel: Remove old hacks for boolean selection
There were various hacks used to try to avoid making s1 SGPR vs. s1
VCC ambiguous after constraining the register before we had a strategy
to deal with this. This also attempted to handle undef operands, which
are now illegal gMIR.
2020-08-03 09:04:14 -04:00
Matt Arsenault 1782fbbc69 GlobalISel: Reimplement moreElementsVectorDst
Use pad with undef and unmerge with unused results. This is annoyingly
similar to several other places in LegalizerHelper, but they're all
slightly different.
2020-08-03 09:03:48 -04:00
Sanjay Patel f19a9be385 Revert "[InstCombine] reduce xor-of-or's bitwise logic (PR46955)"
This reverts commit 2265d01f2a.
Seeing bot failures after this change like:
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/42586
2020-08-03 08:58:41 -04:00
Matt Arsenault fd63e46941 AMDGPU/GlobalISel: Apply load bitcast to s.buffer.load intrinsic
Should also apply this to the non-scalar buffer loads.
2020-08-03 08:54:29 -04:00
Simon Pilgrim 99a971cadf [X86][SSE] Start shuffle combining from ANY_EXTEND_VECTOR_INREG on SSE targets
We already do this on AVX (+ for ZERO_EXTEND_VECTOR_INREG), but this enables it for all SSE targets - we attempted something similar back at rL357057 but hit issues with the ZERO_EXTEND_VECTOR_INREG handling (PR41249).

I'm still looking at the vector-mul.ll regression - which is due to 32-bit targets performing the load as a f64, resulting in the shuffle combiner thinking it has to create a shuffle in the float domain.
2020-08-03 13:41:48 +01:00
Matt Arsenault d8ef1d1251 AMDGPU/GlobalISel: Fix selecting broken copies for s32->s64 anyext
These should probably not be legal in the first place, but that might
also be a pain.
2020-08-03 08:36:41 -04:00
Sanjay Patel 2265d01f2a [InstCombine] reduce xor-of-or's bitwise logic (PR46955)
I tried to use m_Deferred() on this, but didn't find
a clean way to do that.

http://bugs.llvm.org/PR46955

https://alive2.llvm.org/ce/z/2h6QTq
2020-08-03 08:31:43 -04:00
Nicholas Guy 18279a54b5 [ARM] Fix IT block generation after Thumb2SizeReduce with -Oz
Fixes a regression caused by D82439, in which IT blocks were no longer being
generated when -Oz is present. This was due to the CPSR register being marked as
dead, while this case was not accounted for.

Differential Revision: https://reviews.llvm.org/D83667
2020-08-03 13:20:32 +01:00
Florian Hahn 98db27711d [LV] Do not check widening decision for instrs outside of loop.
No widening decisions will be computed for instructions outside the
loop. Do not try to get a widening decision. The load/store will be just
a scalar load, so treating at as normal should be fine I think.

Fixes PR46950.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D85087
2020-08-03 10:09:24 +01:00
Xing GUO ef005f204b [MachOYAML] Remove redundant variable initialization. NFC.
The value of `is64Bit` is initialized in the constructor body.
2020-08-03 16:17:28 +08:00
Shinji Okumura 434cf2ded3 [Attributor] Check nonnull attribute violation in AAUndefinedBehavior
This patch makes it possible to handle nonnull attribute violation at callsites in AAUndefinedBehavior.
If null pointer is passed to callee at a callsite and the corresponding argument of callee has nonnull attribute, the behavior of the callee is undefined.
In this patch, violations of argument nonnull attributes is only handled.
But violations of returned nonnull attributes can be handled and I will implement that in a follow-up patch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84733
2020-08-03 17:12:50 +09:00
Igor Kudrin 414b9bec6d [DebugInfo] Make DIEDelta::SizeOf() more explicit. NFCI.
The patch restricts DIEDelta::SizeOf() to accept only DWARF forms that
are actually used in the LLVM codebase. This should make the use of the
class more explicit and help to avoid issues similar to fixed in D83958
and D84094.

Differential Revision: https://reviews.llvm.org/D84095
2020-08-03 15:04:15 +07:00
Igor Kudrin f98e03a35d [DebugInfo] Fix misleading using of DWARF forms with DIELabel. NFCI.
DIELabel can emit only 32- or 64-bit values, while it was created in
some places with DW_FORM_udata, which implies emitting uleb128.
Nevertheless, these places also expected to emit U32 or U64, but just
used a misleading DWARF form. The patch updates those places to use more
appropriate DWARF forms and restricts DIELabel::SizeOf() to accept only
forms that are actually used in the LLVM codebase.

Differential Revision: https://reviews.llvm.org/D84094
2020-08-03 15:04:08 +07:00
Igor Kudrin 8feff8d14f [DebugInfo] Fix a comment and a variable name. NFC.
DebugLocListIndex keeps the index of an entry list, not the offset.

Differential Revision: https://reviews.llvm.org/D84093
2020-08-03 15:04:00 +07:00
Igor Kudrin 4e10a18972 [DebugInfo] Make DIELocList::SizeOf() more explicit. NFCI.
DIELocList is used with a limited number of DWARF forms, see the only
place where it is instantiated, DwarfCompileUnit::addLocationList().

The patch marks the unexpected execution path in DIELocList::SizeOf()
as unreachable, to reduce ambiguity.

Differential Revision: https://reviews.llvm.org/D84092
2020-08-03 15:03:37 +07:00
Fangrui Song 40da58a04b [MC] Default MCAsmBackend::mayNeedRelaxation() to false 2020-08-02 22:13:59 -07:00
QingShan Zhang 62e4644616 [NFC][PowerPC] Add a multiclass for fsetcc to define them in a uniform way
This is a refactor patch to prepare for adding the support for strict-fsetcc
in PowerPC backend. We want to move their definition into a uniform way so that,
we could add the strict node easier.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D81712
2020-08-03 03:28:03 +00:00
StephenFan a96921afa7 [RISCV] eliminate the repetition declare of SDLoc DL
Differential revision: https://reviews.llvm.org/D85002
2020-08-03 10:24:30 +08:00
Fangrui Song b497665d98 Reland D64327 [MC][ELF] Allow STT_SECTION referencing SHF_MERGE on REL targets
This drops a GNU gold workaround and reverts the revert commit rL366708.

  Before binutils 2.34, gold -O2 and above did not correctly handle R_386_GOTOFF to
  SHF_MERGE|SHF_STRINGS sections: https://sourceware.org/bugzilla/show_bug.cgi?id=16794

From the original review:

  ... it reduced the size of a big ARM-32 debug image by 33%. It contained ~68M
  of relocations symbols out of total ~71M symbols (96% of symbols table was
  generated for relocations with symbol).

-Wl,-O2 (and -Wl,-O3) is so rare that we should just lower the
optimization level for LLVM_LINKER_IS_GOLD rather than pessimizing all users.
2020-08-02 18:05:17 -07:00
Florian Hahn 599955eb56 Recommit "[IPConstProp] Remove and move tests to SCCP."
This reverts commit 59d6e814ce.

The cause for the revert (3 clang tests running opt -ipconstprop) was
fixed by removing those lines.
2020-08-02 22:23:54 +01:00
Vitaly Buka 08cf49658c [StackSafety, NFC] Don't insert empty objects into the map
Result should be the same but it makes generateParamAccessSummary 5x
faster.
2020-08-02 13:58:56 -07:00
Craig Topper 64516ec7c1 [X86] Use parity flag from byte test/cmp instruction for __builtin_parity when input fits in 8 bits.
If the upper bits of the __builtin_parity idiom are known to be
0 we were previously emitting an xor with 0 to get the parity flag.
But we can use cmp/test instead which may expose opportunities for
load folding or combining an AND.
2020-08-02 10:45:04 -07:00
Simon Pilgrim e202236721 [IR] Add IRBuilderBase::CreateVectorSplat(ElementCount EC) variant
As discussed on D81500, this adds a more general ElementCount variant of the build helper and converts the (non-scalable) unsigned NumElts variant to use it internally.
2020-08-02 16:55:38 +01:00
Sanjay Patel 4abc69c6f5 [InstSimplify] fold max (max X, Y), X --> max X, Y
https://alive2.llvm.org/ce/z/VGgG3M
2020-08-02 11:50:58 -04:00
Matt Arsenault 212570abcf GlobalISel: Implement bitcast action for G_EXTRACT_VECTOR_ELEMENT
For AMDGPU, vectors with elements < 32 bits should be indexed in
32-bit elements and the desired bits extracted from there. For
elements > 64-bits, these should be reduce to 64/32 elements to enable
the normal dynamic indexing paths.

In the dynamic index cases, this produces shorter code most of the
time. This does immediately regress the constant index cases, but this
should be fixed once we have the most basic of shift combines.

The element size > 64 case is pretty much ported from the exisiting
DAG implementation for extract element promote. The increasing element
size case is new.
2020-08-02 10:42:07 -04:00
Simon Pilgrim 00d0f354f2 X86InstrInfo.cpp - fix include ordering. NFCI. 2020-08-02 15:34:18 +01:00
Simon Pilgrim 7dd4f03595 Use merge null and isa<> tests into isa_and_nonnull<>. NFCI. 2020-08-02 15:34:18 +01:00
Simon Pilgrim b8ffbf0e02 [DAG] TargetLowering::expandMUL_LOHI - pass SDLoc as const&
Try to be more consistent with the SDLoc param in the TargetLowering methods.

This also exposes an issue where we were passing a SDNode as a SDLoc, relying on the implicit SDLoc(SDNode) constructor.
2020-08-02 15:31:36 +01:00
Simon Pilgrim d14a22da5e [DAG] TargetLowering::LowerAsmOutputForConstraint - pass SDLoc as const&
Try to be more consistent with the SDLoc param in the TargetLowering methods.
2020-08-02 15:12:02 +01:00
Shinji Okumura 376b64926b Revert "[Attributor] AAPotentialValues Interface"
The commit cause build failure.
2020-08-02 22:49:52 +09:00
Nikita Popov a0addbb4ec [InstSimplify] Reduce code duplication in icmp of binop folds (NFC)
For folds where we check for the binop on both the LHS and RHS,
extract a function that expects it on the LHS and call it with
swapped order.
2020-08-02 15:47:18 +02:00
Xing GUO 8d1b9505f2 [DWARFYAML][debug_aranges] Make the 'Descriptors' field optional. 2020-08-02 21:39:44 +08:00
Simon Pilgrim 20fbbbc583 [X86] Use const APInt& in for-range loop to avoid unnecessary copies. NFCI.
Fixes clang-tidy warning.
2020-08-02 14:32:23 +01:00
Simon Pilgrim d7e2616741 [X86] Pass SDLoc by const reference. NFCI. 2020-08-02 14:32:22 +01:00
Simon Pilgrim 3f276840b6 [X86] Use const APInt& in for-range loop to avoid unnecessary copies. NFCI.
Fixes clang-tidy warning.
2020-08-02 14:32:22 +01:00
Simon Pilgrim 2700311cce [X86] combineX86ShuffleChain - pull out repeated RootVT.getSizeInBits() calls. NFCI. 2020-08-02 14:32:22 +01:00
Shinji Okumura d3f01b6681 [Attributor] AAPotentialValues Interface
This is a split patch of D80991.
This patch introduces AAPotentialValues and its interface only.
For more detail of AAPotentialValues abstract attribute, see the original patch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83283
2020-08-02 19:12:17 +09:00
Craig Topper 56166a3a52 [X86] Improve parity idiom recognition to handle (and (truncate (ctpop X)), 1).
Fixes part of PR46954
2020-08-01 22:59:43 -07:00
AK 20797989ea Outline non returning functions unless a longjmp
__assert_fail, abort, exit etc. are cold.
TODO: outline throw

Authored by: rjf (Ruijie Fang)
Reviewed by: hiraditya,tejohnson,fhahn

Differential Revision: https://reviews.llvm.org/D69257
2020-08-01 22:16:14 -07:00
Kazu Hirata 60434989e5 Use llvm::is_contained where appropriate (NFC)
Use llvm::is_contained where appropriate (NFC)

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D85083
2020-08-01 21:51:06 -07:00
Lang Hames 0f5b70769d [llvm-jitlink] Add -phony-externals option to suppress unresolved externals.
The -phony-externals option adds a generator which explicitly defines any
otherwise unresolved externals as null. This transforms link-time
unresolved-symbol errors into potential runtime null pointer accesses
(if an unresolved external is actually accessed during execution).

This option can be useful in -harness mode to avoid having to mock a
large number of symbols that are not reachable at runtime (e.g. unused
methods referenced by a class vtable).
2020-08-01 18:33:44 -07:00
Nikita Popov 25af353b0e [NewPM][LVI] Abandon LVI after CVP
As mentioned on D70376, LVI can currently cause performance issues
when running under NewPM. The problem is that, unlike the legacy
pass manager, NewPM will not immediately discard the LVI analysis
if the following pass does not need it. This is a problem, because
LVI has a high memory requirement, and mass invalidation of LVI
values is very inefficient. LVI should only be alive during passes
that actively interact with it.

This patch addresses the issue by explicitly abandoning LVI after CVP,
which gets us back to the LegacyPM behavior.

Differential Revision: https://reviews.llvm.org/D84959
2020-08-01 23:47:46 +02:00
Craig Topper e297d928dc [X86] Add assembler support for {disp8} and {disp32} to control the size of displacement used for memory operands.
These prefixes should override the default behavior and force a larger immediate size. I don't believe gas issues any warning if you use {disp8} when a 32-bit displacement is already required. And this patch doesn't either.

This completes the {disp8} and {disp32} support from PR46650.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D84793
2020-08-01 13:26:35 -07:00
Craig Topper 85b5315dbe [InstSimplify] Fold abs(abs(x)) -> abs(x)
It's always safe to pick the earlier abs regardless of the nsw flag. We'll just lose it if it is on the outer abs but not the inner abs.

Differential Revision: https://reviews.llvm.org/D85053
2020-08-01 13:25:00 -07:00
Craig Topper 4a19e6156e [InstCombine] Fold abs(-x) -> abs(x)
Negating the input doesn't matter. I left a FIXME to copy the nsw flag if its present on the neg but not on the abs.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D85055
2020-08-01 13:25:00 -07:00
Florian Hahn 05b44f7eae [LCSSA] Provide option for caller to clean up unused PHIs.
formLCSSAForInstructions is used by SCEVExpander, which tracks all
inserted instructions including LCSSA phis using asserting value
handles. This means cleanup needs to happen in the caller.

Extend formLCSSAForInstructions  to take an optional pointer to a
vector. If this argument is non-nullptr, instead of directly deleting
the phis, add them to the vector, so the caller can process them.

This should address various PPC buildbot failures, including
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/40567
2020-08-01 20:43:19 +01:00
Simon Pilgrim 82a5c848e7 [X86][AVX512] Fold concat(and(x,y),and(z,w)) -> and(concat(x,z),concat(y,w)) for 512-bit vectors
Helps vpternlog folding on non-AVX512BW targets
2020-08-01 20:34:39 +01:00
Simon Pilgrim bb13c34c3a [X86][AVX] Ensure we only combine to PSHUFLW/PSHUFHW on supporting targets
Noticed while investigating combining from concatenated shuffle vectors, we weren't checking that PSHUFLW/PSHUFHW was legal - we were depending on lowering splitting to subvectors.
2020-08-01 19:18:11 +01:00
Florian Hahn a9b06a2c14 [LCSSA] Use IRBuilder for PHI creation.
Use IRBuilder instead PHINode::Create. This should not impact the
generated code, but IRBuilder provides a way to register callbacks for
inserted instructions, which is convenient for some users.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D85037
2020-08-01 18:44:15 +01:00
David Green fd69df62ed [ARM] Distribute post-inc for Thumb2 sign/zero extending loads/stores
This adds sign/zero extending scalar loads/stores to the MVE
instructions added in D77813, allowing us to create up more post-inc
instructions. These are comparatively simple, compared to LDR/STR (which
may be better turned into an LDRD/LDM), but still require some additions
over MVE instructions. Because there are i12 and i8 variants of the
offset loads/stores dealing with different signs, we may need to convert
an i12 address to a i8 negative instruction. t2LDRBi12 can also be
shrunk to a tLDRi under the right conditions, so we need to be careful
with codesize too.

Differential Revision: https://reviews.llvm.org/D78625
2020-08-01 14:01:18 +01:00
Sanjay Patel 04b99a4d18 [InstSimplify] simplify abs if operand is known non-negative
abs() should be rare enough that using value tracking is not going
to be a compile-time cost burden, so use it to reduce a variety of
potential patterns. We do this in DAGCombiner too.

Differential Revision: https://reviews.llvm.org/D85043
2020-08-01 07:47:06 -04:00
Simon Pilgrim 1b1901536a [X86][AVX] Extend v2f64 BROADCAST(LOAD) -> BROADCAST_LOAD to v2i64/v4f32/v4i32
Minor precursor fix for D66004, but helps the SSE41 tests as well as they run with -disable-peephole
2020-08-01 12:28:29 +01:00
Evgeny Leviant e73f5d86f1 [MachineVerifier] Refactor calcRegsPassed. NFC
Patch improves performance of verify-machineinstrs pass up to 10x.
Differential revision: https://reviews.llvm.org/D84105
2020-08-01 12:58:52 +03:00
Craig Topper 75f134eec1 [X86] Refactor the broadcast and load folding in tryVPTESTM to reduce some code.
Now we try to load and broadcast together for operand 1. Followed
by load and broadcast for operand 1. Previously we tried load
operand 1, load operand 1, broadcast operand 0, broadcast operand 1.

Now we have a single helper that tries load and broadcast for
one operand that we can just call twice.
2020-07-31 23:57:13 -07:00
Chen Zheng 8c5edf5023 [SCEV] don't query getSCEV() for incomplete phis
querying getSCEV() for incomplete phis leads to wrong cache value in `ExprToIVMap`,
because incomplete phis may be simplified to same value before get SCEV expression.

Reviewed By: lebedev.ri, mkazantsev

Differential Revision: https://reviews.llvm.org/D77560
2020-08-01 02:38:54 -04:00
Craig Topper 1bd7046e4c [X86] Use TargetLowering::getRegClassFor to simplify some code in tryVPTESTM. NFCI 2020-07-31 21:39:10 -07:00
Justin Hibbits 7e9153e940 PowerPC: Don't lower SELECT_CC to PPCISD::FSEL on SPE
SPE doesn't have a fsel instruction, so don't try to lower to it.

This fixes a "Cannot select: tN: f64 = PPCISD::FSEL tX, tY, tZ" error.

Reviewed By: #powerpc, lkail
Differential Revision: https://reviews.llvm.org/D77773
2020-07-31 22:52:47 -05:00
Justin Hibbits 914dbf4808 PowerPC: Fix SPE extloadf32 handling.
The patterns were incorrect copies from the FPU code, and are
unnecessary, since there's no extended load for SPE.  Just let LLVM
itself do the work by marking it expand.

Reviewed By: #powerpc, lkail
Differential Revision: https://reviews.llvm.org/D78670
2020-07-31 22:42:57 -05:00
Kazushi (Jam) Marukawa 605fd4d77c [VE] Change calling convention to follow ABI
Change to expand all arguments and return values to i64 to follow ABI.
Update regression tests also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D84581
2020-08-01 10:08:54 +09:00
Huihui Zhang 01bfe2e494 [AArch64][SVE] Allow vector of pointers as legal type for masked load/store.
Refer to LangRef http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics
'llvm.masked.load/store.*’ intrinsics are overloaded intrinsic, which allow the
load/store data to be a vector of any integer, floating-point or pointer data type.

Therefore, allow pointer data type when checking 'isLegalMaskedLoadStore()'.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D85045
2020-07-31 17:30:23 -07:00
Craig Topper 93c678a79b [X86] Simplify vpternlog immediate selection.
Rather than hardcoding immediate values for 12 different combinations
in a nested pair of switches, we can perform the matched logic
operation on 3 magic constants to calculate the immediate.

Special thanks to this tweet https://twitter.com/rygorous/status/1187034321992871936
for making me realize I could do this.
2020-07-31 17:16:27 -07:00
Hsiangkai Wang 47a4a27f47 Upgrade MC to v0.9.
Differential revision: https://reviews.llvm.org/D80802
2020-08-01 07:42:06 +08:00
Craig Topper 86dea1f39b [ValueTracking] Improve llvm.abs handling in computeKnownBits.
Add the optimizations we have in the SelectionDAG version.
Known non-negative copies all known bits. Any known one other than
the sign bit makes result non-negative.

Differential Revision: https://reviews.llvm.org/D85000
2020-07-31 15:55:03 -07:00
Sriraman Tallam ca6b6d40ff Rename basic block sections options to be consistent.
D68049 created options for basic block sections: -fbasic-block-sections=,
-funique-basic-block-section-names. Rename options in llc and lld (--lto-)
to be consistent. Specifically,

+ Rename basicblock-sections to basic-block-sections
+ Rename unique-bb-section-names to unique-basic-block-section-names

Differential Revision: https://reviews.llvm.org/D84462
2020-07-31 11:50:55 -07:00
Sidharth Baveja b7cfa6ca92 [Loop Peeling] Separate the Loop Peeling Utilities from the Loop Unrolling Utilities
Summary: This patch separates the Loop Peeling Utilities from Loop Unrolling.
The reason for this change is that Loop Peeling is no longer only being used by
loop unrolling; Patch D82927 introduces loop peeling with fusion, such that
loops can be modified to have to same trip count, making them legal to be
peeled.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D83056
2020-07-31 18:31:58 +00:00
Fangrui Song cd53ded557 [Support] Fix computeHostNumPhysicalCores() to respect affinity
computeHostNumPhysicalCores() is designed to respect CPU affinity.
D84764 used sysconf(_SC_NPROCESSORS_ONLN) which does not respect
affinity.
SupportTests Threading.PhysicalConcurrency may fail if taskset -c is specified.
2020-07-31 11:20:15 -07:00
Sanjay Patel e591713bff [ConstantFolding] fold abs intrinsic
The handling for minimum value is similar to cttz/ctlz with 0 just above this case.

Differential Revision: https://reviews.llvm.org/D84942
2020-07-31 14:08:44 -04:00
Hans Wennborg 6a3b07a4bf RuntimeDyldELF: report_fatal_error instead of asserting for unimplemented relocations (PR46816)
This fixes the ExecutionEngine/MCJIT/stubs-sm-pic.ll test in no-asserts
builds which is set to XFAIL on some platforms like 32-bit x86. More
importantly, we probably don't want to silently error in these cases.

Differential revision: https://reviews.llvm.org/D84390
2020-07-31 20:06:47 +02:00
Craig Topper 0e0aebc527 [ValueTracking] Add ComputeNumSignBits support for llvm.abs intrinsic
If absolute value needs turn a negative number into a positive number it reduces the number of sign bits by at most 1.

Differential Revision: https://reviews.llvm.org/D84971
2020-07-31 10:59:12 -07:00
Teresa Johnson 1479cdfe4f [ThinLTO] Compile time improvement to propagateAttributes
I found that propagateAttributes was ~23% of a thin link's run time
(almost 4x higher than the second hottest function). The main reason is
that it re-examines a global var each time it is referenced. This
becomes unnecessary once it is marked both non read only and non write
only. I added a set to avoid doing redundant work, which dropped the
runtime of that thin link by almost 15%.

I made a smaller efficiency improvement (no measurable impact) to skip
all summaries for a VI if the first copy is dead. I added an assert to
ensure that all copies are dead if any is. The code in
computeDeadSymbols marks all summaries for a VI as live. There is one
corner case where it was skipping marking an alias as live, that I
fixed. However, since the code earlier marked all copies of a preserved
GUID's VI as live, and each 'visit' marks all copies live, the only case
where this could make a difference is summaries that were marked live
when they were built initially, and that is only a few special compiler
generated symbols and inline assembly symbols, so it likely is never
provoked in practice.

Differential Revision: https://reviews.llvm.org/D84985
2020-07-31 10:54:02 -07:00
Fangrui Song c068e9c8c1 [Support][CommandLine] Delete unused llvm:🆑:ParseEnvrironmentOptions
The function was added in 2003. It is not used and can be emulated with ParseCommandLineOptions.
2020-07-31 10:48:09 -07:00
Albion Fung 93fd8dbdc2 [PowerPC] Add Vector String Isolate instruction definitions and MC Tests
This patch implements the instruction definition and MC tests for the vector
string isolate instructions.

Differential Revision: https://reviews.llvm.org/D84197
2020-07-31 12:32:29 -05:00
Florian Hahn 3b0d30ffd3 [SCEVExpander] Name temporary instructions for LCSSA insertion (NFC). 2020-07-31 18:16:46 +01:00
Aditya Nandakumar 2144a3bdbb [GISel] Add combiners for G_INTTOPTR and G_PTRTOINT
https://reviews.llvm.org/D84909

Patch adds two new GICombinerRules, one for G_INTTOPTR and one for
G_PTRTOINT. The G_INTTOPTR elides ptr2int(int2ptr(x)) to a copy of x, if
the cast is within the same address space. The G_PTRTOINT elides
int2ptr(ptr2int(x)) to a copy of x. Patch additionally adds new combiner
tests for the AArch64 target to test these new combiner rules.

Patch by mkitzan
2020-07-31 10:13:36 -07:00
Hongtao Yu d23c1d6a8d [AutoFDO] Avoid merging inlinee samples multiple times
A function call can be replicated by optimizations like loop unroll and jump threading and the replicates end up sharing the sample nested callee profile. Therefore when it comes to merging samples for uninlined callees in the sample profile inliner, a callee profile can be merged multiple times which will cause an assert to fire.

This change avoids merging same callee profile for duplicate callsites by filtering out callee profiles with a non-zero head sample count.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D84997
2020-07-31 09:30:05 -07:00
Sameer Arora df69492cdf [llvm-libtool-darwin] Refactor Slice and writeUniversalBinary
Refactoring `Slice` class and function `createUniversalBinary` from
`llvm-lipo` into  MachOUniversalWriter. This refactoring is necessary so
as to use the refactored code for creating universal binaries under
llvm-libtool-darwin.

Reviewed by alexshap, smeenai

Differential Revision: https://reviews.llvm.org/D84662
2020-07-31 09:22:35 -07:00
Benjamin Kramer c6f08b14d4 Hide some internal symbols. NFC. 2020-07-31 17:28:02 +02:00
Matt Arsenault 57bd64ff84 Support addrspacecast initializers with isNoopAddrSpaceCast
Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs
with the DataLayout.
2020-07-31 10:42:43 -04:00
Xing GUO 74b02d73e3 [DWARFYAML] Make the debug_aranges entry optional.
This patch makes the 'debug_aranges' entry optional. If the entry is
empty, yaml2obj will only emit the header for it.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D84921
2020-07-31 20:18:53 +08:00
Xing GUO 760e4f2202 [DWARFYAML] Add helper function getDWARFEmitterByName(). NFC.
In this patch, we add a helper function getDWARFEmitterByName(). This
function returns the proper DWARF section emitting method by the name.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D84952
2020-07-31 20:07:39 +08:00
Xing GUO cbf5bf513b [DWARFYAML] Add emitDebug[GNU]Pub[names/types] functions. NFC.
In this patch, emitDebugPubnames(), emitDebugPubtypes(),
emitDebugGNUPubnames(), emitDebugGNUPubtypes() are added.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D85003
2020-07-31 20:05:30 +08:00
Vitaly Buka b0eb40ca39 [NFC] Remove unused GetUnderlyingObject paramenter
Depends on D84617.

Differential Revision: https://reviews.llvm.org/D84621
2020-07-31 02:10:03 -07:00
QingShan Zhang 9b04fec002 [PowerPC] Retrieve the offset from load/store if it stores to stack slots
Scheduler will try to retrieve the offset and base addr to determine if two
loads/stores are disjoint memory access. PowerPC failed to handle this for
frame index which will bring extra memory dependency for loads/stores.

Reviewed By: jji

Differential Revision: https://reviews.llvm.org/D84308
2020-07-31 07:08:20 +00:00
Juneyoung Lee ad48367722 [JumpThreading] Let SimplifyPartiallyRedundantLoad look into freeze
This patch allows SimplifyPartiallyRedundantLoad work when
the branch condition was frozen.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84944
2020-07-31 15:28:24 +09:00
Fangrui Song 1cc210383b [MC] Support infix operator !
Disabled for Darwin mode.

Also disabled for ARM which has compatible aliases (implied 'sp' operand
in 'srs*' instructions like 'srsda #31!').
2020-07-30 23:25:53 -07:00
Craig Topper 30a0dbb70d [X86] Remove x86_sse42_crc32_64_64 from X86TTIImpl::simplifyDemandedUseBitsIntrinsic
It doesn't do any simplifying. It just computes known bits. We
can just let InstCombine call computeKnownBits which will handle
this just as well.
2020-07-30 21:51:23 -07:00
Max Kazantsev 8aaeee5fb6 [SimpleLoopUnswitch] Preserve make.implicit in non-trivial unswitch if legal
We can preserve make.implicit metadata in the split block if it is
guaranteed that after following the branch we always reach the block
where processing of null case happens, which is equivalent to
"initial condition must execute if the loop is entered".

Differential Revision: https://reviews.llvm.org/D84925
Reviewed By: asbirlea
2020-07-31 11:38:43 +07:00
Max Kazantsev d889e17eca [SimpleLoopUnswitch] Drop make.implicit metadata in case of non-trivial unswitching
Non-trivial unswitching simply moves terminator being unswitch from the loop
up to the switch block. It also preserves all metadata that was there. It might not
be a correct thing to do for `make.implicit` metadata. Consider case:
```
for (...) {
  cond = // computed in loop
  if (cond) return X;
  if (p == null) throw_npe(); !make implicit
}
```
Before the unswitching, if `p` is null and we reach this check, we are guaranteed
to go to `throw_npe()` block. Now we unswitch on `p == null` condition:
```
if (p == null) !make implicit {
  for (...) {
    if (cond) return X;
    throw_npe()
  }
} else {
  for (...) {
    if (cond) return X;
  }
}
```
Now, following `true` branch of `p == null` does not always lead us to
`throw_npe()` because the loop has side exit. Now, if we run ImplicitNullCheck
pass on this code, it may end up making the unswitch condition implicit. This may
lead us to turning normal path to `return X` into signal-throwing path, which is
not efficient.

Note that this does not happen during trivial unswitch: it guarantees that we do not
have side exits before condition being unswitched.

This patch fixes this situation by unconditional dropping of `make.implicit` metadata
when we perform non-trivial unswitch. We could preserve it if we could prove that the
condition always executes. This can be done as a follow-up.

Differential Revision: https://reviews.llvm.org/D84916
Reviewed By: asbirlea
2020-07-31 11:33:02 +07:00
Wei Mi 836991d367 Fix a crash when the sample profile uses md5 and -sample-profile-merge-inlinee
is enabled.

When -sample-profile-merge-inlinee is enabled, new FunctionSamples may be
created during profile merge without GUIDToFuncNameMap being initialized.
That will occasionally cause compiler crash. The patch fixes it.

Differential Revision: https://reviews.llvm.org/D84994
2020-07-30 21:21:06 -07:00
Vitaly Buka 89051ebace [NFC] GetUnderlyingObject -> getUnderlyingObject
I am going to touch them in the next patch anyway
2020-07-30 21:08:24 -07:00
Craig Topper 916d9e1877 [X86] Pass the OperandVector by reference to ParseIntelOperand and ParseRoundingMode. NFCI
Similar to what was recently done to ParseATTOperand. Make
ParseIntelOperand directly responsible for adding to the operand
vector instead of returning the operand. Return a bool for error.

Remove ErrorOperand since it is no longer used.
2020-07-30 19:52:38 -07:00
Arthur Eubanks 47acbcf09a [tbaa] Rename type-based-aa -> tbaa
For consistency with legacy pass name.
Helps with 37 instances of "unknown pass name 'tbaa'" in check-llvm under NPM.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D84967
2020-07-30 19:51:35 -07:00
Vitaly Buka b256cb88a7 [ValueTracking] Remove AllocaForValue parameter
findAllocaForValue uses AllocaForValue to cache resolved values.
The function is used only to resolve arguments of lifetime
intrinsic which usually are not fare for allocas. So result reuse
is likely unnoticeable.

In followup patches I'd like to replace the function with
GetUnderlyingObjects.

Depends on D84616.

Differential Revision: https://reviews.llvm.org/D84617
2020-07-30 18:48:34 -07:00
Vitaly Buka 61cab352e3 [NFC] Move findAllocaForValue into ValueTracking.h
Differential Revision: https://reviews.llvm.org/D84616
2020-07-30 18:22:59 -07:00
Scott Constable ec1445c5af [X86] Fix for ballooning compile times due to Load Value Injection (LVI) mitigations
Fix for the issue raised in https://github.com/rust-lang/rust/issues/74632.

The current heuristic for inserting LFENCEs uses a quadratic-time algorithm. This can apparently cause substantial compilation slowdowns for building Rust projects, where functions > 5000 LoC are apparently common.

The updated heuristic in this patch implements a linear-time algorithm. On a set of benchmarks, the slowdown factor for the generated code was comparable (2.55x geo mean for the quadratic-time heuristic, vs. 2.58x for the linear-time heuristic). Both heuristics offer the same security properties, namely, mitigating LVI.

This patch also includes some formatting fixes.

Differential Revision: https://reviews.llvm.org/D84471
2020-07-30 17:22:33 -07:00
Craig Topper 3ad09fd03c [X86] Separate CPU Feature lists in X86.td between architecture features and tuning features
After the recent change to the tuning settings for pentium4 to improve our default 32-bit behavior, I've decided to see about implementing -mtune support. This way we could have a default architecture CPU of "pentium4" or "x86-64" and a default tuning cpu of "generic". And we could change our "pentium4" tuning settings back to what they were before.

As a step to supporting this, this patch separates all of the features lists for the CPUs into 2 lists. I'm using the Proc class and a new ProcModel class to concat the 2 lists before passing to the target independent ProcessorModel. Future work to truly support mtune would change ProcessorModel to take 2 lists separately. I've diffed the X86GenSubtargetInfo.inc file before and after this patch to ensure that the final feature list for the CPUs isn't changed.

Differential Revision: https://reviews.llvm.org/D84879
2020-07-30 17:19:19 -07:00
kuterd 49def10e02 [Attributor] Add time trace support.
This patch addes time trace functionality to have a better understanding
of the analysis times.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84980
2020-07-31 03:08:50 +03:00
Craig Topper 24f5235d93 [ValueTracking] Add basic computeKnownBits support for llvm.abs intrinsic
This includes basic support for computeKnownBits on abs. I've left FIXMEs for more complicated things we could do.

Differential Revision: https://reviews.llvm.org/D84963
2020-07-30 16:26:54 -07:00
Eli Friedman 7e88efa7c5 [LegalizeTypes][SVE] Support widen/split legalization for SPLAT_VECTOR
Just the obvious implementation that rewrites the result type. Also fix
warning from EXTRACT_SUBVECTOR legalization that triggers on the test.

Differential Revision: https://reviews.llvm.org/D84706
2020-07-30 16:17:45 -07:00
Amara Emerson 09f9f7dd1b [AArch64][GlobalISel] Add legalization & selection support for G_INTRINSIC_LRINT.
Differential Revision: https://reviews.llvm.org/D84552
2020-07-30 16:14:56 -07:00
Lang Hames 8ce8cee1e1 [llvm-jitlink] Add -harness option to llvm-jitlink.
The -harness option enables new testing use-cases for llvm-jitlink. It takes a
list of objects to treat as a test harness for any regular objects passed to
llvm-jitlink.

If any files are passed using the -harness option then the following
transformations are applied to all other files:

  (1) Symbols definitions that are referenced by the harness files are promoted
      to default scope. (This enables access to statics from test harness).

  (2) Symbols definitions that clash with definitions in the harness files are
      deleted. (This enables interposition by test harness).

  (3) All other definitions in regular files are demoted to local scope.
      (This causes untested code to be dead stripped, reducing memory cost and
      eliminating spurious unresolved symbol errors from untested code).

These transformations allow the harness files to reference and interpose
symbols in the regular object files, which can be used to support execution
tests (including fuzz tests) of functions in relocatable objects produced by a
build.
2020-07-30 15:26:19 -07:00
Lang Hames 9f1dcdca71 [JITLink] Allow JITLinkContext::notifyResolved to return an Error.
This allows clients to detect invalid transformations applied by JITLink passes
(e.g. inserting or removing symbols in unexpected ways) and terminate linking
with an error.

This change is used to simplify the error propagation logic in
ObjectLinkingLayer.
2020-07-30 15:26:18 -07:00
Matt Arsenault e56e9022bc AMDGPU: Fix liveness errors when copying AGPR tuples
Avoid recursively calling copyPhysReg for AGPR handling. This was
dropping the necessary super register implicit defs to avoid liveness
verifier errors.
2020-07-30 18:13:04 -04:00
Changpeng Fang 243376cdc7 AMDGPU: Put inexpensive ops first in AMDGPUAnnotateUniformValues::visitLoadInst
Summary:
  This is in response to the review of https://reviews.llvm.org/D84873:
The expensive check should be reordered last

Reviewers:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D84890
2020-07-30 14:37:06 -07:00
Nikita Popov 9ebeac6788 [ConstantRange][CVP] Make use of abs poison flag
Pass the abs poison flag to the underlying ConstantRange
implementation, allowing CVP to simplify based on it.

Importantly, this recognizes that abs with poison flag is actually
non-negative...
2020-07-30 23:06:10 +02:00
Jon Roelofs afae6d97fa [SelectionDAG] Fix lowering of vector geps
This fixes an assertion failure that was being triggered in
SelectionDAG::getZeroExtendInReg(), where it was trying to extend the <2xi32>
to i64 (which should have been <2xi64>).

Fixes: rdar://66016901

Differential Revision: https://reviews.llvm.org/D84884
2020-07-30 14:56:53 -06:00
Nikita Popov 94f8120cb9 [ConstantRange] Support abs with poison flag
This just adds the ConstantRange support, including exhaustive
testing. It's not wired up to the IR intrinsic flag yet.
2020-07-30 22:49:28 +02:00
Nikita Popov d8a98a9c35 [ConstantRange][CVP] Compute min/max/abs intrinsic ranges
Wire up ConstantRange::intrinsic() to the existing primitives for
min, max and abs.

The poison flag on abs is not yet taken into account.
2020-07-30 22:21:34 +02:00
Nikita Popov 4c16eafe12 [SCCP] Remove dead switch cases based on range information
Determine whether switch edges are feasible based on range information,
and remove non-feasible edges lateron.

This does not try to determine whether the default edge is dead,
as we'd have to determine that the range is fully covered by the
cases for that.

Another limitation here is that we don't remove dead cases that
have the same successor as a live case. I'm not handling this
because I wanted to keep the edge removal based on feasible edges
only, rather than inspecting ranges again there -- this does not
seem like a particularly useful case to handle.

Differential Revision: https://reviews.llvm.org/D84270
2020-07-30 21:21:08 +02:00
Florian Hahn 2062b3707c [LAA] Avoid adding pointers to the checks if they are not needed.
Currently we skip alias sets with only reads or a single write and no
reads, but still add the pointers to the list of pointers in RtCheck.

This can lead to cases where we try to access a pointer that does not
exist when grouping checks.  In most cases, the way we access
PositionMap masked that, as the value would default to index 0.

But in the example in PR46854 it causes a crash.

This patch updates the logic to avoid adding pointers for alias sets
that do not need any checks. It makes things slightly more verbose, by
first checking the numbers of reads/writes and bailing out early if we don't
need checks for the alias set.

I think this makes the logic a bit simpler to follow.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D84608
2020-07-30 19:21:14 +01:00
Ettore Tiotto 36a4f10376 Fix computeHostNumPhysicalCores() for Linux on POWER and Linux on Z
ThinLTO is run using a single thread on Linux on Power. The
compute_thread_count() routine calls getHostNumPhysicalCores which
returns -1 by default, and so `MaxThreadCount is set to 1.

unsigned llvm::ThreadPoolStrategy::compute_thread_count() const {
    int MaxThreadCount = UseHyperThreads
          ? computeHostNumHardwareThreads()
          : sys::getHostNumPhysicalCores();
     if (MaxThreadCount <= 0)
        MaxThreadCount = 1;
   …
}
Fix: provide custom implementation of getHostNumPhysicalCores for
Linux on Power and Linux on Z.

Reviewed By: Kai, uweigand

Differential Revision: https://reviews.llvm.org/D84764
2020-07-30 18:05:36 +00:00
Wouter van Oortmerssen ce1eb7af9d [WebAssembly] Fixed 64-bit indices in br_table
LLVM selection dag assumes "switch" indices are pointer sized, which causes problems for our 32-bit br_table. The new function ensures 32-bit operands don't get unnecessarily extended, and 64-bit operands get truncated.

Note that the changes to the existing test test exactly that: the addition of -NEXT in 2 places ensures no extension is inserted (which the test previously ignored) and that the wrap is present (previously omitted in wasm64 mode).

Differential Revision: https://reviews.llvm.org/D84705
2020-07-30 10:52:16 -07:00
Stanislav Mekhanoshin 5b32518f96 [AMDGPU] Do not use undef on indirect source
We are using undef on the indirect move source subreg and then
using implicit super-reg. This creates a problem in RA when
Greedy decides to split the register. It reassigns the implicit
super-reg but does not bother to change undef source because
it is really does not matter. The fix is to stop lying to RA and
drop undef flag.

This has also hit a problem in SIFoldOperands as it can fold
immediate into an indirect move since there is no undef flag
anymore. That results in multiple test failures, so added the
check for this case.

Differential Revision: https://reviews.llvm.org/D84899
2020-07-30 10:41:59 -07:00
Simon Pilgrim 4a161bd8b3 LoopUnroll.cpp - pass std::vector by const reference to needToInsertPhisForLCSSA helper. NFCI.
Avoid an unnecessary pass by value.
2020-07-30 18:17:04 +01:00
Yuanfang Chen 555cf42f38 [NewPM][PassInstrument] Add PrintPass callback to StandardInstrumentations
Problem:
Right now, our "Running pass" is not accurate when passes are wrapped in adaptor because adaptor is never skipped and a pass could be skipped. The other problem is that "Running pass" for a adaptor is before any "Running pass" of passes/analyses it depends on. (for example, FunctionToLoopPassAdaptor). So the order of printing is not the actual order.

Solution:
Doing things like PassManager::Debuglogging is very intrusive because we need to specify Debuglogging whenever adaptor is created. (Actually, right now we're not specifying Debuglogging for some sub-PassManagers. Check PassBuilder)

This patch move debug logging for pass as a PassInstrument callback. We could be sure that all running passes are logged and in the correct order.

This could also be used to implement hierarchy pass logging in legacy PM. We could also move logging of pass manager to this if we want.

The test fixes looks messy. It includes changes:
- Remove PassInstrumentationAnalysis
- Remove PassAdaptor
- If a PassAdaptor is for a real pass, the pass is added
- Pass reorder (to the correct order), related to PassAdaptor
- Add missing passes (due to Debuglogging not passed down)

Reviewed By: asbirlea, aeubanks

Differential Revision: https://reviews.llvm.org/D84774
2020-07-30 10:07:57 -07:00
Craig Topper 3632f765dc [WebAssembly] Fix GCC 5 build.
Hans' speculative fix in b7292f2db0
didn't work for me. This seems to.
2020-07-30 10:00:28 -07:00
Hiroshi Yamauchi 3d6f53018f [PGO] Include the mem ops into the function hash.
To avoid hash collisions when the only difference is in mem ops.
2020-07-30 09:26:20 -07:00
hsmahesha 33fd4a18e7 [AMDGPU/MemOpsCluster] Clean-up fixme's around mem ops clustering logic
Get rid of all fixmes and base heuristic on `num-clustered-dwords`. The main intuition behind this is as
follows. The existing heuristic roughly summarizes as below:

* Assume, all the mem ops instructions participating in the clustering process,  loads/stores same num bytes
* If num bytes loaded by each mem op is 4 bytes, then cluster at max 5 mem ops, that is at max 20 bytes
* If num bytes loaded by each mem op is 8 bytes, then cluster at max 3 mem ops, that is at max 24 bytes
* If num bytes loaded by each mem op is 16 bytes, then cluster at max 2 mem ops, that is at max 32 bytes

So, we need to make sure that the new heuristic do not completey deviate away from the above one, and it
properly handles both the sub-word loads and the wide loads.

Reviewed By: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D84354
2020-07-30 21:41:13 +05:30
Brendon Cahoon 7b114446c3 Align store conditional address
In cases where the alignment of the datatype is smaller than
expected by the instruction, the address is aligned. The aligned
address is used for the load, but wasn't used for the store
conditional, which resulted in a run-time alignment exception.
2020-07-30 10:42:00 -05:00
Fangrui Song d2c2248722 [X86] Parse and ignore .arch directives
We parse .arch so that some `.arch i386; .code32` code can assemble. It seems
that X86AsmParser does not do a good job tracking what features are needed to
assemble instructions. GNU as's x86 port supports a very wide range of .arch
operands. Ignore the operand for now.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D84900
2020-07-30 08:30:06 -07:00
Johannes Doerfert 19756ef53a [OpenMP][IRBuilder] Support allocas in nested parallel regions
We need to keep track of the alloca insertion point (which we already
communicate via the callback to the user) as we place allocas as well.

Reviewed By: fghanim, SouraVX

Differential Revision: https://reviews.llvm.org/D82470
2020-07-30 10:19:39 -05:00
Momchil Velikov ef4e665435 [AArch64] Fix operand definitions of XPACI/XPACD
The operand to these instructions is both input and output.

These are not yet emitted by the compiler and the assembler already
works fine, so can't test in this patch.  But D75044 will use XPACI
and provide test coverage for this patch as well.

Differential Revision: https://reviews.llvm.org/D84298
2020-07-30 15:31:44 +01:00
Simon Pilgrim 6316b0023e Attributor.h - remove unnecessary includes. NFCI.
Fix implicit cpp include dependencies.
2020-07-30 15:26:41 +01:00
Hans Wennborg b7292f2db0 Speculative GCC 5 build fix
It's complaining about specializing the template in a different namespace.
2020-07-30 16:12:52 +02:00
jasonliu 04dc9691eb [XCOFF][AIX] Enable -ffunction-sections
Summary:
This patch implements -ffunction-sections on AIX.
This patch focuses on assembly generation.
Follow-on patch needs to handle:
1. -ffunction-sections implication for jump table.
2. Object file generation path and associated testing.

Differential Revision: https://reviews.llvm.org/D83875
2020-07-30 13:30:01 +00:00
David Green 1da0c47fa2 [LoopVectorizer] Don't create unused block masks for reductions. NFC
This removes some unneeded block masks when we don't have any
reductions. It should not have any effect on codegen as the values
created are dead anyway.

Differential Revision: https://reviews.llvm.org/D81415
2020-07-30 14:28:08 +01:00
Florian Hahn 59d6e814ce Revert "[IPConstProp] Remove and move tests to SCCP."
This reverts commit e77624a3be.

Looks like some clang tests manually invoke -ipconstprop via opt.....
2020-07-30 13:06:54 +01:00
Florian Hahn e77624a3be [IPConstProp] Remove and move tests to SCCP.
As far as I know, ipconstprop has not been used in years and ipsccp has
been used instead. This has the potential for confusion and sometimes
leads people to spend time finding & reporting bugs as well as
updating it to work with the latest API changes.

This patch moves the tests over to SCCP. There's one functional difference
I am aware of: ipconstprop propagates for each call-site individually, so
for functions that are called with different constant arguments it can sometimes
produce better results than ipsccp (at much higher compile-time cost).But
IPSCCP can be thought to do so as well for internal functions and as mentioned
earlier, the pass seems unused in practice (and there are no plans on working
towards enabling it anytime).

Also discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143773.html

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84447
2020-07-30 12:36:27 +01:00
Simon Pilgrim cc529285fd VectorUtils.h - reduce unnecessary includes. NFC.
Replace TargetLibraryInfo.h include with forward declaration and fix implicit dependencies.

Reduce SmallSet.h include to SmallVector.h include.
2020-07-30 12:27:49 +01:00
Simon Pilgrim 2dec72ba5c [X86][SSE] combineExtractWithShuffle - extend extract(truncate(x),0) for any source vector size
As long as we can extract the lowest 128-bit subvector from the pre-truncated source vector, then we don't care what size it is.

The next stage will be to support non-zero extraction indices, as long as its still coming from the lowest 128-bit subvector.
2020-07-30 12:27:49 +01:00
Xing GUO 3da6a974db [DWARFYAML] Make the 'Length' field of the address range table optional.
This patch makes the 'Length' field of the address range table optional.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D84911
2020-07-30 17:42:18 +08:00
Xing GUO 006f6f8ac6 [DWARFYAML] Make the 'AddressSize', 'SegmentSelectorSize' fields optional.
This patch makes the 'AddressSize' and 'SegmentSelectorSize' fields of
address range table optional.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D84907
2020-07-30 17:39:58 +08:00
Sam Tebbs 276ed5f7e4 [DAGCombiner] Fold sext_inreg of a masked load into a sign extended masked load
This patch adds a DAG combine fold for a sext(masked_load) into a sign extended masked load.

Differential Revision: https://reviews.llvm.org/D84332
2020-07-30 10:34:02 +01:00
Kang Zhang 0037a5f894 [PHIElimination] Fix the killed flag for LowerPHINode()
Summary:
In the phi-node-elimination pass, we set the killed flag incorrectly.
When we eliminate the PHI node, we replace the PHI with a copy for the
incoming value.

Before this patch, we will set incoming value as killed(PHICopy). And
we will remove the killed flag from last using incoming value(OldKill).
This is correct, only if the new PHICopy is after the OldKill.

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D80886
2020-07-30 08:18:50 +00:00
David Sherwood 23ad660b5d [SVE][CodeGen] At -O0 fallback to DAG ISel when translating alloca with scalable types
When building code at -O0 We weren't falling back to DAG ISel correctly
when encountering alloca instructions with scalable vector types. This
is because the alloca has no operands that are scalable. I've fixed this by
adding a check in AArch64ISelLowering::fallBackToDAGISel for alloca
instructions with scalable types.

Differential Revision: https://reviews.llvm.org/D84746
2020-07-30 08:40:53 +01:00
Craig Topper 07bb8240a0 [X86] Pass the OperandVector to ParseMemOperand instead of returning the operand. NFCI
Continue the change made to ParseATTOperand to take the vector by
reference. Let ParseMemOperand add its memory operand to the
vector and just return true/false to indicate error.
2020-07-29 23:44:56 -07:00
Craig Topper 17597442db [X86] Don't pass some many parameters to ParseMemOperand by reference.
Pointers and SMLocs are cheap to copy. Even though the function
modifies some of these the caller doesn't use them after the call.
2020-07-29 23:44:56 -07:00
Serge Pavlov 032ed39def [Support] Class to facilitate file locking
This change define RAII class `FileLocker` and methods `lock` and
`tryLockFor` of the class `raw_fd_stream` to facilitate using file locks.

Differential Revision: https://reviews.llvm.org/D79066
2020-07-30 13:42:20 +07:00
Max Kazantsev 3678ad88a6 [NFC] Remove unused variable 2020-07-30 13:32:15 +07:00
Craig Topper 9611ee5f40 [X86] Teach the assembler parser to handle a '*' between segment register and base/index/displacement part of an address
A '*' after the segment is equivalent to a '*' before the segment register. To make the AsmMatcher table work we need to place the '*' token into the operand vector before the full memory operand. To accomplish this I've modified some portions of operand parsing to expose the operand vector to ParseATTOperand so that the token can be pushed to the vector after parsing the segment register and before creating the memory operand using that segment register.

Fixes PR46879

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D84895
2020-07-29 21:15:04 -07:00
Kang Zhang a18953c1c0 [PowerPC] Fix RM operands for some instructions
Summary:
Some instructions have set the wrong [RM] flag, this patch is to fix it.

Instructions x(v|s)r(d|s)pi[zmp]? and fri[npzm] use fixed rounding
directions without referencing current rounding mode.

Also, the SETRNDi, SETRND, BCLRn, MTFSFI, MTFSB0, MTFSB1, MTFSFb,
MTFSFI, MTFSFI_rec, MTFSF, MTFSF_rec should also fix the RM flag.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D81360
2020-07-30 02:10:49 +00:00
Matt Arsenault 7d0b32c268 GlobalISel: Use result of find rather than rechecking map 2020-07-29 21:26:20 -04:00
Matt Arsenault 66c572af55 GlobalISel: Handle assorted no-op intrinsics
SelectionDAGBuilder just drops these, so do the same.
2020-07-29 21:26:20 -04:00
Juneyoung Lee 111a02decd [JumpThreading] Fold br(freeze(undef))
This patch makes JumpThreading fold br(freeze(undef)) if the freeze instruction
is only used by the branch.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84818
2020-07-30 09:38:50 +09:00
Matt Arsenault 0da582d9b6 GlobalISel: Handle llvm.roundeven
I still think it's highly questionable that we have two intrinsics
with identical behavior and only vary by the name of the libcall used
if it happens to be lowered that way, but try to reduce the feature
delta between SDAG and GlobalISel for recently added intrinsics. I'm
not sure which opcode should be considered the canonical one, but
lower roundeven back to round.
2020-07-29 20:01:12 -04:00
Mircea Trofin 71059257bd [llvm][NFC] TensorSpec abstraction for ML evaluator
Further abstracting the specification of a tensor, to more easily
support different types and shapes of tensor, and also to perform
initialization up-front, at TFModelEvaluator construction time.

Differential Revision: https://reviews.llvm.org/D84685
2020-07-29 16:29:21 -07:00
Hiroshi Yamauchi ae7589e1f1 Revert "[PGO] Include the mem ops into the function hash."
This reverts commit 120e66b341.

Due to a buildbot failure.
2020-07-29 15:04:57 -07:00
Craig Topper b1c1825b99 [X86] Remove unused argument from HandleAVX512Operand in the assembly parser. 2020-07-29 14:23:01 -07:00
Sanjay Patel fef513f5cc [InstSimplify] fold min/max intrinsic with undef operand 2020-07-29 17:03:50 -04:00
Sanjay Patel 5cd695dd7f [InstSimplify] fold min/max with opposite of limit value 2020-07-29 17:03:50 -04:00
Hiroshi Yamauchi 120e66b341 [PGO] Include the mem ops into the function hash.
To avoid hash collisions when the only difference is in mem ops.

Differential Revision: https://reviews.llvm.org/D84782
2020-07-29 13:59:40 -07:00
Philip Reames 755f91f12c [Statepoint] Enable cross block relocates w/vreg lowering
This change is mechanical, it just removes the restriction and updates tests.  The key building blocks were submitted in 31342eb and 8fe2abc.

Note that this (and preceeding changes) entirely subsumes D83965.  I did includes a couple of it's tests.

From the codegen changes, an interesting observation: this doesn't actual reduce spilling, it just let's the register allocator do it's job.  That results in a slightly different overall result which has both pros and cons over the eager spill lowering.  (i.e. We'll have some perf tuning to do once this is stable.)
2020-07-29 13:32:51 -07:00
Nikita Popov 897bdca4b8 [ConstantRange] Add API for intrinsics (NFC)
This adds a common API for compute constant ranges of intrinsics.
The intention here is that
a) we can reuse the same code across different passes that handle
   constant ranges, i.e. this can be reused in SCCP
b) we only have to add knowledge about supported intrinsics to
   ConstantRange, not any consumers.

Differential Revision: https://reviews.llvm.org/D84587
2020-07-29 22:16:27 +02:00
Simon Pilgrim a1c9529e60 [X86][AVX] isHorizontalBinOp - relax no-lane-crossing limit for AVX1-only targets.
Instead of never accepting v8f32/v4f64 FHADD/FHSUB if the input shuffle masks cross lanes, perform the matching and determine if the post shuffle mask simplifies to a 'whole lane shuffle' mask - in which case we are guaranteed to cheaply perform this as a VPERM2F128 shuffle.
2020-07-29 20:49:10 +01:00
Florian Hahn f75564ad4e Reland "[SCEVExpander] Add option to preserve LCSSA directly."
This reverts the revert commit dc28675768.

It includes a fix for Polly, which uses SCEVExpander on IR that is not
in LCSSA form. Set PreserveLCSSA = false in that case, to ensure we do
not introduce LCSSA phis where there were none before.
2020-07-29 20:41:53 +01:00
Stanislav Mekhanoshin decfdb8ce3 [AMDGPU] Fixed formatting in GCNHazardRecognizer.cpp. NFC. 2020-07-29 12:21:28 -07:00
Stanislav Mekhanoshin 13b63be472 [AMDGPU] prefer non-mfma in post-RA schedule
MFMA instructions shall not be scheduled back to back
to avoid MAI SIMD stall. Tell post-RA schedule we would
prefer some other instruction instead.

Differential Revision: https://reviews.llvm.org/D84883
2020-07-29 12:17:50 -07:00
Baptiste Saleil 7aaa85627b [PowerPC] Add options to control paired vector memops support
Adds frontend and backend options to enable and disable the
PowerPC paired vector memory operations added in ISA 3.1.
Instructions using these options will be added in subsequent patches.

Differential Revision: https://reviews.llvm.org/D83722
2020-07-29 14:00:53 -05:00
Matt Morehouse e2d0b44a7c [DFSan] Add efficient fast16labels instrumentation mode.
Adds the -fast-16-labels flag, which enables efficient instrumentation
for DFSan when the user needs <=16 labels.  The instrumentation
eliminates most branches and most calls to __dfsan_union or
__dfsan_union_load.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D84371
2020-07-29 18:58:47 +00:00
Amara Emerson 0c0e36061a [GlobalISel] Add G_INTRINSIC_LRINT and translate from llvm.lrint
Differential Revision: https://reviews.llvm.org/D84551
2020-07-29 11:51:04 -07:00
Philip Reames 8fe2abc190 [Statepoint] Consolidate relocation type tracking [NFC]
Change the way we track how a particular pointer was relocated at a statepoint in selection dag.  Previously, we used an optional<location> for the spill lowering, and a block local Register for the newly introduced vreg lowering.  Combine all three lowerings (norelocate, spill, and vreg) into a single helper class, and keep a single copy of the information.

This is submitted separately as it really does make the code more readible on it's own, but the indirect motivation is to move vreg tracking from StatepointLowering to FunctionLoweringInfo.  This is the last piece needed to support cross block relocations with vregs; that will follow in a separate (non-NFC) patch.
2020-07-29 11:45:31 -07:00
Amara Emerson d8ba622209 [AArch64][GlobalISel] Selection support for vector DUP[X]lane instructions.
In future, we'd like to use the perfect-shuffle mechanism to deal with these
shuffle permutations. For now, this improves performance by avoiding the
super-expensive const-pool load + tbl instruction.

Differential Revision: https://reviews.llvm.org/D84866
2020-07-29 11:41:37 -07:00
Matt Arsenault 59fac51ff2 AMDGPU/GlobalISel: Handle llvm.amdgcn.reloc.constant 2020-07-29 14:24:21 -04:00
Florian Hahn dc28675768 Revert "[SCEVExpander] Add option to preserve LCSSA directly."
This reverts commit 99166fd4fb, because it
breaks the polly builders.

polly/test/Isl/CodeGen/invariant_load_escaping_second_scop.ll fails
because a apparently unnecessary LCSSA phi node is introduced.

Make the bots green again, while I take a closer look.
2020-07-29 19:19:04 +01:00
Matt Arsenault 0b7de7966f GlobalISel: Implement lower for G_EXTRACT_VECTOR_ELT
Use the basic store to stack and reload.
2020-07-29 14:16:28 -04:00
Jessica Paquette 7ff9575594 [AArch64][GlobalISel] Select XRO addressing mode with wide immediates
Port the wide immediate case from AArch64DAGToDAGISel::SelectAddrModeXRO.

If we have a wide immediate which can't be represented in an add, we can end up
with code like this:

```
mov  x0, imm
add x1, base, x0
ldr  x2, [x1, 0]
```

If we use the [base, xN] addressing mode instead, we can produce this:

```
mov  x0, imm
ldr  x2, [base, x0]
```

This saves 0.4% code size on 7zip at -O3, and gives a geomean code size
improvement of 0.1% on CTMark.

Differential Revision: https://reviews.llvm.org/D84784
2020-07-29 11:02:10 -07:00
Matt Arsenault 766cb615a3 AMDGPU: Relax restriction on folding immediates into physregs
I never completed the work on the patches referenced by
f8bf7d7f42, but this was intended to
avoid folding immediate writes into m0 which the coalescer doesn't
understand very well. Relax this to allow simple SGPR immediates to
fold directly into VGPR copies. This pattern shows up routinely in
current GlobalISel code since nothing is smart enough to emit VGPR
constants yet.
2020-07-29 14:01:53 -04:00
Matt Arsenault 90b76dac57 GloblaISel: Remove unreachable condition
Fixes bug 46882
2020-07-29 13:42:22 -04:00
Heejin Ahn 276f9e8cfa [WebAssembly] Fix getBottom for loops
When it was first created, CFGSort only made sure BBs in each
`MachineLoop` are sorted together. After we added exception support,
CFGSort now also sorts BBs in each `WebAssemblyException`, which
represents a `catch` block, together, and
`Region` class was introduced to be a thin wrapper for both
`MachineLoop` and `WebAssemblyException`.

But how we compute those loops and exceptions is different.
`MachineLoopInfo` is constructed using the standard loop computation
algorithm in LLVM; the definition of loop is "a set of BBs that are
dominated by a loop header and have a path back to the loop header". So
even if some BBs are semantically contained by a loop in the original
program, or in other words dominated by a loop header, if they don't
have a path back to the loop header, they are not considered a part of
the loop. For example, if a BB is dominated by a loop header but
contains `call abort()` or `rethrow`, it wouldn't have a path back to
the header, so it is not included in the loop.

But `WebAssemblyException` is wasm-specific data structure, and its
algorithm is simple: a `WebAssemblyException` consists of an EH pad and
all BBs dominated by the EH pad. So this scenario is possible: (This is
also the situation in the newly added test in cfg-stackify-eh.ll)

```
Loop L: header, A, ehpad, latch
Exception E: ehpad, latch, B
```
(B contains `abort()`, so it does not have a path back to the loop
header, so it is not included in L.)

And it is sorted in this order:
```
header
A
ehpad
latch
B
```

And when CFGStackify places `end_loop` or `end_try` markers, it
previously used `WebAssembly::getBottom()`, which returns the latest BB
in the sorted order, and placed the marker there. So in this case the
marker placements will be like this:
```
loop
  header
  try
    A
  catch
    ehpad
    latch
end_loop         <-- misplaced!
    B
  end_try
```
in which nesting between the loop and the exception is not correct.
`end_loop` marker has to be placed after `B`, and also after `end_try`.

Maybe the fundamental way to solve this problem is to come up with our
own algorithm for computing loop region too, in which we include all BBs
dominated by a loop header in a loop. But this takes a lot more effort.
The only thing we need to fix is actually, `getBottom()`. If we make it
return the right BB, which means in case of a loop, the latest BB of the
loop itself and all exceptions contained in there, we are good.

This renames `Region` and `RegionInfo` to `SortRegion` and
`SortRegionInfo` and extracts them into their own file. And add
`getBottom` to `SortRegionInfo` class, from which it can access
`WebAssemblyExceptionInfo`, so that it can compute a correct bottom
block for loops.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D84724
2020-07-29 10:36:32 -07:00
Craig Topper c4823b24a4 [X86] Add custom lowering for llvm.roundeven with sse4.1.
We can use the roundss/sd/ps/pd instructions like we do for
ceil/floor/trunc/rint/nearbyint.

Differential Revision: https://reviews.llvm.org/D84592
2020-07-29 10:23:08 -07:00
Craig Topper 3efc978bae [LV] Add abs/smin/smax/umin/umax intrinsics to isTriviallyVectorizable
This patch adds support for vectorizing these intrinsics.

Differential Revision: https://reviews.llvm.org/D84796
2020-07-29 10:23:07 -07:00
Arthur Eubanks 71d0a2b8a3 [DFSan][NewPM] Port DataFlowSanitizer to NewPM
Reviewed By: ychen, morehouse

Differential Revision: https://reviews.llvm.org/D84707
2020-07-29 10:19:15 -07:00
Sanjay Patel ee9617e96b [InstSimplify] try constant folding intrinsics before general simplifications
This matches the behavior of simplify calls for regular opcodes -
rely on ConstantFolding before spending time on folds with variables.

I am not aware of any diffs from this re-ordering currently, but there was
potential for unintended behavior from the min/max intrinsics because that
code is implicitly assuming that only 1 of the input operands is constant.
2020-07-29 13:18:40 -04:00
Simon Pilgrim fdc902774e [DAG][AMDGPU][X86] Add SimplifyMultipleUseDemandedBits handling for SIGN/ZERO_EXTEND + SIGN/ZERO_EXTEND_VECTOR_INREG
Peek through multiple use ops like we already do for ANY_EXTEND/ANY_EXTEND_VECTOR_INREG

Differential Revision: https://reviews.llvm.org/D84863
2020-07-29 18:10:59 +01:00
Roman Lebedev 1d51dc38d8
[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108
2020-07-29 20:05:30 +03:00
Kang Zhang 802c043078 [PowerPC] Set v1i128 to expand for SETCC to avoid crash
Summary:
PPC only supports the instruction selection for v16i8, v8i16, v4i32,
v2i64, v4f32 and v2f64 for ISD::SETCC, don't support the v1i128, so
v1i128 for ISD::SETCC will crash.

This patch is to set v1i128 to expand to avoid crash.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D84238
2020-07-29 16:39:27 +00:00
Philip Reames 31342eb63e [Statepoint] When using the tied def lowering, unconditionally use vregs [almost NFC]
This builds on 3da1a96 on the path towards supporting invokes and cross block relocations. The actual change attempts to be NFC, but does fail in one corner-case explained below.

The change itself is fairly mechanical. Rather than remember SDValues - which are inherently block local - immediately produce a virtual register copy and remember that.

Once this lands, we'll update the FunctionLoweringInfo::StatepointSpillMap map to allow register based lowerings, delete VirtRegs from StatepointLowering, and drop the restriction against cross block relocations. I deliberately separate the semantic part into it's own change for easy of understanding and fault isolation.

The corner-case which isn't quite NFC is that the old implementation implicitly CSEd gc.relocates of the same SDValue regardless of type. The new implementation still only relocates once, but it produces distinct vregs for the bitcast and it's source, whereas SelectionDAG's generic CSE was able to remove the bitcast in the old implementation. Note that the final assembly doesn't change (at least in the test), as our MI level optimizations catch the duplication.

I assert that this is an uninteresting corner-case. It's functionally correct, and if we find a case where this influences performance, we should really be canonicalizing types to i8* at the IR level.

Differential Revision: https://reviews.llvm.org/D84692
2020-07-29 09:23:52 -07:00
Sanjay Patel 3e8534fbc6 [InstSimplify] allow partial undef constants for vector min/max folds 2020-07-29 11:53:41 -04:00
Sanjay Patel 3c20ede18b [InstSimplify] fold integer min/max intrinsic with same args 2020-07-29 11:53:41 -04:00
Kang Zhang a4ade9ed21 [MachineVerifier] Handle the PHI node for verifyLiveVariables()
Summary:
When doing MachineVerifier for LiveVariables, the MachineVerifier pass
will calculate the LiveVariables, and compares the result with the
result livevars pass gave. If they are different, verifyLiveVariables()
will give error.

But when we calculate the LiveVariables in MachineVerifier, we don't
consider the PHI node, while livevars considers.

This patch is to fix above bug.

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D80274
2020-07-29 15:43:47 +00:00
Matt Arsenault d42c7b2211 AMDGPU: Account for the size of LDS globals used through constant
expressions.

Also "fix" the longstanding bug where the computed size depends on the
order of the visitation. We could try to predict the allocation order
used by legalization, but it would never be 100% perfect. Until we
start fixing the addresses somehow (or have a more reliable allocation
scheme later), just try to compute the size based on the worst case
padding.
2020-07-29 11:40:42 -04:00
David Sherwood 9ad7c980bb [SVE] Don't consider scalable vector types in SLPVectorizerPass::vectorizeChainsInBlock
In vectorizeChainsInBlock we try to collect chains of PHI nodes
that have the same element type, but the code is relying upon
the implicit conversion from TypeSize -> uint64_t. For now, I have
modified the code to ignore PHI nodes with scalable types.

Differential Revision: https://reviews.llvm.org/D83542
2020-07-29 16:29:19 +01:00
Yuanfang Chen 7a2e1122ae [NewPM][PassInstrument] Make PrintIR and TimePasses to use before-pass-run callback
Reviewed By: asbirlea, aeubanks

Differential Revision: https://reviews.llvm.org/D84773
2020-07-29 08:26:36 -07:00
Johannes Doerfert ee05167cc4 [OpenMP] Allow traits for the OpenMP context selector `isa`
It was unclear what `isa` was supposed to mean so we did not provide any
traits for this context selector. With this patch we will allow *any*
string or identifier. We use the target attribute and target info to
determine if the trait matches. In other words, we will check if the
provided value is a target feature that is available (at the call site).

Fixes PR46338

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D83281
2020-07-29 10:22:27 -05:00
Simon Wallis 6a05c6bfc8 [MachineCopyPropagation] BackwardPropagatableCopy: add check for hasOverlappingMultipleDef
In MachineCopyPropagation::BackwardPropagatableCopy(),
a check is added for multiple destination registers.

The copy propagation is avoided if the copied destination register
is the same register as another destination on the same instruction.

A new test is added.  This used to fail on ARM like this:
error: unpredictable instruction, RdHi and RdLo must be different
        umull   r9, r9, lr, r0

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D82638
2020-07-29 16:21:01 +01:00
Xing GUO bfa140376d [DWARFYAML] Make the field names consistent with the DWARF spec. NFC.
This patch replaces 'AddrSize'/'SegSize' with
'AddressSize'/'SegmentSelectorSize'. NFC.
2020-07-29 23:10:08 +08:00
Sanjay Patel 9ee7d7122c [ConstantFolding] fold integer min/max intrinsics
If both operands are undef, return undef.
If one operand is undef, clamp to limit constant.
2020-07-29 11:01:13 -04:00
Simon Pilgrim d1abca187d [CostModel][X86] Add SSE costs for SMAX/SMIN/UMAX/UMIN intrinsics 2020-07-29 15:55:43 +01:00
Florian Hahn 99166fd4fb [SCEVExpander] Add option to preserve LCSSA directly.
This patch teaches SCEVExpander to directly preserve LCSSA.

As it is currently, SCEV does not look through PHI nodes in loops,
as it might break LCSSA form. Once SCEVExpander can preserve
LCSSA form, it should be safe for SCEV to look through PHIs.

To preserve LCSSA form, this patch uses formLCSSAForInstructions
on operands of newly created instructions, if the definition is inside
a different loop than the new instruction.

The final value we return from expandCodeFor may also need LCSSA
phis, depending on the insert point. As no user for it exists there yet,
create a temporary instruction at the insert point, which can be passed
to formLCSSAForInstructions. This temporary instruction is removed
after LCSSA construction.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D71538
2020-07-29 15:07:37 +01:00
Simon Pilgrim 0a0f28254a [CostModel][X86] Add SSE costs for ABS intrinsics 2020-07-29 14:33:59 +01:00
Victor Campos d1a3396bfb [Driver][ARM] Disable unsupported features when nofp arch extension is used
A list of target features is disabled when there is no hardware
floating-point support. This is the case when one of the following
options is passed to clang:

 - -mfloat-abi=soft
 - -mfpu=none

This option list is missing, however, the extension "+nofp" that can be
specified in -march flags, such as "-march=armv8-a+nofp".

This patch also disables unsupported target features when nofp is passed
to -march.

Differential Revision: https://reviews.llvm.org/D82948
2020-07-29 14:13:22 +01:00
David Green 9ddb28964c [ARM] Tune getCastInstrCost for extending masked loads and truncating masked stores
This patch uses the feature added in D79162 to fix the cost of a
sext/zext of a masked load, or a trunc for a masked store.
Previously, those were considered cheap or even free, but it's
not the case as we cannot split the load in the same way we would for
normal loads.

This updates the costs to better reflect reality, and adds a test for it
in test/Analysis/CostModel/ARM/cast.ll.

It also adds a vectorizer test that showcases the improvement: in some
cases, the vectorizer will now choose a smaller VF when
tail-predication is enabled, which results in better codegen. (Because
if it were to use a higher VF in those cases, the code we see above
would be generated, and the vmovs would block tail-predication later in
the process, resulting in very poor codegen overall)

Original Patch by Pierre van Houtryve

Differential Revision: https://reviews.llvm.org/D79163
2020-07-29 13:41:34 +01:00
David Green 60280e9818 [Analysis] TTI: Add CastContextHint for getCastInstrCost
Currently, getCastInstrCost has limited information about the cast it's
rating, often just the opcode and types.  Sometimes there is a context
instruction as well, but it isn't trustworthy: for instance, when the
vectorizer is rating a plan, it calls getCastInstrCost with the old
instructions when, in fact, it's trying to evaluate the cost of the
instruction post-vectorization.  Thus, the current system can get the
cost of certain casts incorrect as the correct cost can vary greatly
based on the context in which it's used.

For example, if the vectorizer queries getCastInstrCost to evaluate the
cost of a sext(load) with tail predication enabled, getCastInstrCost
will think it's free most of the time, but it's not always free. On ARM
MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar
situations can come up with how masked loads can be extended when being
split.

To fix that, this path adds a new parameter to getCastInstrCost to give
it a hint about the context of the cast. It adds a CastContextHint enum
which contains the type of the load/store being created by the
vectorizer - one for each of the types it can produce.

Original patch by Pierre van Houtryve

Differential Revision: https://reviews.llvm.org/D79162
2020-07-29 13:32:53 +01:00
David Sherwood 2078771759 [SVE][CodeGen] Add simple integer add tests for SVE tuple types
I have added tests to:

  CodeGen/AArch64/sve-intrinsics-int-arith.ll

for doing simple integer add operations on tuple types. Since these
tests introduced new warnings due to incorrect use of
getVectorNumElements() I have also fixed up these warnings in the
same patch. These fixes are:

1. In narrowExtractedVectorBinOp I have changed the code to bail out
early for scalable vector types, since we've not yet hit a case that
proves the optimisations are profitable for scalable vectors.
2. In DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS I have replaced
calls to getVectorNumElements with getVectorMinNumElements in cases
that work with scalable vectors. For the other cases I have added
asserts that the vector is not scalable because we should not be
using shuffle vectors and build vectors in such cases.

Differential revision: https://reviews.llvm.org/D84016
2020-07-29 13:32:10 +01:00
Sjoerd Meijer 85342c27a3 [ARM] Optimize immediate selection
Optimize some specific immediates selection by materializing them with sub/mvn
instructions as opposed to loading them from the constant pool.

Patch by Ben Shi, powerman1st@163.com.

Differential Revision: https://reviews.llvm.org/D83745
2020-07-29 13:29:17 +01:00
Matt Arsenault 200bb5191a AMDGPU/GlobalISel: Refactor special argument management 2020-07-29 08:27:31 -04:00
Matt Arsenault c230965ccf AMDGPU: Make saturating add/sub legal for DAG path 2020-07-29 08:27:31 -04:00
Matt Arsenault cdd45d5f9c AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.csub
Remove the custom node boilerplate. Not sure why this tried to handle
the LDS atomic stuff.
2020-07-29 08:27:31 -04:00
David Sherwood 5d84eafc6b [CodeGen] Remove calls to getVectorNumElements in DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR
In DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR I have replaced
calls to getVectorNumElements with getVectorMinNumElements, since
this code path works for both fixed and scalable vector types. For
scalable vectors the index will be multiplied by VSCALE.

Fixes warnings in this test:

  sve-sext-zext.ll

Differential revision: https://reviews.llvm.org/D83198
2020-07-29 13:05:39 +01:00
Yevgeny Rouban 5d6cd61904 [LoopSimplifyCFG] Delete landing pads in dead exit blocks
In addition to removing phi nodes this patch removes any
landing pad that the dead exit block might have. Without
this fix Verifier complains about a new switch instruction
jumps to a block with a landing pad.

Differential Revision: https://reviews.llvm.org/D84320
2020-07-29 18:36:51 +07:00
Simon Pilgrim 0c005be6eb [X86][SSE] getV4X86ShuffleImm8 - canonicalize broadcast masks
If the mask input to getV4X86ShuffleImm8 only refers to a single source element (+ undefs) then canonicalize to a full broadcast.

getV4X86ShuffleImm8 defaults to inline values for undefs, which can be useful for shuffle widening/narrowing but does leave SimplifyDemanded* calls thinking the shuffle depends on unnecessary elements.

I'm still investigating what we should do more generally to avoid these undemanded elements, but broadcast cases was a simpler win.
2020-07-29 11:32:44 +01:00
Ikhlas Ajbar d50d4c3d44 [Hexagon] Correct the order of operands when lowering funnel shift-left
This patch corrects the order of operands in the pattern that lowers fshl
in Hexagon.
2020-07-28 21:22:41 -05:00
Chuanqi Xu dd4106d22e [NFC] Edit the comment in User::replaceUsesOfWith 2020-07-29 10:02:04 +08:00
Kang Zhang 00046d789c [PowerPC] Add Def CR1 for MTFSFI_rec and MTFSF_rec 2020-07-29 01:47:23 +00:00
Matt Arsenault 44211f20a8 AMDGPU: Optimize copies to exec with other insts after exec def
It's possible to have terminator instructions after a write to exec,
so skip over them to find it.
2020-07-28 21:34:50 -04:00
Matt Arsenault b6ebc77326 AMDGPU/GlobalISel: Fix selecting llvm.amdgcn.s.getreg
This introduces the same bug llvm.amdgcn.s.setreg has where if the
user specified an immediate outside of the valid 16-bit range, it will
select into a verifier error.
2020-07-28 21:34:50 -04:00
Thomas Lively 11bb7eef41 [WebAssembly] Remove intrinsics for SIMD widening ops
Instead, pattern match extends of extract_subvectors to generate
widening operations. Since extract_subvector is not a legal node, this
is implemented via a custom combine that recognizes extract_subvector
nodes before they are legalized. The combine produces custom ISD nodes
that are later pattern matched directly, just like the intrinsic was.

Also removes the clang builtins for these operations since the
instructions can now be generated from portable code sequences.

Differential Revision: https://reviews.llvm.org/D84556
2020-07-28 18:25:55 -07:00
Craig Topper 06cf6f770d [X86] Add FeatureCMPXCHG8B and FeatureSlowUAMem16 to 'lakemont' in X86.td
We already had CMPXCH8B feature on this CPU for the frontend so
this doesn't have much effect.

The FeatureSlowUAMem16 only matters if someone compiles with
-march=lakemont -msse which doesn't make sense, but is consistent
with all our pre-sse4.2 CPUs. Maybe the feature flag should be
FeatureFastUAMem16 and set on the newer CPUs instead.
2020-07-28 18:24:46 -07:00
Matt Arsenault 6a7b6dd54b AMDGPU: Don't assert in canInsertSelect
Currently GlobalISel doesn't force all VGPR phi operands to VGPRs, so
this hit a case where it was queried with a VGPR and SGPR. This could
arguably be a verifier error, but it's currently not.
2020-07-28 21:01:06 -04:00
Thomas Lively ffd8c23ccb [WebAssembly] Implement truncating vector stores
Rather than expanding truncating stores so that vectors are stored one
lane at a time, lower them to a sequence of instructions using
narrowing operations instead, when possible. Since the narrowing
operations have saturating semantics, but truncating stores require
truncation, mask the stored value to manually truncate it before
narrowing. Also, since narrowing is a binary operation, pass in the
original vector as the unused second argument.

Differential Revision: https://reviews.llvm.org/D84377
2020-07-28 17:46:45 -07:00
Matt Arsenault 068808d102 AMDGPU: Don't assume call targets are registers
GlobalISel let through a call to null, which would then fold into the
source operand like any other inline immediate. The SelectionDAG
lowering deletes calls to null and undef as a workaround from before
calls were supported. We should probably drop the special handling
case in the DAG lowering now, since the middle end optimizers delete
null calls anyway.
2020-07-28 20:46:06 -04:00
Matt Arsenault 8860daf0ed AMDGPU: Handle a few missing cases in getAddrModeArguments 2020-07-28 20:22:38 -04:00
Matt Arsenault b3e63aa8a4 AMDGPU: Don't assume there is only one terminator copy
This would stop on the first in reverse order, failing the verifier if
there were more earlier in the block.
2020-07-28 20:22:38 -04:00
Matt Arsenault 592f2e8d1c AMDGPU: Fix verifier error on spilling partially defined SGPRs
This needs an implicit def of the super-register in case one of the
lanes isn't defined, similar to copyPhysReg (or the not-VGPR spill
case below). This showed up in GlobalISel testing since it currently
doesn't fold out many undef instructions.
2020-07-28 20:01:57 -04:00
Matt Arsenault 66d60e06cb AMDGPU: Serialize MFI spill fields
These should probably be inferred from the function on parse, but the
target specific infrastructure currently does not give you a way to do
this. SILowerSGPRSpills early exits without this reporting spills,
which makes it difficult to write a MIR test for.
2020-07-28 20:01:57 -04:00
Joel E. Denny 9f86b8ec41 [FileCheck] Report captured variables
Report captured variables in input dumps and traces.  For example:

```
$ cat check
CHECK: hello [[WHAT:[a-z]+]]
CHECK: goodbye [[WHAT]]

$ FileCheck -dump-input=always -vv check < input |& tail -8
<<<<<<
           1: hello world
check:1'0     ^~~~~~~~~~~
check:1'1           ^~~~~ captured var "WHAT"
           2: goodbye world
check:2'0     ^~~~~~~~~~~~~
check:2'1                   with "WHAT" equal to "world"
>>>>>>

$ FileCheck -dump-input=never -vv check < input
check2:1:8: remark: CHECK: expected string found in input
CHECK: hello [[WHAT:[a-z]+]]
       ^
<stdin>:1:1: note: found here
hello world
^~~~~~~~~~~
<stdin>:1:7: note: captured var "WHAT"
hello world
      ^~~~~
check2:2:8: remark: CHECK: expected string found in input
CHECK: goodbye [[WHAT]]
       ^
<stdin>:2:1: note: found here
goodbye world
^~~~~~~~~~~~~
<stdin>:2:1: note: with "WHAT" equal to "world"
goodbye world
^
```

Reviewed By: thopre

Differential Revision: https://reviews.llvm.org/D83651
2020-07-28 19:15:18 -04:00
Joel E. Denny d680711b94 [FileCheck] Extend -dump-input with substitutions
Substitutions are already reported in the diagnostics appearing before
the input dump in the case of failed directives, and they're reported
in traces (produced by `-vv -dump-input=never`) in the case of
successful directives.  However, those reports are not always
convenient to view while investigating the input dump, so this patch
adds the substitution report to the input dump too.  For example:

```
$ cat check
CHECK: hello [[WHAT:[a-z]+]]
CHECK: [[VERB]] [[WHAT]]

$ FileCheck -vv -DVERB=goodbye check < input |& tail -8
<<<<<<
           1: hello world
check:1       ^~~~~~~~~~~
           2: goodbye word
check:2'0     X~~~~~~~~~~~ error: no match found
check:2'1                  with "VERB" equal to "goodbye"
check:2'2                  with "WHAT" equal to "world"
>>>>>>
```

Without this patch, the location reported for a substitution for a
directive match is the directive's full match range.  This location is
misleading as it implies the substitution itself matches that range.
This patch changes the reported location to just the match range start
to suggest the substitution is known at the start of the match.  (As
in the above example, input dumps don't mark any range for
substitutions.  The location info in that case simply identifies the
right line for the annotation.)

Reviewed By: mehdi_amini, thopre

Differential Revision: https://reviews.llvm.org/D83650
2020-07-28 19:15:18 -04:00
Zahira Ammarguellat 80bd6ae13e On Windows build, making the /bigobj flag global , instead of passing it per file.
To avoid having this flag be passed in per/file manner, we are instead
passing it globally.

This fixes this bug: https://bugs.llvm.org/show_bug.cgi?id=46733

Reviewed-by: aaron.ballman, beanz, meinersbur

Differential Revision: https://reviews.llvm.org/D84038
2020-07-28 18:04:36 -05:00
Johannes Doerfert 450dc09d69 [SROA][Mem2Reg] Use efficient droppable use API (after D83976)
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84804
2020-07-28 17:41:01 -05:00
Daniel Sanders abf1ed70d6 [globalisel][cse] Merge debug locations when CSE'ing
Reviewed By: aditya_nandakumar

Differential Revision: https://reviews.llvm.org/D78388
2020-07-28 14:25:26 -07:00
Matt Arsenault e87356b498 GlobalISel: Don't assert on operations with no type indices
Fix not marking G_FENCE as legal on AMDGPU This was apparently
defaulting to legal using the "legacy" rules, whatever those are.
2020-07-28 16:49:55 -04:00
Matt Arsenault 5174e7b443 GlobalISel: Add typeIsNot LegalityPredicate
This allows sorting the legal/custom rules first as is recommended
2020-07-28 16:49:55 -04:00
Matt Arsenault 9731ef3ec5 AMDGPU/GlobalISel: Add SReg_96 to SGPRRegBank 2020-07-28 16:49:55 -04:00
Matt Arsenault e9b236f411 AMDGPU: Check for other defs when folding conditions into s_andn2_b64
We can't fold the masked compare value through the select if the
select condition is re-defed after the and instruction. Fixes a
verifier error and trying to use the outgoing value defined in the
block.

I'm not sure why this pass is bothering to handle physregs. It's
making this more complex and forces extra liveness computation.
2020-07-28 16:36:23 -04:00
Roman Lebedev e1dd212c87
[X86] Remove disabled miscompiling X86CondBrFolding pass
As briefly discussed in IRC with @craig.topper,
the pass is disabled basically since it's original introduction (nov 2018)
due to  known correctness issues (miscompilations),
and there hasn't been much work done to fix that.

While i won't promise that i will "fix" the pass,
i have looked at it previously, and i'm sure i won't try to fix it
if that requires actually fixing this existing code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D84775
2020-07-28 23:35:04 +03:00
Mircea Trofin 1e027b77f0 [llvm][NFC] refactor setBlockFrequency for clarity.
The refactoring encapsulates frequency calculation in MachineBlockFrequencyInfo,
and renames the API to clarify its motivation. It should clarify
frequencies may not be reset 'freely' by users of the analysis, as the
API serves as a partial update to avoid a full analysis recomputation.

Differential Revision: https://reviews.llvm.org/D84427
2020-07-28 13:04:11 -07:00
Sanjay Patel 3fb13b8484 [InstSimplify] allow undefs in icmp with vector constant folds
This is the main icmp simplification shortcoming seen in D84655.

Alive2 agrees that the basic examples are correct at least:

define <2 x i1> @src(<2 x i8> %x) {
%0:
  %r = icmp sle <2 x i8> { undef, 128 }, %x
  ret <2 x i1> %r
}
=>
define <2 x i1> @tgt(<2 x i8> %x) {
%0:
  ret <2 x i1> { 1, 1 }
}
Transformation seems to be correct!

define <2 x i1> @src(<2 x i32> %X) {
%0:
  %A = or <2 x i32> %X, { 63, 63 }
  %B = icmp ult <2 x i32> %A, { undef, 50 }
  ret <2 x i1> %B
}
=>
define <2 x i1> @tgt(<2 x i32> %X) {
%0:
  ret <2 x i1> { 0, 0 }
}
Transformation seems to be correct!

https://alive2.llvm.org/ce/z/omt2ee
https://alive2.llvm.org/ce/z/GW4nP_

Differential Revision: https://reviews.llvm.org/D84762
2020-07-28 15:13:53 -04:00
Craig Topper 69152a11cf [X86] Merge the two 'Emit the normal disp32 encoding' cases in SIB byte handling in emitMemModRMByte. NFCI
By repeating the Disp.isImm() check in a couple spots we can
make the normal case for immediate and for expression the same.
And then always rely on the ForceDisp32 flag to remove a later
non-zero immediate check.

This should make {disp32} pseudo prefix handling
slightly easier as we need the normal disp32 handler to handle a
immediate of 0.
2020-07-28 12:12:09 -07:00
Sanjay Patel f75cf240d6 [InstCombine] avoid crashing on vector constant expression (PR46872) 2020-07-28 15:02:36 -04:00
jasonliu f8ab66538c [NFC][XCOFF] Use getFunctionEntryPointSymbol from TLOF to simplify logic
Reviewed By: Xiangling_L

Differential Revision: https://reviews.llvm.org/D84693
2020-07-28 18:59:51 +00:00
Simon Pilgrim b4b6e77454 [DAG] isSplatValue - add support for TRUNCATE/SIGN_EXTEND/ZERO_EXTEND
These are just pass-throughs to the source operand - we can't assume that ANY_EXTEND(splat) will still be a splat though.
2020-07-28 19:56:11 +01:00
Simon Pilgrim 4838cd46a9 [X86][XOP] Shuffle v16i8 using VPPERM(X,Y) instead of OR(PSHUFB(X),PSHUFB(Y)) 2020-07-28 19:56:10 +01:00
Austin Kerbow adeeac9d5a [AMDGPU] Spill CSR VGPR which is reserved for SGPR spills
Update logic for reserving VGPR for SGPR spills. A CSR VGPR being reserved for
SGPR spills could be clobbered if there were no free lower VGPR's available.
Create a stack object so that it will be spilled in the prologue. Also
adds more tests.

Differential Revision: https://reviews.llvm.org/D83730
2020-07-28 11:53:02 -07:00
Juneyoung Lee 4c9af6d0e0 [JumpThreading] Add a basic support for freeze instruction
This patch adds a basic support for freeze instruction to JumpThreading
by making ComputeValueKnownInPredecessorsImpl look into its operand.

Reviewed By: efriedma, nikic

Differential Revision: https://reviews.llvm.org/D84598
2020-07-29 03:12:14 +09:00
Craig Topper 91b8c1fd0f [X86] Simplify some code in emitMemModRMByte. NFCI 2020-07-28 10:46:04 -07:00
Craig Topper 6c3dc6e1d5 [X86] Merge disp8 and cdisp8 handling into a single helper function to reduce some code.
We currently handle EVEX and non-EVEX separately in two places. By sinking the EVEX
check into the existing helper for CDisp8 we can simplify these two places.

Differential Revision: https://reviews.llvm.org/D84730
2020-07-28 10:46:01 -07:00
Anna Welker 0c64233bb7 [ARM][MVE] Teach MVEGatherScatterLowering to merge successive getelementpointers
A patch following up on the introduction of pointer induction variables, adding
a preprocessing step to the address optimisation in the MVEGatherScatterLowering
pass. If the getelementpointer that is the address is itself using a
getelementpointer as base, they will be merged into one by summing up the
offsets, after checking that this will not cause an overflow (this can be
repeated recursively).

Differential Revision: https://reviews.llvm.org/D84027
2020-07-28 17:31:20 +01:00
Arthur Eubanks 2ca6c422d2 [FunctionAttrs] Rename functionattrs -> function-attrs
To match NewPM pass name, and also for readability.
Also rename rpo-functionattrs -> rpo-function-attrs while we're here.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D84694
2020-07-28 09:09:13 -07:00
Jon Roelofs 736423af53 [OldPM] Print out a bit more when passes lie about changing IR
https://reviews.llvm.org/D84686
2020-07-28 10:01:24 -06:00
Matt Arsenault 97b5fb78d1 GlobalISel: Translate llvm.convert.{to|from}.fp16 intrinsics
I think these were added as a workaround for SelectionDAG lacking half
legalization support in the past. I think they should probably be
removed from the IR, but clang does still have a target control to
emit these instead of the native half fpext/fptrunc.
2020-07-28 11:46:05 -04:00
Matt Arsenault 16bcd54570 AMDGPU/GlobalISel: Mark GlobalISel classes as final 2020-07-28 11:42:17 -04:00
Matt Arsenault bb23b5cfe0 AMDGPU/GlobalISel: Merge identical select cases 2020-07-28 11:42:17 -04:00
Matt Arsenault a4edc04693 AMDGPU/GlobalISel: Use clamp modifier for [us]addsat/[us]subsat
We also have never handled this for SelectionDAG, which needs
additional work.
2020-07-28 11:18:05 -04:00
Xing GUO 6784d82d5b [DWARFYAML] Rename checkListEntryOperands() to checkOperandCount(). NFC.
This patch renames checkListEntryOperands() to checkOperandCount(), so
that we are able to check DWARF expression operands using the same
function.

Reviewed By: jhenderson, labath

Differential Revision: https://reviews.llvm.org/D84624
2020-07-28 22:57:39 +08:00
Sander de Smalen cda2eb3ad2 [AArch64][SVE] Fix epilogue for SVE when the stack is realigned.
While deallocating the stackframe, the offset used to reload the
callee-saved registers was not pointing to the SVE callee-saves,
but rather to the whole SVE area.

   +--------------+
   | GRP callee   |
   |     saves    |
   +--------------+ <- FP
   | SVE callee   |
   |     saves    |
   +--------------+ <- Should restore SVE callee saves from here
   |  SVE Spills  |
   |  and Locals  |
   +--------------+ <- instead of from here.
   |              |
   :              :
   |              |
   +--------------+ <- SP

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D84539
2020-07-28 15:45:53 +01:00
Sander de Smalen 26b4ef3694 [AArch64][SVE] Don't align the last SVE callee save.
Instead of aligning the last callee-saved-register slot to the stack
alignment (16 bytes), just align the SVE callee-saved block. This also
simplifies the code that allocates space for the callee-saves.

This change is needed to make sure the offset to which the callee-saved
register is spilled, corresponds to the offset used for e.g. unwind call
frame instructions.

Reviewers: efriedma, paulwalker-arm, david-arm, rengolin

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84042
2020-07-28 15:45:53 +01:00
Sander de Smalen 54492a5843 [AArch64][SVE] Don't support fixedStack for SVE objects.
Fixed stack objects are preallocated and defined to be allocated before
any of the regular stack objects. These are normally used to model stack
arguments.

The AAPCS does not support passing SVE registers on the stack by value
(only by reference). The current layout also doesn't place them before
all stack objects, but rather before all SVE objects. Removing this
simplifies the code that emits the allocation/deallocation
around callee-saved registers (D84042).

This patch also removes all uses of fixedStack from from
framelayout-sve.mir, where this was used purely for testing purposes.

Reviewers: paulwalker-arm, efriedma, rengolin

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D84538
2020-07-28 15:45:53 +01:00
Xing GUO 22ec861d28 [DWARFYAML] Add support for emitting custom range list content.
This patch adds support for emitting custom range list content.

We are able to handcraft a custom range list via the following syntax.

```
debug_rnglists:
  - Lists:
      - Entries:
          - Operator: DW_RLE_startx_endx
            Values:   [ 0x1234, 0x1234 ]
      - Content: '1234567890abcdef'
      - Content: 'abcdef1234567890'
```

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D84618
2020-07-28 22:11:16 +08:00
Jinsong Ji d28f86723f Re-land "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support"
This reverts commit bf544fa1c3.

Fixed the typo in PPCInstrInfo.cpp.
2020-07-28 14:00:11 +00:00
Georgii Rymar bd93f5ce07 [yaml2obj] - Add a way to override sh_type section field.
This adds the `ShType` key similar to others `Sh*` keys we have.

My use case is the following. Imagine we have a `SHT_SYMTAB_SHNDX`
section and want to hide it from a dumper. The natural way would be to
do something like:

```
  - Name:    .symtab_shndx
    Type:    [[TYPE=SHT_SYMTAB_SHNDX]]
    Entries: [ 0, 1 ]

```

and then change the TYPE from `SHT_SYMTAB_SHNDX` to something else,
for example to `SHT_PROGBITS`.

But we have a problem: regular sections does not have `Entries` key,
so yaml2obj will be unable to produce a section.

The solution is to introduce a `ShType` key to override the final type.

This is not the first time I am facing the need to change the type. I
was able to invent workarounds or solved issues differently in the past,
but finally came to conclusion that we just should support the `ShType`.

Differential revision: https://reviews.llvm.org/D84738
2020-07-28 16:16:42 +03:00
Evgeniy Brevnov 412b3932c6 [BPI] Fix memory leak reported by sanitizer bots
There is a silly mistake where release() is used instead of reset() for free resources of unique pointer.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D84747
2020-07-28 19:53:46 +07:00
Tim Northover 39108f4c7a ARM: make Thumb1 instructions non-flag-setting in IT block.
Many Thumb1 instructions are defined to set CPSR if executed outside an IT
block, but leave it alone from inside one. In MachineIR this is represented by
whether an optional register is CPSR or NoReg (0), and affects how the
instructions are printed.

This sets the instruction to the appropriate form during if-conversion.
2020-07-28 13:31:17 +01:00
Stefan Pintilie 97470897c4 [PowerPC] Split s34imm into two types
Currently the instruction paddi always takes s34imm as the type for the
34 bit immediate. However, the PC Relative form of the instruction should
not produce the same fixup as the non PC Relative form.
This patch splits the s34imm type into s34imm and s34imm_pcrel so that two
different fixups can be emitted.

Reviewed By: nemanjai, #powerpc, kamaub

Differential Revision: https://reviews.llvm.org/D83255
2020-07-28 05:55:56 -05:00
Evgeniy Brevnov 3a2b05f9fe [BPI][NFC] Consolidate code to deal with SCCs under a dedicated data structure.
In order to facilitate review of D79485 here is a small NFC change which restructures code around handling of SCCs in BPI.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D84514
2020-07-28 17:42:33 +07:00
Kai Nacke 7294ca3f6e [SystemZ/ZOS] Implement setLastAccessAndModificationTime()
The function setLastAccessAndModificationTime() uses function
futimens() or futimes() by default. Both functions are not
available in z/OS, therefore functionality is implemented using
__fchattr() on z/OS.

Reviews by: abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D83945
2020-07-28 06:36:15 -04:00
Luofan Chen 5ee07dc53f [Attributor] Track AA dependency using dependency graph
This patch added dependency graph to the attributor so that we can dump the dependencies between AAs more easily. We can also apply general graph algorithms to the graph, making it easier for us to create deep wrappers.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D78861
2020-07-28 18:02:49 +08:00
Serge Pavlov 536736995b [Support] Add file lock/unlock functions
This is recommit of f51bc4fb60, reverted in 8577595e03, because
the function `flock` is not available on Solaris. In this variant
`flock` was replaced with `fcntl`, which is a POSIX function.

New functions `lockFile`, `tryLockFile` and `unlockFile` implement
simple file locking. They lock or unlock entire file. This must be
enough to support simulataneous writes to log files in parallel builds.

Differential Revision: https://reviews.llvm.org/D78896
2020-07-28 16:44:23 +07:00
Simon Pilgrim 182111777b [X86][SSE] Attempt to match OP(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(HOP(X,Y))
An initial backend patch towards fixing the various poor HADD combines (PR34724, PR41813, PR45747 etc.).

This extends isHorizontalBinOp to check if we have per-element horizontal ops (odd+even element pairs), but not in the expected serial order - in which case we build a "post shuffle mask" that we can apply to the HOP result, assuming we have fast-hops/optsize etc.

The next step will be to extend the SHUFFLE(HOP(X,Y)) combines as suggested on PR41813 - accepting more post-shuffle masks even on slow-hop targets if we can fold it into another shuffle.

Differential Revision: https://reviews.llvm.org/D83789
2020-07-28 10:04:14 +01:00
serge-sans-paille 3218c064d6 [legacyPM] Do not compute preserved analysis if there's no local change
All analysis are preserved if there's no local change, and thanks to
3667d87a33 this property is enforced for all
passes.

Skipping the dependency computation improves the performance when there's a lot
of small functions, where only a few change happen.

Thanks to Nikita Popov who provided this numbers (extract below)

https://llvm-compile-time-tracker.com/compare.php?from=183342c0a9850e60dd7a004b651c83dfb3a7d25e&to=f2f91e6a2743070471cc9471e4e8c646e50c653c&stat=instructions

O3: (number of instructions)
Benchmark               Old             New
kimwitu++               60783M          59968M          (-1.34%)
sqlite3                 73200M          73083M          (-0.16%)
consumer-typeset        52776M          52712M          (-0.12%)
Bullet                  133709M         132940M         (-0.58%)
tramp3d-v4              123864M         123186M         (-0.55%)
mafft                   55534M          55477M          (-0.10%)
ClamAV                  76292M          76164M          (-0.17%)
lencod                  103190M         103061M         (-0.13%)
SPASS                   64068M          63713M          (-0.55%)
7zip                    197332M         196308M         (-0.52%)
geomean                 85750M          85389M          (-0.42%)

Differential Revision: https://reviews.llvm.org/D80707
2020-07-28 11:01:04 +02:00
Roman Lebedev e40315d2b4
[GVN] Rewrite IsValueFullyAvailableInBlock(): no recursion, less false-negatives
While this doesn't appear to help with the perf issue being exposed by
D84108, the function as-is is very weird, convoluted, and what's worse,
recursive.

There was no need for `SpeculativelyAvaliableAndUsedForSpeculation`,
tri-state choice is enough. We don't even ever check for that state.

The basic idea here is that we need to perform a depth-first traversal
of the predecessors of the basic block in question, either finding a
preexisting state for the block in a map, or inserting a "placeholder"
`SpeculativelyAvaliable`,

If we encounter an `Unavaliable` block, then we need to give up search,
and back-propagate the `Unavaliable` state to the each successor of
said block, more specifically to the each `SpeculativelyAvaliable`
we've just created.

However, if we have traversed entirety of the predecessors and have not
encountered an `Unavaliable` block, then it must mean the value is fully
available. We could update each inserted `SpeculativelyAvaliable` into
a `Avaliable`, but we don't need to, as assertion excersizes,
because we can assume that if we see an `SpeculativelyAvaliable` entry,
it is actually `Avaliable`, because during the time we've produced it,
if we would have found that it has an `Unavaliable` predecessor,
we would have updated it's successors, including this block,
into `Unavaliable`

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D84181
2020-07-28 10:19:28 +03:00
Craig Topper 647e861e08 [X86] Detect if EFLAGs is live across XBEGIN pseudo instruction. Add it as livein to the basic blocks created when expanding the pseudo
XBEGIN causes several based blocks to be inserted. If flags are live across it we need to make eflags live in the new basic blocks to avoid machine verifier errors.

Fixes PR46827

Reviewed By: ivanbaev

Differential Revision: https://reviews.llvm.org/D84479
2020-07-27 21:15:35 -07:00
Craig Topper 25f193fb46 [X86] Add support for {disp32} to control size of jmp and jcc instructions in the assembler
By default we pick a 1 byte displacement and let relaxation enlarge it if necessary. The GNU assembler supports a pseudo prefix to basically pre-relax the instruction the larger size.

I plan to add {disp8} and {disp32} support for memory operands in another patch which is why I've included the parsing code and enum for {disp8} pseudo prefix as well.

Reviewed By: echristo

Differential Revision: https://reviews.llvm.org/D84709
2020-07-27 21:11:48 -07:00
Craig Topper a0ebac52df [X86] Properly encode a 32-bit address with an index register and no base register in 16-bit mode.
In 16-bit mode we can encode a 32-bit address using 0x67 prefix.
We were failing to do this when the index register was a 32-bit
register, the base register was not present, and the displacement
fit in 16-bits.

Fixes PR46866.
2020-07-27 21:11:42 -07:00
Wei Mi a23f62343c Supplement instr profile with sample profile.
PGO profile is usually more precise than sample profile. However, PGO profile
needs to be collected from loadtest and loadtest may not be representative
enough to the production workload. Sample profile collected from production
can be used as a supplement -- for functions cold in loadtest but warm/hot
in production, we can scale up the related function in PGO profile if the
function is warm or hot in sample profile.

The implementation contains changes in compiler side and llvm-profdata side.
Given an instr profile and a sample profile, for a function cold in PGO
profile but warm/hot in sample profile, llvm-profdata will either mark
all the counters in the profile to be -1 or scale up the max count in the
function to be above hot threshold, depending on the zero counter ratio in
the profile. The assumption is if there are too many counters being zero
in the function profile, the profile is more likely to cause harm than good,
then llvm-profdata will mark all the counters to be -1 indicating the
function is hot but the profile is unaccountable. In compiler side, if a
function profile with all -1 counters is seen, the function entry count will
be set to be above hot threshold but its internal profile will be dropped.

In the long run, it may be useful to let compiler support using PGO profile
and sample profile at the same time, but that requires more careful design
and more substantial changes to make two profiles work seamlessly. The patch
here serves as a simple intermediate solution.

Differential Revision: https://reviews.llvm.org/D81981
2020-07-27 20:17:40 -07:00
Alina Sbirlea f1d4db4f0c [GraphDiff] Use class method getChildren instead of GraphTraits.
Summary:
Use getChildren() method in GraphDiff instead of GraphTraits.

This simplifies the code and allows for refactorigns inside GraphDiff.
All usecase need not have a light-weight/copyable range.
Clean GraphTraits implementation.

Reviewers: dblaikie

Subscribers: hiraditya, llvm-commits, george.burgess.iv

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84562
2020-07-27 16:12:34 -07:00
Matt Arsenault 5f802be4e5 GlobalISel: Don't fail translate on intrinsics with metadata 2020-07-27 19:00:25 -04:00
Matt Arsenault ce944af33c AMDGPU/GlobalISel: Mark G_ATOMICRMW_{NAND|FSUB} as lower
These aren't implemented and we're still relying on the AtomicExpand
pass, but mark these as lower to eliminate a few of the few remaining
no rules defined cases.
2020-07-27 18:47:40 -04:00
Matt Arsenault 8b81d0633f AMDGPU: global_atomic_csub is not always dereferenceable 2020-07-27 18:47:39 -04:00
Jonas Devlieghere f9fec0447e [llvm] Make ZLIB handling compatible with multi-configuration generators
The CMAKE_BUILD_TYPE is only meaningful to single-configuration
generators (such as make and Ninja). For multi-configuration generators
like Xcode and MSVC this variable won't be set, resulting in a CMake
error.
2020-07-27 15:37:18 -07:00
Francesco Petrogalli adb28e0fb2 [llvm][CodeGen] Addressing modes for SVE ldN.
Reviewers: c-rhodes, efriedma, sdesmalen

Subscribers: huihuiz, tschuett, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77251
2020-07-27 22:18:28 +00:00
Arthur Eubanks c37bb5e2a5 [DFSan] Remove unused DataFlowSanitizer vars
Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D84704
2020-07-27 14:59:07 -07:00
Sridhar Gopinath 4b5412b5db Fix the move constructor of MMI to move MachineFunctions map
The move constructor of MachineModuleInfo currently does not copy the
MachineFunctions map. This commit fixes this issue.

Patch by Sridhar Gopinath. Thanks!

Differential Revision: https://reviews.llvm.org/D84274
2020-07-27 14:10:05 -07:00
Jinsong Ji bf544fa1c3 Revert "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support"
This reverts commit adffce7153.

This is breaking test-suite, revert while investigation.
2020-07-27 21:07:00 +00:00
Guillaume Chatelet 754deffd11 [NFC] Move BitcodeCommon.h from Bitstream to Bitcode 2020-07-27 20:49:17 +00:00
Arthur Eubanks beb7e3bb70 Rename t2-reduce-size -> thumb2-reduce-size
For readability and consistency with other thumb2 passes like
"thumb2-it".

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D84696
2020-07-27 13:42:13 -07:00
Roman Lebedev 351d234d86
[OpenMPOpt] Most SCC's are uninteresting, don't waste time on them (up to 16x faster)
Summary:
This seems obvious in hindsight, but the result is surprising.
I've measured compile-time of `-openmpopt` pass standalone
on RawSpeed unity build, and while there is some OpenMP stuff,
most is not OpenMP. But nonetheless the pass does a lot of costly
preparations before ever trying to look for OpenMP stuff in SCC.

Numbers (n=25): 0.094624s  ->  0.005976s, an -93.68% improvement, or ~16x

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: yaxunl, hiraditya, guansong, llvm-commits, sstefan1

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84689
2020-07-27 23:36:34 +03:00
Jinsong Ji adffce7153 [PowerPC] Remove QPX/A2Q BGQ/BGP CNK support
Per RFC http://lists.llvm.org/pipermail/llvm-dev/2020-April/141295.html
no one is making use of QPX/A2Q/BGQ/BGP CNK anymore.

This patch remove the support of QPX/A2Q in llvm, BGQ/BGP in clang,
CNK support in openmp/polly.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D83915
2020-07-27 19:24:39 +00:00
Arthur Eubanks 2a672767cc Prefix some AArch64/ARM passes with "aarch64-"/"arm-"
For consistency with other target specific passes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D84560
2020-07-27 11:00:39 -07:00
Kazu Hirata 902cbcd59e Use llvm::is_contained where appropriate (NFC)
Summary:
This patch replaces std::find with llvm::is_contained where
appropriate.

Reviewers: efriedma, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84489
2020-07-27 10:20:44 -07:00
Nadav Rotem df880b7730 [StackProtector] Speed up RequiresStackProtector
Speed up the method RequiresStackProtector by checking the intrinsic
value of the call. The original code calls getName() that returns an
allocating std::string on each check. This change removes about 96072
std::string instances when compiling sqlite3.c; The function was
discovered with a Facebook-internal performance tool.

Differential Revision: https://reviews.llvm.org/D84620
2020-07-27 10:07:47 -07:00
Craig Topper 51e1c028d4 [X86] Add back comment inadvertently lost in 1a1448e656. 2020-07-27 10:02:38 -07:00
Francesco Petrogalli dbeb184b7f [NFC][AArch64] Replace some template methods/invocations...
...with the non-template version, as the template version might
increase the size of the compiler build.

Methods affected:

1.`findAddrModeSVELoadStore`
2. `SelectPredicatedStore`

Also, remove the `const` qualifier from the `unsigned` parameters of
the methods to conform with other similar methods in the class.
2020-07-27 16:52:08 +00:00
Simon Pilgrim 4d84d94969 [X86][SSE] Relax 128-bit restriction on extract_subvector(ext_vector_inreg(X),0) -> ext_vector_inreg(extract_subvector(X,0)) fold
We only need to ensure that the source is larger than the subvector result type
2020-07-27 17:50:36 +01:00
Jon Roelofs 88ce9f9b44 [TableGen][CGS] Print better errors on overlapping InstRW
Differential Revision: https://reviews.llvm.org/D83588
2020-07-27 09:41:10 -06:00
Logan Smith a52aea0ba6 Use INTERFACE_COMPILE_OPTIONS to disable -Wsuggest-override for any target that links to gtest
This cleans up several CMakeLists.txt's where -Wno-suggest-override was manually specified. These test targets now inherit this flag from the gtest target.

Some unittests CMakeLists.txt's, in particular Flang and LLDB, are not touched by this patch. Flang manually adds the gtest sources itself in some configurations, rather than linking to LLVM's gtest target, so this fix would be insufficient to cover those cases. Similarly, LLDB has subdirectories that manually add the gtest headers to their include path without linking to the gtest target, so those subdirectories still need -Wno-suggest-override to be manually specified to compile without warnings.

Differential Revision: https://reviews.llvm.org/D84554
2020-07-27 08:37:01 -07:00
jasonliu c25f61cf6a [XCOFF][AIX] Handle llvm.used and llvm.compiler.used global array
For now, just return and do nothing when we see llvm.used and
llvm.compiler.used global array.
Hopefully, we could come up with a good solution later to prevent
linker from eliminating symbols in llvm.used array.

Reviewed By: DiggerLin, daltenty

Differential Revision: https://reviews.llvm.org/D84363
2020-07-27 15:28:32 +00:00
Jon Roelofs f5e1ec8c58 [AArch64] fjcvtzs,rmif,cfinv,setf* all clobber nzcv
Differential Revision: https://reviews.llvm.org/D83818
2020-07-27 09:17:53 -06:00
Amy Kwan 7c182663a8 Revert "Re-apply:" Emit DW_OP_implicit_value for Floating point constants""
This patch reverts commit `59a76d957a26` as it has caused failure on the
big endian PowerPC buildbots (as well as the SystemZ buildbots).
2020-07-27 09:44:13 -05:00
Guillaume Chatelet 1b4d24912a [NFC] Replace ".size() < 1" with ".empty()" 2020-07-27 13:54:53 +00:00
Simon Pilgrim ab4ffa52f0 [X86][AVX] Fold extract_subvector(truncate(x),0) -> truncate(extract_subvector(x),0)
This is currently only supported for VLX targets where the op should be legal.
2020-07-27 14:51:29 +01:00
Simon Pilgrim f720c9c68c [X86] combineExtractSubvector - pull out repeated getSizeInBits() calls. NFCI. 2020-07-27 14:51:28 +01:00
Simon Pilgrim 5b5b3ce0ad IRPrintingPasses.h - simplify unnecessary header with forward declarations. NFC.
Remove duplicate PassManager.h include in IRPrintingPasses.cpp
2020-07-27 14:51:28 +01:00
Sergey Dmitriev bec77ece14 [CallGraph] Preserve call records vector when replacing call edge
Summary:
Try not to resize vector of call records in a call graph node when
replacing call edge. That would prevent invalidation of iterators
stored in the CG SCC pass manager's scc_iterator.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84295
2020-07-27 06:02:55 -07:00
Tim Northover 0f1494be43 AArch64: avoid UB shift of negative value
Left shifting a negative value is undefined behaviour, so this just moves the
negation afterwards to avoid it.
2020-07-27 13:49:50 +01:00
Roman Lebedev 1da9834557
[JumpThreading] ProcessBranchOnXOR(): bailout if any pred ends in indirect branch (PR46857)
SplitBlockPredecessors() can not split blocks that have such terminators,
and in two other places we already ensure that we don't end up calling
SplitBlockPredecessors() on such blocks. Do so in one more place.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46857
2020-07-27 15:39:03 +03:00
Nathan James d127112724
[llvm][NFC] Silence unused variable warning by using isa over dyn_cast 2020-07-27 13:37:21 +01:00
Tim Northover 216b67e202 AArch64: diagnose out of range relocation addends on MachO.
MachO only has 24-bit addends for most relocations, small enough that it can
overflow in semi-reasonable functions and cause insidious bugs if compiled
without assertions enabled. Switch it to an actual error instead.

The condition isn't quite identical because ld64 treats the addend as a signed
number.
2020-07-27 13:01:22 +01:00
Guillaume Chatelet d9bbe85943 [Alignment][NFC] Update Bitcodewriter to use Align
Differential Revision: https://reviews.llvm.org/D83533
2020-07-27 08:16:45 +00:00
Juneyoung Lee e1eacf27c6 [InstCombine] Fold freeze into phi if one operand is not undef
This patch adds folding freeze into phi if it has only one operand to target.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84601
2020-07-27 17:07:27 +09:00
Piotr Sobczak 590dd73c6e [AMDGPU] Make generating cache invalidating instructions optional
Summary:
D78800 skipped generating cache invalidating instrucions altogether
on AMDPAL. However, this is sometimes too restrictive - we want a
more flexible option to be able to toggle this behaviour on and off
while we work towards developing a correct implementation of the
alternative memory model.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, dexonsmith, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84448
2020-07-27 09:24:11 +02:00
David Sherwood 14bc85e0eb [SVE] Don't use LocalStackAllocation for SVE objects
I have introduced a new TargetFrameLowering query function:

  isStackIdSafeForLocalArea

that queries whether or not it is safe for objects of a given stack
id to be bundled into the local area. The default behaviour is to
always bundle regardless of the stack id, however for AArch64 this is
overriden so that it's only safe for fixed-size stack objects.
There is future work here to extend this algorithm for multiple local
areas so that SVE stack objects can be bundled together and accessed
from their own virtual base-pointer.

Differential Revision: https://reviews.llvm.org/D83859
2020-07-27 08:22:01 +01:00
biplmish 825ed2d43d [PowerPC] Add Vector Extract Double Instruction Definitions and MC tests.
This patch adds the td definitions and asm/disasm tests for the following instructions:

Vector Extract Double Left Index - vextdubvlx, vextduhvlx, vextduwvlx, vextddvlx
Vector Extract Double Right Index - vextdubvrx, vextduhvrx, vextduwvrx, vextddvrx

Differential Revision: https://reviews.llvm.org/D84384
2020-07-26 23:56:19 -05:00
Fangrui Song fae221e7ad [gcov] Simplify/speed up CFG hash calculation 2020-07-26 21:15:33 -07:00
Matt Arsenault e97aa5609f AMDGPU/GlobalISel: Don't assert in LegalizerInfo constructor
We don't really need these asserts. The LegalizerInfo is also
overly-aggressivly constructed, even when not in use. It needs to not
assert on dummy targets that have manually specified, unrelated
features.
2020-07-26 23:01:28 -04:00
QingShan Zhang a6e9f5264c [Scheduling] Improve group algorithm for store cluster
Store Addr and Store Addr+8 are clusterable pair. They have memory(ctrl) dependency on different loads.
Current implementation will put these two stores into different group and miss to cluster them.

Reviewed By: evandro

Differential Revision: https://reviews.llvm.org/D84139
2020-07-27 02:02:40 +00:00
Lang Hames 47a40eda17 [ORC] Remove a redundant call to getTargetMemory. 2020-07-26 17:34:31 -07:00
Craig Topper df12524e6b [X86] Turn X86DAGToDAGISel::tryVPTERNLOG into a fully custom instruction selector that can handle bitcasts between logic ops
Previously we just matched the logic ops and replaced with an
X86ISD::VPTERNLOG node that we would send through the normal
pattern match. But that approach couldn't handle a bitcast
between the logic ops. Extending that approach would require us
to peek through the bitcasts and emit new bitcasts to match
the types. Those new bitcasts would then have to be properly
topologically sorted.

This patch instead switches to directly emitting the
MachineSDNode and skips the normal tablegen pattern matching.
We do have to handle load folding and broadcast load folding
ourselves now. Which also means commuting the immediate control.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D83630
2020-07-26 12:19:08 -07:00
Craig Topper 1a75d88b3e [X86] Move getGatherOverhead/getScatterOverhead into X86TargetTransformInfo.
These cost methods don't make much sense in X86Subtarget. Make
them methods in X86's TTI and move the feature checks from the
X86Subtarget constructor into these methods.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D84594
2020-07-26 10:38:42 -07:00
Simon Pilgrim 17eafe0841 [X86][SSE] lowerV2I64Shuffle - use undef elements in PSHUFD mask widening
If we lower a v2i64 shuffle to PSHUFD, we currently clamp undef elements to 0, (elements 0,1 of the v4i32) which can result in the shuffle referencing more elements of the source vector than expected, affecting later shuffle combines and KnownBits/SimplifyDemanded calls.

By ensuring we widen the undef mask element we allow getV4X86ShuffleImm8 to use inline elements as the default, which are more likely to fold.
2020-07-26 16:04:22 +01:00
Matt Arsenault d35e2c101d AMDGPU/GlobalISel: Fix not constraining ds_append/consume operands 2020-07-26 10:17:36 -04:00
Matt Arsenault f6176f8a5f GlobalISel: Handle G_PTR_ADD in narrowScalar 2020-07-26 10:08:17 -04:00
Matt Arsenault 3e8bb7a000 GlobalISel: Handle fewerElementsVector for G_PTR_ADD 2020-07-26 10:08:09 -04:00
Matt Arsenault 7c09c173a2 AMDGPU/GlobalISel: Reorder G_CONSTANT legality rules
The legal cases should be the first rules.
2020-07-26 10:05:05 -04:00
Matt Arsenault bcf5184a68 AMDGPU/GlobalISel: Make sure <2 x s1> phis are scalarized 2020-07-26 10:04:47 -04:00
Matt Arsenault 6f961a1e7e AMDGPU/GlobalISel: Legalize GDS atomics
I noticed these don't use the _gfx9, non-m0 reading variants but not
sure if that's a bug or not. It's the same in the DAG.
2020-07-26 10:03:34 -04:00
Matt Arsenault 5819159995 AMDGPU/GlobalISel: Pack constant G_BUILD_VECTOR_TRUNCs when selecting 2020-07-26 09:55:34 -04:00
Sanjay Patel 0481e1ae3c [InstSimplify] fold integer min/max intrinsics with limit constant 2020-07-26 09:41:54 -04:00
Matt Arsenault 61ced4b87a GlobalISel: Handle 'n' inline asm constraint 2020-07-26 09:30:41 -04:00
Matt Arsenault 4033aa1467 AMDGPU/GlobalISel: Sign extend integer constants
This matches the DAG behavior and fixes immediate folding
2020-07-26 09:30:14 -04:00
Xing GUO b1731da871 [DWARFYAML] Rename getUsedSectionNames() to getNonEmptySectionNames().
This patch renames getUsedSectionNames() to getNonEmptySectionNames.
NFC.
2020-07-26 21:10:38 +08:00
Sanjay Patel b89ae102e6 [InstSimplify] fold fcmp using isKnownNeverInfinity + isKnownNeverNaN
Follow-up to D84035 / rG7393d7574c09.
This sidesteps a question of FMF/poison on fcmp raised in PR46077:
http://bugs.llvm.org/PR46077

https://alive2.llvm.org/ce/z/TCsyzD
  define i1 @src(float %x) {
  %0:
    %x42 = fadd nnan ninf float %x, 42.000000
    %r = fcmp ueq float %x42, inf
    ret i1 %r
  }
  =>
  define i1 @tgt(float %x) {
  %0:
    ret i1 0
  }
  Transformation seems to be correct!

https://alive2.llvm.org/ce/z/FQaH7a
  define i1 @src(i8 %x) {
  %0:
    %cast = uitofp i8 %x to float
    %r = fcmp one float inf, %cast
    ret i1 %r
  }
  =>
  define i1 @tgt(i8 %x) {
  %0:
    ret i1 1
  }
  Transformation seems to be correct!
2020-07-26 09:04:37 -04:00
Juneyoung Lee 32088f4f7f [ConstantFolding] Fold freeze if it is never undef or poison
This is a simple patch that adds constant folding for freeze
instruction.

IIUC, it isn't needed to update ConstantFold.cpp because there is no freeze
constexpr.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84597
2020-07-26 21:54:44 +09:00
Juneyoung Lee 9f074214b7 [ValueTracking] Instruction::isBinaryOp should be used for constexprs
This is a simple patch that makes canCreateUndefOrPoison use
Instruction::isBinaryOp because BinaryOperator inherits Instruction.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84596
2020-07-26 21:48:51 +09:00
Amara Emerson 9b19400004 [AArch64][GlobalISel] Make <8 x s16> and <16 x s8> legal types for G_SHUFFLE_VECTOR and G_IMPLICIT_DEF.
Trivial change, we're still missing support for rev matching for these types
in the combiner.
2020-07-26 00:48:09 -07:00
Craig Topper 1a1448e656 [X86] Merge X86MCInstLowering's maxLongNopLength into emitNop and remove check for FeatureNOPL.
The switch in emitNop uses 64-bit registers for nops exceeding
2 bytes. This isn't valid outside 64-bit mode. We could fix this
easily enough, but there are no users that ask for more than 2
bytes outside 64-bit mode.

Inlining the method to make the coupling between the two methods
more explicit.
2020-07-25 22:11:47 -07:00
Craig Topper 14c59b4577 [X86] Remove getProcFamily() method from X86Subtarget. NFC
This isn't used and we've decided in the past that a CPU enum
for tuning is not a good idea.
2020-07-25 22:11:45 -07:00
Changpeng Fang 9162b70e51 DADCombiner: Don't simplify the token factor if the node's number of operands already exceeds TokenFactorInlineLimit
Summary:
  In parallelizeChainedStores, a TokenFactor was created with the size greater than 3000.
We found that DAGCombiner::visitTokenFactor will consume a huge amount of time on
such nodes. Since the number of operands already exceeds TokenFactorInlineLimit, we propose
to give up simplification with the consideration of compile time.

Reviewers:
  @spatel, @arsenm

Differential Revision:
  https://reviews.llvm.org/D84204
2020-07-25 21:20:59 -07:00
Craig Topper 1df8804ce5 [X86] Replace a use of ProcIntelSLM with FeatureFast7ByteNOP. 2020-07-25 20:46:48 -07:00
Eric Christopher 18975762c1 Fold StatepointBB into checks as it's only used from an NDEBUG or ASSERT
context fixing an unused variable warning.
2020-07-25 18:36:53 -07:00
Nemanja Ivanovic cdead4f89c [PowerPC][NFC] Fix an assert that cannot trip from 7d076e19e3
I mixed up the precedence of operators in the assert and thought I
had it right since there was no compiler warning. This just
adds the parentheses in the expression as needed.
2020-07-25 20:28:52 -04:00
Philip Reames 55dae9c20c [Statepoints] Style cleanup after 3da1a963 [NFC]
Just fixing a few minor stylistic issues.
2020-07-25 16:40:39 -07:00
Roman Lebedev 96d74530c0
[Reduce] Argument reduction: do deal with function declarations
We can happily turn function definitions into declarations,
thus obscuring their argument from being elided by this pass.

I don't believe there is a good reason to just ignore declarations.
likely even proper llvm intrinsics ones,
at worst the input becomes uninteresting.

The other question here is that all these transforms are all-or-nothing.
In some cases, should we be treating each use separately?

The main blocker here seemed to be that llvm::CloneFunctionInto()
does `&OldFunc->front()`, which inserts a nullptr into a densemap,
which is not happy about it and asserts.
2020-07-26 01:31:56 +03:00
Lang Hames a01c4ee71c [ORC] Rename TargetProcessControl DynamicLibraryHandle and loadLibrary.
The new names, DylibHandle and loadDylib, are more concise and make
clear that these utilities are for loading dynamic libraries, not static
ones.
2020-07-25 15:21:43 -07:00
Lang Hames 11d5316afd [ORC] Don't require PageSize or Triple during TargetProcessControl construction
Subclasses will commonly gather that information from a remote during
construction, in which case they won't have meaningful values to pass to
TargetProcessControl's constructor.
2020-07-25 15:21:43 -07:00
Philip Reames 3da1a9634e [Statepoints] Support lowering gc relocations to virtual registers
(Disabled under flag for the moment)

This is part of a larger project wherein we are finally integrating lowering of gc live operands with the register allocator.  Today, we force spill all operands in SelectionDAG.  The code to do so is distinctly non-optimal.  The approach this patch is working towards is to instead lower the relocations directly into the MI form, and let the register allocator pick which ones get spilled and which stack slots they get spilled to.  In terms of performance, the later part is actually more important as it avoids redundant shuffling of values between stack slots.

This particular change adds ISEL support to produce the variadic def STATEPOINT form required by the above.  In particular, the first N are lowered to variadic tied def/use pairs.  So new statepoint looks like this:
reloc1,reloc2,... = STATEPOINT ..., base1, derived1<tied-def0>, base2, derived2<tied-def1>, ...

N is limited by the maximal number of tied registers machine instruction can have (15 at the moment).

The current patch is restricted to handling relocations within a single basic block.  Cross block relocations (e.g. invokes) are handled via the legacy mechanism.  This restriction will be relaxed in future patches.

Patch By: dantrushin
Differential Revision: https://reviews.llvm.org/D81648
2020-07-25 14:26:05 -07:00
Nikita Popov bc79ed7e16 [LVI] Don't require operand number for range (NFC)
Pass the Value* instead of the operand number, rename I to CxtI.
This makes the function a bit more generally useful.
2020-07-25 16:33:45 +02:00
Matt Arsenault 392b969c32 AMDGPU/GlobalISel: Don't assert on G_INSERT > 128-bits
Just fallback for now. Really tablegen needs to generate all of the
subregister index handling we need.
2020-07-25 10:05:44 -04:00
Nikita Popov 632a89e866 [SCCP] Restore the change reporting as well
Reapply 5db5b4bc43.
2020-07-25 15:11:30 +02:00
Nikita Popov ad16e71c95 Reapply [SCCP] Directly remove non-feasible edges
Reapply with DTU update moved after CFG update, which is a
requirement of the API.

-----

Non-feasible control-flow edges are currently removed by replacing
the branch condition with a constant and then calling
ConstantFoldTerminator. This happens in a rather roundabout manner,
by inspecting the users (effectively: predecessors) of unreachable
blocks, and further complicated by the need to explicitly materialize
the condition for "forced" edges. I would like to extend SCCP to
discard switch conditions that are non-feasible based on range
information, but this is incompatible with the current approach
(as there is no single constant we could use.)

Instead, this patch explicitly removes non-feasible edges. It
currently only needs to handle the case where there is a single
feasible edge. The llvm_unreachable() branch will need to be
implemented for the aforementioned switch improvement.

Differential Revision: https://reviews.llvm.org/D84264
2020-07-25 14:52:35 +02:00
Simon Pilgrim b5e14d78f1 SimplifyLibCalls - remove unnecessary header and forward declaration. NFC.
We include TargetLibraryInfo.h so don't need to forward declare it, and we don't need to include TargetLibraryInfo.h in SimplifyLibCalls.cpp as well.
2020-07-25 12:58:39 +01:00
Simon Pilgrim 3b21823e4a [X86][SSE] combineX86ShufflesRecursively - move all Root node asserts to the same location. NFCI.
Minor tidyup for some upcoming shuffle combine improvements.
2020-07-25 12:48:14 +01:00
Florian Hahn 3c1476d26c [IPSCCP] Drop argmemonly after replacing pointer argument.
This patch updates IPSCCP to drop argmemonly and
inaccessiblemem_or_argmemonly if it replaces a pointer argument.

Fixes PR46717.

Reviewers: efriedma, davide, nikic, jdoerfert

Reviewed By: efriedma, jdoerfert

Differential Revision: https://reviews.llvm.org/D84432
2020-07-25 11:52:14 +01:00
Simon Pilgrim 66998ae59f [X86][SSE] getFauxShuffle - ignore undemanded sources for PACKSS/PACKUS faux shuffles
If we don't care about an entire LHS/RHS of the PACK op, then can just treat it the same as undef (we don't care if it saturates) and is safe to treat as a shuffle.

This can happen if we attempt to decode as a faux shuffle before SimplifyDemandedVectorElts has been called on the PACK which should replace the source with UNDEF entirely.
2020-07-25 10:51:14 +01:00
Jessica Paquette 604e33e83a [AArch64][GlobalISel] Look through constants when selection stores of 0
Very minor code size improvements (hits 8 times in Bullet at -O3), but still
something.

Also very minor NFC change to make sure we only search for a 0 constant when
selecting a store. Before, we'd do this for loads as well.

Differential Revision: https://reviews.llvm.org/D84573
2020-07-24 22:46:14 -07:00
Amy Kwan 739cd2638b [PowerPC] Exploit the High Order Vector Multiply Instructions on Power10
This patch aims to exploit the following vector multiply high instructions on Power10.
vmulhsw VRT, VRA, VRB
vmulhsd VRT, VRA, VRB
vmulhuw VRT, VRA, VRB
vmulhud VRT, VRA, VRB

Differential Revision: https://reviews.llvm.org/D82584
2020-07-24 20:57:57 -05:00
Rong Xu 1dd39b1133 [PGO] Fix incorrect function entry count
Function entry count might be zero after the profile counts reset and
before reentry to the function.

Zero profile entry count is very bad as the profile count from BFI will
be wrong.

A simple fix is to set the profile entry count to 1 if there are
non-zero profile counts in this function.

Differential Revision: https://reviews.llvm.org/D84378
2020-07-24 17:39:55 -07:00
Rong Xu 31bd15c562 [PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction
Skip profile count promotion if any of the ExitBlocks contains a ret
instruction. This is to prevent dumping of incomplete profile -- if the
the loop is a long running loop and dump is called in the middle
of the loop, the result profile is incomplete.

ExitBlocks containing a ret instruction is an indication of a long running
loop -- early exit to error handling code.

Differential Revision: https://reviews.llvm.org/D84379
2020-07-24 17:38:31 -07:00
Rong Xu 5546c2ab42 Revert "[PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction"
This reverts commit 6fdc6f6c7d.
2020-07-24 17:35:44 -07:00
Amy Kwan 74790a5dde [PowerPC] Implement Truncate and Store VSX Vector Builtins
This patch implements the `vec_xst_trunc` function in altivec.h in  order to
utilize the Store VSX Vector Rightmost [byte | half | word | doubleword] Indexed
instructions introduced in Power10.

Differential Revision: https://reviews.llvm.org/D82467
2020-07-24 19:22:39 -05:00
Jessica Paquette fcc55c0952 [AArch64][GlobalISel] Use wzr/xzr for 16 and 32 bit stores of zero
We weren't performing this optimization on 16 and 32 bit stores. SDAG happily
does this though.

e.g. https://godbolt.org/z/cWocKr

This saves about 0.2% in code size on CTMark at -O3.

Differential Revision: https://reviews.llvm.org/D84568
2020-07-24 17:15:20 -07:00
Rong Xu 6fdc6f6c7d [PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction
Skip profile count promotion if any of the ExitBlocks contains a ret
instruction. This is to prevent dumping of incomplete profile -- if the
the loop is a long running loop and dump is called in the middle
of the loop, the result profile is incomplete.

ExitBlocks containing a ret instruction is an indication of a long running
loop -- early exit to error handling code.

Differential Revision: https://reviews.llvm.org/D84379
2020-07-24 17:13:58 -07:00
Matt Arsenault 4b53072ee5 GlobalISel: Define mulfix/divfix opcodes
The full expansion involves the funnel shifts, which depend on another
patch to expand those.
2020-07-24 20:02:20 -04:00
Amara Emerson f320f83f3a [AArch64][GlobalISel] Promote G_UITOFP vector operands to same elt size as result.
Fixes legalization failures.
2020-07-24 17:00:50 -07:00
Alina Sbirlea 8bf4c1f4fb Reapply "[DomTree] Replace ChildrenGetter with GraphTraits over GraphDiff."
This is the part of the patch that's moving the Updates to a CFGDiff
object. Splitting off from the clean-up work merging the two branches when BUI is null.

Differential Revision: https://reviews.llvm.org/D77341
2020-07-24 14:10:50 -07:00
Matt Arsenault 2bd72abef0 AMDGPU: Skip other terminators before inserting s_cbranch_exec[n]z
PHIElimination/createPHISourceCopy inserts non-branch terminators
after the control flow pseudo if a successor phi reads a register
defined by the control flow pseudo. If this happens, we need to split
the expansion of the control flow pseudo to ensure all the branches
are after all of the other mask management instructions.

GlobalISel hit this in testscases that happened to be tail
duplicated. The original testcase still does not work, since the same
problem appears to be present in a later pass.
2020-07-24 16:51:59 -04:00
Eli Friedman c02aa53ecb [AArch64][SVE] Add "fast" fcmp operations.
dacf8d3 added support for most fcmp operations, but there are some extra
variations I hadn't considered: SelectionDAG supports float comparisons
that are neither ordered nor unordered. Add support for the missing
operations.

Differential Revision: https://reviews.llvm.org/D84460
2020-07-24 13:22:41 -07:00
Johannes Doerfert aa09db495a [SROA] Teach promote to register about droppable instructions
This is the second of two patches to address PR46753. We basically allow
SROA to promote allocas that are used in doppable instructions, for
now that means `llvm.assume`. The (transitive) uses are replaced by
`undef` in the droppable instructions.

See also D83976.

Reviewed By: Tyker

Differential Revision: https://reviews.llvm.org/D83978
2020-07-24 15:15:39 -05:00
Johannes Doerfert ce8928f2e4 [Mem2Reg] Teach promote to register about droppable instructions
This is the first of two patches to address PR46753. We basically allow
mem2reg to promote allocas that are used in doppable instructions, for
now that means `llvm.assume`. The uses of the alloca (or a bitcast or
zero offset GEP from there) are replaced by `undef` in the droppable
instructions.

Reviewed By: Tyker

Differential Revision: https://reviews.llvm.org/D83976
2020-07-24 15:15:38 -05:00
Johannes Doerfert ce2d69b557 [SROA][Mem2Reg] Do not crash on alloca + addrspacecast
SROA knows that it can look through addrspacecast but
PromoteMemoryToRegister did not handle them. This caused an assertion
error for the test case, exposed while running
`Transforms/PhaseOrdering/inlining-alignment-assumptions.ll` with D83978
applied.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D84085
2020-07-24 15:15:38 -05:00
Gui Andrade 1e77b3af12 [MSAN] Allow inserting array checks
Flattens arrays by ORing together all their elements.

Differential Revision: https://reviews.llvm.org/D84446
2020-07-24 20:12:58 +00:00
Nemanja Ivanovic 7d076e19e3 [PowerPC] Fix computation of offset for load-and-splat for permuted loads
Unfortunately this is another regression from my canonicalization patch
(1fed131660). The patch contained two implicit assumptions:
1. That we would have a permuted load only if we are loading a partial vector
2. That a partial vector load would necessarily be as wide as the splat

However, assumption 2 is not correct since it is possible to do a wider
load and only splat a half of it. This patch corrects this assumption by
simply checking if the load is permuted and adjusting the offset if it is.
2020-07-24 15:38:46 -04:00
Martin Storsjö 9e81d8bbf1 [MC] [COFF] Make sure that weak external symbols are undefined symbols
For comdats (e.g. caused by -ffunction-sections), Section is already
set here; make sure it's null, for the weak external symbol to be undefined.

This fixes PR46779.

Differential Revision: https://reviews.llvm.org/D84507
2020-07-24 22:15:08 +03:00
Martin Storsjö 4d09ed953b [llvm-lib] Support adding short import library objects with llvm-lib
This fixes PR 42837.

Differential Revision: https://reviews.llvm.org/D84465
2020-07-24 22:15:08 +03:00
Arthur Eubanks 9bb6ce78be Rename scoped-noalias -> scoped-noalias-aa
Summary: To match NewPM name. Also the new name is clearer and more consistent.

Subscribers: jvesely, nhaehnle, hiraditya, asbirlea, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84542
2020-07-24 12:14:27 -07:00
madhur13490 4a577c3a22 [AMDGPU] Fix incorrect arch assert while setting up FlatScratchInit
Reviewers: arsenm, foad, rampitec, scott.linder

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84391
2020-07-24 18:19:04 +00:00
Craig Topper 945ed22f33 [X86] Move the implicit enabling of sse2 for 64-bit mode from X86Subtarget::initSubtargetFeatures to X86_MC::ParseX86Triple.
ParseX86Triple already checks for 64-bit mode and produces a
static string. We can just add +sse2 to the end of that static
string. This avoids a potential reallocation when appending it
to the std::string at runtime.

This is a slight change to the behavior of tools that only use
MC layer which weren't implicitly enabling sse2 before, but will
now. I don't think we check for sse2 explicitly in any MC layer
components so this shouldn't matter in practice. And if it did
matter the new behavior is more correct.
2020-07-24 11:14:20 -07:00
Francesco Petrogalli 809600d664 [llvm][sve] Reg + Imm addressing mode for ld1ro.
Reviewers: kmclaughlin, efriedma, sdesmalen

Subscribers: tschuett, hiraditya, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83357
2020-07-24 17:48:47 +00:00
Craig Topper 8158f0cefe [X86] Use X86_MC::ParseX86Triple to add mode features to feature string in X86Subtarget::initSubtargetFeatures.
Remove mode flags from constructor and remove calls to
ToggleFeature for the mode bits.

By adding them to the feature string we handle initializing the
mode member variables in X86Subtarget and the feature bits in
MCSubtargetInfo in one shot.
2020-07-24 10:48:22 -07:00
Meera Nakrani db37937a47 [ARM] Added additional patterns to VABD instruction
Added extra patterns to VABD instruction so it is selected in place of VSUB and VABS. Added corresponding regression test too.

Differential Revision: https://reviews.llvm.org/D84500
2020-07-24 17:46:25 +00:00
Meera Nakrani 805e6bcf22 Test Commit
Test commit - added whitespace in ARMInstrMVE.td
2020-07-24 17:22:56 +00:00
Florian Hahn 1c7c69c795 [ValueTracking] Check for ConstantExpr before using recursive helpers.
Make sure we do not call
constainsConstantExpression/containsUndefElement on ConstantExpression,
which is not supported.

In particular, containsUndefElement/constainsConstantExpression are only
supported on constants which are supported by getAggregateElement.

Unfortunately there's no convenient way to check if a constant supports
getAggregateElement, so just check for non-constantexpressions with
vector type. Other users of those functions do so too.

Reviewers: spatel, nikic, craig.topper, lebedev.ri, jdoerfert, aqjune

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84512
2020-07-24 17:37:09 +01:00
Nicolai Hähnle 5934df0c9a MachineBasicBlock: add printName method
Common up some existing MBB name printing logic into a single place.
Note that basic block dumping now prints the same set of attributes as
the MIRPrinter.

Change-Id: I8f022bbd922e831bc96d63143d7472c03282530b

Differential Revision: https://reviews.llvm.org/D83253
2020-07-24 18:18:09 +02:00
Dmitry Preobrazhensky 6b8948922c [AMDGPU][MC] Added support of SP3 syntax for MTBUF format modifier
Currently supported LLVM MTBUF syntax is shown below. It is not compatible with SP3.

    op     dst, addr, rsrc, FORMAT, soffset

This change adds support for SP3 syntax:

    op     dst, addr, rsrc, soffset SP3FORMAT

In addition to being compatible with SP3, this syntax allows using symbolic names for data, numeric and unified formats. Below is a list of added syntax variants.

format:<expression>
format:[<numeric-format-name>,<data-format-name>]
format:[<data-format-name>,<numeric-format-name>]
format:[<data-format-name>]
format:[<numeric-format-name>]
format:[<unified-format-name>]

The last syntax variant is supported for GFX10 only.

See llvm bug 37738

Reviewers: arsenm, rampitec, vpykhtin

Differential Revision: https://reviews.llvm.org/D84026
2020-07-24 16:41:03 +03:00
Djordje Todorovic 6371a0a00e [DWARF][EntryValues] Emit GNU extensions in the case of DWARF 4 + SCE
Emit DWARF 5 call-site symbols even though DWARF 4 is set,
only in the case of LLDB tuning.

This patch addresses PR46643.

Differential Revision: https://reviews.llvm.org/D83463
2020-07-24 14:33:57 +02:00
Simon Pilgrim 0128b9505c Revert rG5dd566b7c7b78bd- "PassManager.h - remove unnecessary Function.h/Module.h includes. NFCI."
This reverts commit 5dd566b7c7.

Causing some buildbot failures that I'm not seeing on MSVC builds.
2020-07-24 13:02:33 +01:00
Simon Pilgrim 5dd566b7c7 PassManager.h - remove unnecessary Function.h/Module.h includes. NFCI.
PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list.

This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.
2020-07-24 12:40:50 +01:00
Djordje Todorovic cbb3571b0d [DWARF] Avoid entry_values production for SCE
SONY debugger does not prefer debug entry values feature, so
the plan is to avoid production of the entry values
by default when the tuning is SCE debugger.

The feature still can be enabled with the -debug-entry-values
option for the testing/development purposes.

This patch addresses PR46643.

Differential Revision: https://reviews.llvm.org/D83462
2020-07-24 13:34:05 +02:00
Xing GUO bbb057c49a [DWARFYAML] Replace 'Format', 'Version', etc with 'FormParams'. NFC.
This patch replaces 'Format', 'Version' fields, etc with 'FormParams' to
simplify codes.

Reviewed By: labath

Differential Revision: https://reviews.llvm.org/D84496
2020-07-24 16:54:51 +08:00
Petar Avramovic 47bd41d099 AMDGPU/GlobalISel: Select set.inactive intrinsic
Differential Revision: https://reviews.llvm.org/D84407
2020-07-24 10:14:14 +02:00
Craig Topper 205e8b7e89 [X86] Make the X86ProcFamilyEnum private to X86Subtarget. Removed unneeded 'protected' from X86Subtarget. NFC 2020-07-23 23:42:11 -07:00
Petr Hosek 10b1b4a231 [CMake] Simplify CMake handling for zlib
Rather than handling zlib handling manually, use find_package from CMake
to find zlib properly. Use this to normalize the LLVM_ENABLE_ZLIB,
HAVE_ZLIB, HAVE_ZLIB_H. Furthermore, require zlib if LLVM_ENABLE_ZLIB is
set to YES, which requires the distributor to explicitly select whether
zlib is enabled or not. This simplifies the CMake handling and usage in
the rest of the tooling.

This is a reland of abb0075 with all followup changes and fixes that
should address issues that were reported in PR44780.

Differential Revision: https://reviews.llvm.org/D79219
2020-07-23 23:05:36 -07:00
Xing GUO 367d0d4c32 [DWARFYAML] Use writeDWARFOffset() to simplify emitting offsets. NFC.
This patch uses writeDWARFOffset() to simplify some codes. NFC.
2020-07-24 12:15:18 +08:00
Craig Topper 8131e19064 [LegalizeTypes] Teach DAGTypeLegalizer::GenWidenVectorLoads to pad with undef if needed when concatenating small or loads to match a larger load
In the included test case the align 16 allowed the v23f32 load to handled as load v16f32, load v4f32, and load v4f32(one element not used). These loads all need to be concatenated together into a final vector. In this case we tried to concatenate the two v4f32 loads to match the type of the v16f32 load so we could do a second concat_vectors, but those loads alone only add up to v8f32. So we need to two v4f32 undefs to pad it.

It appears we've tried to hack around a similar issue in this code before by adding undef padding to loads in one of the earlier loops in this function. Originally in r147964 by padding all loads narrower than previous loads to the same size. Later modifed to only the last load in r293088. This patch removes that earlier code and just handles it on demand where we know we need it.

Fixes PR46820

Differential Revision: https://reviews.llvm.org/D84463
2020-07-23 19:02:03 -07:00
Matt Arsenault 891759db73 GlobalISel: Add scalarSameSizeAs LegalizeRule
Widen or narrow a type to a type with the same scalar size as
another. This can be used to force G_PTR_ADD/G_PTRMASK's scalar
operand to match the bitwidth of the pointer type. Use this to
disallow narrower types for G_PTRMASK.
2020-07-23 21:17:31 -04:00
Fangrui Song 4637daa990 Revert D84264 "[SCCP] Directly remove non-feasible edges" & 5db5b4bc43
It breaks stage-2 build. Clang crashed when compiling
llvm/lib/Target/Hexagon/HexagonFrameLowering.cpp

llvm/Support/GenericDomTree.h eraseNode: Node is not a leaf node
2020-07-23 17:51:48 -07:00
Eli Friedman 993c1a3219 [AArch64][SVE] Teach copyPhysReg to copy ZPR2/3/4.
It's sort of tricky to hit this in practice, but not impossible. I have
a synthetic C testcase if anyone is interested.

The implementation is identical to the equivalent NEON register copies.

Differential Revision: https://reviews.llvm.org/D84373
2020-07-23 16:41:37 -07:00
Lang Hames 69091eb1c4 [ORC] Enable use of TargetProcessControl::getMemMgr with ObjectLinkingLayer.
This patch makes ownership of the JITLinkMemoryManager by ObjectLinkingLayer
optional: the layer can still own the memory manager but no longer has to.

Evevntually we want to move to a state where ObjectLinkingLayer never owns its
memory manager. For now allowing optional ownership makes it easier to develop
classes that can dynamically use either RTDyldObjectLinkingLayer, which owns
its memory managers, or ObjectLinkingLayer (e.g. LLJIT).
2020-07-23 16:18:57 -07:00
Amy Kwan 1dc1a3fb0c [PowerPC] Implement low-order Vector Multiply, Modulus and Divide Instructions
This patch aims to implement the low order vector multiply, divide and modulo
instructions available on Power10.

The patch involves legalizing the ISD nodes MUL, UDIV, SDIV, UREM and SREM for
v2i64 and v4i32 vector types in order to utilize the following instructions:
- Vector Multiply Low Doubleword: vmulld
- Vector Modulus Word/Doubleword: vmodsw, vmoduw, vmodsd, vmodud
- Vector Divide Word/Doubleword: vdivsw, vdivsd, vdivuw, vdivud

Differential Revision: https://reviews.llvm.org/D82510
2020-07-23 17:18:36 -05:00
Petr Hosek 38c71b7c85 Revert "[CMake] Simplify CMake handling for zlib"
This reverts commit 1d09ecf361 since
it breaks sanitizer bots.
2020-07-23 15:12:42 -07:00
Eric Christopher 3ac828b8f7 Use llvm::size rather than an empty loop to get the number of top
level loops.
2020-07-23 14:55:50 -07:00
Petr Hosek 1d09ecf361 [CMake] Simplify CMake handling for zlib
Rather than handling zlib handling manually, use find_package from CMake
to find zlib properly. Use this to normalize the LLVM_ENABLE_ZLIB,
HAVE_ZLIB, HAVE_ZLIB_H. Furthermore, require zlib if LLVM_ENABLE_ZLIB is
set to YES, which requires the distributor to explicitly select whether
zlib is enabled or not. This simplifies the CMake handling and usage in
the rest of the tooling.

This is a reland of abb0075 with all followup changes and fixes that
should address issues that were reported in PR44780.

Differential Revision: https://reviews.llvm.org/D79219
2020-07-23 14:47:25 -07:00
Amara Emerson 645e7fc542 [GlobalISel] Use existing MIR builder instead of creating one in combiner. 2020-07-23 14:16:45 -07:00
Sidharth Baveja 38a8217931 [Loop Fusion] Integrate Loop Peeling into Loop Fusion (re-land after fixing ASAN build failures)
This patch adds the ability to peel off iterations of the first loop in loop
fusion. This can allow for both loops to have the same trip count, making it
legal for them to be fused together.

Here is a simple scenario peeling can be used in loop fusion:

for (i = 0; i < 10; ++i)
  a[i] = a[i] + 3;
for (j = 1; j < 10; ++j)
  b[j] = b[j] + 5;

Here is we can make use of peeling, and then fuse the two loops together. We
can peel off the 0th iteration of the loop i, and then combine loop i and j for
i = 1 to 10.

a[0] = a[0] +3;
for (i = 1; i < 10; ++i) {
  a[i] = a[i] + 3;
  b[i] = b[i] + 5;
}

Currently peeling with loop fusion is only supported for loops with constant
trip counts and a single exit point. Both unguarded and guarded loops are
supported.

Reviewed By: bmahjour (Bardia Mahjour), MaskRay (Fangrui Song)

Differential Revision: https://reviews.llvm.org/D82927
2020-07-23 21:02:04 +00:00
David Green b37e92201c [ARM] Add predicated mla reduction patterns
Similar to 8fa824d7a3 but this time for MLA patterns, this selects
predicated vmlav/vmlava/vmlalv/vmlava instructions from
vecreduce.add(select(p, mul(x, y), 0)) nodes.

Differential Revision: https://reviews.llvm.org/D84102
2020-07-23 21:47:59 +01:00
Steven Wu ac375c2fe3 [Bitcode] Avoid duplicating linker option when upgrading
Summary:
The upgrading path from old ModuleFlag based linker options to the new
NamedMetadata based linker option in in materializeMetadata() which gets
called once for the module and once for every GV. The linker options are
getting dup'ed every time and it can create massive amount of the linker
options in the object file that gets created from old bitcode. Fix the
problem by checking if the new option exists or not before upgrade
again.

rdar://64543389

Reviewers: pcc, t.p.northover, dexonsmith, arphaman

Reviewed By: arphaman

Subscribers: hiraditya, jkorous, ributzka, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83688
2020-07-23 13:07:28 -07:00
Tarindu Jayatilaka 06283661b3 Add new function properties to FunctionPropertiesAnalysis
Added  LoadInstCount, StoreInstCount, MaxLoopDepth, LoopCount

Reviewed By: jdoerfert, mtrofin

Differential Revision: https://reviews.llvm.org/D82283
2020-07-23 12:46:47 -07:00
Matt Arsenault b9c644ec61 AMDGPU: Fix failures from overflowing uint8_t number of operands
If the operand index exceeded the limit of unsigned char, it wrapped
and would point to the wrong operand. Increase the size of the operand
index field to avoid this, and also don't bother trying to fold into
implicit operands.
2020-07-23 15:39:33 -04:00
Amara Emerson 3b10e42ba1 [AArch64][GlobalISel] Add post-legalize combine for sext(trunc(sextload)) -> trunc/copy
On AArch64 we generate redundant G_SEXTs or G_SEXT_INREGs because of this.

Differential Revision: https://reviews.llvm.org/D81993
2020-07-23 12:06:35 -07:00
Nikita Popov 5db5b4bc43 [SCCP] Add missing change reporting
Forgot to actually use the return value of the function.
2020-07-23 20:58:29 +02:00
Tarindu Jayatilaka ee6f0e109c Add a Printer to the FunctionPropertiesAnalysis
A printer pass and a lit test case was added.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D82523
2020-07-23 11:57:11 -07:00
Nikita Popov deb4bb2b3a [IR] Add min/max/abs intrinsics
This adds the llvm.abs(), llvm.umin(), llvm.umax(), llvm.smin(),
and llvm.smax() intrinsics specified in D81829. For SelectionDAG,
the ISD opcodes and all the legalization and lowering already exist,
so this just wires them up to the intrinsic in the SDAG builder and
adds rudimentary tests. For GlobalISel only the min/max intrinsics
are wired up, as llvm.abs() will require the addition of a G_ABS op,
and corresponding legalization support.

Differential Revision: https://reviews.llvm.org/D84125
2020-07-23 20:56:19 +02:00
Tarindu Jayatilaka 2f56046d7c Refactor FunctionPropertiesAnalysis
this separates  `analyze` logic from  `FunctionPropertiesAnalysis`

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D82521
2020-07-23 11:49:10 -07:00
Nikita Popov 9394c3ec88 [SCCP] Directly remove non-feasible edges
Non-feasible control-flow edges are currently removed by replacing
the branch condition with a constant and then calling
ConstantFoldTerminator. This happens in a rather roundabout manner,
by inspecting the users (effectively: predecessors) of unreachable
blocks, and further complicated by the need to explicitly materialize
the condition for "forced" edges. I would like to extend SCCP to
discard switch conditions that are non-feasible based on range
information, but this is incompatible with the current approach
(as there is no single constant we could use.)

Instead, this patch explicitly removes non-feasible edges. It
currently only needs to handle the case where there is a single
feasible edge. The llvm_unreachable() branch will need to be
implemented for the aforementioned switch improvement.

Differential Revision: https://reviews.llvm.org/D84264
2020-07-23 20:32:57 +02:00
Matt Arsenault d2b8fcff34 AMDGPU/GlobalISel: Handle call return values
The only case that I know doesn't work is the implicit sret case when
the return type doesn't fit in the return registers.
2020-07-23 14:29:35 -04:00
Nikita Popov def48b0e88 [PredicateInfo][SCCP] Remove assertion (PR46814)
As long as RenamedOp is not guaranteed to be accurate, we cannot
assert here and should just return false. This was already done
for the other conditions in this function.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46814.
2020-07-23 19:36:51 +02:00
Gui Andrade 3285b24249 [MSAN] Allow emitting checks for struct types
Differential Revision: https://reviews.llvm.org/D82680
2020-07-23 16:50:59 +00:00
Simon Pilgrim 7eb213499e RegionInfo.cpp - remove duplicate includes that already exist in RegionInfo.h. NFC.
Also remove some unnecessary forward declarations in RegionInfo.h.
2020-07-23 17:50:22 +01:00
Gui Andrade 0025d52c0f [MSAN] Never allow checking calls to __sanitizer_unaligned_{load,store}
These functions expect the caller to always pass shadows over TLS.

Differential Revision: https://reviews.llvm.org/D84351
2020-07-23 16:42:59 +00:00
Craig Topper 5dbcf5e3cc [X86] Add Feature64Bit to the 'generic' CPU and remove feature string hacking in X86Subtarget constructor
Feature64Bit is only used by a check in the X86Subtarget
constructor to ensure that the CPU selected supports 64-bit mode
when the triple is for 64-bit mode.

'generic' is the default CPU in llc and so needs to be able to
pass this check. Previously we did this by detecting the name and
adding the feature to the feature string. But there doesn't seem
to be any reason we can't just add the feature to the CPU directly.
2020-07-23 09:16:18 -07:00
Steven Wu 78709345fb [Bitcode] Drop invalid branch_weight in BitcodeReader
Summary:
If bitcode reader gets an invalid branch weight, drop that from the
inputs. This allows us to read the broken modules we generated before
the verifier was able to catch this.

rdar://64870641

Reviewers: yrouban, t.p.northover, dexonsmith, arphaman, aprantl

Reviewed By: aprantl

Subscribers: aprantl, hiraditya, jkorous, ributzka, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83699
2020-07-23 09:07:15 -07:00
Mircea Trofin 302e91baf4 [llvm][NFC] Add comments and common-case API to MachineBlockFrequencyInfo
Clarify the relation between a block's BlockFrequency and the
getEntryFreq() API, and added an API for the relatively common case of
finding a block's frequency relative to the entrypoint.

Added / moved some comments to header.

Differential Revision: https://reviews.llvm.org/D84357
2020-07-23 08:42:34 -07:00
Simon Pilgrim 86fd5be6fd AggressiveInstCombine.h - remove unused includes. NFC. 2020-07-23 16:20:13 +01:00
Simon Pilgrim 9c81c2372d PassTimingInfo.h - remove unused includes. NFC.
Remove duplicate includes from PassTimingInfo.cpp that already exist in PassTimingInfo.h
2020-07-23 16:20:13 +01:00
Evgeny Leviant dc619f3d7a [CodeGen][TargetPassConfig] Add unreachable-mbb-elimination pass explicitly
Differential revision: https://reviews.llvm.org/D84228
2020-07-23 18:05:11 +03:00
Braedy Kuzma 24e41a34fe [Matrix] Add asserts for mismatched element types.
This patch clarifies the failing point of having input or output vectors
of differing types. Before, lowering would fail elsewhere (e.g. in
`fmul` creation) which may have been not immediately clear.

As a side effect, the `getElementType` and `getVectoryTy` functions
required the `const` qualifier to be added.

Reviewers: fhahn

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D84374
2020-07-23 16:02:48 +01:00
Sebastian Neubauer 896679733d [AMDGPU] Fix typo. NFC 2020-07-23 17:01:12 +02:00
Xing GUO 92874d2866 [DWARFYAML] Refactor emitDebugInfo() to make the length be inferred.
This patch refactors `emitDebugInfo()` to make the length field be
inferred from its content. Besides, the `Visitor` class is removed in
this patch. The original `Visitor` class helps us determine an
appropriate length and emit the .debug_info section. These two
processes can be merged into one process. Besides, the length field
should be inferred when it's missing rather than when it's zero.

Reviewed By: jhenderson, labath

Differential Revision: https://reviews.llvm.org/D84008
2020-07-23 23:00:19 +08:00
Xing GUO a997e6edb9 [DWARFYAML] Pull out common helper functions for rnglist and loclist tables. NFC.
This patch helps pull out some common helper functions for range list
and location list tables. NFC.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D84383
2020-07-23 22:57:30 +08:00
Simon Pilgrim d720ba1e4b [X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add SSE shift multiple use handling
Add SimplifyMultipleUseDemandedVectorElts peek through for imm/var SSE shifts
2020-07-23 14:39:03 +01:00
Ulrich Weigand 68a80a4436 [SystemZ] Ensure -mno-vx disables any use of vector features
When passing the -vector feature to LLVM (or equivalently the
-mno-vx command line argument to clang), the intent is that
generated code must not use any vector features (in particular,
no vector registers must be used).

However, there are some cases where we still could generate
such uses; these are all related to some of the additional
vector features (like +vector-enhancements-1).  Since none
of those features are actually usable with -vector, just make
sure we disable them all if -vector is given.
2020-07-23 15:34:59 +02:00
Florian Hahn ecd3f853a8 [SCEVExpander] Use IRBuilderCallbackInserter to call rememberInstruction.
Currently there are plenty of instructions that SCEVExpander creates but
does not track as created. IRBuilder allows specifying a callback
whenever an instruction is inserted. Use this to call
rememberInstruction automatically for each created instruction.

There are still a few rememberInstruction calls remaining, because in
some cases Inst::Create functions are used to construct instructions.

Suggested by @lebedev.ri in D75980.

Reviewers: mkazantsev, reames, sanjoy.google, lebedev.ri

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D84326
2020-07-23 14:25:28 +01:00
Jay Foad b35833b84e [GlobalISel][AMDGPU] Legalize saturating add/subtract
Add support in LegalizerHelper for lowering G_SADDSAT etc. either
using add/subtract-with-overflow or using max/min instructions.

Enable this lowering for AMDGPU so it can be tested. The legalization
rules are still approximate and skips out on using the clamp bit to
treat these as legal, which has never been used before. This also
doesn't yet try to deal with expanding SALU cases.
2020-07-23 09:06:42 -04:00
Sanjay Patel 7485e92412 [InstSimplify] reduce code duplication for binop expansion; NFC
D84250 proposes to extend this code, so the duplication for
the commuted case would continue to grow.
2020-07-23 08:35:21 -04:00
Simon Pilgrim 1003113ef0 Fix -Wparentheses warning - add missing brackets around the entire assertion condition 2020-07-23 13:33:24 +01:00
Shinji Okumura 697c6d8907 [Attributor] Cache query results for isPotentiallyReachable in AAReachability
Summary:
This is the next patch of [[ https://reviews.llvm.org/D76210 | D76210 ]].
This patch made a map in `InformationCache` for caching results.

Reviewers: jdoerfert, sstefan1, uenoku, homerdin, baziotis

Reviewed By: jdoerfert

Subscribers: hiraditya, uenoku, kuter, bbn, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83246
2020-07-23 20:49:28 +09:00
Konstantin Schwarz 931488779f [GlobalISel][InlineAsm] Add register class ID to the flags of register input operands
Summary: We do this already for output operands, but missed it for (non-tied) input operands.

Reviewers: arsenm, Petar.Avramovic

Reviewed By: arsenm

Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, llvm-commits, kerbowa

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83763
2020-07-23 13:35:01 +02:00
Simon Pilgrim 5b20c14525 ValueProfileCollector.h - remove unnecessary includes. NFC. 2020-07-23 12:33:13 +01:00
Simon Pilgrim f758d72eb8 Speculation.h - remove unnecessary includes. NFC. 2020-07-23 11:58:23 +01:00
Florian Hahn 6c9da995fc [ScheduleDAGRRList] Pacify overload mismatch in std::min.
On systems where size() doesn't return unsigned long, this leads to an
overloading mismatch. Convert the constant to whatever type is used for
Q.size() on the system.
2020-07-23 11:56:50 +01:00
Florian Hahn 2f8e6b5f3c [ScheduleDAGRRList] Limit number of candidates to explore.
Currently popFromQueueImpl iterates over all candidates to find the best
one. While the candidate queue is small, this is not a problem. But it
becomes a problem once the queue gets larger. For example, the snippet
below takes 330s to compile with llc -O0, but completes in 3s with this
patch.

define void @test(i4000000* %ptr) {
entry:
  store i4000000 0, i4000000* %ptr, align 4
  ret void
}

This patch limits the number of candidates to check to 1000. This limit
ensures that it never triggers for test-suite/SPEC2000/SPEC2006 on X86
and AArch64 with -O3, while still drastically limiting the compile-time
in case of very large queues.

It would be even better to use a binary heap to manage to queue
(D83335), but some heuristics change the score of a node in the queue
after another node has been scheduled. I plan to address this for
backends that use the MachineScheduler in the future, but that requires
a more careful evaluation. In the meantime, the limit should help users
impacted by this issue.

The patch includes a slightly smaller version of the motivating example
as test case, to guard against the issue.

Reviewers: efriedma, paquette, niravd

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84328
2020-07-23 11:35:33 +01:00
Sourabh Singh Tomar 8998f8ab66 [DebugInfo] Attempt to fix regression test failure after 59a76d957a
Test case `test/CodeGen/WebAssembly/stackified-debug.ll`
was failing due to malformed DwarfExpression.

This failure has been seen in lot of bots, for instance in:
http://lab.llvm.org:8011/builders/lld-x86_64-ubuntu-fast/builds/18794

: 'RUN: at line 1'
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/build/bin/llc
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/build/bin/FileCheck /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/test/CodeGen/WebAssembly/stackified-debug.ll
home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/test/CodeGen/WebAssembly/stackified-debug.ll:26:10: error: CHECK: expected string not found in input
 CHECK: .int16 4 # Loc expr size
         ^
<stdin>:34:2: note: scanning from here
 .int16 3 # Loc expr size

Differential Revision: https://reviews.llvm.org/D83560
2020-07-23 14:55:30 +05:30
Sourabh Singh Tomar 59a76d957a Re-apply:" Emit DW_OP_implicit_value for Floating point constants"
This patch was reverted in 9d2da6759b due to assertion failure seen
in `test/DebugInfo/Sparc/subreg.ll`. Assertion failure was happening
due to malformed/unhandeled DwarfExpression.

Differential Revision: https://reviews.llvm.org/D83560
2020-07-23 13:56:20 +05:30
Serge Pavlov dab898f9ab [Windows] Fix limit on command line size
This reapplies commit d4020ef7c4, reverted in ac0edc5588 because it
broke build of LLDB. This commit contains appropriate changes for LLDB.
The original commit message is below.

Documentation on CreateProcessW states that maximal size of command line
is 32767 characters including ternimation null character. In the
function llvm::sys::commandLineFitsWithinSystemLimits this limit was set
to 32768. As a result if command line was exactly 32768 characters long,
a response file was not created and CreateProcessW was called with
too long command line.

Differential Revision: https://reviews.llvm.org/D83772
2020-07-23 11:39:42 +07:00
Hiroshi Yamauchi 557db6f8aa Reland D84057 [PGO][PGSO] Remove a temporary flag used for gradual rollout.
The revert was a misfire.

Remove the temporary flag PGSOIRPassOrTestOnly and the guard code which was used
for the staged rollout. This is a cleanup (NFC) as it's now false by default.

Differential Revision: https://reviews.llvm.org/D84057
2020-07-22 20:57:25 -07:00
Sourabh Singh Tomar 9d2da6759b Revert "[DebugInfo] Emit DW_OP_implicit_value for Floating point constants"
This reverts commit 6b55a95898.
Temporal revert due to a failing/assertion in test case in Sparc backend.
`test/DebugInfo/Sparc/subreg.ll`
Seen in lot of bots, for instance in:
`http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/24679`
2020-07-23 08:50:01 +05:30
Xing GUO c4cf250c5b [DWARFYAML] Refactor range list table to hold more data structure.
This patch refactors the range list table to hold both the range list
table and the location list table.

Reviewed By: jhenderson, labath

Differential Revision: https://reviews.llvm.org/D84239
2020-07-23 10:25:01 +08:00
Sourabh Singh Tomar 6b55a95898 [DebugInfo] Emit DW_OP_implicit_value for Floating point constants
Summary:
llvm is missing support for DW_OP_implicit_value operation.
DW_OP_implicit_value op is indispensable for cases such as
optimized out long double variables.

For intro refer: DWARFv5 Spec Pg: 40 2.6.1.1.4 Implicit Location Descriptions

Consider the following example:
```
int main() {
        long double ld = 3.14;
        printf("dummy\n");
        ld *= ld;
        return 0;
}
```
when compiled with tunk `clang` as
`clang test.c -g -O1` produces following location description
of variable `ld`:
```
DW_AT_location        (0x00000000:
                     [0x0000000000201691, 0x000000000020169b): DW_OP_constu 0xc8f5c28f5c28f800, DW_OP_stack_value, DW_OP_piece 0x8, DW_OP_constu 0x4000, DW_OP_stack_value, DW_OP_bit_piece 0x10 0x40, DW_OP_stack_value)
                  DW_AT_name    ("ld")
```
Here one may notice that this representation is incorrect(DWARF4
stack could only hold integers(and only up to the size of address)).
Here the variable size itself is `128` bit.
GDB and LLDB confirms this:
```
(gdb) p ld
$1 = <invalid float value>
(lldb) frame variable ld
(long double) ld = <extracting data from value failed>
```

GCC represents/uses DW_OP_implicit_value in these sort of situations.
Based on the discussion with Jakub Jelinek regarding GCC's motivation
for using this, I concluded that DW_OP_implicit_value is most appropriate
in this case.

Link: https://gcc.gnu.org/pipermail/gcc/2020-July/233057.html

GDB seems happy after this patch:(LLDB doesn't have support
for DW_OP_implicit_value)
```
(gdb) p ld
p ld
$1 = 3.14000000000000012434
```

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D83560
2020-07-23 07:21:49 +05:30
Logan Smith 77e0e9e17d Reapply "Try enabling -Wsuggest-override again, using add_compile_options instead of add_compile_definitions for disabling it in unittests/ directories."
add_compile_options is more sensitive to its location in the file than add_definitions--it only takes effect for sources that are added after it. This updated patch ensures that the add_compile_options is done before adding any source files that depend on it.

Using add_definitions caused the flag to be passed to rc.exe on Windows and thus broke Windows builds.
2020-07-22 17:50:19 -07:00
Craig Topper ebe5f17f9c [X86] Remove the DeprecatedMPX feature flag.
We deprecated mpx feature in 10.0. I left this feature flag
in case someone still had IR files containing the feature
in a target-feature attribute. At the time I think I thought it
would fail the test if the feature couldn't be found. Further
review suggests that at worst it prints a message to
stderr about ignoring the feature.
2020-07-22 17:44:07 -07:00
Amy Huang 724bf4ee23 [Symbolize][PDB] Switch llvm-symbolizer to use PDB_ReaderType::Native.
Since native PDB reading has been implemented for symbolizing,
switch to using the native PDB reader by default, unless
LLVM_ENABLE_DIA_SDK is on.

Bug: https://bugs.llvm.org/show_bug.cgi?id=41795

Differential Revision: https://reviews.llvm.org/D84286
2020-07-22 17:17:57 -07:00
Craig Topper b2c65beb14 [X86] Rework the "sahf" feature flag to only apply to 64-bit mode.
SAHF/LAHF instructions are always available in 32-bit mode. Early
64-bit capable CPUs made the undefined opcodes in 64-bit mode. This
was changed on later CPUs.

We have a feature flag to control our usage of these instructions.
This feature flag is hooked up to a clang command line option
-msahf/-mno-sahf specifically to give control of the 64-bit mode
behavior.

In the backend X86Subtarget constructor we were explicitly forcing
+sahf into the feature flag string if we were not compiling for
64-bit mode. This was intended to make the predicates always allow
the instructions outside of 64-bit mode. Unfortunately, the way
it was placed into the string allowed -mno-sahf from clang to disable
SAHF instructions in 32-bit mode. This causes an assertion to fire
if you compile a floating point comparison with something like
"-march=pentium -mno-sahf" as our floating point comparison
handling on CPUs that don't support FCOMI/FUCOMI instructions
requires SAHF.

To fix this, this commit restricts the feature flag to only apply to
64-bit mode by ignoring the flag outside 64-bit mode in
X86Subtarget::hasLAHFSAHF(). This way we don't need to mess with
the feature string at all.
2020-07-22 16:57:46 -07:00
Lang Hames 13ad00be98 [ORC] Add a TargetProcessControl-based dynamic library search generator.
TPCDynamicLibrarySearchGenerator uses a TargetProcessControl instance to
load libraries and search for symbol addresses in a target process. It
can be used in place of a DynamicLibrarySearchGenerator to enable
target-process agnostic lookup.
2020-07-22 16:19:24 -07:00
Fangrui Song 27650ec554 Revert D81682 "[PGO] Extend the value profile buckets for mem op sizes."
This reverts commit 4a539faf74.

There is a __llvm_profile_instrument_range related crash in PGO-instrumented clang:

```
(gdb) bt
llvm::ConstantRange const&, llvm::APInt const&, unsigned int, bool) ()
llvm::ScalarEvolution::getRangeForAffineAR(llvm::SCEV const*, llvm::SCEV
const*, llvm::SCEV const*, unsigned int) ()
```

(The body of __llvm_profile_instrument_range is inlined, so we can only find__llvm_profile_instrument_target in the trace)

```
 23│    0x000055555dba0961 <+65>:    nopw   %cs:0x0(%rax,%rax,1)
 24│    0x000055555dba096b <+75>:    nopl   0x0(%rax,%rax,1)
 25│    0x000055555dba0970 <+80>:    mov    %rsi,%rbx
 26│    0x000055555dba0973 <+83>:    mov    0x8(%rsi),%rsi  # %rsi=-1 -> SIGSEGV
 27│    0x000055555dba0977 <+87>:    cmp    %r15,(%rbx)
 28│    0x000055555dba097a <+90>:    je     0x55555dba0a76 <__llvm_profile_instrument_target+342>
```
2020-07-22 16:08:25 -07:00
Amy Kwan 5f11027395 [PowerPC][Power10] Fix vins*vlx instructions to have i32 arguments.
Previously, the vins*vlx instructions were incorrectly defined with i64 as the
second argument. This patches fixes this issue by correcting the second argument
of the vins*vlx instructions/intrinsics to be i32.

Differential Revision: https://reviews.llvm.org/D84277
2020-07-22 17:58:14 -05:00
Craig Topper deeb2fdbf4 [X86] Remove a couple temporary std::string for CPU names that I don't need to exist.
The input to these functions is a StringRef. We then convert it
to a std::string. Then maybe replace with "generic". I think we
can just overwrite the incoming StringRef with "generic" if needed
and then pass it along without creating any std::string.
2020-07-22 15:55:04 -07:00
Rahul Joshi ed88cd77d4 [NFC] Simplify `splitLiteralAndReplacement` function
- Eliminate `From` which is 0 most of the times.
- Replace 'find_first_of('{') != 0' with 'front() != '{'
- Simplify the loop body given the it executes only when front() == '}'

Differential Revision: https://reviews.llvm.org/D84178
2020-07-22 15:32:32 -07:00
Christopher Tetreault 23c5e59d9f [SVE] Remove calls to VectorType::getNumElements from Analysis
Reviewers: efriedma, fpetrogalli, c-rhodes, asbirlea, RKSimon

Reviewed By: RKSimon

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81504
2020-07-22 15:19:05 -07:00
Logan Smith 97a0f80c46 Revert "Try enabling -Wsuggest-override again, using add_compile_options instead of add_compile_definitions for disabling it in unittests/ directories."
This reverts commit 388c9fb1af.
2020-07-22 15:07:01 -07:00
Rong Xu 50da55a585 [PGO] Supporting code for always instrumenting entry block
This patch includes the supporting code that enables always
instrumenting the function entry block by default.

This patch will NOT the default behavior.

It adds a variant bit in the profile version, adds new directives in
text profile format, and changes llvm-profdata tool accordingly.

This patch is a split of D83024 (https://reviews.llvm.org/D83024)
Many test changes from D83024 are also included.

Differential Revision: https://reviews.llvm.org/D84261
2020-07-22 15:01:53 -07:00
Fangrui Song dbdda8232a Revert D84057 "[PGO][PGSO] Remove a temporary flag used for gradual rollout."
This reverts commit e64afefdf8. It caused
a PGO bootstrapped clang to crash on many source files.

`__llvm_profile_instrument_range` seems to trigger a null pointer dereference.

Call stack:
__llvm_profile_instrument_range
llvm::APInt::udiv(llvm::APInt const&) const
getRangeForAffineARHelper
2020-07-22 14:28:28 -07:00
Christopher Tetreault ae35c09c34 [MVT] Fix getTypeForEVT for v64f16 and v128f16
Summary: These should have half float as the element type

Reviewers: cameron.mcinally, efriedma, sdesmalen, paulwalker-arm

Reviewed By: paulwalker-arm

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84211
2020-07-22 14:27:08 -07:00
Logan Smith 388c9fb1af Try enabling -Wsuggest-override again, using add_compile_options instead of add_compile_definitions for disabling it in unittests/ directories.
Using add_compile_definitions caused the flag to be passed to rc.exe on Windows and thus broke Windows builds.
2020-07-22 14:19:34 -07:00
David Blaikie 5c2451785d DebugInfo: Use debug_line.dwo for debug_macro.dwo
This is an alternative proposal to D81476 (and D82084) - the details were sufficiently confusing to me it seemed easier to write some code and see how it looks.

Reviewers: SouraVX

Differential Revision: https://reviews.llvm.org/D84278
2020-07-22 14:06:33 -07:00
Fangrui Song 5724c8ba29 Temporarily revert D83903 "[PGO] Enable the extended value profile buckets for mem op sizes."
`__llvm_profile_instrument_memop` transitively calls calloc, thus calloc
should not be instrumented.

I saw a
`calloc -> __llvm_profile_instrument_memop -> calloc -> __llvm_profile_instrument_memop -> ...`
infinite loop leading to stack overflow
when the malloc implementation (e.g. tcmalloc) is built and instrumented along with the application.

We should figure out the library calls which may be instrumented and disable
their instrumentation before rolling out this change.

Reviewed By: yamauchi

Differential Revision: https://reviews.llvm.org/D84358
2020-07-22 13:12:19 -07:00
Mircea Trofin 111a018b36 [llvm][NFC] const-ed MachineBlockFrequencyInfo::isIrrLoopHeader 2020-07-22 13:06:34 -07:00
David Green 411eb87c79 [ARM] Fix missing MVE_VMUL_qr predicate
This was missed out of 1030e82598, but hopefully fixes the issues
reported with NEON accidentally generating MVE instructions.
2020-07-22 20:43:02 +01:00
Andrew Litteken bcbc6117b5 [CGP] Add Pass Dependencies
Add pass dependecies:
  - TargetTransformInfoWrapperPass
  - TargetPassConfig
  - LoopInfoWrapperPass
  - TargetLibraryInfoWrapperPass

To fix inconsistencies when passes are added to the pipeline.

Reviewers: efriedma, kmclaughlin, paquette

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84346
2020-07-22 12:02:53 -07:00
Amy Kwan 08b4a50e39 [PowerPC][Power10] Fix the Test LSB by Byte (xvtlsbb) Builtins Implementation
The implementation of the xvtlsbb builtins/intrinsics were not correct as the
intrinsics previously used i1 as an argument type. This patch changes the i1
argument type used in these intrinsics to be i32 instead, as having the second
as an i1 can lead to issues in the backend.

Differential Revision: https://reviews.llvm.org/D84291
2020-07-22 13:27:05 -05:00
Simon Pilgrim 1c060aa988 DwarfCompileUnit.cpp - remove duplicate includes that already exist in DwarfCompileUnit.h. NFC.
Also remove DIE.h include from DwarfCompileUnit.h and replace with forward declarations.
2020-07-22 19:25:27 +01:00
Simon Pilgrim cd0a36bbda CodeViewDebug.cpp - remove duplicate includes that already exist in CodeViewDebug.h. NFC. 2020-07-22 19:25:27 +01:00
Hans Wennborg 3eec657825 Revert "Enable -Wsuggest-override in the LLVM build" and the follow-ups.
After lots of follow-up fixes, there are still problems, such as
-Wno-suggest-override getting passed to the Windows Resource Compiler
because it was added with add_definitions in the CMake file.

Rather than piling on another fix, let's revert so this can be re-landed
when there's a proper fix.

This reverts commit 21c0b4c1e8.
This reverts commit 81d68ad27b.
This reverts commit a361aa5249.
This reverts commit fa42b7cf29.
This reverts commit 955f87f947.
This reverts commit 8b16e45f66.
This reverts commit 308a127a38.
This reverts commit 274b6b0c7a.
This reverts commit 1c7037a2a5.
2020-07-22 20:23:58 +02:00
Matt Arsenault d26526fd09 AArch64: Use Register 2020-07-22 14:14:44 -04:00
Matt Arsenault 0c92bfa4b8 GlobalISel: Don't use virtual for distinguishing arg handlers
There's no reason to involve the hassle of a virtual method targets
have to override for a simple boolean.

Not sure exactly what's going on with Mips, but it seems to define its
own totally separate handler classes.
2020-07-22 14:14:43 -04:00
Matt Arsenault 6f437117af AMDGPU: Don't assert on f16 inv2pi immediates pre-gfx8
v_cvt_f32_f16 can still accept this value as a literal constant. This
showed up in GlobalISel since it doesn't have constant folding for
G_FPEXT.
2020-07-22 13:59:03 -04:00
Matt Arsenault b98f902f18 GlobalISel: Restructure argument lowering loop in handleAssignments
This was structured in a way that implied every split argument is in
memory, or in registers. It is possible to pass an original argument
partially in registers, and partially in memory. Transpose the logic
here to only consider a single piece at a time. Every individual
CCValAssign should be treated independently, and any merge to original
value needs to be handled later.

This is in preparation for merging some preprocessing hacks in the
AMDGPU calling convention lowering into the generic code.

I'm also not sure what the correct behavior for memlocs where the
promoted size is larger than the original value. I've opted to clamp
the memory access size to not exceed the value register to avoid the
explicit trunc/extend/vector widen/vector extract instruction. This
happens for AMDGPU for i8 arguments that end up stack passed, which
are promoted to i16 (I think this is a preexisting DAG bug though, and
they should not really be promoted when in memory).
2020-07-22 13:31:11 -04:00
Matt Arsenault 1fd1beea18 AMDGPU/GlobalISel: Fix translation of indirect calls 2020-07-22 13:13:21 -04:00
Tarindu Jayatilaka 418121c30a Reapply "Rename InlineFeatureAnalysis to FunctionPropertiesAnalysis"
(This reverts commit a5e0194709, and
corrects author).

Rename the pass to be able to extend it to function properties other than inliner features.

    Reviewed By: mtrofin

    Differential Revision: https://reviews.llvm.org/D82044
2020-07-22 10:07:35 -07:00
Gui Andrade 33d239513c [MSAN] Instrument libatomic load/store calls
These calls are neither intercepted by compiler-rt nor is libatomic.a
naturally instrumented.

This patch uses the existing libcall mechanism to detect a call
to atomic_load or atomic_store, and instruments them much like
the preexisting instrumentation for atomics.

Calls to _load are modified to have at least Acquire ordering, and
calls to _store at least Release ordering. Because this needs to be
converted at runtime, msan injects a LUT (implemented as a vector
with extractelement).

Differential Revision: https://reviews.llvm.org/D83337
2020-07-22 16:45:06 +00:00
Mircea Trofin a5e0194709 Revert "Rename InlineFeatureAnalysis to FunctionPropertiesAnalysis"
This reverts commit 44a6bda19b. I forgot
to correctly attibute it to tarinduj. Fixing and resubmitting.
2020-07-22 09:42:17 -07:00
David Green 8fa824d7a3 [ARM] Add predicated add reduction patterns
Given a vecreduce.add(select(p, x, 0)), we can convert that to a
predicated vaddv, as the else value for the select is the identity
value, a zero. That is what this patch does for the vaddv, vaddva,
vaddlv and vaddlva instructions, copying the existing patterns to also
handle predication through a select.

Differential Revision: https://reviews.llvm.org/D84101
2020-07-22 17:30:02 +01:00
Mircea Trofin 44a6bda19b Rename InlineFeatureAnalysis to FunctionPropertiesAnalysis
Rename the pass to be able to extend it to function properties other than inliner features.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D82044
2020-07-22 09:24:15 -07:00
Dmitry Preobrazhensky 0b8fd77ad9 [AMDGPU][MC] Corrected decoding of 16-bit literals
16-bit literals are encoded as 32-bit values. If high 16-bits of the value is 0xFFFF, the decoded instruction cannot be reassembled.

For example, the following code

0xff,0x04,0x04,0x52,0xcd,0xab,0xff,0xff

was decoded as

v_mul_lo_u16_e32 v2, 0xffffabcd, v2

However this literal is actually a 64-bit constant 0x00000000ffffabcd which violates requirements described in the documentation - the truncation is not safe.

This change corrects decoding to make reassembly possible.

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D84098
2020-07-22 17:20:43 +03:00
Stefan Pintilie a60251d739 [PowerPC] Add linker opt for PC Relative GOT indirect accesses
A linker optimization is available on PowerPC for GOT indirect PCRelative loads.

The idea is that we can mark a usual GOT indirect load:

pld 3, vec@got@pcrel(0), 1
lwa 3, 4(3)

With a relocation to say that if we don't need to go through the GOT we can let
the linker further optimize this and replace a load with a nop.

  pld 3, vec@got@pcrel(0), 1
.Lpcrel1:
.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
  lwa 3, 4(3)

This patch adds the logic that allows the compiler to add the R_PPC64_PCREL_OPT.

Reviewers: nemanjai, lei, hfinkel, sfertile, efriedma, tstellar, grosbach

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D79864
2020-07-22 09:08:23 -05:00
jasonliu b98b1700ef [XCOFF] Enable symbol alias for AIX
Summary:
AIX assembly's .set directive is not usable for aliasing purpose.
We need to use extra-label-at-defintion strategy to generate symbol
aliasing on AIX.

Reviewed By: DiggerLin, Xiangling_L

Differential Revision: https://reviews.llvm.org/D83252
2020-07-22 14:03:55 +00:00
Sebastian Neubauer 2a6c871596 [InstCombine] Move target-specific inst combining
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.

D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).

This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic

A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.

This allows to move about 3000 lines out from InstCombine to the targets.

Differential Revision: https://reviews.llvm.org/D81728
2020-07-22 15:59:49 +02:00
Simon Pilgrim fa95688237 SelectionDAGBuilder.cpp - remove duplicate includes that already exist in SelectionDAGBuilder.h. NFC. 2020-07-22 14:19:41 +01:00
David Green f8abecf337 [ARM] Extra MVE select(binop) patterns
This is very similar to 243970d03cace2, but handling a slightly
different form of predicated operations. When starting with a pattern of
the form select(p, BinOp(x, y), x), Instcombine will often transform
this to BinOp(x, select(p, y, 0)), where 0 is the identity value of the
binop (0 for adds/subs, 1 for muls, -1 for ands etc). This adds the
patterns that transforms those back into predicated binary operations.

There is also a very minor adjustment to tablegen null_frag in here, to
allow it to also be recognized as a PatLeaf node, so that it can be used
in MVE_TwoOpPattern to easily exclude the cases where we do not need the
alternate transform.

Differential Revision: https://reviews.llvm.org/D84091
2020-07-22 14:08:29 +01:00
David Green 3533e0a08d [ARM] Add patterns for select(p, BinOp(x, y), z) -> BinOpT(x, y,p z)
Most MVE instructions can be predicated to fold a select into the
instruction, using the predicate and the selects else as a passthough.
This adds tablegen patterns for most two operand instructions using the
newly added TwoOpPattern from 1030e82598.

Differential Revision: https://reviews.llvm.org/D83222
2020-07-22 13:24:01 +01:00
OCHyams ce6de3747b [DebugInfo] Drop location ranges for variables which exist entirely outside the variable's scope
Summary:
This patch reduces file size in debug builds by dropping variable locations a
debugger user will not see.

After building the debug entity history map we loop through it. For each
variable we look at each entry. If the entry opens a location range which does
not intersect any of the variable's scope's ranges then we mark it for removal.
After visiting the entries for each variable we also mark any clobbering
entries which will no longer be referenced for removal, and then finally erase
the marked entries. This all requires the ability to query the order of
instructions, so before this runs we number them.

Tests:
Added llvm/test/DebugInfo/X86/trim-var-locs.mir

Modified llvm/test/DebugInfo/COFF/register-variables.ll
  Branch folding merges the tails of if.then and if.else into if.else. Each
  blocks' debug-locations point to different scopes so when they're merged we
  can't use either. Because of this the variable 'c' ends up with a location
  range which doesn't cover any instructions in its scope; with the patch
  applied the location range is dropped and its flag changes to IsOptimizedOut.

Modified llvm/test/DebugInfo/X86/live-debug-variables.ll
Modified llvm/test/DebugInfo/ARM/PR26163.ll
  In both tests an out of scope location is now removed. The remaining location
  covers the entire scope of the variable allowing us to emit it as a single
  location.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D82129
2020-07-22 12:45:21 +01:00
Sebastian Neubauer 2c659082bd [AMDGPU] Don't combine memory intrs to v3i16
v3i16 and v3f16 currently cannot be legalized and lowered so they should
not be emitted by inst combining.

Moved the check down to still allow extracting 1 or 2 elements via the dmask.

Fixes image intrinsics being combined to return v3x16.

Differential Revision: https://reviews.llvm.org/D84223
2020-07-22 12:44:01 +02:00
Chen Zheng 36f9fe2d34 [PowerPC] fixupIsDeadOrKill start and end in different block fixing
In fixupIsDeadOrKill, we assume StartMI and EndMI not exist in same
basic block, so we add an assertion in that function. This is wrong
before RA, as before RA the true definition may exist in another
block through copy like instructions.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D83365
2020-07-22 06:27:13 -04:00
Sander de Smalen bef56f7fe2 [AArch64][SVE] Correctly allocate scavenging slot in presence of SVE.
This patch addresses two issues:

* Forces the availability of the base-pointer (x19) when the frame has
  both scalable vectors and variable-length arrays. Otherwise it will
  be expensive to access non-SVE locals.

* In presence of SVE stack objects, it will allocate the emergency
  scavenging slot close to the SP, so that they can be accessed from
  the SP or BP if available. If accessed from the frame-pointer, it will
  otherwise need an extra register to access the scavenging slot because
  of mixed scalable/non-scalable addressing modes.

Reviewers: efriedma, ostannard, cameron.mcinally, rengolin, david-arm

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D70174
2020-07-22 10:50:36 +01:00
Stefan Pintilie e0a372ff10 [PowerPC] Extend .reloc directive on PowerPC
When the compiler generates a GOT indirect load it must generate two loads. One
that loads the address of the element from the GOT and a second to load the
actual element based on the address just loaded from the GOT. However, the
linker can optimize these two loads into one load if it knows that it is safe
to do so. The compiler can tell the linker that the optimization is safe
by using the R_PPC64_PCREL_OPT relocation.

This patch extends the .reloc directive to allow the following setup

  pld 3, vec@got@pcrel(0), 1
.Lpcrel1=.-8
      ... More instructions possible here ...
.reloc .Lpcrel1,R_PPC64_PCREL_OPT,.-.Lpcrel1
  lwa 3, 4(3)

Reviewers: nemanjai, lei, hfinkel, sfertile, efriedma, tstellar, grosbach, MaskRay

Reviewed By: nemanjai, MaskRay

Differential Revision: https://reviews.llvm.org/D79625
2020-07-22 04:25:54 -05:00
Simon Wallis 94e4e37d55 [Thumb] set code alignment for 16-bit load from constant pool
Summary:
[Thumb] set code alignment for 16-bit load from constant pool

LLVM miscompiles this code when compiling for a target with v8.2-A FP16 and the Thumb ISA at -O0:

extern void bar(__fp16 P5);

int main() {
  __fp16 P5 = 1.96875;
  bar(P5);
}

The code section containing main has 2 byte alignment.
It needs to have 4 byte alignment,
because the load literal instruction has an offset from the
load address with the low 2 bits zeroed.

I do not include a test case in this check-in.
llc and llvm-mc do not exhibit this bug. They do not set code section alignment
in the same manner as clang.

Reviewers: dnsampaio

Reviewed By: dnsampaio

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84169
2020-07-22 10:12:41 +01:00
Sjoerd Meijer 5567c62afa [Matrix] Add LowerMatrixIntrinsics to the NPM
Pass LowerMatrixIntrinsics wasn't running yet running under the new pass
manager, and this adds LowerMatrixIntrinsics to the pipeline (to the
same place as where it is running in the old PM).

Differential Revision: https://reviews.llvm.org/D84180
2020-07-22 09:47:53 +01:00
Max Kazantsev b96114c1e1 [SCEV] Remove premature assert. PR46786
This assert was added to verify assumption that GEP's SCEV will be of pointer type,
basing on fact that it should be a SCEVAddExpr with (at least) last operand being
pointer. Two notes:
- GEP's SCEV does not have to be a SCEVAddExpr after all simplifications;
- In current state, GEP's SCEV does not have to have at least one pointer operands
  (all of them can become int during the transforms).

However, we might want to be at a point where it is true. We are currently removing
this assert and will try to enumerate the cases where "is pointer" notion might be
lost during the transforms. When all of them are fixed, we can return it.

Differential Revision: https://reviews.llvm.org/D84294
Reviewed By: lebedev.ri
2020-07-22 15:43:16 +07:00
Petar Avramovic 44967fc604 AMDGPU: Simplify f16 to i64 custom lowering
Range that f16 can represent fits into i32.
Lower as f16->i32->i64 instead of f16->f32->i64
since f32->i64 has long expansion.

Differential Revision: https://reviews.llvm.org/D84166
2020-07-22 10:32:14 +02:00
David Spickett 3a34194606 [ARM] Fix Asm/Disasm of TBB/TBH instructions
Summary:
This fixes Bugzilla #46616 in which it was reported
that "tbb  [pc, r0]" was marked as SoftFail
(aka unpredictable) incorrectly.

Expected behaviour is:
* ARMv8 is required to use sp as rn or rm
  (tbb/tbh only have a Thumb encoding so using Arm mode
  is not an option)
* If rm is the pc then the instruction is always
  unpredictable

Some of this was implemented already and this fixes the
rest. Added tests cover the new and pre-existing handling.

Reviewers: ostannard

Reviewed By: ostannard

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84227
2020-07-22 09:31:56 +01:00
Max Kazantsev 360ab70712 [SimplifyCFG] Do not create unneeded PR Phi in block with convergent calls
We do not thread blocks with convergent calls, but this check was missing
when we decide to insert PR Phis into it (which we only do for threading).

Differential Revision: https://reviews.llvm.org/D83936
Reviewed By: nikic
2020-07-22 13:53:50 +07:00
Kai Luo c3f9697f1f [PowerPC] Fix wrong codegen when stack pointer has to realign performing dynalloc
Current powerpc backend generates wrong code sequence if stack pointer
has to realign if `-fstack-clash-protection` enabled. When probing
dynamic stack allocation, current `PREPARE_PROBED_ALLOCA` takes
`NegSizeReg` as input and returns
`FinalStackPtr`. `FinalStackPtr=StackPtr+ActualNegSize` is calculated
correctly, however code following `PREPARE_PROBED_ALLOCA` still uses
value of `NegSizeReg`, which does not contain `ActualNegSize` if
`MaxAlign > TargetAlign`, to calculate loop trip count and residual
number of bytes.

This patch is part of fix of
https://bugs.llvm.org/show_bug.cgi?id=46759.

Differential Revision: https://reviews.llvm.org/D84152
2020-07-22 06:35:12 +00:00
Kai Luo 8912252252 [PowerPC] Fix wrong codegen when stack pointer has to realign in prologue
Current powerpc backend generates wrong code sequence if stack pointer
has to realign if -fstack-clash-protection enabled. When probing in
prologue, backend should generate a subtraction instruction rather
than a `stux` instruction to realign the stack pointer.

This patch is part of fix of
https://bugs.llvm.org/show_bug.cgi?id=46759.

Differential Revision: https://reviews.llvm.org/D84218
2020-07-22 06:35:12 +00:00
Kang Zhang 9bbf0ecff3 [PowerPC] Fix the implicit operands in PredicateInstruction()
Summary:
In the function `PPCInstrInfo::PredicateInstruction()`, we will replace
non-Predicate Instructions to Predicate Instruction. But we forget add
the new implicit operands the new Predicate Instruction needed. This
patch is to fix this.

Reviewed By: jsji, efriedma

Differential Revision: https://reviews.llvm.org/D82390
2020-07-22 05:51:03 +00:00
Xing GUO 8632931787 [DWARFYAML] Make the length field of compilation units optional. NFC.
This patch makes the length field of compilation units optional (0 by
default).
2020-07-22 12:16:19 +08:00
Chen Zheng e8425b27fe [PowerPC] add store (load float*) pattern to isProfitableToHoist
store (load float*) can be optimized to store(load i32*) in InstCombine pass.

Add store (load float*) to isProfitableToHoist to make sure we don't break
the opt in InstCombine pass.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D82341
2020-07-21 20:55:13 -04:00
Juneyoung Lee ace0bf7490 [ValueTracking] Fix incorrect handling of canCreateUndefOrPoison
.. in isGuaranteedNotToBeUndefOrPoison.

This caused early exit of isGuaranteedNotToBeUndefOrPoison, making it return
imprecise result.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84251
2020-07-22 09:31:16 +09:00
Amy Huang 0881d0bed3 [PDB][NativeSession] Clean up some things in NativeSession.
-Use the actual sect/offset to keep track of symbols in the cache so they don't get created multiple times with different addresses.
-Remove getSymTag from PDBFunctionSymbol/PDBPublicSymbol because it's already implemented in the base class
-Merge the symbolizer test files for DIA and native, since the tests are the same.
-Implement getCompilandId for NativeLineNumber

Reviewed By: amccarth

Differential Revision: https://reviews.llvm.org/D84208
2020-07-21 16:54:52 -07:00
Matt Arsenault bf6bc62d1f GlobalISel: Use Register and update comment physical register syntax 2020-07-21 19:11:57 -04:00
Amy Kwan 1eb279d2a8 [PowerPC][Power10] Add Vector Multiply/Mod/Divide Instruction Definitions and MC Tests
This patch adds the td definitions and asm/disasm tests for the following instructions:
- Vector Multiply Low Doubleword: vmulld
- Vector Modulus Word/Doubleword: vmodsw, vmoduw, vmodsd, vmodud
- Vector Divide Word/Doubleword: vdivsw, vdivuw, vdivsd, vdivud
- Vector Multiply High Word/Doubleword: vmulhsw, vmulhsd, vmulhuw, vmulhud
- Vector Divide Extended Word/Doubleword: vdivesw, vdiveuw, vdivesd, vdiveud

Differential Revision: https://reviews.llvm.org/D82929
2020-07-21 18:05:35 -05:00
Amara Emerson 791544422a Revert "[AArch64][GlobalISel] Add post-legalize combine for sext_inreg(trunc(sextload)) -> copy"
This reverts commit 64eb3a4915.

It caused miscompiles with optimizations enabled. Reverting while I investigate.
2020-07-21 16:01:18 -07:00
Amara Emerson f1ae96d9bf [AArch64][GlobalISel] Fix TLS accesses clobbering registers incorrectly.
This was happening because the BLR didn't have a use of the X0 arg register,
which would end up being re-used in high reg pressure situations.
The change also avoids hard coding the use of X0 for the sequence except to
copy the value for the call. ld64 should still be able to optimize it.

rdar://65438258
2020-07-21 16:01:17 -07:00
Matt Arsenault b258920095 AMDGPU/GlobalISel: Fix not erasing inst when lowering G_FRINT 2020-07-21 18:29:41 -04:00
Matt Arsenault 7cd8a0256d GlobalISel: Legalize G_FPOWI 2020-07-21 18:13:04 -04:00
Matt Arsenault 7941dc5041 GlobalISel: Translate llvm.powi intrinsic
There are a few questionable things about this intrinsic and existing
DAG implementation. For some reason the intrinsic hardcodes the second
operand to be scalar-only i32, and SelectionDAG builder makes a
legalization decision based on whether the operand is constant.
2020-07-21 18:13:04 -04:00
Matt Arsenault 1168119c2f AMDGPU: Start interpreting byref on kernel arguments
These are treated identically to value aggregates placed in the kernel
argument list. A %struct.foo or %struct.foo addrspace(4)*
byref(sizeof(%struct.foo)) align(alignof(%struct.foo)) argument should
produce the same offsets and argument metadata.

This handles all 3 kernel ABI implementations, and the two HSA
metadata emission paths.
2020-07-21 18:11:22 -04:00
Matt Arsenault f659c44016 CodeGen: Add support for lowering byref attribute 2020-07-21 17:38:15 -04:00
Simon Pilgrim 5b5dc2442a [X86][AVX] getTargetShuffleMask - don't decode VBROADCAST(EXTRACT_SUBVECTOR(X,0)) patterns.
getTargetShuffleMask is used by the various "SimplifyDemanded" folds so we can't assume that the bypassed extract_subvector can be safely simplified - getFauxShuffleMask performs a more general decode that allows us to more safely catch many of these cases so the impact is minimal.
2020-07-21 21:55:44 +01:00
Matt Arsenault 2fe0ea8261 DAG: Handle expanding strict_fsub into fneg and strict_fadd
The AMDGPU handling of f16 vectors is terrible still since it gets
scalarized even when the vector operation is legal.

The code is is essentially duplicated between the non-strict and
strict case. Apparently no other expansions are currently trying to do
this. This is mostly because I found the behavior of
getStrictFPOperationAction to be confusing. In the ARM case, it would
expand strict_fsub even though it shouldn't due to the later check. At
that point, the logic required to check for legality was more complex
than just duplicating the 2 instruction expansion.
2020-07-21 16:17:10 -04:00
diggerlin 11546898e2 [AIX][XCOFF]emit extern linkage for the llvm intrinsic symbol
SUMMARY:

when we call memset, memcopy,memmove etc(this are llvm intrinsic function) in the c source code. the llvm will generate IR
like call call void @llvm.memset.p0i8.i32(i8* align 4 bitcast (%struct.S* @s to i8*), i8 %1, i32 %2, i1 false)
for c source code
bash> cat test_memset.call

struct S{
 int a;
 int b;
};
extern struct  S s;
void bar() {
  memset(&s, s.b, s.b);
}
like

%struct.S = type { i32, i32 }
@s = external global %struct.S, align 4
; Function Attrs: noinline nounwind optnone
define void @bar() #0 {
entry:
  %0 = load i32, i32* getelementptr inbounds (%struct.S, %struct.S* @s, i32 0, i32 1), align 4
  %1 = trunc i32 %0 to i8
  %2 = load i32, i32* getelementptr inbounds (%struct.S, %struct.S* @s, i32 0, i32 1), align 4
  call void @llvm.memset.p0i8.i32(i8* align 4 bitcast (%struct.S* @s to i8*), i8 %1, i32 %2, i1 false)
  ret void
}
declare void @llvm.memset.p0i8.i32(i8* nocapture writeonly, i8, i32, i1 immarg) #1
If we want to let the aix as assembly compile pass without -u
it need to has following assembly code.
.extern .memset
(we do not output extern linkage for llvm instrinsic function.
even if we output the extern linkage for llvm intrinsic function, we should not out .extern llvm.memset.p0i8.i32,
instead of we should emit .extern memset)

for other llvm buildin function floatdidf . even if we do not call these function floatdidf in the c source code(the generated IR also do not the call __floatdidf . the function call
was generated in the LLVM optimized.
the function is not in the functions list of Module, but we still need to emit extern .__floatdidf

The solution for it as :
We record all the lllvm intrinsic extern symbol when transformCallee(), and emit all these symbol in the AsmPrinter::doFinalization(Module &M)

Reviewers:  jasonliu, Sean Fertile, hubert.reinterpretcast,

Differential Revision: https://reviews.llvm.org/D78929
2020-07-21 16:03:04 -04:00
Fangrui Song 8a268bec1b Revert D82927 "[Loop Fusion] Integrate Loop Peeling into Loop Fusion"
This reverts commit bb8850d34d.

It broke 3 check-llvm-transforms-loopfusion tests in an ASAN build.

LoopFuse.cpp `for (BasicBlock *Pred : predecessors(BB)) {` may operate on a deleted BB.
2020-07-21 12:24:50 -07:00
David Green 1030e82598 [ARM] Add MVE_TwoOpPattern. NFC
This commons out a chunk of the different two operand MVE patterns into
a single helper multidef. Or technically two multidef patterns so that
the Dup qr patterns can also get the same treatment. This is most of the
two address instructions that we have some codegen pattern for (not ones
that we select purely from intrinsics). It does not include shifts,
which are more spread out and will need some extra work to be given the
same treatment.

Differential Revision: https://reviews.llvm.org/D83219
2020-07-21 19:51:37 +01:00
Guozhi Wei 28759e9fcc [MBP] Use profile count to compute tail dup cost if it is available
Current tail duplication in machine block placement pass uses block frequency
information in cost model. But frequency number has only relative meaning
compared to other basic blocks in the same function. A large frequency number
doesn't mean it is hot and a small frequency number doesn't mean it is cold.

To overcome this problem, this patch uses profile count in cost model if it's
available. So we can tail duplicate real hot basic blocks.

Differential Revision: https://reviews.llvm.org/D83265
2020-07-21 11:18:06 -07:00
Hiroshi Yamauchi 7bedae7dee [PGO][PGSO] Add profile guided size optimization to loop vectorization legality. 2020-07-21 11:16:36 -07:00
Serge Pavlov ac0edc5588 Revert "[Windows] Fix limit on command line size"
This reverts commit d4020ef7c4. It broke
LLDB buildbot: http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/17702.
2020-07-22 01:00:32 +07:00
Nico Weber 4fe912f186 Build: Move TF source file inclusion from build system to source files
Outside of compiler-rt (where it's arguably an anti-pattern too),
LLVM tries to keep its build files as simple as possible. See e.g.
llvm/docs/SupportLibrary.rst, "Code Organization".

Differential Revision: https://reviews.llvm.org/D84243
2020-07-21 13:02:34 -04:00
Arthur Eubanks b13b858182 [NewPM] Support optnone under new pass manager
OptNoneInstrumentation is part of StandardInstrumentations. It skips
functions (or loops) that are marked optnone.

The feature of skipping optional passes for optnone functions under NPM
is gated on a -enable-npm-optnone flag. Currently it is by default
false. That is because we still need to mark all required passes to be
required. Otherwise optnone functions will start having incorrect
semantics.  After that is done in following changes, we can remove the
flag and always enable this.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D83519
2020-07-21 09:53:43 -07:00
Jordan Rupprecht 1ee1da1ea5 [NFC] Fix unused var warning 2020-07-21 09:26:01 -07:00
Sidharth Baveja bb8850d34d [Loop Fusion] Integrate Loop Peeling into Loop Fusion
Summary:
This patch adds the ability to peel off iterations of the first loop in loop
fusion. This can allow for both loops to have the same trip count, making it
legal for them to be fused together.

Here is a simple scenario peeling can be used in loop fusion:

for (i = 0; i < 10; ++i)
  a[i] = a[i] + 3;
for (j = 1; j < 10; ++j)
  b[j] = b[j] + 5;

Here is we can make use of peeling, and then fuse the two loops together. We can
peel off the 0th iteration of the loop i, and then combine loop i and j for
i = 1 to 10.

a[0] = a[0] +3;
for (i = 1; i < 10; ++i) {
  a[i] = a[i] + 3;
  b[i] = b[i] + 5;
}

Currently peeling with loop fusion is only supported for loops with constant
trip counts and a single exit point. Both unguarded and guarded loops are
supported.

Author: sidbav (Sidharth Baveja)

Reviewers: kbarton, Meinersbur, bkramer, Whitney, skatkov, ashlykov, fhahn, bmahjour

Reviewed By: bmahjour

Subscribers: bmahjour, mgorny, hiraditya, zzheng

Tags: LLVM

Differential Revision: https://reviews.llvm.org/D82927
2020-07-21 15:59:14 +00:00
Jon Roelofs dc09c65f63 LoopIdiomRecognize: use ExpandedValuesCleaner in another place
This is a necessary cleanup after having expanded a SCEV.

See: https://reviews.llvm.org/D84071#inline-774728

Differential Revision: https://reviews.llvm.org/D84174
2020-07-21 09:32:23 -06:00
Jon Roelofs 4d75cc4b0a More conservatively report status from LoopIdiomRecognize
Being "precise" here is getting us into trouble with one of the
EXPENSIVE_CHECKS buildbots, see [1].  Rather than reporting IR additions that
later get rolled back as "no change", instead we now conservatively report that
there was.

1: http://lists.llvm.org/pipermail/llvm-dev/2020-July/143509.html

Differential Revision: https://reviews.llvm.org/D84071
2020-07-21 09:32:22 -06:00
Sander de Smalen 9bacf15885 [AArch64][SVE] Fix PCS for functions taking/returning scalable types.
The default calling convention needs to save/restore the SVE callee
saves according to the SVE PCS when the function takes or returns
scalable types, even when the `aarch64_sve_vector_pcs` CC is not
specified for the function.

Reviewers: efriedma, paulwalker-arm, david-arm, rengolin

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D84041
2020-07-21 15:55:39 +01:00
Petre-Ionut Tudor 1af9fc8213 [ARM] Generate [SU]HADD from ((a + b) >> 1)
Summary:
Teach LLVM to recognize the above pattern, where the operands are
either signed or unsigned types.

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83777
2020-07-21 13:22:07 +01:00
Jay Foad 5e5bda74b6 [IR] Simplify Use::swap. NFCI.
The new implementation makes it clear that there are exactly two
conditional stores (after the initial no-op optimization). By contrast
the old implementation had seven conditionals, some hidden inside other
functions.

This commit can change the order of operands in operand lists, hence the
tweak to one test case.

Differential Revision: https://reviews.llvm.org/D80116
2020-07-21 12:15:12 +01:00
David Green becaa6803a [ARM] Constant fold VCTP intrinsics
We can sometimes get into the situation where the operand to a vctp
intrinsic becomes constant, such as after a loop is fully unrolled. This
adds the constant folding needed for them, allowing them to simplify
away and hopefully simplifying remaining instructions.

Differential Revision: https://reviews.llvm.org/D84110
2020-07-21 11:39:31 +01:00
Nico Weber e37b220442 [gn build] (manually) hack around 70f8d0ac8a 2020-07-21 06:35:36 -04:00
Serge Pavlov d4020ef7c4 [Windows] Fix limit on command line size
Documentation on CreateProcessW states that maximal size of command line
is 32767 characters including ternimation null character. In the
function llvm::sys::commandLineFitsWithinSystemLimits this limit was set
to 32768. As a result if command line was exactly 32768 characters long,
a response file was not created and CreateProcessW was called with
too long command line.

Differential Revision: https://reviews.llvm.org/D83772
2020-07-21 17:33:22 +07:00
Florian Hahn 752fea7c27 [SCCP] Add range metadata to call sites with known return ranges.
If we inferred a range for the function return value, we can add !range
at all call-sites of the function, if the range does not include undef.

Reviewers: efriedma, davide, nikic

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D83952
2020-07-21 10:06:54 +01:00
David Green 30371df85f [ARM] More unpredictable VCVT instructions.
These extra vcvt instructions were missed from 74ca67c109 because they
live in a different Domain, but should be treated in the same way.

Differential Revision: https://reviews.llvm.org/D83204
2020-07-21 07:24:37 +01:00
David Blaikie 38fbba4cb8 DebugInfo: Move getMD5AsBytes from DwarfUnit to DwarfDebug
It wasn't using any state from DwarfUnit anyway.
2020-07-20 19:21:39 -07:00
Matt Arsenault 1ef3ed0eb4 GlobalISel: Rewrite getLCMType
Try to make the behavior more consistent with getGCDType, and bias
towards returning something closer to the source type whenever there's
an ambiguity.
2020-07-20 21:06:30 -04:00
Matt Arsenault 12d5bec8c7 GlobalISel: Handle more cases in getGCDType
Try harder to find a canonical unmerge type when trying to cover the
desired target type. Handle finding a compatible unmerge type for two
vectors with different element types. This will return the largest
multiple of the source vector element that will evenly divide the
target vector type.

Also make the handling mixing scalars and vectors, and prefer the
source element type as the unmerge target type.
2020-07-20 20:53:35 -04:00
Matt Arsenault 107c954c13 AMDGPU/GlobalISel: Remove unnecessary parameter 2020-07-20 20:53:01 -04:00
Artem Belevich bf66003a4f [MC,NVPTX] Add MCAsmPrinter support for unsigned-only data directives.
PTX does not support negative values in .bNN data directives and we must
typecast such values to unsigned before printing them.

MCAsmInfo can now specify whether such casting is necessary for particular
target.

Differential Revision: https://reviews.llvm.org/D83423
2020-07-20 16:24:41 -07:00
Lang Hames 574713c307 [ExecutionEngine] Initialize near block hint in SectionMemoryManager.
When allocating a new memory block in SectionMemoryManager, initialize
the Near hint for the other memory groups if they have not been set
already.

Patch by Dana Koch. Thanks Dana!
2020-07-20 14:40:54 -07:00
Logan Smith 308a127a38 [llvm][unittest] Add -Wno-suggest-override to more infrastructure that includes googletest/googlemock headers 2020-07-20 13:59:39 -07:00
Louis Dionne 78f543e5a1 [NFC] Use std::free instead of ::free
Since we include <cstdlib> instead of <stdlib.h>, it makes sense to
use std::free.
2020-07-20 16:19:08 -04:00
Sanjay Patel 750f4c591d [InstCombine] allow peeking through zext of shift amount to match rotate idioms (PR45701)
We might want to also allow trunc of the shift amount, but that seems less likely?

  define i32 @src(i32 %x, i1 %y) {
  %0:
    %rem = and i1 %y, 1
    %cmp = icmp eq i1 %rem, 0
    %sh_prom = zext i1 %rem to i32
    %sub = sub nsw nuw i1 0, %rem
    %sh_prom1 = zext i1 %sub to i32
    %shr = lshr i32 %x, %sh_prom1
    %shl = shl i32 %x, %sh_prom
    %or = or i32 %shl, %shr
    %r = select i1 %cmp, i32 %x, i32 %or
    ret i32 %r
  }
  =>
  define i32 @tgt(i32 %x, i1 %y) {
  %0:
    %t = zext i1 %y to i32
    %r = fshl i32 %x, i32 %x, i32 %t
    ret i32 %r
  }

  Transformation seems to be correct!

https://alive2.llvm.org/ce/z/xgMvE3

http://bugs.llvm.org/PR45701
2020-07-20 16:18:11 -04:00
Florian Hahn f13a59bcff [Matrix] Use TileInfo to create tiled loop nest for matrix multiply.
This patch uses the TileInfo introduced in D77550 to generate a loop
nest for tiled matrix multiplication, instead of generating the
unrolled code for the whole multiplication. This makes code-generation
more scalable for larger matrixes.

Initially loops are only used if both the number of rows and columns are
divisible by the tile size. Other cases will be added as follow-up.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D81308
2020-07-20 21:11:53 +01:00
Eli Friedman b8f765a1e1 [AArch64][SVE] Add support for trunc to <vscale x N x i1>.
This isn't a natively supported operation, so convert it to a
mask+compare.

In addition to the operation itself, fix up some surrounding stuff to
make the testcase work: we need concat_vectors on i1 vectors, we need
legalization of i1 vector truncates, and we need to fix up all the
relevant uses of getVectorNumElements().

Differential Revision: https://reviews.llvm.org/D83811
2020-07-20 13:11:02 -07:00
Hiroshi Yamauchi 9f5d8e8a72 [PGO] Enable the extended value profile buckets for mem op sizes.
Following up D81682 and enable the new, extended value profile buckets for mem
op sizes.

Differential Revision: https://reviews.llvm.org/D83903
2020-07-20 12:05:09 -07:00
Hiroshi Yamauchi e64afefdf8 [PGO][PGSO] Remove a temporary flag used for gradual rollout.
Remove the temporary flag PGSOIRPassOrTestOnly and the guard code which was used
for the staged rollout. This is a cleanup (NFC) as it's now false by default.

Differential Revision: https://reviews.llvm.org/D84057
2020-07-20 11:12:11 -07:00
Mircea Trofin 70f8d0ac8a [llvm] Development-mode InlineAdvisor
Summary:
This is the InlineAdvisor used in 'development' mode. It enables two
scenarios:

 - loading models via a command-line parameter, thus allowing for rapid
 training iteration, where models can be used for the next exploration
 phase without requiring recompiling the compiler. This trades off some
 compilation speed for the added flexibility.

 - collecting training logs, in the form of tensorflow.SequenceExample
 protobufs. We generate these as textual protobufs, which simplifies
 generation and testing. The protobufs may then be readily consumed by a
 tensorflow-based training algorithm.

To speed up training, training logs may also be collected from the
'default' training policy. In that case, this InlineAdvisor does not
use a model.

RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140763.html

Reviewers: jdoerfert, davidxl

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83733
2020-07-20 11:01:56 -07:00
Florian Hahn e1270b16c9 [Matrix] Add TileInfo abstraction for tiled matrix code-gen.
This patch adds a TileInfo abstraction and utilities to
create a 3-level loop nest for tiling.

Reviewers: anemet

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D77550
2020-07-20 18:49:08 +01:00
Yuanfang Chen ca1e69a675 [NFC] remove unused includes of SelectionDAGISel.h 2020-07-20 10:43:29 -07:00
Yuanfang Chen efcb8a1903 [NFC] remove unneeded TargetLoweringObjectFile init after 85c30f3374 2020-07-20 10:43:28 -07:00
Yuanfang Chen 589c646a7e [llc] (almost) remove `--print-machineinstrs`
Its effect could be achieved by
`-stop-after`,`-print-after`,`-print-after-all`. But a few tests need to
print MIR after ISel which could not be done with
`-print-after`/`-stop-after` since isel pass does not have commandline name.
That's the reason `--print-machineinstrs` is downgraded to
`--print-after-isel` in this patch. `--print-after-isel` could be
removed after we switch to new pass manager since isel pass would have a
commandline text name to use `print-after` or equivalent switches.

The motivation of this patch is to reduce tests dependency on
would-be-deprecated feature.

Reviewed By: arsenm, dsanders

Differential Revision: https://reviews.llvm.org/D83275
2020-07-20 10:43:28 -07:00
Matt Arsenault ce76d15a70 AMDGPU: Use MCRegister for preloaded arguments
Attempt to fix build error with ancient GCC
2020-07-20 13:34:28 -04:00
Nick Desaulniers b3031593ea [ThinLTO] parse flags and blockcount summaries
Forked from pr/46523, we were having a hard time running llvm-extract on
IR from a thinLTO build of the Linux kernel.

$ llvm-extract --func jeq_imm jit-42f488b63a04fdaa931315bdadecb6d23e20529a.ll
llvm-extract: jit-42f488b63a04fdaa931315bdadecb6d23e20529a.ll:47463:8:
error: Expected 'gv', 'module', or 'typeid' at the start of summary
entry
^209 = flags: 8
       ^

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D82917
2020-07-20 09:50:22 -07:00
Matt Arsenault 21ef01b7e3 AMDGPU: Remove outdated fixme 2020-07-20 11:41:41 -04:00
Matt Arsenault 84704d989b AMDGPU: Fix not accounting for constantexpr uses of LDS globals
This was failing to add the size of LDS globals that weren't directly
used by an instruction. They could be used by constant expressions
which are transitively used by the function. This requires a better
search, but just abort on this for now for correctness.
2020-07-20 11:41:41 -04:00
Matt Arsenault 61f1f2a204 AMDGPU/GlobalISel: Initial Implementation of calls
Return values, and tail calls are not yet handled.
2020-07-20 11:13:22 -04:00
Matt Arsenault 780cef1f34 Verifier: Check byref address space for AMDGPU calling conventions 2020-07-20 11:13:11 -04:00
Matt Arsenault ad8e900cb3 Verifier: Disallow byval and similar for AMDGPU calling conventions
These imply stack-like semantics, which doesn't make any sense for
entry points.
2020-07-20 10:58:57 -04:00
Alok Kumar Sharma 2d10258a31 [DebugInfo] Support for DW_AT_associated and DW_AT_allocated.
Summary:
This support is needed for the Fortran array variables with pointer/allocatable
attribute. This support enables debugger to identify the status of variable
whether that is currently allocated/associated.

  for pointer array (before allocation/association)
  without DW_AT_associated

(gdb) pt ptr
type = integer (140737345375288:140737354129776)
(gdb) p ptr
value requires 35017956 bytes, which is more than max-value-size

  with DW_AT_associated

(gdb) pt ptr
type = integer (:)
(gdb) p ptr
$1 = <not associated>

  for allocatable array (before allocation)

  without DW_AT_allocated

(gdb) pt arr
type = integer (140737345375288:140737354129776)
(gdb) p arr
value requires 35017956 bytes, which is more than max-value-size

  with DW_AT_allocated

(gdb) pt arr
type = integer, allocatable (:)
(gdb) p arr
$1 = <not allocated>

    Testing
- unit test cases added
- check-llvm
- check-debuginfo

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D83544
2020-07-20 19:54:35 +05:30
Matt Arsenault 5e999cbe8d IR: Define byref parameter attribute
This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.

This includes the base IR changes, and some tests for places where it
should be treated similarly to byval. Codegen support will be in a
future patch.

My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the
argument, although nothing does this in reality. There is also talk of
removing and replacing the byval attribute, so a new attribute would
need to take its place anyway.

This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.

Background:

There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.

It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.

The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.

This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.

Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.

The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.

Additional context is in D79744.
2020-07-20 10:23:09 -04:00
Simon Pilgrim 017e5c949b MCFixup.h - remove unnecessary MCExpr.h include. NFCI.
Move the include down to files that actually depend on MCExpr definitions.

Also exposes an implicit dependency on MCContext in AVRAsmBackend.h
2020-07-20 15:17:19 +01:00
Petar Avramovic 6a1030aa0e AMDGPU/GlobalISel: Legalize s16->s64 G_FPEXT
Legalize using narrowScalar as s16->s32 G_FPEXT
followed by s32->s64 G_FPEXT.

Differential Revision: https://reviews.llvm.org/D84030
2020-07-20 16:12:19 +02:00
Matt Arsenault 100564bdf8 AMDGPU/GlobalISel: Remove outdated comment 2020-07-20 10:06:18 -04:00
Matt Arsenault 5cbd4e415e GlobalISel: Don't handle widenScalar for vector G_INSERT
This handling didn't make any sense for vectors.
2020-07-20 10:06:18 -04:00
Matt Arsenault 93311a9812 AMDGPU/GlobalISel: Fix custom lowering of llvm.trunc.f64 for SI
This was missing an operand from BFE and not erasing the original
instruction.
2020-07-20 10:06:18 -04:00
Matt Arsenault a679f27e98 GlobalISel: Consistently get TII from MIRBuilder 2020-07-20 10:06:18 -04:00
Benjamin Kramer e88b6ed748 [LLE] std::inserter doesn't work with SmallSet, so don't use it. 2020-07-20 15:47:42 +02:00
Benjamin Kramer 44ab60f74d [LoopSimplify] Use SmallPtrSet and range for loops more. NFCI. 2020-07-20 15:00:59 +02:00
Paul Walker 6384ec4099 [SVE] Add lowering for fixed length vector fdiv, fma, fmul and fsub operations.
Differential Revision: https://reviews.llvm.org/D84034
2020-07-20 11:57:34 +00:00
Florian Hahn dc1087d408 [Matrix] Add minimal lowering pass that only requires TTI.
This patch adds a new variant of the matrix lowering pass that only does
a minimal lowering and only depends on TTI. The main purpose of this pass
is to have a pass with minimal dependencies to run as part of the backend
pipeline.

At the moment, the only difference to the regular lowering pass is that it
does not support remarks. But in subsequent patches add support for tiling
to the lowering pass which will require more analysis, which we do not want
to run in the backend, as the lowering should happen in the middle-end in
practice and running it in the backend is mostly for convenience when
running llc.

Reviewers: anemet, Gerolf, efriedma, hfinkel

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D76867
2020-07-20 11:16:11 +01:00
Tim Northover 88464a55b4 AArch64: emit @llvm.debugtrap as `brk #0xf000` on all platforms
It's useful for a debugger to be able to distinguish an @llvm.debugtrap
from a (noreturn) @llvm.trap, so this extends the existing Windows
behaviour to other platforms.
2020-07-20 10:31:26 +01:00
Petar Avramovic ba938f6388 AMDGPU/GlobalISel: Legalize s16->s64 G_FPTOSI/G_FPTOUI
Add narrowScalarFor action.
Add narrow scalar for typeIndex == 0 for G_FPTOSI/G_FPTOUI.
Legalize using narrowScalarFor as s16->s32 G_FPTOSI/G_FPTOUI
followed by s32->s64 G_SEXT/G_ZEXT.

Differential Revision: https://reviews.llvm.org/D84010
2020-07-20 11:06:11 +02:00
Georgii Rymar 2a4df6a325 [llvm-readobj] - Refactor how the code dumps relocations.
There is a strange "feature" of the code: it handles all relocations as `Elf_Rela`.
For handling `Elf_Rel` it converts them to `Elf_Rela` and passes `bool IsRela` to
specify the real type everywhere.

A related issue is that the
`decode_relrs` helper in lib/Object has to return `Expected<std::vector<Elf_Rela>>`
because of that, though it could return a vector of `Elf_Rel`.

I think we should just start using templates for relocation types, it makes the code
cleaner and shorter. This patch does it.

Differential revision: https://reviews.llvm.org/D83871
2020-07-20 12:05:05 +03:00
Roman Lebedev 04b729d076
[NFCI][SimplifyCFG] Guard common code hoisting with a (default-on) flag
Common code sinking is already guarded with a (with default-off!) flag,
so add a flag for hoisting, too.

D84108 will hopefully make hoisting off-by-default too.
2020-07-20 10:29:57 +03:00
Xing GUO c657602f3f [DWARFYAML] Add dependency 'BinaryFormat'. NFC.
This patch is trying to fix build failure.

Failed tests:
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/12574
2020-07-20 13:58:22 +08:00
Lang Hames 0d944e00ea [ORC] Refactor TrampolinePool to reduce virtual function calls.
Virtual function calls are now only made when the pool needs to be
grown to accommodate o new request.
2020-07-19 22:38:41 -07:00
Xing GUO 65c63eb69c [DWARFYAML] Remove 'default' tag. NFC.
This patch is trying to make build bots happy.

Failed bots:
http://lab.llvm.org:8011/builders/ppc64le-lld-multistage-test/builds/10705
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-android/builds/33595
2020-07-20 11:14:50 +08:00
Lang Hames cdd10ca280 [JITLink][MachO] Tidy up debugging output for relocation parsing.
Identify relocations by (section name, offset) pairs, rather than plain
vmaddrs. This makes it easier to cross-reference debugging output for
relocations with output from standard object inspection tools (otool,
readelf, objdump, etc.).
2020-07-19 19:45:50 -07:00
Xing GUO 1ab3d6c819 [DWARFYAML] Implement the .debug_rnglists section.
This patch implements the .debug_rnglists section. We are able to
produce the .debug_rnglists section by the following syntax.

```
debug_rnglists:
  - Format:              DWARF32 ## Optional
    Length:              0x1234  ## Optional
    Version:             5       ## Optional
    AddressSize:         0x08    ## Optional
    SegmentSelectorSize: 0x00    ## Optional
    OffsetEntryCount:    2       ## Optional
    Offsets:             [1, 2]  ## Optional
    Lists:
      - Entries:
          - Operator: DW_RLE_base_address
            Values:   [ 0x1234 ]
```

The generated .debug_rnglists is verified by llvm-dwarfdump, except for
the operator DW_RLE_startx_endx, since llvm-dwarfdump doesn't support
it.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D83624
2020-07-20 10:42:27 +08:00
Juneyoung Lee 30201d3b61 [ValueTracking] Let isGuaranteedNotToBeUndefOrPoison use canCreateUndefOrPoison
This patch adds support more operations.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D83926
2020-07-20 09:21:39 +09:00
Lang Hames f7a571537a [JITLink][MachO] Fix handling of non-extern UNSIGNED pair of SUBTRACTOR relocs.
When processing a MachO SUBTRACTOR/UNSIGNED pair, if the UNSIGNED target
is non-extern then check the r_symbolnum field of the relocation to find
the targeted section and use the section's address to find 'ToSymbol'.

Previously 'ToSymbol' was found by loading the initial value stored at
the fixup location and treating this as an address to search for. This
is incorrect, however: the initial value includes the addend and will
point to the wrong block if the addend is less than zero or greater than
the block size.

rdar://65756694
2020-07-19 10:22:55 -07:00
Jameson Nash 8b354cc8db [ConstantFolding] check applicability of AllOnes constant creation first
The getAllOnesValue can only handle things that are bitcast from a
ConstantInt, while here we bitcast through a pointer, so we may see more
complex objects (like Array or Struct).

Differential Revision: https://reviews.llvm.org/D83870
2020-07-19 13:13:57 -04:00
Juneyoung Lee 0a6aee5160 [ValueTracking] Add canCreateUndefOrPoison & let canCreatePoison use Operator
This patch
- adds `canCreateUndefOrPoison`
- refactors `canCreatePoison` so it can deal with constantexprs

`canCreateUndefOrPoison` will be used at D83926.

Reviewed By: nikic, jdoerfert

Differential Revision: https://reviews.llvm.org/D84007
2020-07-20 01:24:30 +09:00
Wenlei He d41d952be9 Revert "[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks"
This reverts commit 2d6ecfa168.
2020-07-19 08:49:04 -07:00
Wenlei He 2d6ecfa168 [InlineAdvisor] New inliner advisor to replay inlining from optimization remarks
Summary:
This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.

A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.

Subscribers: mgorny, aprantl, hiraditya, llvm-commits

Tags: #llvm

Resubmit for https://reviews.llvm.org/D84086
2020-07-19 08:21:05 -07:00
Sanjay Patel 50afa18772 [x86] split FMA with fast-math-flags to avoid libcall
fma reassoc A, B, C --> fadd (fmul A, B), C (when target has no FMA hardware)

C/C++ code may use explicit fma() calls (which become LLVM fma
intrinsics in IR) but then gets compiled with -ffast-math or similar.
For targets that do not have FMA hardware, we don't want to go out to
the math library for a precise but slow FMA result.

I tried this as a generic DAGCombine, but it caused infinite looping
on more than 1 other target, so there's likely some over-reaching fma
formation happening.

There's also a potential intersection of strict FP with fast-math here.
Deferring to current behavior for that case (assuming that strict-ness
overrides fast-ness).

Differential Revision: https://reviews.llvm.org/D83981
2020-07-19 10:03:55 -04:00
Roman Lebedev 2f3862eb9f
Reland "[InstCombine] Lower infinite combine loop detection thresholds"
This reverts commit 4500db8c59,
which was reverted because lower thresholds exposed a new issue (PR46680).

Now that it was resolved by d12ec0f752,
we can reinstate lower limits and wait for a new bugreport before
reverting this again...
2020-07-19 16:37:03 +03:00
Nikita Popov c6e13667e7 [PredicateInfo] Add a method to interpret predicate as cmp constraint
Both users of predicteinfo (NewGVN and SCCP) are interested in
getting a cmp constraint on the predicated value. They currently
implement separate logic for this. This patch adds a common method
for this in PredicateBase.

This enables a missing bit of PredicateInfo handling in SCCP: Now
the predicate on the condition itself is also used. For switches
it means we know that the switched-on value is the same as the case
value. For assumes/branches we know that the condition is true or
false.

Differential Revision: https://reviews.llvm.org/D83640
2020-07-19 15:34:32 +02:00
Roman Lebedev fb5577d4f8
[NFCI][GVN] Make IsValueFullyAvailableInBlock() readable - use enum class instead of magic numbers
This does not change any logic, it only wraps the magic 0/1/2/3 constants
into an enum class.
2020-07-19 16:33:56 +03:00
Sanjay Patel 7393d7574c [InstSimplify] fold fcmp with infinity constant using isKnownNeverInfinity
This is a step towards trying to remove unnecessary FP compares
with infinity when compiling with -ffinite-math-only or similar.
I'm intentionally not checking FMF on the fcmp itself because
I'm assuming that will go away eventually.
The analysis part of this was added with rGcd481136 for use with
isKnownNeverNaN. Similarly, that could be an enhancement here to
get predicates like 'one' and 'ueq'.

Differential Revision: https://reviews.llvm.org/D84035
2020-07-19 09:24:52 -04:00
Nikita Popov d12ec0f752 [InstCombine] Fix store merge worklist management (PR46680)
Fixes https://bugs.llvm.org/show_bug.cgi?id=46680.

Just like insertions through IRBuilder, InsertNewInstBefore()
should be using the deferred worklist mechanism, so that processing
of newly added instructions is prioritized.

There's one side-effect of the worklist order change which could be
classified as a regression. An add op gets pushed through a select
that at the time is not a umax. We could add a reverse transform
that tries to push adds in the reverse direction to restore a min/max,
but that seems like a sure way of getting infinite loops... Seems
like something that should best wait on min/max intrinsics.

Differential Revision: https://reviews.llvm.org/D84109
2020-07-19 15:05:45 +02:00
David Green 3504acc33e [ARM] Don't mark vctp as having sideeffects
As far as I can tell, it should not be necessary for VCTP to be
unpredictable in tail predicated loops. Either it has a a valid loop
counter as a operand which will naturally keep it in the right loop, or
it doesn't and it won't be converted to a tail predicated loop. Not
marking it as having side effects allows it to be scheduled more cleanly
for cases where it is not expected to become a tail predicate loop.

Differential Revision: https://reviews.llvm.org/D83907
2020-07-19 09:28:09 +01:00
Fangrui Song 2e74b6d80f [llvm-cov gcov] Don't require NUL terminator when reading files
.gcno, .gcda and source files can be modified while we are reading them. If the
concurrent modification of a file being read nullifies the NUL terminator
assumption, llvm-cov can trip over an assertion failure in MemoryBuffer::init.
This is not so rare - the source files can be in an editor and .gcda can be
written by an running process (if the process forks, when .gcda gets written is
probably more unpredictable).

There is no accompanying test because an assertion failure requires data
races with some involved setting.
2020-07-19 00:31:52 -07:00
Kang Zhang d37befdfe5 [PowerPC] Remove the redundant implicit operands in ppc-early-ret pass
Summary:
In the `ppc-early-ret` pass, we have use `BuildMI` and `copyImplicitOps` when the branch instructions can do the early return. But the two functions will add implicit operands twice, this is not correct.

This patch is to remove the redundant implicit operands in `ppc-early-ret pass`.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D76042
2020-07-19 07:01:45 +00:00
Yuanfang Chen 606e756bb1 [NewPM] make parsePassPipeline parse adaptor-wrapped user passes
Currently, when parsing text pipeline, different kinds of passes always
introduce nested pass managers. This makes it impossible to test the
adaptor-wrapped user passes from the text pipeline interface which is needed
by D82344 test cases. This also seems useful in general. See comments above
`parsePassPipeline`.

The syntax would be like mixing passes of different types, but it is
not the same as inferring the correct pass type and then adding the
matching nested pass managers. Strictly speaking, the resulted pipelines
are different.

Reviewed By: asbirlea, aeubanks

Differential Revision: https://reviews.llvm.org/D82698
2020-07-18 22:26:37 -07:00
Fangrui Song 5809a32e7c [gcov] Add __gcov_dump/__gcov_reset and delete __gcov_flush
GCC r187297 (2012-05) introduced `__gcov_dump` and `__gcov_reset`.
  `__gcov_flush = __gcov_dump + __gcov_reset`

The resolution to https://gcc.gnu.org/PR93623 ("No need to dump gcdas when forking" target GCC 11.0) removed the unuseful and undocumented __gcov_flush.

Close PR38064.

Reviewed By: calixte, serge-sans-paille

Differential Revision: https://reviews.llvm.org/D83149
2020-07-18 15:07:46 -07:00
Roman Lebedev 9dceb32f30
[NFC][CVP] processSDiv(): pacify gcc compilers 2020-07-18 19:41:43 +03:00
Fangrui Song 3ab0f53ef3 [DebugInfo] Respect relocations when decoding DW_EH_PE_sdata4 & DW_EH_PE_sdata8 and support R_ARM_REL32
The addresses in llvm-dwarfdump --eh-frame output for object files are closer to readelf -wf output now.
2020-07-18 09:00:50 -07:00
Florian Hahn 4b19cccbb5 [PredicateInfo] Fold PredicateWithCondition into PredicateBase (NFC).
Each concrete instance of a predicate has a condition (also noted in the
original PredicateBase comment) and to me it seems like there is no
clear benefit of having both PredicateBase and PredicateWithCondition
and they can be folded together.

Reviewers: nikic, efriedma

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84089
2020-07-18 16:21:56 +01:00
Roman Lebedev 8d487668d0
[CVP] Soften SDiv into a UDiv as long as we know domains of both of the operands.
Yes, if operands are non-positive this comes at the extra cost
of two extra negations. But  a. division is already just
ridiculously costly, two more subtractions can't hurt much :)
and  b. we have better/more analyzes/folds for an unsigned division,
we could end up narrowing it's bitwidth, converting it to lshr, etc.

This is essentially a take two on 0fdcca07ad,
which didn't fix the potential regression i was seeing,
because ValueTracking's computeKnownBits() doesn't make use
of dominating conditions in it's analysis.
While i could teach it that, this seems like the more general fix.

This big hammer actually does catch said potential regression.

Over vanilla test-suite + RawSpeed + darktable
(10M IR instrs, 1M IR BB, 1M X86 ASM instrs), this fires/converts 5 more
(+2%) SDiv's, the total instruction count at the end of middle-end pipeline
is only +6, so out of +10 extra negations, ~half are folded away,
and asm instr count is only +1, so practically speaking all extra
negations are folded away and are therefore free.
Sadly, all these new UDiv's remained, none folded away.
But there are two less basic blocks.

https://rise4fun.com/Alive/VS6

Name: v0
Pre: C0 >= 0 && C1 >= 0
%r = sdiv i8 C0, C1
  =>
%r = udiv i8 C0, C1

Name: v1
Pre: C0 <= 0 && C1 >= 0
%r = sdiv i8 C0, C1
  =>
%t0 = udiv i8 -C0, C1
%r = sub i8 0, %t0

Name: v2
Pre: C0 >= 0 && C1 <= 0
%r = sdiv i8 C0, C1
  =>
%t0 = udiv i8 C0, -C1
%r = sub i8 0, %t0

Name: v3
Pre: C0 <= 0 && C1 <= 0
%r = sdiv i8 C0, C1
  =>
%r = udiv i8 -C0, -C1
2020-07-18 17:59:56 +03:00
Roman Lebedev 45b7388824
[NFC][CVP] Rename predicates - s/positive/non negative/ to better note that zero is ok 2020-07-18 17:59:32 +03:00
Roman Lebedev 2cde6984d8
[NFC][CVP] Refactor isPositive() out of hasPositiveOperands() 2020-07-18 17:59:32 +03:00
Evgeny Leviant 24089928be [CodeGen][TargetPassConfig] Add TargetTransformInfo pass correctly
Patch adds tti pass directly enforcing its execution with correctly set
TargetTransformInfo.

Differential revision: https://reviews.llvm.org/D84047
2020-07-18 14:11:40 +03:00
Fangrui Song 3073a3aa1e [RelocationResolver] Support R_AARCH64_PREL32
Code from D83800 by Yichao Yu
2020-07-17 23:49:15 -07:00
Fangrui Song b922004ea2 [RelocationResolver] Support R_PPC_REL32 & R_PPC64_REL{32,64}
This suppresses `failed to compute relocation: R_PPC_REL32, Invalid data was encountered while parsing the file`
and its 64-bit variants when running llvm-dwarfdump on a PowerPC object file with .eh_frame

Unfortunately it is difficult to test the computation:
DWARFDataExtractor::getEncodedPointer does not use the relocated value
and even if it does, we need to teach llvm-dwarfdump --eh-frame to do
some linker job to report a reasonable address.
2020-07-17 23:29:50 -07:00
Gui Andrade 951584db4f Revert "update libatomic instrumentation"
This was committed mistakenly.

This reverts commit 1f29171ae7.
2020-07-18 03:53:00 +00:00
Gui Andrade 1f29171ae7 update libatomic instrumentation 2020-07-18 03:39:21 +00:00
Gui Andrade c42509413f [LLVM] Add libatomic load/store functions to TargetLibraryInfo
This allows treating these functions like libcalls.
This patch is a prerequisite to instrumenting them in MSAN: https://reviews.llvm.org/D83337

Differential Revision: https://reviews.llvm.org/D83361
2020-07-18 03:18:48 +00:00
Chen Zheng 6d247f980d [SCEV][IndVarSimplify] insert point should not be block front.
Recommit after removing the unused cast instructions.

Differential Revision:  https://reviews.llvm.org/D80975
2020-07-17 22:25:10 -04:00
Kuba Mracek 176a6e7abe [asan] Use dynamic shadow memory position on Apple Silicon macOS
This is needed because macOS on Apple Silicon has some reserved pages inside the "regular" shadow memory location, and mapping over that location fails.

Differential Revision: https://reviews.llvm.org/D82912
2020-07-17 17:40:21 -07:00
Logan Smith 3ee7fe4cfd [llvm][NFC] Add missing 'override's 2020-07-17 17:35:59 -07:00
Arthur Eubanks 0dfa4a83fa Revert "[PGO][PGSO] Add profile guided size optimization to loop vectorization legality."
This reverts commit 30c382a7c6.

See https://crbug.com/1106813.
2020-07-17 16:47:41 -07:00
Aditya Nandakumar 63c081e73d [GISel: Add support for CSEing SrcOps which are immediates
https://reviews.llvm.org/D84072

Add G_EXTRACT to CSEConfigFull and add unit test as well.
2020-07-17 16:04:24 -07:00
Leonard Chan cf5df40c4c Revert "[AddressSanitizer] Don't use weak linkage for __{start,stop}_asan_globals"
This reverts commit d76e62fdb7.

Reverting since this can lead to linker errors:

```
ld.lld: error: undefined hidden symbol: __start_asan_globals
```

when using --gc-sections. The linker can discard __start_asan_globals
once there are no more `asan_globals` sections left, which can lead to
this error if we have external linkages to them.
2020-07-17 15:29:50 -07:00
Eric Christopher ae08dbc673 Temporarily Revert "[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks"
as it is failing the inline-replay.ll test as well as sanitizers/Werror
from returning a stack local variable.

This reverts commit 029946b112.
2020-07-17 14:58:01 -07:00
Wenlei He 029946b112 [InlineAdvisor] New inliner advisor to replay inlining from optimization remarks
Summary:
This change added a new inline advisor that takes optimization remarks for previous inlining as input, and provide the decision as advice so current inlining can replay inline decision of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites. The change can be useful for Inliner tuning.
A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inliner advisor with SampleProfileLoader's inline decision for replay. The new inline advisor can also be used by regular CGSCC inliner later if needed.

Reviewers: davidxl, mtrofin, wmi, hoy

Subscribers: aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83743
2020-07-17 13:30:47 -07:00
Roman Lebedev 0fdcca07ad
[InstCombine] Fold X sdiv (-1 << C) -> -(X u>> Y) iff X is non-negative
This is the one i'm seeing as missed optimization,
although there are likely other possibilities, as usual.

There are 4 variants of a general sdiv->udiv fold:
https://rise4fun.com/Alive/VS6

Name: v0
Pre: C0 >= 0 && C1 >= 0
%r = sdiv i8 C0, C1
  =>
%r = udiv i8 C0, C1

Name: v1
Pre: C0 <= 0 && C1 >= 0
%r = sdiv i8 C0, C1
  =>
%t0 = udiv i8 -C0, C1
%r = sub i8 0, %t0

Name: v2
Pre: C0 >= 0 && C1 <= 0
%r = sdiv i8 C0, C1
  =>
%t0 = udiv i8 C0, -C1
%r = sub i8 0, %t0

Name: v3
Pre: C0 <= 0 && C1 <= 0
%r = sdiv i8 C0, C1
  =>
%r = udiv i8 -C0, -C1


If we really don't like sdiv (more than udiv that is),
and are okay with increasing instruction count (2 new negations),
and we ensure that we don't undo the fold,
then we could just implement these..
2020-07-17 22:50:09 +03:00
Stanislav Mekhanoshin efb5040262 Fixed warning about signed/unsigned comparison
I've got the report clang11 issues signed/unsigned mismatch
warning here. For some reason only clang11 seems to issue
this warning.

Differential Revision: https://reviews.llvm.org/D83916
2020-07-17 11:03:42 -07:00
Dmitry Preobrazhensky 2e87acac9b [AMDGPU] Removed s_mov_regrd and mov_fed opcodes
These opcodes are not intended for public use.

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D81659
2020-07-17 19:52:54 +03:00
Yonghong Song 0e347c0ff0 BPF: generate .rodata BTF datasec for certain initialized local var's
Currently, BTF datasec type for .rodata is generated only if there are
user-defined readonly global variables which have debuginfo generated.

Certain readonly global variables may be generated from initialized
local variables. For example,
  void foo(const void *);
  int test() {
    const struct {
      unsigned a[4];
      char b;
    } val = { .a = {2, 3, 4, 5}, .b = 6 };
    foo(&val);
    return 0;
  }

The clang will create a private linkage const global to store
the initialized value:
  @__const.test.val = private unnamed_addr constant %struct.anon
      { [4 x i32] [i32 2, i32 3, i32 4, i32 5], i8 6 }, align 4

This global variable eventually is put in .rodata ELF section.

If there is .rodata ELF section, libbpf expects a BTF .rodata
datasec as well even though it may be empty meaning there are no
global readonly variables with proper debuginfo. Martin reported
a bug where without this empty BTF .rodata datasec, the bpftool
gen will exit with an error.

This patch fixed the issue by generating .rodata BTF datasec
if there exists local var intial data which will result in
.rodata ELF section.

Differential Revision: https://reviews.llvm.org/D84002
2020-07-17 09:45:57 -07:00
Fangrui Song f8a29b174a [OptTable] Support grouped short options
POSIX.1-2017 12.2 Utility Syntax Guidelines, Guideline 5 says:

> One or more options without option-arguments, followed by at most one option that takes an option-argument, should be accepted when grouped behind one '-' delimiter.

i.e. -abc represents -a -b -c. The grouped short options are very common.  Many
utilities extend the syntax by allowing (an option with an argument) following a
sequence of short options.

This patch adds the support to OptTable, similar to cl::Group for CommandLine
(D58711).  llvm-symbolizer will use the feature (D83530). CommandLine is exotic
in some aspects. OptTable is preferred if the user wants to get rid of the
behaviors.

* `cl::opt<bool> i(...)` can be disabled via -i=false or -i=0, which is
  different from conventional --no-i.
* Handling --foo & --no-foo requires a comparison of argument positions,
  which is a bit clumsy in user code.

OptTable::parseOneArg (non-const reference InputArgList) is added along with
ParseOneArg (const ArgList &). The duplicate does not look great at first
glance. However, The implementation can be simpler if ArgList is mutable.
(ParseOneArg is used by clang-cl (FlagsToInclude/FlagsToExclude) and lld COFF
(case-insensitive). Adding grouped short options can make the function even more
complex.)

The implementation allows a long option following a group of short options. We
probably should refine the code to disallow this in the future. Allowing this
seems benign for now.

Reviewed By: grimar, jhenderson

Differential Revision: https://reviews.llvm.org/D83639
2020-07-17 09:32:43 -07:00
Nikita Popov f7dce88915 [IR] Fix MSVC warning (NFC)
As requested by Andrew Kaylor, rewrite this code in a way that does
not warn on old MSVC versions.

Avoid the buggy constexpr warning by just not using constexpr and
removing the static_assert that depends on it.
2020-07-17 18:27:39 +02:00
Matt Arsenault 994fb86bc2 AMDGPU: Fix promoting f16 fpowi with legal f16 2020-07-17 11:29:05 -04:00
Florian Hahn 31d71c69f1 [Matrix] Only run matrix lowering early with -O0.
Currently matrix lowering is run twice if OptLevel > 0. Fix that and
also add a test for OptLevel > 0 with matrix lowering enabled.
2020-07-17 15:53:16 +01:00
David Tenty 8dea7f3202 [z/OS][AIX] Move lambda definition to fix build problem
This is a follow on change to eed19bd8 and contains a fix for a build
failure that occurs on both z/OS and AIX as a result of this commit:

https://reviews.llvm.org/rG670915094462d831e3733e5b01a76471b8cf6dd8.
2020-07-17 10:08:01 -04:00
Sidharth Baveja 11e879d4f1 [Loop Simplify] Resolve an issue where metadata is not applied to a loop latch.
Summary:
This patch resolves an issue where the metadata of a loop is not added to the
new loop latch, and not removed from the old loop latch. This issue occurs in
the SplitBlockPredecessors function, which  adds a new block in a loop, and
in the case that the block passed into this function is the header of the loop,
the loop can be modified such that the latch of the loop is replaced.
This patch applies to the Loop Simplify pass since it ensures that each loop
has exit blocks which only have predecessors that are inside of the loop. In
the case that this is not true, the pass will create a new exit block for the
loop. This guarantees that the loop preheader/header will dominate the exit blocks.

Author: sidbav (Sidharth Baveja)

Reviewers: asbirlea (Alina Sbirlea), chandlerc (Chandler Carruth), Whitney (Whitney Tsang), bmahjour (Bardia Mahjour)

Reviewed By:  asbirlea (Alina Sbirlea)

Subscribers: hiraditya (Aditya Kumar), llvm-commits

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D83869
2020-07-17 14:02:14 +00:00
Eric Astor 47a3b85a97 [ms] [llvm-ml] Remove unused function
Summary: Remove unused function

Reviewed By: lbenes

Differential Revision: https://reviews.llvm.org/D83898
2020-07-17 09:06:37 -04:00
Anna Welker 23c9534515 [LV] Enable the LoopVectorizer to create pointer inductions
This patch enables the LoopVectorizer to build a phi of pointer
type and provide the vector loads and stores with vector type
getelementptrs built from the pointer induction variable, which
produces much less instructions than the previous approach of
creating scalar getelementpointers and glue them together to a
vector.

Differential Revision: https://reviews.llvm.org/D81267
2020-07-17 13:35:07 +01:00
Georgii Rymar 6227f04a09 [llvm-readobj] - Add proper testing for the SHT_MIPS_ABIFLAGS section.
This rewrites the mips-abiflags.test to stop using recompiled objects,
adds testing for all missed bits and also adds two missing enum values
to lib/ObjectYAML, which are used in the new test.

Differential revision: https://reviews.llvm.org/D83954
2020-07-17 15:24:39 +03:00
Benjamin Kramer 9a0689e072 Make helpers static. NFC. 2020-07-17 13:49:11 +02:00
Sam Tebbs 6c348e4067 [HWLoops] Stop converting to a while loop when it would be unsafe to
There were cases where a do-while loop would be converted to a while
loop before finding out that it would be unsafe to expand the SCEV in
this situation and then bailing out of hardware loop conversion.

This patch checks if it would be unsafe to expand the SCEV and if so stops converting the do-while into a while, allowing conversion to a hardware loop.

Differential Revision: https://reviews.llvm.org/D83953
2020-07-17 11:47:08 +01:00
Jay Foad 760af7a074 [AMDGPU] Avoid splitting FLAT offsets in unsafe ways
As explained in the comment:

// For a FLAT instruction the hardware decides whether to access
// global/scratch/shared memory based on the high bits of vaddr,
// ignoring the offset field, so we have to ensure that when we add
// remainder to vaddr it still points into the same underlying object.
// The easiest way to do that is to make sure that we split the offset
// into two pieces that are both >= 0 or both <= 0.

In particular FLAT (as opposed to SCRATCH and GLOBAL) instructions have
an unsigned immediate offset field, so we can't use it to help split a
negative offset.

Differential Revision: https://reviews.llvm.org/D83394
2020-07-17 11:44:10 +01:00
Jay Foad 62fd7f767c [MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics
tryLatency compares two sched candidates. For the top zone it prefers
the one with lesser depth, but only if that depth is greater than the
total latency of the instructions we've already scheduled -- otherwise
its latency would be hidden and there would be no stall.

Unfortunately it only tests the depth of one of the candidates. This can
lead to situations where the TopDepthReduce heuristic does not kick in,
but a lower priority heuristic chooses the other candidate, whose depth
*is* greater than the already scheduled latency, which causes a stall.

The fix is to apply the heuristic if the depth of *either* candidate is
greater than the already scheduled latency.

All this also applies to the BotHeightReduce heuristic in the bottom
zone.

Differential Revision: https://reviews.llvm.org/D72392
2020-07-17 11:02:13 +01:00
Florian Hahn e297006d6f [ScheduleDAG] Move DBG_VALUEs after first term forward.
MBBs are not allowed to have non-terminator instructions after the first
terminator. Currently in some cases (see the modified test),
EmitSchedule can add DBG_VALUEs after the last terminator, for example
when referring a debug value that gets folded into a TCRETURN
instruction on ARM.

This patch updates EmitSchedule to move inserted DBG_VALUEs just before
the first terminator. I am not sure if there are terminators produce
values that can in turn be used by a DBG_VALUE. In that case, moving the
DBG_VALUE might result in referencing an undefined register. But in any
case, it seems like currently there is no way to insert a proper DBG_VALUEs
for such registers anyways.

Alternatively it might make sense to just remove those extra DBG_VALUES.

I am not too familiar with the details of debug info in the backend and
would appreciate any suggestions on how to address the issue in the best
possible way.

Reviewers: vsk, aprantl, jpaquette, efriedma, paquette

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D83561
2020-07-17 10:27:43 +01:00
Marco Elver 785d41a261 [TSan] Add option for emitting compound read-write instrumentation
This adds option -tsan-compound-read-before-write to emit different
instrumentation for the write if the read before that write is omitted
from instrumentation. The default TSan runtime currently does not
support the different instrumentation, and the option is disabled by
default.

Alternative runtimes, such as the Kernel Concurrency Sanitizer (KCSAN)
can make use of the feature. Indeed, the initial motivation is for use
in KCSAN as it was determined that due to the Linux kernel having a
large number of unaddressed data races, it makes sense to improve
performance and reporting by distinguishing compounded operations. E.g.
the compounded instrumentation is typically emitted for compound
operations such as ++, +=, |=, etc. By emitting different reports, such
data races can easily be noticed, and also automatically bucketed
differently by CI systems.

Reviewed By: dvyukov, glider

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83867
2020-07-17 10:24:20 +02:00
Simon Wallis 3e0ccf9a90 [ARM] halfword store hits llvm_unreachable with big-endian
Summary:
[ARM] halfword store hits llvm_unreachable with big-endian

Provide missing case in getFixupKindContainerSizeBytes().

This stops execution reaching llvm_unreachable("Unknown fixup kind!")

D83947

Reviewers: olista01, ostannard

Reviewed By: ostannard

Subscribers: ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83947

Change-Id: I598aa1fb51fd1c6f424c557c85d6df6d1958bc62
2020-07-17 08:56:44 +01:00
Max Kazantsev c989881078 [InstCombine] Fix replace select with Phis when branch has the same labels
```
define i32 @test(i1 %cond) {
entry:
  br i1 %cond, label %exit, label %exit
exit:
  %result = select i1 %cond, i32 123, i32 456
  ret i32 %result
}
```
In this test, after applying transformation of replacing select with Phis,
the result will be:

```
define i32 @test(i1 %cond) {
entry:
  br i1 %cond, label %exit, label %exit
exit:
  %result = i32 phi [123, %exit], [123, %exit]
  ret i32 %result
}
```
That is, select is transformed into an invalid Phi, which will then be
reduced to 123 and the second value will be lost. But it is worth
noting that this problem will arise only if select is in the InstCombine
worklist will be before the branch. Otherwise, InstCombine will replace
the branch condition with false and transformation will not be applied.

The fix is to check the target labels in the branch condition for equality.

Patch By: Kirill Polushin
Differential Revision: https://reviews.llvm.org/D84003
Reviewed By: mkazantsev
2020-07-17 14:04:58 +07:00
hsmahesha 4905536086 Revert "[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size"
This reverts commit cc9d693856.
2020-07-17 12:20:37 +05:30
Igor Kudrin f76a0cd97a [DebugInfo] Fix a misleading usage of DWARF forms with DIEExpr. NFCI.
For now, DIEExpr is used only in two places:

 1) in the debug info library unit test suite to emit
    a DW_AT_str_offsets_base attribute with the DW_FORM_sec_offset
    form, see dwarfgen::DIE::addStrOffsetsBaseAttribute();

 2) in DwarfCompileUnit::addLocationAttribute() to generate the location
    attribute for a TLS variable.

The later case used an incorrect DWARF form of DW_FORM_udata, which
implies storing an uleb128 value, not a 4/8 byte constant. The generated
result was as expected because DIEExpr::SizeOf() did not handle the used
form, but returned the size of the code pointer by default.

The patch fixes the issue by using more appropriate DWARF forms for
the problematic case and making DIEExpr::SizeOf() more straightforward.

Differential Revision: https://reviews.llvm.org/D83958
2020-07-17 13:49:27 +07:00
Craig Topper 6bba95831e [X86] Change the scheduler model for 'pentium4' to SandyBridgeModel.
I meant to do this in D83913, but missed it while updating the
feature list.

Interestingly I think this is disabling the postRA scheduler. But
it does match our default 64-bit behavior.

Reviewed By: echristo

Differential Revision: https://reviews.llvm.org/D83996
2020-07-16 22:04:29 -07:00
Craig Topper addbf732c8 [X86] Reorder how the subtarget map key is created.
We use a SmallString<512> and attempted to reserve enough space
for CPU plus Features, but that doesn't account for all the things
that get added to the string.

Reorder the string so the shortest things go first which shouldn't
exceed the small size. Finally add the feature string at the end
which might be long. This should ensure at most one heap allocation
without needing to use reserve.

I don't know if this matters much in practice, but I was looking
into something else that will require more code here and noticed
the odd reserve call.
2020-07-16 21:41:45 -07:00
Juneyoung Lee 582901d0b5 [ValueTracking] Let isGuaranteedNotToBeUndefOrPoison consider noundef
This patch adds support for noundef arguments.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D83752
2020-07-17 12:53:08 +09:00
Xing GUO dc65f57124 [DWARFYAML] Merge forms that use same encodings. NFC. 2020-07-17 11:31:49 +08:00
Carl Ritson 3a18665748 [AMDGPU] Translate s_and/s_andn2 to s_mov in vcc optimisation
When SCC is dead, but VCC is required then replace s_and / s_andn2
with s_mov into VCC when mask value is 0 or -1.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D83850
2020-07-17 11:48:57 +09:00
Lang Hames 10056238ac [ORC] Switch from initializer lists to named arguments to work around MSVC.
MSVC doesn't like some of the initializer list uses in 0e940d55f8.
Switch to named arguments to work around this.
2020-07-16 15:58:31 -07:00
Lang Hames b0bc77380d [ORC] Add more explicit casts to fix a narrowing conversion errors. 2020-07-16 15:37:18 -07:00
Lang Hames 121302ac62 [ORC] Add explicit cast to fix a narrowing conversion error. 2020-07-16 15:33:02 -07:00
Albion Fung c273563552 [PowerPC][Power10] Add 128-bit Binary Integer Operation instruction definitions and MC Tests
This patch adds the instruction definitions and MC tests for the 128-bit Binary
Integer Operation instructions introduced in Power10.

Differential Revision: https://reviews.llvm.org/D83516
2020-07-16 17:16:43 -05:00
Lang Hames 0e940d55f8 [ORC] Add TargetProcessControl and TPCIndirectionUtils APIs.
TargetProcessControl is a new API for communicating with JIT target processes.
It supports memory allocation and access, and inspection of some process
properties, e.g. the target proces triple and page size.

Centralizing these APIs allows utilities written against TargetProcessControl
to remain independent of the communication procotol with the target process
(which may be direct memory access/allocation for in-process JITing, or may
involve some form of IPC or RPC).

An initial set of TargetProcessControl-based utilities for lazy compilation is
provided by the TPCIndirectionUtils class.

An initial implementation of TargetProcessControl for in-process JITing
is provided by the SelfTargetProcessControl class.

An example program showing how the APIs can be used is provided in
llvm/examples/OrcV2Examples/LLJITWithTargetProcessControl.
2020-07-16 15:09:13 -07:00
Jon Roelofs a0537fc35f [SimplifyCFG] Fix crash in the EXPENSIVE_CHECKS build
SimplifyCFG was incorrectly reporting to the pass manager that it had not made
changes after folding away a PHI.  This is detected in the EXPENSIVE_CHECKS
build when the function's hash changes.

Differential Revision: https://reviews.llvm.org/D83985
2020-07-16 15:34:41 -06:00
Wouter van Oortmerssen cc1b9b680f [WebAssembly] 64-bit (function) pointer fixes.
Accounting for the fact that Wasm function indices are 32-bit, but in wasm64 we want uniform 64-bit pointers.
Includes reloc types for 64-bit table indices.

Differential Revision: https://reviews.llvm.org/D83729
2020-07-16 14:10:22 -07:00
Craig Topper 5408024fa8 [X86] Move integer hadd/hsub formation into a helper function shared by combineAdd and combineSub.
There was a lot of duplicate code here for checking the VT and
subtarget. Moving it into a helper avoids that.

It also fixes a bug that combineAdd reused Op0/Op1 after a call
to isHorizontalBinOp may have changed it. The new helper function
has its own local version of Op0/Op1 that aren't shared by other
code.

Fixes PR46455.

Reviewed By: spatel, bkramer

Differential Revision: https://reviews.llvm.org/D83971
2020-07-16 13:27:27 -07:00
Denis Antrushin e04fe9aefd [Statepoint] Fix bug found by sanitaizer.
Statepoint has no static operands, so it cannot be verified
against MCInstrDescr. Revert NumDefs change introduced by ef658ebd62.
2020-07-16 23:06:53 +03:00
Craig Topper ad171d24b9 [X86] Change the tuning settings for pentium4 to be more modern since its the default 32-bit cpu in clang
Alternative to D83897. I believe the big change here is that I removed slow unaligned memory 16

Down side that it may adversely effect tuning if someone explicitly targets -march=pentium4 and expects pentium4 tuned code. Of course pentium4 is so old our default behavior with the previous settings may not have been the best either.

Reviewed By: echristo, RKSimon

Differential Revision: https://reviews.llvm.org/D83913
2020-07-16 12:51:25 -07:00
Mircea Trofin 9870f77441 [llvm] Moved InlineSizeEstimatorAnalysis test to .ll
Summary:
Following guidance in
https://llvm.org/docs/TestingGuide.html#testing-analysis

Reviewers: mehdi_amini

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83918
2020-07-16 12:25:16 -07:00
Wouter van Oortmerssen 29f8c9f6c2 [WebAssembly] Triple::wasm64 related cleanup
Differential Revision: https://reviews.llvm.org/D83713
2020-07-16 12:01:10 -07:00
Eric Christopher 7bfaa40086 Temporarily Revert "[AssumeBundles] Use operand bundles to encode alignment assumptions"
due to the performance bugs filed in https://bugs.llvm.org/show_bug.cgi?id=46753.

An SROA change soon may obviate some of these problems.

This reverts commit 8d09f20798.
2020-07-16 11:54:04 -07:00
Nadav Rotem 8f0a8ed44e [InjectTLIMappings] Use StringRef instead of std::string for FN name.
https://reviews.llvm.org/D83797
2020-07-16 11:53:04 -07:00
Zakk Chen 294d1eae75 [RISCV] Add support for -mcpu option.
Summary:
1. gcc uses `-march` and `-mtune` flag to chose arch and
pipeline model, but clang does not have `-mtune` flag,
we uses `-mcpu` to chose both infos.
2. Add SiFive e31 and u54 cpu which have default march
and pipeline model.
3. Specific `-mcpu` with rocket-rv[32|64] would select
pipeline model only, and use the driver's arch choosing
logic to get default arch.

Reviewers: lenary, asb, evandro, HsiangKai

Reviewed By: lenary, asb, evandro

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D71124
2020-07-16 11:46:22 -07:00
Nadav Rotem a394aa1b97 [LiveVariables] Replace std::vector with SmallVector.
Replace std::vector with SmallVector to reduce the number of mallocs.
This method is frequently executed, and the number of elements in the
vector is typically small.

https://reviews.llvm.org/D83920
2020-07-16 11:39:54 -07:00
Thomas Lively ecb2e5bcd7 [WebAssembly] Implement v128.select
Although the SIMD spec proposal does not specifically include a
select instruction, the select instruction in MVP WebAssembly is
polymorphic over the selected types, so it is able to work on v128
values when they are enabled. This patch introduces a new variant of
the select instruction for each legal vector type. Additional ISel
patterns are adapted from the SELECT_I32 and SELECT_I64 patterns.

Depends on D83736.

Differential Revision: https://reviews.llvm.org/D83737
2020-07-16 11:37:25 -07:00
Matt Arsenault 1912ace968 AMDGPU: Move handling of AGPR copies to a separate function
This is in preparation for fixing multiple problems with the way AGPR
copies are handled, but this change is NFC itself. First, it's relying
on recursively calling copyPhysReg, which is losing information
necessary to get correct super register handling.

Second, it's constructing a new RegScavenger and doing a O(N^2) walk
on every single sub-spill for every AGPR tuple copy. Third, it's using
the forward form of the scavenger, and not using the preferred
backwards scan.
2020-07-16 14:32:24 -04:00
Arthur Eubanks 9adbb5cb3a [SCEV] Fix ScalarEvolution tests under NPM
Many tests use opt's -analyze feature, which does not translate well to
NPM and has better alternatives. The alternative here is to explicitly
add a pass that calls ScalarEvolution::print().

The legacy pass manager RUNs aren't changing, but they are now pinned to
the legacy pass manager.  For each legacy pass manager RUN, I added a
corresponding NPM RUN using the 'print<scalar-evolution>' pass. For
compatibility with update_analyze_test_checks.py and existing test
CHECKs, 'print<scalar-evolution>' now prints what -analyze prints per
function.

This was generated by the following Python script and failures were
manually fixed up:

import sys
for i in sys.argv:
    with open(i, 'r') as f:
        s = f.read()
    with open(i, 'w') as f:
        for l in s.splitlines():
            if "RUN:" in l and ' -analyze ' in l and '\\' not in l:
                f.write(l.replace(' -analyze ', ' -analyze -enable-new-pm=0 '))
                f.write('\n')
                f.write(l.replace(' -analyze ', ' -disable-output ').replace(' -scalar-evolution ', ' "-passes=print<scalar-evolution>" ').replace(" | ", " 2>&1 | "))
                f.write('\n')
            else:
                f.write(l)

There are a couple failures still in ScalarEvolution under NPM, but
those are due to other unrelated naming conflicts.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D83798
2020-07-16 11:24:07 -07:00
Thomas Lively f0f9787646 [WebAssembly] Lower vselect to v128.bitselect
We were previously expanding vselect and matching on the expansion to
generate bitselects, but in some cases the expansion would be further
combined and a bitselect would not get generated. This patch improves
codegen in those cases by legalizing vselect and lowering it to
v128.bitselect. The old pattern that matches the expansion is still
useful for lowering IR that already uses the expansion rather than a
select operation.

Differential Revision: https://reviews.llvm.org/D83734
2020-07-16 11:11:19 -07:00
Craig Topper 9adf7461f7 [X86] Add test case for PR46455. 2020-07-16 11:06:55 -07:00
Matt Arsenault 9d3e56e2ee DAG: Try scalarizing when expanding saturating add/sub
In an upcoming AMDGPU patch, the scalar cases will be legal and vector
ops should be scalarized, rather than producing a long sequence of
vector ops which will also need to be scalarized.

Use a lazy heuristic that seems to work and improves the thumb2 MVE
test.
2020-07-16 14:05:16 -04:00
Denis Antrushin ef658ebd62 MIR Statepoint refactoring. Part 1: Basic MI level changes.
Basic support for variadic-def MIR Statepoint:
- Change TableGen STATEPOINT description to variadic out list
  (For self-documentation purpose; by itself it does not affect
  code generation in any way).
- Update StatepointOpers helper class to handle variadic defs.
- Update MachineVerifier to properly handle them, too.

With this change, new Statepoint instruction can be passed through
backend (excluding ISEL) without errors.

Full change set is available at D81603.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D81645
2020-07-17 00:57:21 +07:00
Matt Arsenault d909764cc7 Use findEnumAttribute helper for preallocated 2020-07-16 13:50:49 -04:00
Matt Arsenault 023883a834 IR: Rename Argument::hasPassPointeeByValueAttr to prepare for byref
When the byref attribute is added, there will need to be two similar
functions for the existing cases which have an associate value copy,
and byref which does not. Most, but not all of the existing uses will
use the existing version.

The associated size function added by D82679 also needs to
contextually differ, and will help eliminate a few places still
relying on pointee element types.
2020-07-16 13:50:49 -04:00
Matt Arsenault 0347039a6e ValueTracking: Fix isKnownNonZero for non-0 null pointers for byval
The IR doesn't have a proper concept of invalid pointers, and "null"
constants are just all zeros (though it really needs one).

I think it's not possible to break this for AMDGPU due to the copy
semantics of byval. If you have an original stack object at 0, the
byval copy will be placed above it so I don't think it's really
possible to hit a 0 address.
2020-07-16 13:50:49 -04:00