Commit Graph

151087 Commits

Author SHA1 Message Date
Jay Foad 128a49727a [AMDGPU] Fix upcoming TableGen warnings on unused template arguments. NFC.
The warning is implemented by D109359 which is still in review.

Differential Revision: https://reviews.llvm.org/D109826
2021-09-16 09:07:18 +01:00
Sam Parker c98a8a09b5 [HardwareLoops] Loop guard intrinsic to recognise zext
If a loop count was initially represented by a 32b unsigned int in C
then the hardware-loop pass can recognise the loop guard and insert
the llvm.test.set.loop.iterations intrinsic. If this was instead a
unsigned short/char then clang inserts a zext instruction to expand
the loop count to an i32. This patch adds the necessary pattern
matching to enable the use of lvm.test.set.loop.iterations in those
cases.

Patch by: sherwin-dc

Differential Revision: https://reviews.llvm.org/D109631
2021-09-16 08:33:16 +01:00
Alok Kumar Sharma a5b72abc9e [DebugInfo] Enhance DIImportedEntity to accept children entities
New field `elements` is added to '!DIImportedEntity', representing
list of aliased entities.
This is needed to dump optimized debugging information where all names
in a module are imported, but a few names are imported with overriding
aliases.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D109343
2021-09-16 10:41:55 +05:30
Kazu Hirata 24c8eaec94 [Transforms] Use make_early_inc_range (NFC) 2021-09-15 19:55:24 -07:00
Jessica Paquette c8b3d7d6d6 [AArch64][GlobalISel] Ensure atomic loads always get assigned GPR destinations
The default register bank selection code for G_LOAD assumes that we ought to
use a FPR when the load is casted to a float/double.

For atomics, this isn't true; we should always use GPRs.

Without this patch, we crash in the following example:

https://godbolt.org/z/MThjas441

Also make the code a little more stylistically consistent while we're here.

Also test some other weird cast combinations as well.

Differential Revision: https://reviews.llvm.org/D109771
2021-09-15 17:05:09 -07:00
Ahmed Bougacha e159d3cbfc [AArch64][GlobalISel] Use MI::getIntrinsicID in more spots. NFC.
There's technically a difference in the logic used by these
findIntrinsicID and MachineInstr::getIntrinsicID, but it shouldn't
be a meaningful difference here, with G_INTRINSIC instructions.
getIntrinsicID's "first non-def" logic should be correct for those.
2021-09-15 16:45:34 -07:00
Ahmed Bougacha 94a2f9cdb6 [GlobalISel] Fix CombinerHelper::isPredecessor for same def/use MI.
The doc comment for isPredecessor says:
  Returns true if \p DefMI precedes \p UseMI or they are the same
  instruction.
And dominates relies on that behavior for its own:
  Returns true if \p DefMI dominates \p UseMI. By definition an
  instruction dominates itself.

Make both statements correct by fixing isPredecessor.
Found by inspection.
2021-09-15 16:45:27 -07:00
Arthur Eubanks c3ddc13d7d [NFC] Split up PassBuilder.cpp
PassBuilder.cpp is the slowest file to compile in LLVM.
When trying to test changes to pipelines, it takes a long time to recompile.

This doesn't actually speedup building PassBuilder.cpp itself since most
of the time is spent in other large/duplicated functions caused by
PassRegistry.def.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109798
2021-09-15 15:30:39 -07:00
Owen Anderson 68079ef0eb Teach SimplifyCFG to fold switches into lookup tables in more cases.
In particular, it couldn't handle cases where lookup table constant
expressions involved bitcasts. This does not seem to come up
frequently in C++, but comes up reasonably often in Rust via
`#[derive(Debug)]`.

Originally reported by pcwalton.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109565
2021-09-15 22:07:08 +00:00
Anna Thomas f9e4aebe4a Revert "[InstCombine] Improve TryToSinkInstruction with multiple uses"
This reverts commit 4ac4e52189.
There are couple of test failures, which needs update of the test cases.

Doing a clean revert and will recommit the change along with fixed
testcases.
2021-09-15 18:03:11 -04:00
David Blaikie 065bb08bb8 NFC: DWARFTypePrinter: Remove "type" from member function names to reduce redundancy 2021-09-15 14:46:28 -07:00
Anna Thomas b6cb03e6b9 Revert use of getUniqueUndroppableUser in AssumeBundleBuilder
Fix build bot failure in rG4ac4e521 caused due to assumeBundleBuilder
using new API (getUniqueUndroppableUser).
We now continue using the existing API for AssumeBundleBuilder
(getSingleUndroppableUser).

Sorry for the noise here.

Tests-Run: failing testcase passes.
2021-09-15 17:45:09 -04:00
Matt Arsenault 87c00878d3 SplitKit: Remove decade old live interval hack
This was trying to fixup broken live intervals coming out of the
coalescer. The verifier is more complete now and no tests seem to fail
without this.
2021-09-15 17:35:59 -04:00
Anna Thomas 3273430406 Re-add getSingleUndroppableUse API
The API was removed in 4ac4e52189 in favor of
getUniqueUndroppableUser.
However, this caused a buildbot failure in AbstractCallSiteTest.cpp,
which uses the API and the AbstractCallSite class requires a "use"
rather than a user.
Retain the API so that the unittest compiles and passes.
2021-09-15 17:06:20 -04:00
Anna Thomas 4ac4e52189 [InstCombine] Improve TryToSinkInstruction with multiple uses
This patch allows sinking an instruction which can have multiple uses in a
single user. We were previously over-restrictive by looking for exactly one use,
rather than one user.

Also, the API for retrieving undroppable user has been updated accordingly since
in both usecases (Attributor and InstCombine), we seem to care about the user,
rather than the use.

Reviewed-By: nikic

Differential Revision: https://reviews.llvm.org/D109700
2021-09-15 20:39:38 +00:00
Kazu Hirata 385f380e80 [MemorySSA] Fix "set but not used" warnings 2021-09-15 11:41:41 -07:00
Sanjay Patel e5a32d720e [InstCombine] move extend after insertelement if both operands are extended
I was wondering how instcombine does on the examples in D109236,
and we're missing a basic transform:

inselt (ext X), (ext Y), Index --> ext (inselt X, Y, Index)

https://alive2.llvm.org/ce/z/z2aBu9

Note that there are several possible extensions of this fold
(see TODO comments).

Differential Revision: https://reviews.llvm.org/D109537
2021-09-15 14:38:03 -04:00
Philip Reames 9bdb19cca2 [SCEV] (udiv X, Y) * Y is always NUW
Motivated by the removal done in D109782. This implements the correct flag part generically.

Differential Revision: https://reviews.llvm.org/D109786
2021-09-15 11:34:50 -07:00
Alina Sbirlea b759381b75 [MemorySSA] Add verification levels to MemorySSA. [NFC]
Add two levels of verification for MemorySSA: Fast and Full.
The defaults are kept the same. Full verification always occurs under
EXPENSIVE_CHECKS, but now it can also be requested in a specific pass for
debugging purposes.
2021-09-15 11:09:54 -07:00
Filipp Zhinkin f5d8952356 [InstCombine] Transform X == 0 ? 0 : X * Y --> X * freeze(Y)
Enabled mul folding optimization that was previously disabled
by being incorrect.
To preserve correctness, mul's operand that is not compared
with zero in select's condition is now frozen.

Related bug: https://bugs.llvm.org/show_bug.cgi?id=51286

Correctness:
https://alive2.llvm.org/ce/z/bHef7J
https://alive2.llvm.org/ce/z/QcR7sf
https://alive2.llvm.org/ce/z/vvBLzt
https://alive2.llvm.org/ce/z/jGDXgq
https://alive2.llvm.org/ce/z/3Pe8Z4
https://alive2.llvm.org/ce/z/LGga8M
https://alive2.llvm.org/ce/z/CTG5fs

Differential Revision: https://reviews.llvm.org/D108408
2021-09-15 09:04:06 -04:00
Simon Pilgrim 0767e43d87 [CostModel][X86] Adjust bitreverse/ctpop/ctlz/cttz AVX2+ costs based on llvm-mca reports
Based off the worse case numbers generated by D103695, the AVX2/512 bit reversing/counting costs were higher than necessary (based off instruction counts instead of actual throughput).
2021-09-15 13:04:40 +01:00
Martin Storsjö b33a43e57c [ARM] Move fetching of ARMSubtarget into the scopes that need it. NFC.
This was requested in D38253, but missed back then.

Differential Revision: https://reviews.llvm.org/D109046
2021-09-15 15:03:20 +03:00
David Green a2332d5332 [ARM] Prevent continuous folding of SUBC
Under some situations under Thumb1, we could be stuck in an infinite
loop recombining the same instruction. This puts a limit on that, not
combining SUBC with SUBE repeatedly.
2021-09-15 11:23:32 +01:00
David Green 61cc873a8e [LV] Recognize intrinsic min/max reductions
This extends the reduction logic in the vectorizer to handle intrinsic
versions of min and max, both the floating point variants already
created by instcombine under fastmath and the integer variants from
D98152.

As a bonus this allows us to match a chain of min or max operations into
a single reduction, similar to how add/mul/etc work.

Differential Revision: https://reviews.llvm.org/D109645
2021-09-15 10:45:50 +01:00
Simon Pilgrim dcba994184 [X86] combineX86ShuffleChain - ensure we only peek through bitcasts to vectors (PR51858)
When searching for hidden identity shuffles (added at rG41146bfe82aecc79961c3de898cda02998172e4b), only peek through bitcasts to the source operand if it is a vector type as well.
2021-09-15 10:21:05 +01:00
Simon Atanasyan 533471ff2f [MIPS] Remove unused tblgen template args. NFC
Identified in D109359.
2021-09-15 12:16:07 +03:00
Cullen Rhodes 18655140d6 [NVPTX] NFC: Remove unused imm type intrinsic arg
Identified in D109359.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D109755
2021-09-15 08:56:51 +00:00
Florian Hahn e90d55e1c9
[VPlan] Support sinking recipes with uniform users outside sink target.
This is a first step towards addressing the last remaining limitation of
the VPlan version of sinkScalarOperands: the legacy version can
partially sink operands. For example, if a GEP has uniform users outside
the sink target block, then the legacy version will sink all scalar
GEPs, other than the one for lane 0.

This patch works towards addressing this case in the VPlan version by
detecting such cases and duplicating the sink candidate. All users
outside of the sink target will be updated to use the uniform clone.

Note that this highlights an issue with VPValue naming. If we duplicate
a replicate recipe, they will share the same underlying IR value and
both VPValues will have the same name ir<%gep>.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104254
2021-09-15 09:21:39 +01:00
Xiang1 Zhang 1f1c71aeac [X86][InlineAsm] Use mem size information (*word ptr) for "global variable + registers" memory expression in inline asm.
Differential Revision: https://reviews.llvm.org/D109739
2021-09-15 16:11:14 +08:00
Amara Emerson 5ec1845cad [AArch64][GlobalISel] Add a new reassociation for G_PTR_ADDs.
G_PTR_ADD (G_PTR_ADD X, C), Y) -> (G_PTR_ADD (G_PTR_ADD(X, Y), C)

Improves CTMark -Os on AArch64:

Program            before after  diff
           sqlite3 286932 287024  0.0%
                kc 432512 432508 -0.0%
             SPASS 412788 412764 -0.0%
    pairlocalalign 249460 249416 -0.0%
            bullet 475740 475512 -0.0%
    7zip-benchmark 568864 568356 -0.1%
  consumer-typeset 419088 418648 -0.1%
        tramp3d-v4 367628 367224 -0.1%
          clamscan 383184 382732 -0.1%
            lencod 430028 429284 -0.2%
Geomean difference               -0.1%

Differential Revision: https://reviews.llvm.org/D109528
2021-09-14 23:57:41 -07:00
Markus Lavin 1ac209ed76 [NPM] Added -print-pipeline-passes print params for a few passes.
Added '-print-pipeline-passes' printing of parameters for those passes
declared with *_WITH_PARAMS macro in PassRegistry.def.

Note that it only prints the parameters declared inside *_WITH_PARAMS as
in a few cases there appear to be additional parameters not parsable.

The following passes are now covered (i.e. all of those with *_WITH_PARAMS in
PassRegistry.def).

LoopExtractorPass - loop-extract
HWAddressSanitizerPass - hwsan
EarlyCSEPass - early-cse
EntryExitInstrumenterPass - ee-instrument
LowerMatrixIntrinsicsPass - lower-matrix-intrinsics
LoopUnrollPass - loop-unroll
AddressSanitizerPass - asan
MemorySanitizerPass - msan
SimplifyCFGPass - simplifycfg
LoopVectorizePass - loop-vectorize
MergedLoadStoreMotionPass - mldst-motion
GVN - gvn
StackLifetimePrinterPass - print<stack-lifetime>
SimpleLoopUnswitchPass - simple-loop-unswitch

Differential Revision: https://reviews.llvm.org/D109310
2021-09-15 08:34:04 +02:00
Matt Arsenault 54d755a034 DAG: Fix incorrect folding of fmul -1 to fneg
The fmul is a canonicalizing operation, and fneg is not so this would
break denormals that need flushing and also would not quiet signaling
nans. Fold to fsub instead, which is also canonicalizing.
2021-09-14 21:25:02 -04:00
Hongtao Yu 299b5d420d [CSSPGO] Enable pseudo probe instrumentation in O0 mode.
Pseudo probe instrumentation was missing from O0 build. It is needed in cases where some source files are built in O0 while the others are built in optimize mode.

Reviewed By: wenlei, wlei, wmi

Differential Revision: https://reviews.llvm.org/D109531
2021-09-14 18:13:29 -07:00
Matt Arsenault 4a36e96c3f RegAllocGreedy: Account for reserved registers in num regs heuristic
This simple heuristic uses the estimated live range length combined
with the number of registers in the class to switch which heuristic to
use. This was taking the raw number of registers in the class, even
though not all of them may be available. AMDGPU heavily relies on
dynamically reserved numbers of registers based on user attributes to
satisfy occupancy constraints, so the raw number is highly misleading.

There are still a few problems here. In the original testcase that
made me notice this, the live range size is incorrect after the
scheduler rearranges instructions, since the instructions don't have
the original InstrDist offsets. Additionally, I think it would be more
appropriate to use the number of disjointly allocatable registers in
the class. For the AMDGPU register tuples, there are a large number of
registers in each tuple class, but only a small fraction can actually
be allocated at the same time since they all overlap with each
other. It seems we do not have a query that corresponds to the number
of independently allocatable registers. Relatedly, I'm still debugging
some allocation failures where overlapping tuples seem to not be
handled correctly.

The test changes are mostly noise. There are a handful of x86 tests
that look like regressions with an additional spill, and a handful
that now avoid a spill. The worst looking regression is likely
test/Thumb2/mve-vld4.ll which introduces a few additional
spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll
shows a massive improvement by completely eliminating a large number
of spills inside a loop.
2021-09-14 21:00:29 -04:00
Matt Arsenault 88146230e1 SeparateConstOffsetFromGEP: Fix stack overflow in unreachable code
ConstantOffsetExtractor::Find was infinitely recursing on the add
referencing itself.
2021-09-14 19:49:38 -04:00
Matt Arsenault fdd9761dd1 Attributor: Fix crash on undef in !callees 2021-09-14 19:49:34 -04:00
Matt Arsenault f12174204c AMDGPU: Rename attributor class for uniform-work-group-size
This isn't really an AMDGPU specific attribute and could be moved to
generic code. It's also important to include the word uniform in the
name.
2021-09-14 19:49:08 -04:00
David Blaikie 4cabaf594a NFC: DebugInfo: refactor pretty printing into a utility class
Laying more foundation for full template name rebuilding - more complex
type printing benefits from an object to carry some state rather than
passing it around as parameters to every function.
2021-09-14 15:54:29 -07:00
Philip Reames 0dd755f027 [SCEV] Stop applying contextual flags in applyLoopGuards
This fixes a violation of the wrap flag rules introduced in c4048d8f. As noted in the original review, the NUW is legal to infer from the structure of the replacee, but a) there's no test coverage, and b) this should be done generically for all multiplies.

Differential Revision: https://reviews.llvm.org/D109782
2021-09-14 14:14:52 -07:00
David Tenty 26b8031774 [CMake][AIX] Disable visibility options in build
Visibility options currently have limited support on AIX and may cause warnings or errors
depending on the build compiler used.

Reviewed By: ZarkoCA

Differential Revision: https://reviews.llvm.org/D108467
2021-09-14 16:05:12 -04:00
Heejin Ahn 468c4409f6 Revert "[WebAssembly] Rethrow longjmp in EH handling if EmSjLj is enabled"
This reverts commit b7b4ebbcfa.

Reason: This breaks several code-size tests in Emscripten test suite
because this exports `emscripten_longjmp` for programs that didn't do it
before.
2021-09-14 12:59:42 -07:00
Joe Nash 3ce1b9631a [AMDGPU] Switch PostRA sched to MachineSched
Use GCNHazardRecognizer in postra sched.
Updated tests for the new schedules.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D109536

Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde
2021-09-14 15:11:27 -04:00
Florian Hahn 7359450e6a
[VPlan] Queue (block, operand) pairs together (NFC).
Instead of discovering the sink-to block for each operand in the main
loop, the sink-to block can instead be directly queued with the
operands.

This simplifies processing in the main loop and is a NFC change split
off from D104254 as suggested there.
2021-09-14 20:02:51 +01:00
Bjorn Pettersson cd2bff1ef1 [StackColoring] Fix a debug invariance problem
Ignore dbg instructions when collecting stack slot markers. This is
to make sure the coloring is invariant regarding presence of dbg
instructions (even in cases when the dbg instructions might be
badly placed in the input).

Differential Revision: https://reviews.llvm.org/D109758
2021-09-14 19:21:56 +02:00
Kazu Hirata d9e46beace [IPO] Use make_early_inc_range (NFC) 2021-09-14 08:59:36 -07:00
Sam Clegg 6ee55f9ab5 Fix test failure created by ef8c9135ef
Followup to https://reviews.llvm.org/D108877 to fix test
failure.
2021-09-14 07:35:05 -07:00
Sam Clegg ef8c9135ef [WebAssembly] Allow import and export of TLS symbols between DSOs
We previously had a limitation that TLS variables could not
be exported (and therefore could also not be imported).  This
change removed that limitation.

Differential Revision: https://reviews.llvm.org/D108877
2021-09-14 06:47:37 -07:00
Amy Kwan 5041a485b9 [PowerPC] Exploit Prefixed Load/Stores using the refactored Load/Store Implementation
This patch exploits the prefixed load and store instructions utilizing the
refactored load/store implementation introduced in D93370.

Prefixed load and store instructions are emitted whenever we are loading or
storing a value with an offset that fits into a 34-bit signed immediate.
Patterns for the prefixed load and stores are added in this patch, as well as
the implementation that detects when we are loading and storing a value with an
offset that fits in 34-bits.

Differential Revision: https://reviews.llvm.org/D96075
2021-09-14 08:39:49 -05:00
Florian Hahn e248d69036
Recommit "[LAA] Support pointer phis in loop by analyzing each incoming pointer."
SCEV does not look through non-header PHIs inside the loop. Such phis
can be analyzed by adding separate accesses for each incoming pointer
value.

This results in 2 more loops vectorized in SPEC2000/186.crafty and
avoids regressions when sinking instructions before vectorizing.

Fixes PR50296, PR50288.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D102266
2021-09-14 11:19:12 +01:00
David Green 5a6dfbb8cd [ARM] Teach DemandedVectorElts about VMOVN lanes
The class of instructions that write to narrow top/bottom lanes only
demand the even or odd elements of the input lanes. Which means that a
pair of VMOVNT; VMOVNB demands no lanes from the original input. This
teaches that to instcombine from the target hooks available through
ARMTTIImpl.

Differential Revision: https://reviews.llvm.org/D109325
2021-09-14 11:05:31 +01:00
Tim Northover f287405419 AArch64: fix indentation of ProcAppleA14. NFC. 2021-09-14 10:04:15 +01:00
Cullen Rhodes 6fbc167c0a [WebAssembly] NFC: Remove unused tblgen template args
Identified in D109359.

Reviewed By: aheejin

Differential Revision: https://reviews.llvm.org/D109689
2021-09-14 08:26:15 +00:00
Cullen Rhodes 742cf3996e [AArch64] NFC: Use 'asm' in SIMDScalarCPY
Fixes a warning identified in D109359. The mnemonic is also mov, not
cpy.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D109573
2021-09-14 08:26:15 +00:00
Martin Storsjö ac3edc4c97 [Win64EH] Write .pdata symbol relocations relative to the temporary begin symbol
Previously the relocations pointed at the public user facing,
possibly external symbol.

When the function itself is weak, that symbol may be overridden at
link time, pointing at another strong implementation of the same
function instead. In that case, there's two conflicting pdata entries
pointing at the same address, and the wrong unwind info might end up
used.

Both GCC/binutils and MSVC produce pdata pointing at internal static
symbols. (GCC/binutils point at the .text section just as LLVM does
after this change, MSVC points at special label type symbols with the
type IMAGE_SYM_CLASS_LABEL and names like '$LN4'.)

This fixes unwinding through an overridden "operator new" with a
statically linked C++ library in MinGW mode. (Building libc++ with
-ffunction-sections and linking with --gc-sections might avoid the
issue too.)

This makes the produced object files a little less user friendly
to debug, but with other recent improvements for llvm-readobj, the
unwind info debugging experience should be pretty much the same.

Differential Revision: https://reviews.llvm.org/D109651
2021-09-14 11:05:37 +03:00
Heejin Ahn e85ed44373 [WebAssembly] Fix a typo in comments 2021-09-14 00:45:02 -07:00
Esme-Yi b98c3e957f [yaml2obj][XCOFF] add the SectionIndex field for symbol.
Summary: Add the SectionIndex field for symbol.
1: a symbol can reference a section by SectionName or SectionIndex.
2: a symbol can reference a section by both SectionName and SectionIndex.
3: if both Section and SectionIndex are specified, but the two values refer
   to different sections, an error will be reported.
4: an invalid SectionIndex is allowed.
5: if a symbol references a non-existent section by SectionName, an error will be reported.

Reviewed By: jhenderson, Higuoxing

Differential Revision: https://reviews.llvm.org/D109566
2021-09-14 06:18:03 +00:00
Chris Lattner 8b4afc5aef [APInt] Add a concat method, use LLVM_UNLIKELY to help optimizer.
Three unrelated changes:

1) Add a concat method as a convenience to help write bitvector
   use cases in a nicer way.
2) Use LLVM_UNLIKELY as suggested by @xbolva00 in a previous patch.
3) Fix casing of some "slow" methods to follow naming standards.

Differential Revision: https://reviews.llvm.org/D109620
2021-09-13 22:02:54 -07:00
Chen Zheng 946e69d253 [PowerPC] prepare more loop load/store instructions
PPCLoopInstrFormPrep pass now can prepare for load store instructions
in a loop whose increment is not a constant integer.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D105872
2021-09-14 05:00:48 +00:00
Matt Arsenault c305513cc2 AMDGPU: Fix assert with indirect call with known required inputs
The attributor can determine that some indirect calls do not require
special inputs. The special inputs will still be present in the ABI,
so we need to allocate the registers and pass undefs.
2021-09-13 22:54:11 -04:00
Lang Hames 2c8e784915 [ORC] Add Shared/OrcRTBridge, and TargetProcess/OrcRTBootstrap.
This is a small first step towards reorganization of the ORC libraries:

Declarations for types and function names (as strings) to be found in the
"ORC runtime bootstrap" set are moved into OrcRTBridge.h / OrcRTBridge.cpp.

The current implementation of the "ORC runtime bootstrap" functions is moved
into OrcRTBootstrap.h and OrcRTBootstrap.cpp. It is likely that this code will
eventually be moved into ORT-RT proper (in compiler RT).

The immediate goal of this change is to make these bootstrap functions usable
for clients other than SimpleRemoteEPC/SimpleRemoteEPCServer. The first planned
client is a new RuntimeDyld::MemoryManager that will run over EPC, which will
allow us to remove the old OrcRemoteTarget code.
2021-09-14 10:19:45 +10:00
Brendon Cahoon 42dace9c5b [Hexagon] Use getTypeAllocSize to compute difference between objects
The code was using getTypeStoreSize to calculate the difference
between consecutive objects. The calculation was incorrect due
to padding that is added between consecutive objects. The
getTypeAllocSize includes the padding amount. For example,
if the type is [19 x i8], the difference between consecutive
objects is 32 bytes, not 19 bytes.

A second case for getTypeAllocSize is needed when computing
the pointer values for the vector accesses. The calculation needs
to account for the padding as well.

Differential Revision: https://reviews.llvm.org/D109403
2021-09-13 19:04:59 -05:00
Ankit Aggarwal a72763af67 [Hexagon] Handle bitcast of i64/i128 -> v64i1/v128i1 2021-09-13 18:52:30 -05:00
Kuba Mracek e80ee4cbd9 [GlobalDCE] In VFE support for relative pointers, allow GEP references to the base symbol
This is for Swift VFE support. In some vtable forms that Swift emits, the "base" of a relative pointer is not the global symbol itself directly, but a GEP into it -- so the pointer is relative to a particular field in the global. So getPointerAtOffset() needs to be able to see through the GEP and allow it in a SUB expression, to correctly recognize the offset as a vtable slot.

Differential Revision: https://reviews.llvm.org/D109169
2021-09-13 15:22:11 -07:00
Heejin Ahn c55b6c593b [WebAssembly] Handle _setjmp and _longjmp in SjLj
In some platforms `_setjmp` and `_longjmp` are used instead of `setjmp`
and `longjmp`. This CL adds support for them.

Fixes https://github.com/emscripten-core/emscripten/issues/14999.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D109669
2021-09-13 14:20:04 -07:00
Heejin Ahn b7b4ebbcfa [WebAssembly] Rethrow longjmp in EH handling if EmSjLj is enabled
This is a fix on top of D106525's Case 2. In D106525, in
`runEHOnFunction` which handles Emscripten EH, We rethrow `longjmp` only
when the module has any usage of `setjmp` or `longjmp`. But now Wasm
object files are linked using wasm-ld, the module this pass sees is not
the whole program, and even if this module does not contain any
`longjmp`, another file can contain it and can be linked with the
current module. This enables the rethrowing of longjmp whenever
Emscripten SjLj is enabled, regardless of whether it is used in this
module or not.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D109670
2021-09-13 14:15:25 -07:00
Florian Mayer 0a22510f3e [value-tracking] see through returned attribute.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D109675
2021-09-13 20:52:26 +01:00
Florian Mayer 5b5d774f5d [hwasan] Respect returns attribute when tracking values.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D109233
2021-09-13 20:52:24 +01:00
Jon Chesterfield 71052ea1e3 [openmp] Apply code change from D109500 2021-09-13 18:33:53 +01:00
Jon Chesterfield bfcf979978 Revert "[openmp] Fix 51647, corrupt bitcode on amdgpu"
This reverts commit d5c049a3f6.
Going to re-commit it in pieces for easier application to 13
2021-09-13 18:25:07 +01:00
Philip Reames 6fec6552f5 Revert "[IndVars] Replace PHIs if loop exits on 1st iteration"
This reverts commit 5a6dfb27ca.  See original review for why.
2021-09-13 10:11:18 -07:00
Philip Reames 5746c76f3f Revert "[IndVars] Break backedge and replace PHIs if loop exits on 1st iteration"
This reverts commit d9ca444835.  See review for why.
2021-09-13 10:10:49 -07:00
vnalamot 726b5d3416 [RegScavenger][NFC] Refer to the already initialized local variable for spill slot index
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D109501
2021-09-13 21:55:33 +05:30
Kazu Hirata abca4c012f [Utils] Use make_early_inc_range (NFC) 2021-09-13 08:57:23 -07:00
Simon Pilgrim 9db20822f7 [APInt] Add APIntOps::ScaleBitMask helper
APInt is used to describe a bit mask in a variety of value tracking and demanded bits/elts functions.

When traversing through dst/src operands, we have a number of places where these masks need to widened/narrowed to translate through bitcasts, reductions etc. to a different type.

This patch add a APIntOps::ScaleBitMask common helper, adds unit test coverage, and updates a number of cases to use the the helper instead of their own implementation.

This came up on D109065 where we currently have to add yet another implementation of the same code.

Differential Revision: https://reviews.llvm.org/D109683
2021-09-13 16:27:12 +01:00
vnalamot 0fc3ebb70a [SelectionDAG][NFC] Fix typo in VerifyDAGDiverence() function name
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D109674
2021-09-13 20:48:04 +05:30
dpalermo d5c049a3f6 [openmp] Fix 51647, corrupt bitcode on amdgpu
Patch by @dpalermo

The corrupt bitcode reported in https://bugs.llvm.org/show_bug.cgi?id=51647 seems to be a result of a later pass changing the workfn variable to addrspace(5) (thread private, on the stack). That seems reasonable for an alloca without an address space so it's an open question why that can crash the bitcode reader.

This change puts it in the thread private address space to begin with which means whatever misfired further down the pipeline does not break it. That matches the codegen from clang where stack variables are always annotated (5) and then addrspace cast prior to following use.

This therefore patches around whatever unsuccessfully moved the alloca variable to addrspace(5). That solves the problem of openmp opt producing code that crashes the bitcode reader. It should be possible to create a minimal repro for the underlying bug based on some handwritten IR that uses an alloca in a generic address space.

Reviewed By: ronlieb, jdoerfert, dpalermo-phab

Differential Revision: https://reviews.llvm.org/D109500
2021-09-13 15:24:48 +01:00
Anna Thomas b4e787d8f4 [InstCombining] Refactor checks for TryToSinkInstruction. NFC
Moved out the checks for profitability of TryToSinkInstructions
into a lambda function.
This will also allow us to easily add checks for bailing out if the
transform is not profitable.

Tests-Run: instCombine tests.
2021-09-13 09:04:34 -04:00
Stefan Gränitz 9691851582 [JITLink] Factor out forEachRelocation() function from addRelocations() in ELF Aarch64 backend (NFC)
First step in reducing redundancy in `addRelocations()` implementations across ELF JITLink backends. The patch factors out common logic for ELF relocation traversal into the new helper function `forEachRelocation()` in the `ELFLinkGraphBuilder` base class. For now, this is applied to the Aarch64 implementation. Others may follow soon.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D109516
2021-09-13 14:59:38 +02:00
Tim Northover 5d070c8259 SwiftAsync: use runtime-provided flag for extended frame if back-deploying
When back-deploying Swift async code we can't always toggle the flag showing an
extended frame is present because it will confuse unwinders on systems released
before this feature. So in cases where the code might run there, we `or` in a
mask provided by the runtime (as an absolute symbol) telling us whether the
unwinders can cope.

When deploying only for newer OSs, we can still hard-code the bit-set for
greater efficiency.
2021-09-13 13:54:46 +01:00
Cullen Rhodes 1d771e19fd [AArch64] NFC: Remove unused template args
Identified in D109359.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D109491
2021-09-13 10:39:33 +00:00
Florian Hahn c24fc37e47
[VectorCombine] Support AND/UREM indices that require freezing.
38b098be66 limited scalarization to indices that are known non-poison.
For certain patterns that restrict the range of an index, we can insert
a freeze of the original value, to prevent propagation of poison.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107580
2021-09-13 11:21:45 +01:00
David Truby 915e9e76bf [llvm][sve] Lowering for VLS masked extending loads
This extends the custom lowering for extending loads on
fixed length vectors in SVE to support masked extending loads.

The existing tests for correct behaviour of masked extending loads
exhibit bad code generation due to the legalistaion of i1 vectors.
They have been left as-is and new tests have been added that do not
exhibit this behaviour.

Differential Revision: https://reviews.llvm.org/D108200
2021-09-13 11:13:25 +01:00
Cullen Rhodes 97a6d76694 [Hexagon] NFC: Remove unused tblgen template args
Identified in D109359.

Reviewed By: kparzysz

Differential Revision: https://reviews.llvm.org/D109604
2021-09-13 10:09:08 +00:00
Cullen Rhodes 9e435c96de [Lanai] NFC: Remove unused tblgen template arg 'OpNode'
Identified in D109359.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D109606
2021-09-13 10:09:08 +00:00
Jingu Kang 2a26d47a2d [LoopBoundSplit] Check the start value of split cond AddRec
After transformation, we assume the split condition of the pre-loop is always
true. In order to guarantee it, we need to check the start value of the split
cond AddRec satisfies the split condition.

Differential Revision: https://reviews.llvm.org/D109354
2021-09-13 10:32:35 +01:00
Jay Foad 477b9bc9f7 [AMDGPU] Minor cleanup after D109483. NFC. 2021-09-13 10:27:15 +01:00
Esme-Yi 909f3d7380 [yaml2obj][XCOFF] customize the string table
Summary: The patch adds support for yaml2obj customizing the string table.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D107421
2021-09-13 09:24:38 +00:00
David Sherwood bbada9ff45 [NFC] Replace unsigned VF with ElementCount in EpilogueLoopVectorizationInfo
This patch simply replaces any unsigned VFs with ElementCounts. It's
still NFC because at the moment epilogue vectorisation is disabled
when the main vector loop uses scalable vectors.

Differential Revision: https://reviews.llvm.org/D109364
2021-09-13 10:18:30 +01:00
Jim Lin f29336104d [RISCV] Rename prefix `FeatureExt*` to `FeatureStdExt*` for all sub-extension
Rename prefix `FeatureExt*` to `FeatureStdExt*` for all sub-extension for consistency

Reviewed By: HsiangKai, asb

Differential Revision: https://reviews.llvm.org/D108187
2021-09-13 16:24:15 +08:00
Esme-Yi ea81898d0f [XCOFF] Fix the program abortion issue in XCOFFObjectFile::getSectionContents.
Summary: Use std::move(E) to avoid `Program aborted due to an unhandled Error`

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D109567
2021-09-13 07:54:33 +00:00
Simon Pilgrim 65ad09da0e [X86][SLM] Fix DIVPD/DIVPS/RCPPS/RSQRTPS/SQRTPD/SQRTPS/DPPD/DPPS uops, latency and throughput
The packed variants of the instructions had been modelled as the same as the scalar variants.

Reported during a run of llvm-exegesis on a cheap SLM box and matches what Agner / InstLatX64 report as well.
2021-09-13 08:36:43 +01:00
luxufan ff6069b891 [JITLink] Add initial native TLS support to ELFNix platform
This patch use the same way as the https://reviews.llvm.org/rGfe1fa43f16beac1506a2e73a9f7b3c81179744eb to handle the thread local variable.

It allocates 2 * pointerSize space in GOT to represent the thread key and data address. Instead of using the _tls_get_addr function, I customed a function __orc_rt_elfnix_tls_get_addr to get the address of thread local varible. Currently, this is a wip patch, only one TLS relocation R_X86_64_TLSGD is supported and I need to add the corresponding test cases.

To allocate the TLS  descriptor in GOT, I need to get the edge kind information in PerGraphGOTAndPLTStubBuilder, So I add a `Edge::Kind K` argument in some functions in PerGraphGOTAndPLTStubBuilder.h. If it is not suitable, I can think further to solve this problem.

Differential Revision: https://reviews.llvm.org/D109293
2021-09-13 14:35:49 +08:00
Arthur Eubanks 6a92ab07cb [NFC][CoroSplit] Directly use Function::getFunctionType() 2021-09-12 21:34:19 -07:00
Max Kazantsev d9ca444835 [IndVars] Break backedge and replace PHIs if loop exits on 1st iteration
Implement TODO in optimizeLoopExits. Now if we have proved that some loop exit
is taken on 1st iteration, we make all branches in the following exiting blocks
always branch out of the loop and their conditions simplified away.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D108910
Reviewed By: lebedev.ri
2021-09-13 11:30:55 +07:00
Max Kazantsev 5a6dfb27ca [IndVars] Replace PHIs if loop exits on 1st iteration
This is a part of D108910.
We replace all loop PHIs with values coming from the loop preheader if
we proved that backedge is never taken.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D109596
Reviewed By: lebedev.ri
2021-09-13 10:50:33 +07:00
Arthur Eubanks d48a3f9f75 [NFC] Directly use OpenMPIRBuilder::Ident instead of IdentPtr->getPointerElementType() 2021-09-12 20:45:44 -07:00
Kuter Dinel 9a193bdc81 [Attributor][FIX] AACallEdges, fix propagation error.
This patch fixes a error made in 2cc6f7c8e1. That patch
added a call site position but there was a small error with the way
the presence of a unknown call edge was being propagated from call site
to function. This patch fixes that error. This error was effecting some
AMDGPU tests.
2021-09-13 03:45:26 +03:00
Arthur Eubanks f94a118a6e [NFC] Avoid using pointee types in PPCISelLowering
A cmpxchg's new value type is the same as the pointer operand's pointee type.
2021-09-12 17:37:35 -07:00
Craig Topper 283879793d [RISCV] Initial support .insn directive for the assembler.
This allows for a custom encoding to be emitted. It can also be
used with inline assembly to allow the custom instruction to be
register allocated like other instructions.

I initially started from SystemZ's implementation, but some of
the formats allow operands to be specified in multiple ways so I
had to add support for matching different operand class lists for
the same format. That implementation is a simplified version of
what is emitted by tablegen for regular instructions.

I've left out the compressed formats. And I haven't supported the
named opcodes like LUI or OP_IMM_32. Those can be added in future
patches.

Documentation can be found here https://sourceware.org/binutils/docs-2.37/as/RISC_002dV_002dFormats.html

Reviewed By: jrtc27, MaskRay

Differential Revision: https://reviews.llvm.org/D108602
2021-09-12 15:56:12 -07:00
Kuter Dinel 66a0b3464c [Attributor] AAFunctionReachability, Handle CallBase Reachability.
This patch makes it possible to query callbase reachability
(Can a callbase reach a function Fn transitively).
The patch moves the reachability query handling logic to a member class,
this class will have more users within the AA once we add other function
reachability queries.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106402
2021-09-13 01:35:44 +03:00
Kuter Dinel 2cc6f7c8e1 [Attributor] Create a call site position for AACalledges
This patch adds a call site position for AACallEdges, this
allows us to ask questions about which functions a specific
`CallBase` might call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106208
2021-09-13 01:17:05 +03:00
Florian Hahn 368af7558e
[VPlan] Fix crash caused by not updating all users properly.
Users of VPValues are managed in a vector, so we need to be more
careful when iterating over users while updating them. For now, just
copy them.

Fixes 51798.
2021-09-12 18:10:53 +01:00
Nikita Popov 4189e5fe12 [CGP] Support opaque pointers in address mode fold
Rather than inspecting the pointer element type, use the access
type of the load/store/atomicrmw/cmpxchg.

In the process of doing this, simplify the logic by storing the
address + type in MemoryUses, rather than an Instruction + Operand
pair (which was then used to fetch the address).
2021-09-12 17:43:37 +02:00
Kazu Hirata 8e86c0e4f4 [Scalar] Use make_early_inc_range (NFC) 2021-09-12 08:17:18 -07:00
Sanjay Patel 3a126134d3 [InstCombine] remove casts from splat-a-bit pattern
https://alive2.llvm.org/ce/z/_AivbM

This case seems clear since we can reduce instruction count
and avoid an intermediate type change, but we might want to
use mask-and-compare for other sequences.

Currently, we can generate more instructions on some related
patterns by trying to use bit-hacks instead of mask+cmp, so
something is not behaving as expected.
2021-09-12 09:18:14 -04:00
Sam Clegg b78c85a44a [WebAssembly] Convert to new "dylink.0" section format
This format is based on sub-sections (like the "linking" and "name"
sections) and is therefore easier to extend going forward.

spec change: https://github.com/WebAssembly/tool-conventions/pull/170
binaryen change: https://github.com/WebAssembly/binaryen/pull/4141
wabt change:  https://github.com/WebAssembly/wabt/pull/1707
emscripten change: https://github.com/emscripten-core/emscripten/pull/15019

Differential Revision: https://reviews.llvm.org/D109595
2021-09-12 05:30:38 -07:00
Lang Hames b64fc0af9a [ORC] Add bootstrap symbols to ExecutorProcessControl.
Bootstrap symbols are symbols whose addresses may be required to bootstrap
the rest of the JIT. The bootstrap symbols map generalizes the existing
JITDispatchInfo class provide an arbitrary map of symbol names to addresses.

The JITDispatchInfo class will be replaced by bootstrap symbols with reserved
names in upcoming commits.
2021-09-12 18:49:43 +10:00
Lang Hames e339303776 [ORC] Add OrcTargetProcess dependency on LLVM_PTHREAD_LIB 2021-09-12 18:17:06 +10:00
Lang Hames 698a598cf7 [ORC] Add OrcShared dependency on LLVM_PTHREAD_LIB 2021-09-12 16:02:02 +10:00
Lang Hames d193d23795 [ORC] Fix missing std::move 2021-09-12 15:27:19 +10:00
Lang Hames d11a0c5d91 [ORC] Fix out-of-range comparison errors. 2021-09-12 14:48:05 +10:00
Lang Hames bb72f07380 Re-apply bb27e45643 and 5629afea91 with fixes.
This reapplies bb27e45643 (SimpleRemoteEPC
support) and 2269a941a4 (#include <mutex>
fix) with further fixes to support building with LLVM_ENABLE_THREADS=Off.
2021-09-12 14:23:22 +10:00
Kazu Hirata 15e9575fb5 [Vectorize] Fix "unused variable" warnings 2021-09-11 12:06:43 -07:00
Nikita Popov 45c467346a [LAA] Pass access type to getPtrStride()
Pass the access type to getPtrStride(), so it is not determined
from the pointer element type. Many cases still fetch the element
type at a higher level though, so this only partially addresses
the issue.
2021-09-11 19:16:49 +02:00
Sanjay Patel 75e8eb2b10 [InstCombine] update code/test comments; NFC
Follow-up for post-commit suggestion on:
28afaed691

The comments were partly copied from the original
code, but not updated to match the new code.
2021-09-11 10:53:53 -04:00
Nikita Popov f5806830e0 [ARM] Support neon.vld auto-upgrade with opaque pointers
This code manually constructs the intrinsic name, so we need to
use p0 instead of p0i8 in opaque pointer mode.
2021-09-11 16:34:32 +02:00
Kazu Hirata e030d31fda [GlobalOpt] Use make_early_inc_range (NFC) 2021-09-11 07:23:22 -07:00
Sanjay Patel 28afaed691 [InstCombine] fold sub of min/max intrinsics with invertible ops
This is a translation of the existing code to handle the intrinsics
and another step towards D98152.

https://alive2.llvm.org/ce/z/jA7eBC

This pattern is already handled by underlying folds if there are
less uses, so the minimal tests in this case have extra uses.

The larger cmyk tests show the motivation - when combined with
other folds, we invert a larger sequence and eliminate 'not' ops.
2021-09-11 09:18:46 -04:00
guopeilin 749ddd25e9 [BitcodeReader] Delay select until all constants resolved
Like the shuffle, we should treat the select delayed so that
all constants can be resolved.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D109053
2021-09-11 18:51:35 +08:00
Simon Pilgrim df975e4590 [X86][SLM] Fix PSAD/MPSAD uops, latency and throughput
Noticed while trying to improve generic reduction costs via the D103695 helper script. Confirmed with Intel AoM / Agner / InstLatX64.
2021-09-11 11:44:09 +01:00
Simon Pilgrim 484944ac3b [X86][SLM] Fix HADD/HSUB uops, latency and throughput
Noticed while trying to improve generic reduction costs via the D103695 helper script. Confirmed with Intel AoM / Agner / InstLatX64.
2021-09-11 11:44:09 +01:00
Simon Pilgrim 51d04e2268 [X86][SLM] Swap LoadLat and LoadUOps in the SLMWriteResPair<> helper. NFC.
We set the LoadUOps argument a lot more frequently that LoadLat, by swapping them we can simplify a number of declarations.
2021-09-11 11:44:09 +01:00
Lang Hames 2269a941a4 Revert 5629afea91 and bb27e45643 while I look into bot failures.
This reverts commit 5629afea91 ("[ORC] Add missing
include."), and bb27e45643 ("[ORC] Add
SimpleRemoteEPC: ExecutorProcessControl over SPS + abstract transport.").

The SimpleRemoteEPC patch currently assumes availability of threads, and needs
to be rewritten with LLVM_ENABLE_THREADS guards.
2021-09-11 19:02:11 +10:00
Lang Hames bb27e45643 [ORC] Add SimpleRemoteEPC: ExecutorProcessControl over SPS + abstract transport.
SimpleRemoteEPC is an ExecutorProcessControl implementation (with corresponding
new server class) that uses ORC SimplePackedSerialization (SPS) to serialize and
deserialize EPC-messages to/from byte-buffers. The byte-buffers are sent and
received via a new SimpleRemoteEPCTransport interface that can be implemented to
run SimpleRemoteEPC over whatever underlying transport system (IPC, RPC, network
sockets, etc.) best suits your use case.

The SimpleRemoteEPCServer class provides executor-side support. It uses a
customizable SimpleRemoteEPCServer::Dispatcher object to dispatch wrapper
function calls to prevent the RPC thread from being blocked (a problem in some
earlier remote-JIT server implementations). Almost all functionality (beyond the
bare basics needed to bootstrap) is implemented as wrapper functions to keep the
implementation simple and uniform.

Compared to previous remote JIT utilities (OrcRemoteTarget*,
OrcRPCExecutorProcessControl), more consideration has been given to
disconnection and error handling behavior: Graceful disconnection is now always
initiated by the ORC side of the connection, and failure at either end (or in
the transport) will result in Errors being delivered to both ends to enable
controlled tear-down of the JIT and Executor (in the Executor's case this means
"as controlled as the JIT'd code allows").

The introduction of SimpleRemoteEPC will allow us to remove other remote-JIT
support from ORC (including the legacy OrcRemoteTarget* code used by lli, and
the OrcRPCExecutorProcessControl and OrcRPCEPCServer classes), and then remove
ORC RPC itself.

The llvm-jitlink and llvm-jitlink-executor tools have been updated to use
SimpleRemoteEPC over file descriptors. Future commits will move lli and other
tools and example code to this system, and remove ORC RPC.
2021-09-11 18:16:38 +10:00
Jessica Paquette 4e408aae2c [AArch64][GlobalISel] Select full-fp16 s16 G_FCONSTANT as a constant pool load
When we have full-fp16 support, we should (manually select) s16 G_FCONSTANT to
a constant pool load.

Add support for that to `emitLoadFromConstantPool` + the existing constant
selection code.

Also tidy up the constant selection code a little. There were some out-of-date
comments + some dead code.

Differential Revision: https://reviews.llvm.org/D108957
2021-09-10 19:36:34 -07:00
Lang Hames 6c56b13331 [JITLink] Working memory shouldn't be subject to alignment constraints.
Refactors copyBlockContentToWorkingMemory to use offsets rather than direct
pointers to working memory. This simplifies the problem of maintaining
alignments between blocks in working memory, without requiring the working
memory itself to be aligned.
2021-09-11 11:26:38 +10:00
Lang Hames 3828ab086a [ORC] Fix missing newline in debugging output. 2021-09-11 11:24:01 +10:00
Lang Hames 22641f5853 [ORC] Use EPC for EPCGeneric MemoryAccess / JITLinkMemoryManager construction.
This allows these classes to be created during EPC construction, before an
ExecutionSession is available.
2021-09-11 11:24:00 +10:00
Usman Nadeem ab111e982f Revert "Revert "[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation""
This reverts commit eee7d225de.
Effectively relanding 98c37247d8
after fixing the failing tests.

Change-Id: I5d7461aeb820a2d5f1895457d824a8de4d316ee5
2021-09-10 18:11:24 -07:00
Eric Christopher 2d26a72f82 nullptr initialize variables, spotted on msan bots. 2021-09-10 18:10:53 -07:00
Joseph Huber 29b44ca896 [OpenMP] Add flag for setting debug in the offloading device
This patch introduces the flags `-fopenmp-target-debug` and
`-fopenmp-target-debug=` to set the value of a global in the device.
This will be used to enable or disable debugging features statically in
the device runtime library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109544
2021-09-10 18:19:19 -04:00
Joseph Huber 7eb899cbcd [OpenMP] Add more verbose remarks for runtime folding
We peform runtime folding, but do not currently emit remarks when it is
performed. This is because it comes from the runtime library and is
beyond the users control. However, people may still wish to view  this
and similar information easily, so we can enable this behaviour using a
special flag to enable verbose remarks.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109627
2021-09-10 17:36:06 -04:00
Johannes Doerfert 99ea8ac9f1 Reapply "[OpenMP] Group side-effects to improve guarding efficiency"
This reapplies ca134c3963, effectively
reverting commit d2f206e0af.

Minor test changes to make the test pass.
2021-09-10 15:22:57 -05:00
Johannes Doerfert c09fbbdcfb Reapply "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals""
This reapplies commit 7dbba3376f, or, put
differently, this reverts commit d9a8d20827.

The test now requires the amdgpu and nvptx backend explicitly as it
won't work without properly.
2021-09-10 15:22:56 -05:00
Mark Schimmel 7c82db3634 [ARC] Improve code generated for i32 ADDC/ADDE and SUBC/SUBE
This change improves the code generated for long long addition and subtraction

Differential Revision: https://reviews.llvm.org/D109615
2021-09-10 13:04:08 -07:00
Usman Nadeem eee7d225de Revert "[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation"
This reverts commit 98c37247d8.
2021-09-10 13:01:48 -07:00
Usman Nadeem 98c37247d8 [AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation
Differential Revision: https://reviews.llvm.org/D109118

Change-Id: I47adc1984a54bea02bf5a0a767b765afe7e16aa3
2021-09-10 12:52:14 -07:00
Joseph Huber 9e2fc0ba37 [OpenMP] Check OpenMP assumptions on call-sites as well
This patch adds functionality to check assumption attributes on call
sites as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109376
2021-09-10 14:52:47 -04:00
Florian Mayer 09391e7e50 [hwasan] Do not instrument accesses to uninteresting allocas.
This leads to a statistically significant improvement when using -hwasan-instrument-stack=0: https://bit.ly/3AZUIKI.
When enabling stack instrumentation, the data appears gets better but not statistically significantly so. This is consistent
with the very moderate improvements I have seen for stack safety otherwise, so I expect it to improve when the underlying
issue of that is resolved.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108457
2021-09-10 19:28:28 +01:00
Florian Mayer 57335b6e2e [stack-safety] Allow to determine safe accesses.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D109503
2021-09-10 19:23:54 +01:00
Kazu Hirata c9fca53af1 [CodeGen, Target] Use pred_empty and succ_empty (NFC) 2021-09-10 11:11:31 -07:00
Huihui Zhang da4a2fd832 [AArch64ISelLowering] Fix null pointer access in performSVEAndCombine.
When combining 'and' of an unsigned unpack and shuffle instruction,
bail early if shuffle is not constructed from a constant integer.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D109556
2021-09-10 10:36:43 -07:00
Anton Afanasyev 54d8ebbbfd [AggressiveInstCombine] Add `udiv` and `urem` instrs to TruncInstCombine DAG
Add `udiv` and `urem` instructions to the DAG post-dominated by `trunc`,
allowing TruncInstCombine to reduce bitwidth of expressions containing these
instructions. It is sufficient to require that all truncated bits of both
operands are zeros: https://alive2.llvm.org/ce/z/yiithn
(`urem` case is identical).

Differential Revision: https://reviews.llvm.org/D109515
2021-09-10 20:29:08 +03:00
Johannes Doerfert d2f206e0af Revert "[OpenMP] Group side-effects to improve guarding efficiency"
This reverts commit ca134c3963.

There seems to be a problem with the tests, investigating now:
  https://lab.llvm.org/buildbot/#/builders/61/builds/14574
2021-09-10 12:24:00 -05:00
Johannes Doerfert d9a8d20827 Revert "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals"
This reverts commit 7dbba3376f.

There seems to be a problem with the tests, investigating now:
  https://lab.llvm.org/buildbot/#/builders/61/builds/14574
2021-09-10 12:23:08 -05:00
Johannes Doerfert 7dbba3376f [GlobalOpt][FIX] Do not embed initializers into AS!=0 globals
Not all address spaces support initializers for globals and we can
therefore not set them without checking if they are allowed. This
patch adds a hook into TTI to check if an AS allows non-undef
initializers. We disable it for all but address space 0 by default,
NVPTX and AMDGPU targets allow all but address space 3.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D109337
2021-09-10 12:08:50 -05:00
Johannes Doerfert ca134c3963 [OpenMP] Group side-effects to improve guarding efficiency
When we guard side-effects as part of SPMDzation we do it for
consecutive instructions that need guarding. This patch will try to
reorder guarded side-effects in a block to decrease the number of
guarded regions we need. It does not use any smarts, e.g., alias
analysis, to move side-effects over non-interfering reads. Instead,
it only moves side-effects downwards to the next guarded side-effect
if there was nothing in between that could have possibly be affected.

Reviewed By: ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D109070
2021-09-10 12:08:48 -05:00
David Green deefeffb5d [ARM] Remove unused tblgen arguments. NFC
As per D109359, this removes or makes use of some of the existing unused
NEON and base ARM tblgn arguments.
2021-09-10 18:03:54 +01:00
Nikita Popov 14afbe9448 [CallLowering] Support opaque pointers
Always use the byval/inalloca/preallocated type (which is required
nowadays), don't fall back on the pointer element type.

This requires adding Function::getParamPreallocatedType() to
mirror the CallBase API, so that the templated code can work with
both.
2021-09-10 18:32:12 +02:00
Nikita Popov d34d2bbe5d [IR] Remove unused parameter (NFC) 2021-09-10 18:16:22 +02:00
Craig Topper 1b736bda3b [RISCV] Enable CGP to sink splat operands of Add/Sub/Mul/Shl/LShr/AShr
LICM may have pulled out a splat, but with .vx instructions we
can fold it into an operation.

This patch enables CGP to reverse the LICM transform and move the
splat back into the loop.

I've started with the commutable integer operations and shifts, but we can
extend this with more operations in future patches.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109394
2021-09-10 09:04:01 -07:00
Craig Topper 6c7cadb8c1 [RISCV] Teach vsetvli insertion that stores don't use the policy bits in vtype.
This can avoid a vsetvl after a tail undisturbed operation.

Differential Revision: https://reviews.llvm.org/D109549
2021-09-10 09:03:20 -07:00
David Green 6b7cdb40da [ARM] Remove unused tblgen arguments. NFCI
As per D109359, this removes or makes use of some of the existing unused
MVE tblgn arguments.
2021-09-10 15:06:31 +01:00
Sam Clegg e4b2f3054a [WebAssembly][libObject] Avoid re-use of Section object during parsing
The re-use of this struct across iterations of the loop was causing
fields (specifically Name) to be incorrectly shared between multiple
sections.

Differential Revision: https://reviews.llvm.org/D108984
2021-09-10 09:30:50 -04:00
Nikita Popov 90ec6dff86 [OpaquePtr] Forbid mixing typed and opaque pointers
Currently, opaque pointers are supported in two forms: The
-force-opaque-pointers mode, where all pointers are opaque and
typed pointers do not exist. And as a simple ptr type that can
coexist with typed pointers.

This patch removes support for the mixed mode. You either get
typed pointers, or you get opaque pointers, but not both. In the
(current) default mode, using ptr is forbidden. In -opaque-pointers
mode, all pointers are opaque.

The motivation here is that the mixed mode introduces additional
issues that don't exist in fully opaque mode. D105155 is an example
of a design problem. Looking at D109259, it would probably need
additional work to support mixed mode (e.g. to generate GEPs for
typed base but opaque result). Mixed mode will also end up
inserting many casts between i8* and ptr, which would require
significant additional work to consistently avoid.

I don't think the mixed mode is particularly valuable, as it
doesn't align with our end goal. The only thing I've found it to
be moderately useful for is adding some opaque pointer tests in
between typed pointer tests, but I think we can live without that.

Differential Revision: https://reviews.llvm.org/D109290
2021-09-10 15:18:23 +02:00
Sander de Smalen ec7d8d5069 [SelectionDAG] PromoteIntRes_EXTRACT_SUBVECTOR for scalable vectors (widening).
This patch implements legalization of EXTRACT_SUBVECTOR for the case
where the result needs promoting, and the input type requires widening.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109509
2021-09-10 13:29:26 +01:00
Sander de Smalen 801a745dd2 [SelectionDAG] PromoteIntRes_EXTRACT_SUBVECTOR for scalable vectors.
This patch implements legalization of EXTRACT_SUBVECTOR for the case
where the result needs promoting, and the input type is either legal
or requires splitting.

The idea is that the operation is broken down into simpler steps,
by first extracting a smaller subvector until the input vector
becomes legal or requires promotion.

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D109313
2021-09-10 13:29:26 +01:00
Sjoerd Meijer 6a076fa953 [LoopFlatten] Make the analysis more robust after IV widening
LoopFlatten wasn't triggering on this motivating case after IV widening:

  void foo(int *A, int N, int M) {
    for (int i = 0; i < N; ++i)
      for (int j = 0; j < M; ++j)
        f(A[i*M+j]);
  }

The reason was that the old induction phi nodes were getting in the way. These
narrow and dead induction phis are not always trivially dead, and having both
the narrow and wide IVs confused the analysis and caused it to bail. This adds
some extra bookkeeping for these old phis, so we can filter them out when
checks on phi nodes are performed. Other clean up passes will get rid of these
old phis and increment instructions.

As this was one of the motivating examples from the beginning, it was
surprising this wasn't triggering from C/C++ code. It looks like the IR and CFG
is just slightly different.

Differential Revision: https://reviews.llvm.org/D109309
2021-09-10 12:34:04 +01:00
Rosie Sumpter 9d1bea9c88 [SVE][LoopVectorize] Optimise code generated by widenPHIInstruction
For SVE, when scalarising the PHI instruction the whole vector part is
generated as opposed to creating instructions for each lane for fixed-
width vectors. However, in some cases the lane values may be needed
later (e.g for a load instruction) so we still need to calculate
these values to avoid extractelement being called on the vector part.

Differential Revision: https://reviews.llvm.org/D109445
2021-09-10 11:58:04 +01:00
Serge Bazanski 788e7b3b8c [Lanai] implement wide immediate support
This fixes LanaiTTIImpl::getIntImmCost to return valid costs for i128
(and wider) values. Previously any immediate wider than
64 bits would cause Lanai llc to crash.

A regression test is also added that exercises this functionality.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D107091
2021-09-10 10:54:43 +00:00
Serge Bazanski 231bfaab31 [Lanai] fix MC / objdump
D78776 removed is{Call,Branch,UnconditionalBranch} guards in objdump
before calling MCInstrAnalysis::evaluateBranch. This is fine for other
architectures as they gracefully handle evaluateBranch being called on
non-branches. However, the Lanai MCInstrAnalysis implementation didn't
and that change caused it to crash.

This inserts the same guards back into Lanai's evaluateBranch
implementation and adds a smoke test that exercises `llc | objdump` so
this kind of regression is hopefully caught next time.

Reviewed By: jpienaar, MaskRay

Differential Revision: https://reviews.llvm.org/D107593
2021-09-10 10:46:13 +00:00
Florian Hahn 5d1a6d0d1a
[ARM] Remove unnecessary use of replaceSymbolicStrideSCEV (NFC).
When passing an empty strides map, there's nothing to replace for
replaceSymbolicStrideSCEV and it just returns the SCEV for Ptr. There
should be no need to call the function.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D109462
2021-09-10 10:47:26 +02:00
Sjoerd Meijer 4f9217c519 [FuncSpec] Don't specialise call sites that have the MinSize attribute set
The MinSize attribute can be attached to both the callee and the caller
in the callsite. Function specialisation was already skipped for function
declarations (callees) with MinSize. This also skips specialisations for
the callsite when it has MinSize set.

Differential Revision: https://reviews.llvm.org/D109441
2021-09-10 09:01:45 +01:00
Alexey Lapshin 3493540830 [DebugInfo][NFC] Erase capacity in DWARFUnit::clearDIEs().
DWARFUnit::clearDIEs() uses std::vector::shrink_to_fit() to make
capacity of DieArray matched with its size(). The shrink_to_fit()
is not binding request to make capacity match with size().
Thus the memory could still be reserved after DWARFUnit::clearDIEs()
is called. This patch erases capacity when DWARFUnit::clearDIEs() is requested.
So the memory occupied by dies would be freed.

Differential Revision: https://reviews.llvm.org/D109499
2021-09-10 10:07:28 +03:00
Chris Lattner 704a395693 [APInt] Enable APInt to support zero bit integers.
Motivation: APInt not supporting zero bit values leads to
a lot of special cases in various bits of code, particularly
when using APInt as a bit vector (where you want to start with
zero bits and then concat on more.  This is particularly
challenging in the CIRCT project, where the absence of zero-bit
ConstantOp forces duplication of ops and makes instcombine-like
logic far more complicated.

Approach: zero bit integers are weird.  There are two reasonable
approaches: either make it illegal to do general arithmetic on
them (e.g. sign extends), or treat them as as implicitly having
a zero value.  This patch takes the conservative approach, which
enables their use in bitvector applications.

Differential Revision: https://reviews.llvm.org/D109555
2021-09-09 22:43:54 -07:00
hsmahesha 0c28814015 Revert "[AMDGPU] Split entry basic block after alloca instructions."
This reverts commit 98f4713122.

Without any (theoretical/practical) guarantee that all the allocas within
*entry* basic block are clustered together at the beginning of the block,
this patch is doomed to fail. Hence reverting it.
2021-09-10 10:23:51 +05:30
Yonghong Song e52617c31d BPF: change BTF_KIND_TAG format
Previously we have the following binary representation:
    struct bpf_type { name, info, type }
    struct btf_tag { __u32 component_idx; }
If the tag points to a struct/union/var/func type, we will have
   kflag = 1, component_idx = 0
if the tag points to struct/union member or func argument, we will have
   kflag = 0, component_idx = 0, ..., vlen - 1

The above rather makes interface complex to have both kflag and
component needed to determine its legality and index.

This patch simplifies the interface by removing kflag involvement.
   component_idx = (u32)-1 : tag pointing to a type
   component_idx = 0 ... vlen - 1 : tag pointing to a member or argument
and kflag is always 0 and there is no need to check.

Differential Revision: https://reviews.llvm.org/D109560
2021-09-09 19:03:57 -07:00
Zequan Wu 12f80c0bbd [DebugInfo] Emit DW_AT_inline under -g1/-gmlt
Differential Revision: https://reviews.llvm.org/D109554
2021-09-09 18:59:50 -07:00
Matt Arsenault 0197cd0bd4 AMDGPU: Optimize amdgpu-no-* attributes
This allows clobbering a few extra registers in the fixed ABI, and
avoids some workitem ID packing instructions.
2021-09-09 18:24:28 -04:00
Matt Arsenault db4963d080 AMDGPU: Use attributor to propagate uniform-work-group-size
Drop the legacy version in AMDGPUAnnotateKernelFeatures. This has the
side effect of now respecting the linkage, and not changing externally
visible functions.
2021-09-09 18:24:28 -04:00
Matt Arsenault 722b8e0e5a AMDGPU: Invert ABI attribute handling
Previously we assumed all callable functions did not need any
implicitly passed inputs, and added attributes to functions to
indicate when they were necessary. Requiring attributes for
correctness is pretty ugly, and it makes supporting indirect and
external calls more complicated.

This inverts the direction of the attributes, so an undecorated
function is assumed to need all implicit imputs. This enables
AMDGPUAttributor by default to mark when functions are proven to not
need a given input. This strips the equivalent functionality from the
legacy AMDGPUAnnotateKernelFeatures pass.

However, AMDGPUAnnotateKernelFeatures is not fully removed at this
point although it should be in the future. It is still necessary for
the two hacky amdgpu-calls and amdgpu-stack-objects attributes, which
would be better served by a trivial analysis on the IR during
selection. Additionally, AMDGPUAnnotateKernelFeatures still
redundantly handles the uniform-work-group-size attribute to be
removed in a future commit.

At this point when not using -amdgpu-fixed-function-abi, we are still
modifying the ABI based on these newly negated attributes. In the
future, this option will be removed and the locations for implicit
inputs will always be fixed. We will then use the new attributes to
avoid passing the values when unnecessary.
2021-09-09 18:24:28 -04:00
Philip Reames bfa2a81e92 [ScalarEvolution] Add an additional bailout to avoid NOT of pointer.
It's possible in some cases for the LHS to be a pointer where the RHS is not. This isn't directly possible for an icmp, but the analysis mixes up operands of different icmp expressions in some cases.

This does not include a test case as the smallest reduced case we've managed is extremely fragile and unlikely to test anything meaningful in the long term.

Also add an assertion to getNotSCEV() to make tracking down this sort of issue a bit easier in the future.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51787 .

Differential Revision: https://reviews.llvm.org/D109546
2021-09-09 15:19:36 -07:00
Philip Reames eede4846a9 [SCEV] Allow negative steps for LT exit count computation for unsigned comparisons
This bit of code is incredibly suspicious. It allows fully unknown (but potentially negative) steps, but not steps known to be negative. The comment about scev flag inference is worrying, but also not correct to my knowledge.

At best, this might be covering up some related miscompile. However, there's no test in tree for it, the review history doesn't include obvious motivation, and the C++ example doesn't appear to give wrong results when hand translated to IR. I think it's time to remove this and see what falls out.

During review, there were concerns raised about the correctness of the corresponding signed case.  This change was deliberately narrowed to the unsigned case which has been auditted and appears correct for negative values.  We need to get back to the known-negative signed case, but that'll be a future patch if nothing falls out from this one.

Differential Revision: https://reviews.llvm.org/D104140
2021-09-09 14:09:29 -07:00
Amy Kwan 351a0d8a90 [PowerPC] Update PC-Relative Load/Store Patterns to use the refactored Load/Store Implementation
This patch updates the PC-Relative load and store patterns to utilize the
refactored load/store implementation introduced in D93370.

PC-Relative implementation has been added to PPCISelLowering.cpp, and also the
patterns in PPCInstrPrefix.td have been updated and no longer require AddedComplexity.
All existing test cases pass with this update.

Differential Revision: https://reviews.llvm.org/D95116
2021-09-09 15:38:42 -05:00
Craig Topper 9af8f1b18e [SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode.
Soft deprecrate isNullValue/isAllOnesValue and update in tree
callers. This matches the changes to the APInt interface from
D109483.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D109535
2021-09-09 13:28:30 -07:00
Nikita Popov af382b9383 [IR] Handle constant expressions in containsUndefinedElement()
If the constant is a constant expression, then getAggregateElement()
will return null. Guard against this before calling HasFn().
2021-09-09 22:04:12 +02:00
Eli Friedman 8f792707c4 [ScalarEvolution] Fix pointer/int confusion in howManyLessThans.
In general, howManyLessThans doesn't really want to work with pointers
at all; the result is an integer, and the operands of the icmp are
effectively integers.  However, isLoopEntryGuardedByCond doesn't like
extra ptrtoint casts, so the arguments to isLoopEntryGuardedByCond need
to be computed without those casts.

Somehow, the values got mixed up with the recent howManyLessThans
improvements; fix the confused values, and add a better comment to
explain what's happening.

Differential Revision: https://reviews.llvm.org/D109465
2021-09-09 12:38:33 -07:00
Jameson Nash e20f69f612 [Aarch64] Correct register class for pseudo instructions
This constrains the Mov* and similar pseudo instruction to take
GPR64common register classes rather than GPR64. GPR64 includs XZR
which is invalid here, because this pseudo instructions expands
into an adrp/add pair sharing a destination register. XZR is invalid
on add and attempting to encode it will instead increment the stack
pointer causing crashes (downstream report at [1]). The test case
there reproduces on LLVM11, but I do not have a test case that
reaches this code path on main, since it is being masked by
improved dead code elimination introduced in D91513. Nevertheless,
this seems like a good thing to fix in case there are other cases
that dead code elimination doesn't clean up (e.g. if `optnone` is
used and the optimization is skipped).

I think it would be worth auditing uses of GPR64 in pseudo
instructions to see if there are any similar issues, but I do not
have a high enough view of the backend or knowledge of the
Aarch64 architecture to do this quickly.

[1] https://github.com/JuliaLang/julia/issues/39818

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D97435
2021-09-09 14:31:49 -04:00
Artem Belevich d99a83b4e5 [NVPTX] Simplify and generalize constant printer.
This allows handling i128 values and fixes
https://bugs.llvm.org/show_bug.cgi?id=51789.

Differential Revision: https://reviews.llvm.org/D109458
2021-09-09 11:30:19 -07:00
Craig Topper 517728fe1e [SelectionDAG] Use DAG.getNOT to further simplify some code. NFC
Followup to D109483
2021-09-09 10:53:39 -07:00
Nick Desaulniers e69d402088 [NFC] rename member of BitTestBlock and JumpTableHeader
Follow up to suggestions in D109103 via hans:
  I think UnreachableDefault (or UnreachableFallthrough) would be a
  better name now, since it doesn't just omit the range check, it also
  omits the last bit test.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D109455
2021-09-09 10:43:00 -07:00
Chris Lattner d51da74889 [CodeGen] Use DAG.getAllOnesConstant where possible to simplify code. NFC. 2021-09-09 10:22:51 -07:00
Craig Topper 124bcc1a13 [X86] Disable muloti4 libcalls for x86-64.
This library function only exists in compiler-rt not libgcc. So
this would fail to link unless we were linking with compiler-rt.

This is consistent with the recent removal of calls to mulodi4 on
32-bit targets like D108928.

I suppose maybe we could keep the libcalls for platforms like
Darwin that use compiler-rt exclusively?

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D109385
2021-09-09 10:03:15 -07:00
Chris Lattner 735f46715d [APInt] Normalize naming on keep constructors / predicate methods.
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`.  This achieves two things:

1) This starts standardizing predicates across the LLVM codebase,
   following (in this case) ConstantInt.  The word "Value" doesn't
   convey anything of merit, and is missing in some of the other things.

2) Calling an integer "null" doesn't make any sense.  The original sin
   here is mine and I've regretted it for years.  This moves us to calling
   it "zero" instead, which is correct!

APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go.  As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.

Included in this patch are changes to a bunch of the codebase, but there are
more.  We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.

Differential Revision: https://reviews.llvm.org/D109483
2021-09-09 09:50:24 -07:00
Neumann Hon 0782e55c26 [SystemZ] [NFC] Add SystemZELFFrameLowering and SystemZXPLINKFrameLowering classes.
This patch adds class SystemZFrameLowering which is a SystemZ-specific class
detailing special registers used by calling conventions on the target.
SystemZELFFrameLowering and SystemZXPLINKFrameLowering implement this class
for ELF and XPLINK64 respectively. Previous functionality in SystemZFrameLowering
is moved to SystemZELFFrameLowering. SystemZXPLINKFrameLowering can then be
implemented in future patches.

Reviewed By: uweigand, Kai

Differential Revision: https://reviews.llvm.org/D108777
2021-09-09 12:23:40 -04:00
Kazu Hirata 92c9ff6d5f [IR, Transforms] Use arg_empty (NFC) 2021-09-09 08:50:10 -07:00
Sam Clegg 44177e5fb2 [WebAssembly] Add explict TLS symbol flag
As before we maintain backwards compat with older object files
by also infering the TLS flag based on the name of the segment.

This change is was split out from https://reviews.llvm.org/D108877.

Differential Revision: https://reviews.llvm.org/D109426
2021-09-09 10:03:30 -04:00
Sanjay Patel 97a4e7b7ff [InstCombine] remove a buggy set of zext-icmp transforms
The motivating case is an infinite loop shown with a reduced test from:
https://llvm.org/PR51762

To solve this, I'm proposing we delete the most obviously broken part of this code.

The bug example shows a fundamental problem: we ask computeKnownBits if a transform
will be profitable, alter the code by creating new instructions, then rely on
computeKnownBits to return the same answer to actually eliminate instructions.

But there's no guarantee that the results will be the same between the 1st and 2nd
calls. In the infinite loop example, we get different answers, so we add
instructions that conflict with some other transform, and we're stuck.

There's at least one other problem visible in the test diff for
`@zext_or_masked_bit_test_uses`: the code doesn't check uses properly, so we can
end up with extra instructions created.

Last, it's not clear if this set of transforms actually improves analysis or
codegen. I spot-checked a few targets and don't see a clear win:
https://godbolt.org/z/x87EWovso

If we do see a regression from this change, codegen seems like the right place to
add a cmp -> bit-hack fold.

If this is too big of a step, we could limit the computeKnownBits calls by not
passing a context instruction and/or limiting the recursion. I checked that those
would stop the infinite loop for PR51762, but that won't guarantee that some other
example does not fall into the same loop.

Differential Revision: https://reviews.llvm.org/D109440
2021-09-09 08:49:39 -04:00
Florian Mayer 6e12c73316 [NFC] [stack-safety] add placeholder addRange.
This is in preparataion of D108457.
2021-09-09 13:13:18 +01:00
Florian Mayer d261d4cf55 [stack-safety] [NFC] do not terminate print with blank line. 2021-09-09 12:31:09 +01:00
Florian Mayer 08b4dd8b24 [NFC] [stack-safety] remove unused return value. 2021-09-09 12:19:47 +01:00
Simon Pilgrim c31a202233 [X86][AVX] Add missing X86ISD::VBROADCAST(v2f64 -> v4f64) isel pattern for AVX1 targets
As discussed on the ticket, I'm intending to add additional 128->256 patterns when we have test coverage, but this addresses a known crash.

Differential Revision: https://reviews.llvm.org/D109434
2021-09-09 12:16:23 +01:00
Bradley Smith 8089f9ed5a [AArch64][SVE] Add missing patterns for unpredicated subr intrinsics
Differential Revision: https://reviews.llvm.org/D109369
2021-09-09 10:28:37 +00:00
Alfonso Sánchez-Beato b33fd31772 [yaml2obj][COFF] Allow variable number of directories
Allow variable number of directories, as allowed by the
specification. NumberOfRvaAndSize will default to 16 if not specified,
as in the past.

Reviewed by: jhenderson

Differential Revision: https://reviews.llvm.org/D108825
2021-09-09 11:16:56 +01:00
Sjoerd Meijer ecff9e3da5 [FuncSpec] Fixed minor formatting issues. NFC. 2021-09-09 10:36:54 +01:00
Roman Lebedev 909cba9699
[SimplifyCFG] performBranchToCommonDestFolding(): require block-closed SSA form for bonus instructions (PR51125)
I can't seem to wrap my head around the proper fix here,
we should be fine without this requirement, iff we can form this form,
but the naive attempt (https://reviews.llvm.org/D106317) has failed.
So just to unblock the release, put up a restriction.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51125
2021-09-09 12:28:09 +03:00
Jun Ma 8ba2adcf9e Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""
Differential Revision: https://reviews.llvm.org/D106056
2021-09-09 16:53:33 +08:00
Cullen Rhodes d42f76fd36 [AArch64][SVE] NFC: Remove unused template args
For sve_fp_3op_p_zds_zx we have zero patterns downstream but the
intrinsic args can be added again if/when the patterns are implemented.

Identified in D109359.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D109429
2021-09-09 07:10:57 +00:00
Cullen Rhodes 5b848a35d2 [AArch64][SVE] NFC: Use stepvector directly in index multiclasses
Also fixes a couple of warnings identified in D109359:

  SVEInstrFormats.td:5099:59: warning: unused template argument: sve_int_index_ri::step_vector
  SVEInstrFormats.td:5133:59: warning: unused template argument: sve_int_index_rr::step_vector

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D109422
2021-09-09 07:10:57 +00:00
Alexander Pivovarov 4bc8dbe0ca [RISCV] Add SiFive cores E and S series
Add SiFive cores E20, E21, E24, E34, S21, S54 and S76

Differential Revision: https://reviews.llvm.org/D109260
2021-09-08 23:59:04 -07:00
Yvan Roux 261cbe98c3 [RISCV] Fix Machine Outliner jump table handling.
Don't outline machine instructions which are using jump table indexes
since they are materialized as local labels (like the already handled
case of constant pools).

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D109436
2021-09-09 07:32:30 +02:00
Chris Lattner 9e46dd965a [APInt.h] Reduce the APInt header file interface a bit. NFC
This moves one mid-size function out of line, inlines the
trivial tcAnd/tcOr/tcXor/tcComplement methods into their only
caller, and moves the magic/umagic functions into SelectionDAG
since they are implementation details of its algorithm. This
also removes the unit tests for magic, but these are already
tested in the divide lowering logic for various targets.

This also upgrades some C style comments to C++.

Differential Revision: https://reviews.llvm.org/D109476
2021-09-08 18:17:07 -07:00
Jessica Paquette 22a64d4a14 [MachineOutliner][AArch64] Ensure LR is live-in when inserting reg-save calls
Similar to other code which handles creating the function frame.

If LR isn't live-in to the block that we're inserting the call into, we'll get
a MachineVerifier error.
2021-09-08 17:44:27 -07:00
Amara Emerson eae44c8a86 [GlobalISel] Implement merging of stores of truncates.
This is a port of a combine which matches a pattern where a wide type scalar
value is stored by several narrow stores. It folds it into a single store or
a BSWAP and a store if the targets supports it.

Assuming little endian target:
 i8 *p = ...
 i32 val = ...
 p[0] = (val >> 0) & 0xFF;
 p[1] = (val >> 8) & 0xFF;
 p[2] = (val >> 16) & 0xFF;
 p[3] = (val >> 24) & 0xFF;
=>
 *((i32)p) = val;

On CTMark AArch64 -Os this results in a good amount of savings:

Program            before        after       diff
             SPASS 412792       412788       -0.0%
                kc 432528       432512       -0.0%
            lencod 430112       430096       -0.0%
  consumer-typeset 419156       419128       -0.0%
            bullet 475840       475752       -0.0%
        tramp3d-v4 367760       367628       -0.0%
          clamscan 383388       383204       -0.0%
    pairlocalalign 249764       249476       -0.1%
    7zip-benchmark 570100       568860       -0.2%
           sqlite3 287628       286920       -0.2%
Geomean difference                           -0.1%

Differential Revision: https://reviews.llvm.org/D109419
2021-09-08 17:06:33 -07:00
Philip Reames e741fabc22 [SCEV] Move getIndexExpressionsFromGEP to delinearize [NFC] 2021-09-08 16:56:49 -07:00
Philip Reames 4b5e260b1d [SCEV] Simplify findExistingSCEVInCache interface [NFC]
We were returning a tuple when all but one caller only cared about one piece of the return value.  That one caller can inline the complexity, and we can simplify all other uses.
2021-09-08 15:26:07 -07:00
Andrew Litteken 144cd22bae [CodeExtractor] Creating exit stubs based off original order branch instructions.
Previously the CodeExtractor created exit stubs, and the subsequent return value of the outlined function based on the order of out-of-region blocks after splitting any phi nodes, and collecting the blocks to be outlined. This could cause differences in order if there was a difference of exit block phi nodes between the two regions. This patch moves the collection of the output target blocks to be before this occurs, so that the assignment of target block to output value will be the same, regardless of the contents of the output block.

Reviewers: paquette, roelofs

Differential Revision: https://reviews.llvm.org/D108657
2021-09-08 15:15:15 -07:00
Arthur Eubanks fe15347a1e Port the cost model printer to New PM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109284
2021-09-08 14:47:05 -07:00
Craig Topper a574f0e0c3 [RISCV] Disable use of i128 shift libcalls on RV32.
Since i128 isn't a legal C type on RV32, I don't believe
libgcc implements these functions for RV32. compiler-rt
does implement them because i128 support is enabled
in order to handle long double.

This is consistent with 32-bit X86 and ARM.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D109383
2021-09-08 14:26:07 -07:00
Michael Kruse 088577a38e [Delinerization] Require by offset to be zero.
Users of delinearization assume that the the offset into the array element is zero. In most cases it will indeed be zero, but if it is not, the delinearization has to fail since it violates that assumption without the API even allowing to signal to the caller that the by offset is non-zero.

This bug caused Polly to miscompile blender (526.blender_r from SPEC CPU 2017) in -polly-process-unprofitable mode. The SCEV expression incorrectly delinearized has been reduced in the test case byte_offset.ll. The dropped offset into the array element of size 4 (a float) is ((sext i32 %mul7.i4534 to i64) + {(sext i32 %i1 to i64),+,((sext i32 (1 + ((1 + %shl.i.i) * (1 + %shl.i.i)) + %shl.i.i) to i64) * (sext i32 %i1 to i64))}<%for.body703>). This significant component was just dropped, and the wrong pointer was computed when regenerating code from the remaining delinearized subscripts. This occurred during blender's subsurface scattering implementation. As a result, blender's rendering diverged from the reference image.

Patch D108885 would also fix the API.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D109133
2021-09-08 16:02:37 -05:00
Greg Clayton 14850a0628 Log to the right stream in DwarfTransformer::handleDie().
Since we might end up using multiple threads when logging information in the DWARFTransformer, the handleDie() method must use the supplied stream named "OS" when logging warnings and errors. When we use multiple threads, we log to a thread specific stream buffer and then use a mutex to ensure our output doesn't overlap when we emit warnings and errors after a thread is done.

Differential Revision: https://reviews.llvm.org/D109401
2021-09-08 14:00:19 -07:00
Florian Hahn f4726e7238
[LAA] Remove unused OrigPtr from replaceSymbolicStrideSCEV (NFC).
The OrigPtr argument is not used in tree.
2021-09-08 22:35:36 +02:00
Nikita Popov 6dfdc6bfd2 [SROA] Support opaque pointers
Make the following changes in order to support opaque pointers in SROA:

 * Generate i8 GEPs for opaque pointers.
 * Explicitly enforce that promotable allocas only have stores of
   the alloca type -- previously this was implicitly enforced.
 * Replace a check for pointer element type with load/store type.

Differential Revision: https://reviews.llvm.org/D109259
2021-09-08 22:25:44 +02:00
Arthur Eubanks b493124ae2 [MemorySSA] Support invariant.group metadata
The implementation is mostly copied from MemDepAnalysis. We want to look
at all loads and stores to the same pointer operand. Bitcasts and zero
GEPs of a pointer are considered the same pointer value. We choose the
most dominating instruction.

Since updating MemorySSA with invariant.group is non-trivial, for now
handling of invariant.group is not cached in any way, so it's part of
the walker. The number of loads/stores with invariant.group is small for
now anyway. We can revisit if this actually noticeably affects compile
times.

To avoid invariant.group affecting optimized uses, we need to have
optimizeUsesInBlock() not use invariant.group in any way.

Co-authored-by: Piotr Padlewski <prazek@google.com>

Reviewed By: asbirlea, nikic, Prazek

Differential Revision: https://reviews.llvm.org/D109134
2021-09-08 13:06:12 -07:00
Philip Reames 585c594d74 Move delinearization logic out of SCEV [NFC]
None of this logic has anything to do with SCEV's internals, it just uses the existing public APIs.  As a result, we can move the code from ScalarEvolution.cpp/hpp to Delinearization.cpp/hpp with only minor changes.

This was discussed in advance on today's loop opt call.  It turned out to be easy as hoped.
2021-09-08 12:28:35 -07:00
Nikita Popov 3e54de4df2 [ConstantHoisting] Support opaque pointers
Directly use i8 for GEP, rather than fetching element type of i8*.
2021-09-08 21:23:10 +02:00
Akira Hatanaka dea6f71af0 [ObjC][ARC] Use the addresses of the ARC runtime functions instead of
integer 0/1 for the operand of bundle "clang.arc.attachedcall"

https://reviews.llvm.org/D102996 changes the operand of bundle
"clang.arc.attachedcall". This patch makes changes to llvm that are
needed to handle the new IR.

This should make it easier to understand what the IR is doing and also
simplify some of the passes as they no longer have to translate the
integer values to the runtime functions.

Differential Revision: https://reviews.llvm.org/D103000
2021-09-08 11:58:03 -07:00
Andrew Litteken 0087bb4a9a [IROutliner] Using canonical values to find corresponding values. (NFC)
D104143 introduced canonical value numbering between regions, which allows for the easy identification of items across a region, eliminating the need in the outliner to create parallel lists of instructions for each region, and replace output values in a less convoluted way.

Additionally, in a future commit, the output values will not necessarily be recorded values from the region itself, it could be a combination value where the actual value being output is a PHINode instead.  This new method allows us to handle the replacement of the output value to the stored value with the corresponding item in the same place for both normal output values, and PHINode outputs instead of handling the different types of outputs in different locations.

Reviewers: paquette, roelofs

Differential Revision: https://reviews.llvm.org/D108656
2021-09-08 11:36:05 -07:00
Joseph Huber 6b9a3ec3a2 [OpenMP] Do not SPMDize generic regions with no parallel
This patch changes SPMDization to not trigger for regions with no
parallelism. Otherwise, this will introduce unnecessary barriers that
will slow the single-threaded region down.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109438
2021-09-08 14:33:15 -04:00
Nick Desaulniers 4331f19d8b [ISEL][BitTestBlock] omit additional bit test when default destination is unreachable
Otherwise we end up with an extra conditional jump, following by an
unconditional jump off the end of a function. ie.

  bb.0:
    BT32rr ..
    JCC_1 %bb.4 ...
  bb.1:
    BT32rr ..
    JCC_1 %bb.2 ...
    JMP_1 %bb.3
  bb.2:
    ...
  bb.3.unreachable:
  bb.4:
    ...

  Should be equivalent to:
  bb.0:
    BT32rr ..
    JCC_1 %bb.4 ...
    JMP_1 %bb.2
  bb.1:
  bb.2:
    ...
  bb.3.unreachable:
  bb.4:
    ...

This can occur since at the higher level IR (Instruction) SwitchInsts
are required to have BBs for default destinations, even when it can be
deduced that such BBs are unreachable.

For most programs, this isn't an issue, just wasted instructions since the
unreachable has been statically proven.

The x86_64 Linux kernel when built with CONFIG_LTO_CLANG_THIN=y fails to
boot though once D106056 is re-applied.  D106056 makes it more likely
that correlation-propagation (CVP) can deduce that the default case of
SwitchInsts are unreachable. The x86_64 kernel uses a binary post
processor called objtool, which emits this warning:

vmlinux.o: warning: objtool: cfg80211_edmg_chandef_valid()+0x169: can't
find jump dest instruction at .text.cfg80211_edmg_chandef_valid+0x17b

I haven't debugged precisely why this causes a failure at boot time, but
fixing this very obvious jump off the end of the function fixes the
warning and boot problem.

Link: https://bugs.llvm.org/show_bug.cgi?id=50080
Fixes: https://github.com/ClangBuiltLinux/linux/issues/679
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1440

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D109103
2021-09-08 11:03:47 -07:00
Kirill Stoimenov 3f875134a7 [asan] Fixed the jump to use the 4 byte offset version.
This should have been the 4 byte version in the first place. Unfortunatelly there is no easy way to add a test as both the 1 byte and 4 byte version are printed as 'jmp' in the assembly code.

Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D109453
2021-09-08 17:58:12 +00:00
Wouter van Oortmerssen a99fb86c65 [WebAssembly] Change WebAssemblyMCLowerPrePass to ModulePass
It was a FunctionPass before, which subverted its purpose to collect ALL symbols before MCLowering, depending on how LLVM schedules function passes.
Fixes https://bugs.llvm.org/show_bug.cgi?id=51555

Differential Revision: https://reviews.llvm.org/D109202
2021-09-08 10:47:43 -07:00
Craig Topper aca14c8cf1 [RISCV] Remove unused tablegen template parameters. NFC
Identified in D109359
2021-09-08 10:01:42 -07:00
Craig Topper b04c09c07c [RISCV] Use V0 instead of VMV0: for mask vectors in isel patterns.
This is consistent with the RVV intrinsic patterns. This has been
shown to prevent some "ran out of registers" errors in our internal
testing.

Unfortunately, there are some regressions on LMUL=8 tests in here.
I think the lack of registers with LMUL=8 just makes it very hard
to schedule correctly.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109245
2021-09-08 09:46:21 -07:00
Benjamin Kramer 373b7622c1 [IROutliner] Remove unused variable. NFC. 2021-09-08 18:33:41 +02:00
Roman Lebedev 0852f8706b
[X86] X86DAGToDAGISel::matchBitExtract(): support 'num high bits to clear' pattern
Currently, we only deal with the case where we can match
the number of low bits to be kept, i.e.:
```
x & ((1 << y) - 1)
```
will extract low `y` bits of `x`.

But what will
```
x & (-1 >> y)
```
do?

Logically, it will extract `bitwidth(x) - y` low bits, i.e.:
```
x & ~(-1 << (bitwidth(x)-y))
```
... except we can't do such a transformation in IR in general,
because if we wanted to extract all the bits `(-1 >> 0)` is fine,
but `-1 << bitwidth(x)` would be `poison`: https://alive2.llvm.org/ce/z/BKJZfw,

Yet, here with BMI's BEXTR and BMI2's BZHI we don't have any such problems with edge-cases.
So what we can do is: https://alive2.llvm.org/ce/z/gm5M2B

As briefly discussed with @craig.topper, this appears to be not worse than what we'd end up with currently (a pair of shifts):
* https://godbolt.org/z/nsPb8bejs (direct data dependency, sequential execution)
* https://godbolt.org/z/7bj3zeh1d (no direct data dependency, parallel execution)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107923
2021-09-08 19:27:08 +03:00
Craig Topper 1f16191906 [RISCV] Add an GPR def to the Zvlseg SPILL/RELOAD pseudos
The expansion of these pseudos creates ADD instructions. Those
ADDs modify a GPR so that it is no longer contains the same value
as the input base pointer. Therefore, I believe we should have a
GPR as a Def on these instructions and expansion should get the
destination register for the ADDs from that operand.

At least in our tests here this works out so that register
scavenging picks the same register as the base pointer.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109405
2021-09-08 09:23:33 -07:00
Andrew Litteken c172f1ad39 [IROutliner] Adding supports for multiple exits
When we start outlining across branches, there is the possibility that we will have two different blocks with different output locations, or a single branch that goes to two blocks outside of the region that is being outlined. While the CodeExtractor provides most of the mechanisms by using the return value of the extracted function as the input to a switch statement to correctly branch to the correct location, we need special handling for different output schemas to each location.

This is done by repeating the existing storing scheme for each different exit block. We have a map from the return values used, to the basic block that is used to store the outputs for that particular exit block within the outlined function. Then if needed, we create a switch statement for each return block to branch to the correct set of stored outputs.

Reviewers: paquette

Differential Revision: https://reviews.llvm.org/D106993
2021-09-08 08:58:07 -07:00
Kazu Hirata bcfbb3f9ec [IR] Construct SmallVector with iterator ranges (NFC)
Note that arg_operands has been deprecated in favor of args.
2021-09-08 08:54:15 -07:00
Peter Smith b026ce9c8a [MC] Add Subtarget for MAsmParser call to emitCodeAlignment
The call to emitCodeAlignment was missing a STI which is required
after D45962.

emitCodeAlignment has a default parameter of 0 for MaxBytesToEmit.
Explicitly passing 0 here was interpreted as as nullptr for the STI.
This could possibly be avoided by taking STI as a const reference in
emitCodeAlignment.

Differential Revision: https://reviews.llvm.org/D109425
2021-09-08 13:28:24 +01:00
Sjoerd Meijer 88a2031207 [FuncSpec] Fix typo in option description. NFC. 2021-09-08 12:58:46 +01:00
David Green d8d24c64fe [DAG] Fix GT -> GE condition when creating SetCC
79845ed6df folded some setcc(ashr) conditions to setcc, but got
the condition for NE incorrect, using GT where it should be using GE.
2021-09-08 12:41:51 +01:00
Evgeny Leviant 93b09a2a5d [LiveDebugValues] Handle spills of indirect debug values correctly
When handling register spill for indirect debug value LiveDebugValues pass doesn't add
DW_OP_deref operator which may in some cases cause debugger to return value address, instead
of value while machine register holding that address is spilled.

Differential revision: https://reviews.llvm.org/D109142
2021-09-08 14:06:08 +03:00
Fraser Cormack 7fb66d4035 [MemCpyOpt] Fix a variety of scalable-type crashes
This patch fixes a variety of crashes resulting from the `MemCpyOptPass`
casting `TypeSize` to a constant integer, whether implicitly or
explicitly.

Since the `MemsetRanges` requires a constant size to work, all but one
of the fixes in this patch simply involve skipping the various
optimizations for scalable types as cleanly as possible.

The optimization of `byval` parameters, however, has been updated to
work on scalable types in theory. In practice, this optimization is only
valid when the length of the `memcpy` is known to be larger than the
scalable type size, which is currently never the case. This could
perhaps be done in the future using the `vscale_range` attribute.

Some implicit casts have been left as they were, under the knowledge
they are only called on aggregate types. These should never be
scalably-sized.

Reviewed By: nikic, tra

Differential Revision: https://reviews.llvm.org/D109329
2021-09-08 11:21:36 +01:00
Fraser Cormack 2c5568a6a9 [LegalizeTypes][VP] Add promotion support for binary VP ops
This patch extends the preliminary support for vector-predicated (VP)
operation legalization to include promotion of illegal integer vector
types.

Integer promotion of binary VP operations is relatively simple and
piggy-backs on the non-VP logic, but passing the two extra mask and VP
operands through to the promoted operation.

Tests have been added to the RISC-V target to cover the basic scenarios
for integer promotion for both fixed- and scalable-vector types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108288
2021-09-08 10:22:57 +01:00
Cullen Rhodes 89786c2b99 [AArch64][SME] Fix imm bug in mov vector to tile aliases
Also fixes a warning mentioned in D109359.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D109363
2021-09-08 07:42:16 +00:00
Sander de Smalen 981f7d563a [AArch64] Implement extract_subvector for predicates.
This patch implements extract_subvector for predicate types when
the input type is more than twice the size of the subvector that
is being extracted.

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D109314
2021-09-08 08:18:34 +01:00
Max Kazantsev 29d054bf12 [SimplifyCFG] Preserve knowledge about guarding condition by adding assume
This improvement adds "assume" after removal of branch basing on UB in successor block.

Consider the following example:

```
pred:
  x = ...
  cond = x > 10
  br cond, bb, other.succ

bb:
  phi [nullptr, pred], ... // other possible preds
  load(phi) // UB if we came from pred

other.succ:
  // here we know that x <= 10, but this knowledge is lost
  // after the branch is turned to unconditional unless we
  // preserve it with assume.
```

If we remove the branch basing on knowledge about UB in a successor block,
then the fact that x <= 10 is other.succ might be lost if this condition is
not inferrable from any dominating condition. To preserve this knowledge, we
can add assume intrinsic with (possibly inverted) branch condition.

Patch by Dmitry Bakunevich!

Differential Revision: https://reviews.llvm.org/D109054
Reviewed By: lebedev.ri
2021-09-08 14:05:17 +07:00
Justin Latimer b0d4d969e2 [AVR] Add support for the tinyAVR 0-series and tinyAVR 1-series
Reviewed By: Dylan McKay, Ben Shi

Differential Revision: https://reviews.llvm.org/D103136
2021-09-08 02:35:26 +00:00
Ben Shi f0460fa4eb [AArch64] Improve target hook function to decide folding (mul (add x, c1), c2)
Prevent the folding if it leads to worse code.

Reviewed By: dmgreen, kda

Differential Revision: https://reviews.llvm.org/D108871
2021-09-08 01:51:26 +00:00
Wang, Pengfei 9d7d34c769 [X86][MS] Fix the aligement mismatch of vector variable arguments on Win32
The alignment of vector variable arguments in callee side is 4, which is
aligned with MSVC. But the caller aligns them to the size of vector
arguments. It results in run fails. This patch fixes this problem by
trimming it to 4 bytes for variable arguments on Win32.

Fixed vector arguments are passed by pointer on Win32. So they don't have
the problem.

I don't find a doc in MSDN for this calling conversion, so I did several
experiments here: https://godbolt.org/z/n1zn1Gx1z

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D108887
2021-09-08 09:26:44 +08:00
Philip Reames 6cdca906c7 [SCEV] Use no-self-wrap flags infered from exit structure to compute trip count
The basic problem being solved is that we largely give up when encountering a trip count involving an IV which is not an addrec. We will fall back to the brute force constant eval, but that doesn't have the information about the fact that we can't cycle back through the same set of values.

There's a high level design question of whether this is the right place to handle this, and if not, where that place is. The major alternative here would be to return a conservative upper bound, and then rely on two invocations of indvars to add the facts to the narrow IV, and then reconstruct SCEV. (I have not implemented the alternative and am not 100% sure this would work out.) That's arguably more in line with existing code, but I find this substantially easier to reason about.  During review, no one expressed a strong opinion, so we went with this one.

Differential Revision: D108651
2021-09-07 17:00:02 -07:00
Heejin Ahn a1d522939c [WebAssembly] Error out on indirect uses of setjmp
Both Wasm & Emscripten SjLj handling has a restriction that `setjmp`
cannot be called indirectly. I thought we have been erroring out on
indirect uses of `setjmp`, but some recent CL disrupted the logic and
we are not erroring out anymore.

We currently
1. Collect functions that contain `setjmp` calls in `SetjmpUsers`. This
   only counts direct calls:
   8f77dc459e/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L869-L878)
2. Run `runSjLjOnFunction` only on those `SetjmpUsers`. Within
   `runSjLjOnFunction`, if we see an indirect use of `setjmp`, we error
   out:
   8f77dc459e/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L1218-L1221)

So if there are only indirect setjmp calls within the module,
`SetjmpUsers` will be empty, and `runSjLjOnFunction` is not even entered
once. And the indirect `setjmp` call will error out at link time. So in
this CL we check for the indirect uses of `setjmp` upfront before we
enter `runSjLjOnFunction`.

Also this currently errors out on `invoke @setjmp`, which can only occur
when using Wasm EH + Wasm SjLj within a function. We recently added Wasm
SjLj support but we don't support using Wasm EH + Wasm SjLj in the same
function yet. We plan to add this support very soon, so I don't think
it's worth creating another test file just for this. (This is an error
test so it needs its own file)

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D109375
2021-09-07 15:52:58 -07:00
Arthur Eubanks 39e2e3bddb [NFC][C API] Make LLVMSetInstrParamAlignment's index param type LLVMAttributeIndex
It's the same as unsigned, but clearer in intent.
2021-09-07 15:13:45 -07:00
Rainer Orth 08ba87fa4b [Support] Implement getMainExecutable on Solaris
Many `flang` tests currently `FAIL` on Solaris because the module files
aren't found.  I could trace this to `sys::fs::getMainExecutable` not being
implemented.

This patch does this and fixes all affected `flang` tests.

Tested on `amd64-pc-solaris2.11`.

Differential Revision: https://reviews.llvm.org/D109374
2021-09-07 22:56:10 +02:00
Philip Reames 9659069978 [SCEV] Further clarify comments regarding UB and zero stride
Follow on to D109029. I realized we had no mention of mustprogrress in the comment (as it prexisted mustprogress in the codebase). In the process of adding it, I tweaked the preconditions into something I think is more clear. Note that mustprogress is checked in the code.

Differential Revision: https://reviews.llvm.org/D109091
2021-09-07 13:53:56 -07:00
Sanjay Patel a3c1669b17 [InstCombine] fold icmp equality with 'or' mask ops
This could go either direction since the instruction
count is the same either way, but there are a few
reasons to prefer this:
1. We already do the related transform with 'and'
   (see just above the new code).
2. We try (too hard) to compensate for not having this
   and possibly other folds in transformZExtICmp(),
   and that leads to bugs like https://llvm.org/PR51762 .
3. Codegen looks better across a variety of targets.

https://alive2.llvm.org/ce/z/uEgn4P
2021-09-07 16:34:00 -04:00
Irina Dobrescu 7023cefe61 [AArch64][Global ISel] Add sext/zext of vector extract improvements
This patch adds improvements for sext/zext of a vector extract in Global
ISel.

For example, this piece of code:

define i64 @si64(<4 x i32> %0, i32 %1) {
  %3 = extractelement <4 x i32> %0, i64 1
  %s = sext i32 %3 to i64
  ret i64 %s
}

Used to have this lowering:
si64:
  mov s0, v0.s[1]
  fmov w8, s0
  sxtw x0, w8
  ret

Whereas this patch makes it lower to this:
si64:
  smov x0, v0.h[0]
  ret

Differential Revision: https://reviews.llvm.org/D108137
2021-09-07 21:17:51 +01:00
Arthur Eubanks 4b05341681 Don't check if the result of hasAttrSomewhere is non-zero in CallBase::getReturnedArgOperand()
Index is 0 when the return value has the returned attribute. But the
return value cannot have the returned attribute, so the check is
pointless.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D109334
2021-09-07 12:05:56 -07:00
Elliot Saba ae8507b0df [X86] Don't clobber EBX in stackprobes
On X86, the stackprobe emission code chooses the `R11D` register, which
is illegal on i686.  This ends up wrapping around to `EBX`, which does
not get properly callee-saved within the stack probing prologue,
clobbering the register for the callers.

We fix this by explicitly using `EAX` as the stack probe register.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D109203
2021-09-07 15:00:44 -04:00
Nikita Popov f5832eaaad [UseListOrder] Fix use list order for function operands
Functions can have a personality function, as well as prefix and
prologue data as additional operands. Unused operands are assigned
a dummy value of i1* null. This patch addresses multiple issues in
use-list order preservation for these:

 * Fix verify-uselistorder to also enumerate the dummy values.
   This means that now use-list order values of these values are
   shuffled even if there is no other mention of i1* null in the
   module. This results in failures of Assembler/call-arg-is-callee.ll,
   Assembler/opaque-ptr.ll and Bitcode/use-list-order2.ll.
 * The use-list order prediction in ValueEnumerator does not take
   into account the fact that a global may use a value more than
   once and leaves uses in the same global effectively unordered.
   We should be comparing the operand number here, as we do for
   the more general case.
 * While we enumerate all operands of a function together (which
   seems sensible to me), the bitcode reader would first resolve
   prefix data for all function, then prologue data for all
   functions, then personality functions for all functions. Change
   this to resolve all operands for a given function together
   instead.

Differential Revision: https://reviews.llvm.org/D109282
2021-09-07 20:59:12 +02:00
Arthur Eubanks 7f54009a1f Add missing overloads for Function::addRetAttr(s) 2021-09-07 11:52:22 -07:00
Nikita Popov 58db5f6e95 [ConstFold] Support opaque pointers in constexpr GEPs
Support opaque pointers in SymbolicallyEvaluateGEP() by using the
value type of a GlobalValue base or falling back to i8 if there
isn't one. We don't unconditionally generate i8 GEPs here because
that would lose inrange attribues, and because some optimizations
on globals currently rely on GEP types (e.g. the globals SROA
mentioned in the comment).

Differential Revision: https://reviews.llvm.org/D109297
2021-09-07 20:50:29 +02:00
Andy Kaylor 34528c32d2 Copy Elementtype Attribute to IR at Link step
Copying IR during linking causes a type mismatch due to the field being missing in IRMover/Valuemapper. Adds the full range of typed attributes including elementtype attribute in the copy functions.

Patch by Chenyang Liu

Differential Revision: https://reviews.llvm.org/D108796
2021-09-07 11:41:43 -07:00
Arthur Eubanks b81fc14f2d [NFC][InstCombine] Make check for sret in a vararg function clearer
We're trying to get the parameter index of sret and see if it's part of
a function's varargs.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D109335
2021-09-07 11:19:27 -07:00
Roman Lebedev 35fa7b8ad8
Reland "[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769)"
This reverts commit 91f7a4fff7,
relanding commit 13ec913bdf.

The original commit was reverted because of (essentially)
https://bugs.llvm.org/show_bug.cgi?id=35922
which has now been addressed by d0eeb64be5.
2021-09-07 21:03:52 +03:00
Nick Desaulniers d0eeb64be5 [X86ISelLowering] avoid emitting libcalls to __mulodi4()
Similar to D108842, D108844, and D108926.

__has_builtin(builtin_mul_overflow) returns true for 32b x86 targets,
but Clang is deferring to compiler RT when encountering long long types.
This breaks ARCH=i386 + CONFIG_BLK_DEV_NBD=y builds of the Linux kernel
that are using builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support these builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://bugs.llvm.org/show_bug.cgi?id=35922
Link: https://github.com/ClangBuiltLinux/linux/issues/1438

Reviewed By: lebedev.ri, RKSimon

Differential Revision: https://reviews.llvm.org/D108928
2021-09-07 10:44:54 -07:00
Simon Pilgrim 9eda472112 [X86] X86InstrAVX512.td - remove unused template parameters. NFC.
Identified in D109359
2021-09-07 17:38:20 +01:00
Kazu Hirata 5648f7170e [Analysis, Target, Transforms] Construct SmallVector with iterator ranges (NFC) 2021-09-07 09:19:33 -07:00
Kazu Hirata 5c6338de16 [RISCV] Fix "set but not used" warnings 2021-09-07 09:19:31 -07:00
Dávid Bolvanský 3b5f318f5d [InstCombine] ror/rol(X, RotAmt) == C --> X == rol/ror(C, RotAmt) (PR51567)
```
----------------------------------------
define i1 @src(i32 %0) {
%1:
  %2 = fshl i32 %0, i32 %0, i32 25
  %3 = icmp eq i32 %2, 5
  ret i1 %3
}
=>
define i1 @tgt(i32 %0) {
%1:
  %2 = icmp eq i32 %0, 640
  ret i1 %2
}
Transformation seems to be correct!
```

https://alive2.llvm.org/ce/z/GdY8Jm

Solves PR51567

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D109283
2021-09-07 18:04:58 +02:00
Andrew Litteken 81d3ac0cf2 [IROutliner] Adding outlining for single entry/single exit multiblock regions
Using the similarity found from the IRSimilarity Identifier, we take regions with structural similarity, and deduplicate them into a separate function. The Code Extractor is able to provide most of this functionality.

For simplicity, we start by only outlining regions with a single entry and single exit branch, this reduces the complexity in handling phi nodes outside the region, and handling many sets of outputs for each of the different exit blocks.

Reviewer: paquette

Differential Revision: https://reviews.llvm.org/D106990
2021-09-07 08:51:54 -07:00
Victor Huang 4a226529e2 [PowerPC] Fixed the crash due to early if conversion with fixed CR fields
This patch adds a fix to do early if conversion to select when
conditional branch not using physical register to prevent the crash when
expanding ISEL instruction.

Reviewed By: lei, kamaub, PowerPC

Differential revision: https://reviews.llvm.org/D108302
2021-09-07 10:51:03 -05:00
Simon Pilgrim f8d2cd1428 [X86] Add missing domain to avx512_ord_cmp_sae comis sae patterns
It doesn't appear to be possible to generate this from tests atm, but it matches what we do in sse12_ord_cmp

Fixes unused template arg identified in D109359
2021-09-07 16:20:21 +01:00
Jinsong Ji 042a6564d3 [PowerPC] Guard XSRSP in P8 for FastISel
This is exposed by enabling FastIsel on 64bit AIX.
We are generating XSRSP regardless of the arch,
which may be wrong when -mcpu=pwr7.

The fix is to guard the generation in P8 only.

Reviewed By: qiucf

Differential Revision: https://reviews.llvm.org/D109365
2021-09-07 15:17:51 +00:00
Sander de Smalen bd576e5ac0 [AArch64][SVE] Improve extract_subvector for predicates.
Using PUNPKLO/HI instead of ZIP1/ZIP2, because that avoids
having to generate a predicate with all lanes inactive (PFALSE).

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D109312
2021-09-07 15:49:29 +01:00
Peter Smith e63455d5e0 [MC] Use local MCSubtargetInfo in writeNops
On some architectures such as Arm and X86 the encoding for a nop may
change depending on the subtarget in operation at the time of
encoding. This change replaces the per module MCSubtargetInfo retained
by the targets AsmBackend in favour of passing through the local
MCSubtargetInfo in operation at the time.

On Arm using the architectural NOP instruction can have a performance
benefit on some implementations.

For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to
limit the chances of this causing problems in the future. I've not
done this for other targets such as X86 as there is more frequent use
of the MCSubtargetInfo and it looks to be for stable properties that
we would not expect to vary per function.

This change required threading STI through MCNopsFragment and
MCBoundaryAlignFragment.

I've attempted to take into account the in tree experimental backends.

Differential Revision: https://reviews.llvm.org/D45962
2021-09-07 15:46:19 +01:00
Peter Smith 5e71839f77 [MC] Add MCSubtargetInfo to MCAlignFragment
In preparation for passing the MCSubtargetInfo (STI) through to writeNops
so that it can use the STI in operation at the time, we need to record the
STI in operation when a MCAlignFragment may write nops as padding. The
STI is currently unused, a further patch will pass it through to
writeNops.

There are many places that can create an MCAlignFragment, in most cases
we can find out the STI in operation at the time. In a few places this
isn't possible as we are in initialisation or finalisation, or are
emitting constant pools. When possible I've tried to find the most
appropriate existing fragment to obtain the STI from, when none is
available use the per module STI.

For constant pools we don't actually need to use EmitCodeAlign as the
constant pools are data anyway so falling through into it via an
executable NOP is no better than falling through into data padding.

This is a prerequisite for D45962 which uses the STI to emit the
appropriate NOP for the STI. Which can differ per fragment.

Note that involves an interface change to InitSections. It is now
called initSections and requires a SubtargetInfo as a parameter.

Differential Revision: https://reviews.llvm.org/D45961
2021-09-07 15:46:19 +01:00
Michael Liao 640beb38e7 [amdgpu] Enable selection of `s_cselect_b64`.
Differential Revision: https://reviews.llvm.org/D109159
2021-09-07 10:45:07 -04:00
Mirko Brkusanin 6c4b634da6 [AMDGPU][GlobalISel] Legalize G_MUL for non-standard types
Legalizing G_MUL for non-standard types (like i33) generated an error. Putting
minScalar and maxScalar instead of clampScalar. Also using new rule, instead
of widening to the next power of 2, widen to the next multiple of the passed
argument (32 in this case), so instead of widening i65 to i128, we widen it to
i96.

Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D109228
2021-09-07 16:33:24 +02:00
Mirko Brkusanin 5263bf583a [AMDGPU][GlobalISel] Legalization of G_ROTL and G_ROTR
Add implementation for the legalization of G_ROTL and G_ROTR machine
instructions. They are very similar to funnel shift instructions, the only
difference is funnel shifts have 3 operands, whereas rotate instructions have
two operands, the first being the register that is being rotated and the second
being the number of shifts. The legalization of G_ROTL/G_ROTR is just lowering
them into funnel shift instructions if they are legal.

Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D105347
2021-09-07 16:33:24 +02:00
Simon Pilgrim 0d48ee2774 [X86] X86InstrSSE.td - remove unused template parameters. NFC.
Identified in D109359
2021-09-07 15:13:05 +01:00
Simon Pilgrim b50a60c234 [X86] X86InstrVecCompiler.td - remove unused template parameters. NFC.
Identified in D109359
2021-09-07 14:46:08 +01:00
Simon Pilgrim fb38795062 [X86] X86InstrFMA.td - remove unused template parameters. NFC.
Identified in D109359
2021-09-07 14:46:07 +01:00
Anton Afanasyev d1f9b21677 [AggressiveInstCombine] Add `AssumptionCache` to aggressive instcombine
Add support for @llvm.assume() to TruncInstCombine allowing
optimizations based on these intrinsics while computing known bits.
2021-09-07 16:45:00 +03:00
Anton Afanasyev 8c0a1940c1 [AggresiveInstCombine] Add wrapper calls for `KnownBits` computing
Precommit before `AssumptionCache` adding: reviews.llvm.org/D109141

Differential Revision: https://reviews.llvm.org/D109288
2021-09-07 16:45:00 +03:00
Sander de Smalen 448d47f743 [AArch64][SVE] Implement all-inactive predicate with PFALSE.
Instead of using a WHILE XZR, XZR instruction, just emit a PFALSE.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D109311
2021-09-07 14:29:02 +01:00
Simon Pilgrim 0a07ae6ebf [KnownBits] Add support for X*X self-multiplication
Add KnownBits handling and unit tests for X*X self-multiplication cases which guarantee that bit1 of their results will be zero - see PR48683.

https://alive2.llvm.org/ce/z/NN_eaR

The next step will be to add suitable test coverage so this can be enabled in ValueTracking/DAG/GlobalISel - currently only a single Analysis/ScalarEvolution test is affected.

Differential Revision: https://reviews.llvm.org/D108992
2021-09-07 11:43:45 +01:00
Mirko Brkusanin 36527cbe02 [AMDGPU][GlobalISel] Legalize memcpy family of intrinsics
Legalize G_MEMCPY, G_MEMMOVE, G_MEMSET and G_MEMCPY_INLINE.

Corresponding intrinsics are replaced by a loop that uses loads/stores in
AMDGPULowerIntrinsics pass unless their length is a constant lower then
MemIntrinsicExpandSizeThresholdOpt (default 1024). Any G_MEM* instruction that
reaches legalizer should have a const length argument and should be expanded
into appropriate number of loads + stores.

Differential Revision: https://reviews.llvm.org/D108357
2021-09-07 12:24:07 +02:00
Fraser Cormack a823bdf3ab [RISCV][VP] Custom lower VP_STORE and VP_LOAD
This patch adds support for the vector-predicated `VP_STORE` and
`VP_LOAD` nodes. We do this in the same way we lower `MSTORE` and
`MLOAD`: to regular load/store instructions via intrinsics.

One necessary change was made to `SelectionDAGLegalize` so that
`VP_STORE` nodes' operation actions are taken from the stored "value"
operands, in the same vein as `STORE` or `MSTORE`.

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108999
2021-09-07 10:53:25 +01:00
Fraser Cormack f4dee8cb82 [RISCV][VP] Custom lower VP_SCATTER and VP_GATHER
This patch adds support for the `VP_SCATTER` and `VP_GATHER` nodes by
lowering them to RVV's `vsox`/`vlux` instructions, respectively. This
process is almost identical to the existing `MSCATTER`/`MGATHER` support.

One extra change was made to `SelectionDAGLegalize` so that
`VP_SCATTER`'s operation action is derived from its stored "value"
operand rather than its return type (which is always the chain).

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108987
2021-09-07 10:43:07 +01:00
Andrew Wei da9ed3dc71 [AArch64] Avoid adding duplicate implicit operands when expanding pseudo insts.
When expanding pseudo insts, in order to create a new machine instr, we use BuildMI,
which will add implicit operands by default. And transferImpOps will also copy implicit
operands from old ones. Finally, duplicate implicit operands are added to the same inst.
Sometimes this can cause correctness issues. Like below inst,
    renamable $w18 = nsw SUBSWrr renamable $w30, renamable $w14, implicit-def dead $nzcv
After expanding, it will become
    $w18 = SUBSWrs renamable $w13, renamable $w14, 0, implicit-def $nzcv, implicit-def dead $nzcv
A redundant implicit-def $nzcv is added, but the dead flag is missing.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D109069
2021-09-07 17:11:58 +08:00
luxufan ffcaa80f7e [RuntimeDyld] Don't use bitwise operation on SymbolRef::Type
Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D109292
2021-09-07 16:58:35 +08:00
Ben Shi 63ca9371c7 [ARM] Implement target hook function to decide folding (mul (add x, c1), c2)
Prevent the folding in DAGCombine if it leads to worse code.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D109124
2021-09-07 15:42:43 +08:00
Craig Topper da3ef8b756 [X86] Handle inverted inputs when matching VPTERNLOG from 2 binary ops.
This is a more general version of D109273. Though it doesn't
peek through bitcasts or rearange broadcasts.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D109295
2021-09-06 17:44:52 -07:00
Fangrui Song 76529b4468 [X86] Simplify condition guarding emitCalleeSavedFrameMoves. NFC 2021-09-06 15:54:02 -07:00
Fangrui Song 4f1e410a1b [X86] Simplify two hasFP(F). NFC 2021-09-06 15:47:40 -07:00
Nikita Popov 8d54c8a0c3 [SCEV] Fix applyLoopGuards() with range check idiom (PR51760)
Due to a typo, this replaced %x with umax(C1, umin(C2, %x + C3))
rather than umax(C1, umin(C2, %x)). This didn't make a difference
for the existing tests, because the result is only used for range
calculation, and %x will usually have an unknown starting range,
and the additional offset keeps it unknown. However, if %x already
has a known range, we may compute a result range that is too
small.
2021-09-06 22:22:41 +02:00
Sanjay Patel e1e4bf174b [DAGCombine] Prevent the transform of combine for multi-use operand
The test is based on a miscompile example in:
https://llvm.org/PR51321

Differential Revision: https://reviews.llvm.org/D107692
2021-09-06 15:30:32 -04:00
Andrew Litteken bd4b1b5f6d [IRSim] Adding support for recognizing branch similarity
The current IRSimilarityIdentifier does not try to find similarity across blocks, this patch provides a mechanism to compare two branches against one another, to find similarity across basic blocks, rather than just within them.

This adds a step in the similarity identification process that labels all of the basic blocks so that we can identify the relative branching locations. Within an IRSimilarityCandidate we use these relative locations to determine whether if the branching to other relative locations in the same region is the same between branches. If they are, we consider them similar.

We do not consider the relative location of the branch if the target branch is outside of the region. In this case, both branches must exit to a location outside the region, but the exact relative location does not matter.

Reviewers: paquette, yroux

Differential Revision: https://reviews.llvm.org/D106989
2021-09-06 11:55:38 -07:00
Kazu Hirata 3322354bfc [Support] Qualify auto (NFC)
Identified with readability-qualified-auto.
2021-09-06 09:10:07 -07:00
Jonas Paulsson 118997d8e9 [SelectionDAGBuilder] Bugfix in visitInlineAsm()
In case of a virtual register tied to a phys-def, the register class needs to
be computed. Make sure that this works generally also with fast regalloc by
using TLI.getRegClassFor() whenever possible, and make only the case of
'Untyped' use getMinimalPhysRegClass().

Fixes https://bugs.llvm.org/show_bug.cgi?id=51699.

Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D109291
2021-09-06 17:46:31 +02:00
Sanjay Patel 0d83e72034 [InstCombine] fix infinite loop from shift transform
I'm not sure if there is a better way or another bug
still here, but this is enough to avoid the loop from:
https://llvm.org/PR51657

The test requires multiple blocks and datalayout to
trigger the problem path.
2021-09-06 11:13:39 -04:00
Sanjay Patel c85f450619 [InstCombine] refactor to reduce indent; NFC
This transform should be updated to use better
variable names and code comments. It could
also create the shift-of-shift directly instead
of relying on another combine for that.
2021-09-06 11:13:39 -04:00
Sanjay Patel fbb78668f2 [InstCombine] fix one-use condition for shift transform
This transform is written in a confusing style,
and I suspect it is at fault for a more serious
bug noted in PR51567.

But it's been around forever, so I'm making the
minimal change to fix another bug - it could
increase instructions because it was not checking
uses.
2021-09-06 11:13:39 -04:00
Sanjay Patel 982a15cb3f [InstCombine] early exit to reduce indentation; NFC 2021-09-06 11:13:38 -04:00
Victor Campos 79f9c79aaf [AArch64][MC] Merge FeaturePMU into FeaturePerfMon
FeaturePMU was created in AArch64 to accommodate one missing system
register, PMMIR_EL1, in commit ffcd7698ae.

However, the Performance Monitors extension already had a target
feature, which is called FeaturePerfMon. Therefore, FeaturePMU is
redundant.

This patch removes FeaturePMU and merges its contents into
FeaturePerfMon.

Reviewed By: dnsampaio

Differential Revision: https://reviews.llvm.org/D109246
2021-09-06 14:56:49 +01:00
David Truby b297531ece [AArch64][sve] Prevent incorrect function call on fixed width vector
The isEssentiallyExtractHighSubvector function currently calls
getVectorNumElements on a type that in specific cases might be scalable.
Since this function only has correct behaviour at the moment on scalable
types anyway, the function can just return false when given a fixed type.

Differential Revision: https://reviews.llvm.org/D109163
2021-09-06 14:25:03 +01:00
Sander de Smalen 96f6785bc9 [VectorUtils] Teach findScalarElement to return splat value.
If the vector is a splat of some scalar value, findScalarElement()
can simply return the scalar value if it knows the requested lane
is in the vector.

This is only needed for scalable vectors, because the InsertElement/ShuffleVector
case is already handled explicitly for the fixed-width case.

This helps to recognize an InstCombine fold like:
  extractelt(bitcast(splat(%v))) -> bitcast(%v)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D107254
2021-09-06 10:56:06 +01:00
Tianqing Wang 12fa608af4 [X86] Add CRC32 feature.
d8faf03807 implemented general-regs-only for X86 by disabling all features
with vector instructions. But the CRC32 instruction in SSE4.2 ISA, which uses
only GPRs, also becomes unavailable. This patch adds a CRC32 feature for this
instruction and allows it to be used with general-regs-only.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D105462
2021-09-06 17:24:30 +08:00
Moritz Sichert a0a5964499 [RuntimeDyld] Implemented relocation of TLS symbols in ELF
Differential Revision: https://reviews.llvm.org/D105466
2021-09-06 10:27:43 +02:00
Moritz Sichert f687378603 [RuntimeDyld] Implemented relocation for ELF::R_X86_64_GOTPC32
Differential Revision: https://reviews.llvm.org/D95512
2021-09-06 10:26:37 +02:00
Fangrui Song 0e03450ae4 [AArch64] Remove an uneeded !NeedsWinCFI check. NFC 2021-09-05 21:02:56 -07:00
guopeilin 5f48c144c5 [AArch64][GlobalISel] Use ZExtValue for zext(xor) when invert tb(n)z
Currently, we use SExtValue to decide whether to invert tbz or tbnz.
However, for the case zext (xor x, c), we should use ZExt rather
than SExt otherwise we will generate totally opposite branches.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D108755
2021-09-06 11:12:07 +08:00
David Green 1b83aaaefa [DAG] Remove oneuse check in select_cc setgt X, -1, C, ~C fold
This appears to produce better code, even if the condition may need to
be replicated.
2021-09-05 16:18:31 +01:00
Simon Pilgrim f114ef3731 [CostModel][X86] Add generic costs for vXi32 MUL -> v2Xi16 PMADDDW folds
Based off the improved fold in D108522

This should eventually allow us to replace the SLM only cost patterns with generic versions.
2021-09-05 16:08:11 +01:00
David Green 8523fb96a6 [DAG] Fold select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C
Given a select_cc producing a constant and a invertion of the constant
for a comparison more than zero, we can produce an xor with ashr
instead, which produces smaller code. The ashr either sets all bits or
clear all bits depending on if the value is negative. This is then xor'd
with the constant to optionally negate the value.
https://alive2.llvm.org/ce/z/DTFaBZ

This includes a OneUseCheck on the Cmp, which seems to make thinks a
little worse and will be removed in a followup.

Differential Revision: https://reviews.llvm.org/D109149
2021-09-05 16:04:01 +01:00
David Green 79845ed6df [DAG] Fold setcc eq with ashr to compare to zero.
Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 ->
set_cc setlt X, 0 to prevent some regressions later on when folding
select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C

Differential Revision: https://reviews.llvm.org/D109214
2021-09-05 14:06:47 +01:00
Dávid Bolvanský 9c476172b9 [InstCombine] stpcpy(d,s) -> strcpy(d,s) if the result is not used 2021-09-05 12:12:07 +02:00
Michael Kruse 650bbc5620 [OpenMP][OpenMPIRBuilder] Implement loop unrolling.
Recommit of 707ce34b06. Don't introduce a
dependency to the LLVMPasses component, instead register the required
passes individually.

Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:

 * `unrollLoopFull`
 * `unrollLoopPartial`
 * `unrollLoopHeuristic`

`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.

With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.

Reviewed By: jdoerfert, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107764
2021-09-04 19:18:58 -05:00
Fangrui Song e03c8d309a [AsmPrinter] Remove unneeded MCSubtargetInfo temporary after D14346. NFC
The temporary object was used as a workaround when the target parser may
change STI. D14346 made the MCSubtargetInfo argument to
createMCAsmParser const, so we no longer need the temporary object.
2021-09-04 10:50:10 -07:00
Dávid Bolvanský 3a696f6092 [InstCombine] rotate(X,Z) eq/ne rotate(Y,Z) ---> X eq/ne Y (PR51565)
```

----------------------------------------
define i1 @src(i8 %x, i8 %y, i8 %z) {
%0:
  %f = fshl i8 %x, i8 %x, i8 %z
  %f2 = fshl i8 %y, i8 %y, i8 %z
  %r = icmp eq i8 %f, %f2
  ret i1 %r
}
=>
define i1 @tgt(i8 %x, i8 %y, i8 %z) {
%0:
  %r = icmp eq i8 %x, %y
  ret i1 %r
}
Transformation seems to be correct!

```

https://alive2.llvm.org/ce/z/qAZp8f

Solves PR51565

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D109271
2021-09-04 18:58:44 +02:00
Bjorn Pettersson 0f0344dd1e [SimpleLoopUnswitch] Inform pass manager when child loops are deleted
As part of the nontrivial unswitching we could end up removing child
loops. This patch add a notification to the pass manager when
that happens (using the markLoopAsDeleted callback).

Without this there could be stale LoopAccessAnalysis results cached
in the analysis manager. Those analysis results are cached based on
a Loop* as key. Since the BumpPtrAllocator used to allocate
Loop objects could be resetted between different runs of for
example the loop-distribute pass (running on different functions),
a new Loop object could be created using the same Loop pointer.
And then when requiring the LoopAccessAnalysis for the loop we
got the stale (corrupt) result from the destroyed loop.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D109257
2021-09-04 17:54:39 +02:00
Shivam Gupta 5449d2da65 [NFC] Run clang-format on llvm/lib/Trget/AVR/
The current inconsistency confuse contributors which coding guidlines to follow.
It would be better to have it consistent using clang-format tool.

Reviewed By: mhjacobson

Differential Revision: https://reviews.llvm.org/D109270
2021-09-04 20:05:15 +05:30
Simon Pilgrim cb8d96e72f Fix Wdocumentation unknown parameter warning. NFCI. 2021-09-04 15:06:53 +01:00
Simon Pilgrim 2005ae15a6 [X86][SLM] WriteVecIMul instructions only take 1uop (REAPPLIED)
The xmm variant have half the throughput (and +1cy latency) of the mmx variants, but are still 1uop.

I still need to do more thorough testing of SLM on test-suite before fixing the obvious bad numbers for WritePMULLD.

But this helps the D103695 helper script get to more accurate numbers for vXi32 multiplies of extended operands (i.e. we can use PMADDWD, PMULLW/PMULHW etc). Matches what Intel AoM / Agner / llvm-exegesis reports.
2021-09-04 15:03:56 +01:00
Simon Pilgrim ac51d69208 Revert rG994da657076900f5ad7fe593c3b5e5f89ab3d53d "[X86][SLM] WriteVecIMul instructions only take 1uop"
This changed some codegen tests that I forgot about in my rebase, I'll recommit shortly with a fix.
2021-09-04 13:39:10 +01:00
Simon Pilgrim 994da65707 [X86][SLM] WriteVecIMul instructions only take 1uop
The xmm variant have half the throughput (and +1cy latency) of the mmx variants, but are still 1uop.

I still need to do more thorough testing of SLM on test-suite before fixing the obvious bad numbers for WritePMULLD.

But this helps the D103695 helper script get to more accurate numbers for vXi32 multiplies of extended operands (i.e. we can use PMADDWD, PMULLW/PMULHW etc). Matches what Intel AoM / Agner / llvm-exegesis reports.
2021-09-04 13:21:34 +01:00
Simon Pilgrim c6371020a8 [X86][SLM] RMW instructions don't require an extra uop
For RMW instructions, the load and store hold the MEC for an extra cycle, but within the same single uop. This is alluded to in the Intel AOM:

"The MEC also owns the MEC RSV, which is responsible for scheduling of all loads and stores. Load and
store instructions go through addresses generation phase in program order to avoid on-the-fly memory
ordering later in the pipeline. Therefore, an unknown address will stall younger memory instructions."

Noticed while trying to get a cheap SLM test box up and running with llvm-exegesis - RMW arithmetic is always 1uop - and matches what Agner / InstLatX64 report as well.
2021-09-04 13:21:34 +01:00
Simon Pilgrim da965a77d5 [X86][SLM] Fix MUL uops, latency and throughput
These were all set to the same best case mul i32 values (which seems to be the only version of MUL that SLM actually performs well with).

Noticed while trying to improve multiplication costs for vectorization via the D103695 helper script. Confirmed with Intel AoM / Agner / InstLatX64.
2021-09-04 13:21:34 +01:00
Simon Pilgrim 7d062d2c47 [X86][Atom] MUL/DIV instructions require both ports, not either.
Noticed while trying to improve multiplication costs for vectorization via the D103695 helper script. Confirmed with Intel AoM.
2021-09-04 11:58:09 +01:00
Simon Pilgrim 0d0f39b0f3 [X86][Atom] Add missing UOps override to AtomWriteResPair multiclass
Make it easier to describe microcoded instructions.
2021-09-04 11:58:09 +01:00
Nikita Popov 66a54af967 [WebAssembly] Support opaque pointers in AddMissingPrototypes
The change here is basically the same as in D108880: Rather than
looking at bitcasts, look at calls and their function type. We
still need to look through bitcasts to find those calls.

The change in llvm/test/CodeGen/WebAssembly/add-prototypes-conflict.ll
is due to different visitation order. add-prototypes-opaque-ptrs.ll
is a copy of add-prototypes.ll with -force-opaque-pointers.

Differential Revision: https://reviews.llvm.org/D109256
2021-09-04 11:25:42 +02:00
Kazu Hirata bb51f76fb1 [ForceFunctionAttrs] Add const (NFC) 2021-09-03 22:29:58 -07:00
Kevin Athey c7f50a445e Revert "[AArch64] Implement target hook function to decide folding (mul (add x, c1), c2)"
This reverts commit 095bea23d0.

Broke buildbot: https://lab.llvm.org/buildbot/#/builders/5/builds/11411
2021-09-03 18:08:58 -07:00
Ben Shi 095bea23d0 [AArch64] Implement target hook function to decide folding (mul (add x, c1), c2)
Prevent the folding if it leads to worse code.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D108871
2021-09-04 07:24:23 +08:00
David Blaikie bc066e26c9 DebugInfo: Fix a few bot failures for type dumping fixes 2021-09-03 14:08:58 -07:00
David Blaikie 40f1593558 DebugInfo: Correct/improve type formatting (pointers to function types especially)
This does add some extra superfluous whitespace (eg: "int *") intended
to make the Simplified Template Names work easier - this makes the
DIE-based names match more exactly the clang-generated names, so it's
easier to identify cases that don't generate matching names.

(arguably we could change clang to skip that whitespace or add some
fuzzy matching to accommodate differences in certain whitespace - but
this seemed easier and fairly low-impact)
2021-09-03 12:22:28 -07:00
Sanjay Patel fd807601a7 [InstCombine] fold (rotate X) eq/ne (0/-1)
This generalizes the examples shown in:
https://llvm.org/PR51566

https://alive2.llvm.org/ce/z/V-sEy9
2021-09-03 14:51:35 -04:00
Sanjay Patel d1458903eb [InstCombine] reduce code duplication; NFC 2021-09-03 14:51:35 -04:00
Stanislav Mekhanoshin d0c064715c [AMDGPU] Small cleanup in optimizeCompareInstr. NFC. 2021-09-03 11:31:40 -07:00
David Green adfd12e6d1 [ARM] Add patterns for store(fptosisat(..))
As an extension to D107866, this adds store(fptosisat(..)) patterns,
similar to the existing fptosi patterns, to prevent unnecessarily moving
into gpr regs where we can use fp stores directly.

Differential Revision: https://reviews.llvm.org/D108378
2021-09-03 19:22:11 +01:00
David Green f37e132263 [ARM] Add VFP lowering for fptosi.sat
This extends D107865 to the VFP insructions, lowering llvm.fptosi.sat
and llvm.fptoui.sat to VCVT instructions that inherently perform the
saturate.

Differential Revision: https://reviews.llvm.org/D107866
2021-09-03 18:11:08 +01:00
Craig Topper 75620fadf5 [RISCV] Change how we encode AVL operands in vector pseudoinstructions to use GPRNoX0.
This patch changes the register class to avoid accidentally setting
the AVL operand to X0 through MachineIR optimizations.

There are cases where we really want to use X0, but we can't get that
past the MachineVerifier with the register class as GPRNoX0. So I've
use a 64-bit -1 as a sentinel for X0. All other immediate values should
be uimm5. I convert it to X0 at the earliest possible point in the VSETVLI
insertion pass to avoid touching the rest of the algorithm. In
SelectionDAG lowering I'm using a -1 TargetConstant to hide it from
instruction selection and treat it differently than if the user
used -1. A user -1 should be selected to a register since it doesn't
fit in uimm5.

This is the rest of the changes started in D109110. As mentioned there,
I don't have a failing test from MachineIR optimizations anymore.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109116
2021-09-03 09:19:25 -07:00
David Spickett 02b4620348 [ORC] Static cast more uint64_t to size_t
These instances don't have an obvious way to fail
nicely so I've just asserted they are within range.

Fixes the Arm 32 bit builds.
2021-09-03 12:30:56 +00:00
Max Kazantsev 718157283c [LoopDeletion] Move ICmpInst handling to getValueOnFirstIteration()
As noticed in https://reviews.llvm.org/D105688, it would be great to move
handling of ICmpInst which was in canProveExitOnFirstIteration() to
getValueOnFirstIteration().

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D108978
Reviewed By: reames
2021-09-03 18:36:19 +07:00
Konstantin Schwarz 90d5298759 [GlobalISel] Add convenience constructors to MemDesc
This allows constructing a MemDesc from a MachineMemoryOperand, a pattern that starts to show up more frequently.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D109161
2021-09-03 12:52:18 +02:00
Simon Pilgrim 6ba0b9f68a [X86][SLM] Fix PBLENDVB uops and throughput
SLM PBLENDVB is just as bad as BLENDVPD/PS - so model it as such, fixing the rr vs rm uops diff as well. The Intel AoM appears to have a copy+paste typo with PBLENDW, it doesn't match Agner or InstLatX64.

Noticed while investigating some of the weird discrepancies reported by the D103695 helper script (SLM had much better vector shift throughputs than it should).
2021-09-03 11:31:29 +01:00
gbreynoo e28cd75a50 [OptTable] Reapply Improve error message output for grouped short options
This reapplies 71d7fed3bc which was
reverted by 3e2bd82f02. This change
includes the fix for breaking the sanitizer bots.

As seen in https://bugs.llvm.org/show_bug.cgi?id=48880 the current
implementation for parsing grouped short options can return unclear
error messages. This change fixes the example given in the ticket in
which a flag is incorrectly given an argument. Also when parsing a
group we now keep reading past the first incorrect option and output
errors for all incorrect options in the group.

Differential Revision: https://reviews.llvm.org/D108770
2021-09-03 11:13:52 +01:00
Florian Mayer abf8ed8a82 [hwasan] Support more complicated lifetimes.
This is important as with exceptions enabled, non-POD allocas often have
two lifetime ends: the exception handler, and the normal one.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108365
2021-09-03 10:29:50 +01:00
Stefan Gränitz 2ed91da0f1 [JITLink] Add initial Aarch64 support
Set up basic infrastructure for 64-bit ARM architecture support in JITLink. It allows for loading a minimal object file and resolving a single relocation. Advanced features like GOT and PLT handling or relaxations were intentionally left out for the moment.

This patch follows the idea to keep implementations for ARM (32-bit) and Aaarch64 (64-bit) separate, because:
* it might be easier to share code with the MachO "arm64" JITLink backend
* LLVM has individual targets for ARM and Aaarch64 as well

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D108986
2021-09-03 10:48:06 +02:00
Jingu Kang 562521e2d1 [LoopBoundSplit] Update phi node in exit block
It fixes https://bugs.llvm.org/show_bug.cgi?id=51700

Differential Revision:
2021-09-03 09:10:50 +01:00
Cullen Rhodes dc5dd77ac7 [AArch64][SME] Support NEON vector to GPR integer moves in streaming mode
A small subset of the NEON instruction set is legal in streaming mode.
This patch adds support for the following vector to integer move
instructions:

  0x00 1110 0000 0001 0010 11xx xxxx xxxx # SMOV W|Xd,Vn.B[0]
  0x00 1110 0000 0010 0010 11xx xxxx xxxx # SMOV W|Xd,Vn.H[0]
  0100 1110 0000 0100 0010 11xx xxxx xxxx # SMOV Xd,Vn.S[0]
  0000 1110 0000 0001 0011 11xx xxxx xxxx # UMOV Wd,Vn.B[0]
  0000 1110 0000 0010 0011 11xx xxxx xxxx # UMOV Wd,Vn.H[0]
  0000 1110 0000 0100 0011 11xx xxxx xxxx # UMOV Wd,Vn.S[0]
  0100 1110 0000 1000 0011 11xx xxxx xxxx # UMOV Xd,Vn.D[0]

Only the zero index variants are legal, all others indexes are illegal.
To support this, new instructions are defined specifically for zero
index which is hardcoded, along an implicit 'VectorIndex0' operand.
Since the index operand is implicit and takes no bits in the encoding,
custom decoding is required to add the operand.

I'm not sure if this is the best approach but the predicate constraint
on a subset of an operand is unusual. Would be interested to hear some
alternatives.

The instructions are predicated on 'HasNEONorStreamingSVE', i.e. they're
enabled by either +neon or +streaming-sve. This follows on from the work
in D106272 to support the subset of SVE(2) instructions that are legal
in streaming mode.

Depends on D107902.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D107903
2021-09-03 07:59:17 +00:00
Cullen Rhodes 1dcd900d1d [AArch64][ISel] NFC: DAG.getMachineFunction() -> MF
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D109135
2021-09-03 07:59:17 +00:00
Amara Emerson 6d9505b8e0 [AArch64][GlobalISel] Support for folding G_ROTR as shifted operands.
This allows selection like: eor w0, w1, w2, ror #8

Saves 500 bytes on ClamAV -Os, which is 0.1%.

Differential Revision: https://reviews.llvm.org/D109206
2021-09-02 21:37:24 -07:00
Qiu Chaofan d0f9553ef5 [PowerPC] Enable fast-isel on AIX 64 subtarget
This patch basically enables fast-isel for AIX 64-bit subtarget
(previously enabled only for ELF 64). The initial motivation is to
introduce branch folding to AIX generated code for correct debug
behavior. I also saw some compiling time improvement in a few LLVM
test-suite benchmarks. (toast, dbms, cjpeg, burg, etc.)

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D98844
2021-09-03 11:33:45 +08:00
Chen Zheng 34badc409c Revert "[HardwareLoops] Change order of SCEV expression construction for InitLoopCount."
This causes https://bugs.llvm.org/show_bug.cgi?id=51714 and
is not a right patch according to comments in D91724

This reverts commit 42eaf4fe0a.
2021-09-03 02:55:43 +00:00
Matt Arsenault 79bcd4a7db AMDGPU: Remove FeatureLocalMemorySize0
There's no reason to make this an explicit feature, since it's implied
by the lack of a feature with a size.
2021-09-02 22:43:01 -04:00
Alexander Pivovarov 6cd4b508a8 [RISCV] Add SiFive core S51
Add SiFive core s51 as rv64imac RocketModel

Reviewed-By: MaskRay, evandro
Differential Revision: https://reviews.llvm.org/D108886
2021-09-02 18:45:25 -07:00
PeixinQiao a42380ce83 [OMPIRBuilder] Add ordered directive to OMPBuilder
Add support for ordered directive in the OpenMPIRBuilder.

This patch also modidies clang to use the ordered directive when the
option -fopenmp-enable-irbuilder is enabled.

Also fix one ICE when parsing one canonical for loop with the relational
operator LE or GE in openmp region by replacing unary increment
operation of the expression of the variable "Expr A" minus the variable
"Expr B" (++(Expr A - Expr B)) with binary addition operation of the
experssion of the variable "Expr A" minus the variable "Expr B" and the
expression with constant value "1" (Expr A - Expr B + "1").

Reviewed By: Meinersbur, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107430
2021-09-03 09:37:58 +08:00
Anna Thomas f661ce209f [LoopPredication] Fix MemorySSA crash in predicateLoopExits
The attached testcase crashes without the patch (Not the same accesses
in the same order).

When we move instructions before another instruction, we also need to
update the memory accesses corresponding to it.

Reviewed-By: asbirlea
Differential Revision: https://reviews.llvm.org/D109197
2021-09-02 21:26:07 -04:00
Alexander Pivovarov 1104e3258b Fix typo in RISCVMatInt.cpp comments 2021-09-02 18:11:09 -07:00
Stanislav Mekhanoshin 78fbd1aa3d [AMDGPU] Process any power of 2 in optimizeCompareInstr
Differential Revision: https://reviews.llvm.org/D109201
2021-09-02 17:39:17 -07:00
Xun Li 2cf30c4769 [Coroutines] Only run verifyFunction in debug mode
verifyFunction can be really slow on large functions. This can significantly slow down compilation in production.
Given that coroutine passes are fairly stable now, we should only run it in debug mode.

Differential Revision: https://reviews.llvm.org/D109198
2021-09-02 17:35:01 -07:00
Wenlei He 054487c5b2 [CSSPGO] Honor preinliner decision for ThinLTO importing
When pre-inliner decision is used for CSSPGO, we should take that into account for ThinLTO importing as well, so post-link sample loader inliner can favor that decision. This is handled by a small tweak in this patch. It also includes a change to transfer preinliner decision when merging context.

Differential Revision: https://reviews.llvm.org/D109088
2021-09-02 17:29:26 -07:00
Stanislav Mekhanoshin 2cfda6a691 [AMDGPU] Fold immediates in the optimizeCompareInstr
Peephole works before the first SIFoldOperands so most of
the immediates are in registers.

Differential Revision: https://reviews.llvm.org/D109186
2021-09-02 17:23:26 -07:00
Sam Clegg c32884c482 [WebAssembly] Rename WrapperPIC -> WrapperREL. NFC
This ISD node/wrapper represents am address which is relative to a base
address and therefore lowers to `i32.const` rather than `global.get`.

Use this wrapper type for TLS-relative addresses, paving the way for the
non-REL wrapper to be used to external TLS address once those are
supported.

Differential Revision: https://reviews.llvm.org/D109179
2021-09-02 20:04:34 -04:00
Philip Reames fa82a3d016 [runtimeunroll] Support epilogue unrolling with a parent loop
This patch adds support for unrolling inner loops using epilogue unrolling. The basic issue is that the original latch exit block of the inner loop could be outside the outer loop.  When we clone the inner loop and split the latch exit, the cloned blocks need to be in the outer loop.

Differential Revision: https://reviews.llvm.org/D108476
2021-09-02 16:29:20 -07:00
Philip Reames 45c672e20d [runtimeunroll] Under EXPENSIVE_CHECKS, validate loop info
Requested in review comment on D108476
2021-09-02 16:28:46 -07:00
Lang Hames dad60f8071 [ORC] Add EPCGenericJITLinkMemoryManager: memory management via EPC calls.
All ExecutorProcessControl subclasses must provide a JITLinkMemoryManager object
that can be used to allocate memory in the executor process. The
EPCGenericJITLinkMemoryManager class provides an off-the-shelf
JITLinkMemoryManager implementation for JITs that do not need (or cannot
provide) a specialized JITLinkMemoryManager implementation. This simplifies the
process of creating new ExecutorProcessControl implementations.
2021-09-03 08:28:29 +10:00
Jessica Paquette 844d8e0337 [GlobalISel] Combine icmp eq/ne x, 0/1 -> x when x == 0 or 1
This adds the following combines:

```
x = ... 0 or 1
c = icmp eq x, 1

->

c = x
```

and

```
x = ... 0 or 1
c = icmp ne x, 0

->

c = x
```

When the target's true value for the relevant types is 1.

This showed up in the following situation:

https://godbolt.org/z/M5jKexWTW

SDAG currently supports the `ne` case, but not the `eq` case. This can probably
be further generalized, but I don't feel like thinking that hard right now.

This gives some minor code size improvements across the board on CTMark at
-Os for AArch64. (0.1% for 7zip and pairlocalalign in particular.)

Differential Revision: https://reviews.llvm.org/D109130
2021-09-02 15:05:31 -07:00
Kirill Stoimenov cf53c6c971 [asan] Fixed link error by setting jump symbol to R_X86_64_PLT32.
Fixing this link error:
ld: error: relocation R_X86_64_PC32 cannot be used against symbol __asan_report_load...; recompile with -fPIC

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D109183
2021-09-02 21:50:56 +00:00
Kevin Athey 04ed6e7afc Revert "[CSSPGO] Honor preinliner decision for ThinLTO importing"
This reverts commit a2768b4732.

Breaks sanitizer-x86_64-linux-fast buildbot:
https://lab.llvm.org/buildbot/#/builders/5/builds/11334

Log snippet:
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: Transforms/SampleProfile/early-inline.ll (65549 of 78729)
******************** TEST 'LLVM :: Transforms/SampleProfile/early-inline.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll -instcombine -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/einline.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll
--
Exit Code: 2
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53: runtime error: member call on null pointer of type 'llvm::sampleprof::FunctionSamples'
    #0 0x5a730f8 in shouldInlineCandidate /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53
    #1 0x5a730f8 in (anonymous namespace)::SampleProfileLoader::tryInlineCandidate((anonymous namespace)::InlineCandidate&, llvm::SmallVector<llvm::CallBase*, 8u>*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1178:21
    #2 0x5a6cda6 in inlineHotFunctions /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1105:13
    #3 0x5a6cda6 in (anonymous namespace)::SampleProfileLoader::emitAnnotations(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1633:16
    #4 0x5a5fcbe in runOnFunction /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2008:12
    #5 0x5a5fcbe in (anonymous namespace)::SampleProfileLoader::runOnModule(llvm::Module&, llvm::AnalysisManager<llvm::Module>*, llvm::ProfileSummaryInfo*, llvm::CallGraph*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1922:15
    #6 0x5a5de55 in llvm::SampleProfileLoaderPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2038:21
    #7 0x6552a01 in llvm::detail::PassModel<llvm::Module, llvm::SampleProfileLoaderPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #8 0x57f807c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:526:21
    #9 0x37c8522 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:489:7
    #10 0x37e7c11 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12
    #11 0x7fbf4de4009a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #12 0x379e519 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x379e519)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 in
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll
--
********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: Transforms/SampleProfile/inline-cold.ll (65643 of 78729)
******************** TEST 'LLVM :: Transforms/SampleProfile/inline-cold.ll' FAILED ********************
Script:
--
: 'RUN: at line 4';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 5';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 8';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 11';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=9999999 -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 14';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=-500 -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
--
Exit Code: 2
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53: runtime error: member call on null pointer of type 'llvm::sampleprof::FunctionSamples'
    #0 0x5a730f8 in shouldInlineCandidate /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53
    #1 0x5a730f8 in (anonymous namespace)::SampleProfileLoader::tryInlineCandidate((anonymous namespace)::InlineCandidate&, llvm::SmallVector<llvm::CallBase*, 8u>*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1178:21
    #2 0x5a6cda6 in inlineHotFunctions /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1105:13
    #3 0x5a6cda6 in (anonymous namespace)::SampleProfileLoader::emitAnnotations(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1633:16
    #4 0x5a5fcbe in runOnFunction /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2008:12
    #5 0x5a5fcbe in (anonymous namespace)::SampleProfileLoader::runOnModule(llvm::Module&, llvm::AnalysisManager<llvm::Module>*, llvm::ProfileSummaryInfo*, llvm::CallGraph*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1922:15
    #6 0x5a5de55 in llvm::SampleProfileLoaderPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2038:21
    #7 0x6552a01 in llvm::detail::PassModel<llvm::Module, llvm::SampleProfileLoaderPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #8 0x57f807c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:526:21
    #9 0x37c8522 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:489:7
    #10 0x37e7c11 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12
    #11 0x7fcd534a209a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #12 0x379e519 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x379e519)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 in
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
--
********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
********************
Failed Tests (2):
  LLVM :: Transforms/SampleProfile/early-inline.ll
  LLVM :: Transforms/SampleProfile/inline-cold.ll
2021-09-02 14:48:31 -07:00
Arthur Eubanks 813a7f1ad7 [MemorySSA] Properly handle liveOnEntry in the walker printer
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109177
2021-09-02 12:51:27 -07:00
Arthur Eubanks 92b94a6d0c [Verifier] Only allow invariant.group metadata on stores and loads
As specified by https://llvm.org/docs/LangRef.html#invariant-group-metadata.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109182
2021-09-02 12:49:04 -07:00
Sam Clegg 4664590d53 [WebAssemlby] Remove redundant SDTypeProfile. NFC
I added this back in https://reviews.llvm.org/D54647 but it wasn't
actually needed.

Differential Revision: https://reviews.llvm.org/D109176
2021-09-02 15:21:22 -04:00
Wenlei He f7fff46acc [CSSPGO] Allow inlining recursive call for preinliner
When preinliner is used for CSSPGO, we try to honor global preinliner decision as much as we can except for uninlinable callees. We rely on InlineCost::Never to prevent us from illegal inlining.

However, it turns out that we use InlineCost::Never for both illeagle inlining and some of the "not-so-beneficial" inlining.

The most common one is recursive inlining, while it can bloat size a lot during CGSCC bottom-up inlining, it's less of a problem when recursive inlining is guided by profile and done in top-down manner.

Ideally it'd be better to have a clear separation between inline legality check vs cost-benefit check, but that requires a bigger change.

This change enables InlineCost computation to allow inlining recursive calls, controlled by InlineParams. In SampleLoader, we now enable recursive inlining for CSSPGO when global preinliner decision is used.

With this change, we saw a few perf improvements on SPEC2017 with CSSPGO and preinliner on: 2% for povray_r, 6% for xalancbmk_s, 3% omnetpp_s, while size is about the same (no noticeable perf change for all other benchmarks)

Differential Revision: https://reviews.llvm.org/D109104
2021-09-02 11:24:27 -07:00
Nikita Popov c86e1ce73b [SCEVExpander] Simplify pointer overflow check
This is a followup to D104662 to generate slightly nicer code for
pointer overflow checks. Bypass expandAddToGEP and instead
explicitly generate i8 GEPs. This saves some bitcasts and negates
the value in a more obvious way. In particular, this prevents SCEV
from looking through the umul.with.overflow, same as in the integer
case.

The wrapping-pointer-ni.ll test deserves a comment: Previously,
this generated a typed GEP which used the umulo argument rather
than the multiplication result. This results in more compact IR in
that case, but effectively does the multiplication twice, the
second one is just hidden in the GEP. Reusing the umulo result
seems pretty reasonable to me.

Differential Revision: https://reviews.llvm.org/D109093
2021-09-02 20:15:59 +02:00
Sam Clegg ad2f94f398 [WebAssembly] Fix names of WebAssemblyWrapper SDNodes. NFC
Other platforms all use CamelCase as normal for these wrapper nodes.

Differential Revision: https://reviews.llvm.org/D109172
2021-09-02 13:54:44 -04:00
Heejin Ahn 28780e59f6 [WebAssembly] Add Wasm SjLj support
This add support for SjLj using Wasm exception handling instructions:
https://github.com/WebAssembly/exception-handling/blob/master/proposals/exception-handling/Exceptions.md

This does not yet support the mixed use of EH and SjLj within a
function. It will be added in a follow-up CL.

This currently passes all SjLj Emscripten tests for wasm0/1/2/3/s,
except for the below:
- `test_longjmp_standalone`: Uses Node
- `test_dlfcn_longjmp`: Uses NodeRAWFS
- `test_longjmp_throw`: Mixes EH and SjLj
- `test_exceptions_longjmp1`: Mixes EH and SjLj
- `test_exceptions_longjmp2`: Mixes EH and SjLj
- `test_exceptions_longjmp3`: Mixes EH and SjLj

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D108960
2021-09-02 10:51:02 -07:00
Nick Desaulniers 6860b136b9 [MipsISelLowering] avoid emitting libcalls to __multi3
Similar to D108842 and D108844.

__has_builtin(builtin_mul_overflow) returns true for 32b MIPS targets,
but Clang is deferring to compiler RT when encountering long long types.
This breaks MIPS malta_defconfig builds of the Linux kernel that are
using __builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support malta_defconfig builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://github.com/ClangBuiltLinux/linux/issues/1438

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D108926
2021-09-02 10:41:37 -07:00
Daniil Suchkov 5c97507e2b [InlineCost] Introduce attributes to override InlineCost for inliner testing
This patch introduces four new string attributes: function-inline-cost,
function-inline-threshold, call-inline-cost and call-threshold-bonus.
These attributes allow you to selectively override some aspects of
InlineCost analysis. That would allow us to test inliner separately from
the InlineCost analysis.

That could be useful when you're trying to write tests for inliner and
you need to test some very specific situation, like "the inline cost has
to be this high", or "the threshold has to be this low". Right now every
time someone does that, they have get creative to come up with a way to
make the InlineCost give them the number they need (like adding ~30
load/add pairs for a trivial test). This process can be somewhat tedious
which can discourage some people from writing enough tests for their
changes. Also, that results in tests that are fragile and can be easily
broken without anyone noticing it because the test writer can't
explicitly control what input the inliner will get from the inline cost
analysis.

These new attributes will alleviate those problems to an extent.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D109033
2021-09-02 17:35:06 +00:00
Craig Topper 3e89cc5cda [X86] Remove isel predicates for xgetbv/xsetbv instructions so they can work on Windows.
https://reviews.llvm.org/D56686  was supposed to allow these to
work on Windows without needing to enable the xsave feature to
match MSVC. It seems this didn't work because the backend isel
patterns would still block it.

This patch removes the predicates from the isel patterns.

Fixes PR51706.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D109097
2021-09-02 10:25:02 -07:00
Stanislav Mekhanoshin 832c87b4fb [AMDGPU] Use S_BITCMP0_* to replace AND in optimizeCompareInstr
These can be used for reversed conditions if result of the AND
is unused except in the compare:

s_cmp_eq_u32 (s_and_b32 $src, 1), 0 => s_bitcmp0_b32 $src, 0
s_cmp_eq_i32 (s_and_b32 $src, 1), 0 => s_bitcmp0_b32 $src, 0
s_cmp_eq_u64 (s_and_b64 $src, 1), 0 => s_bitcmp0_b64 $src, 0
s_cmp_lg_u32 (s_and_b32 $src, 1), 1 => s_bitcmp0_b32 $src, 0
s_cmp_lg_i32 (s_and_b32 $src, 1), 1 => s_bitcmp0_b32 $src, 0
s_cmp_lg_u64 (s_and_b64 $src, 1), 1 => s_bitcmp0_b64 $src, 0

Differential Revision: https://reviews.llvm.org/D109099
2021-09-02 09:38:01 -07:00
Simon Pilgrim d66d520fe1 [X86][SSE] combineMulToPMADDWD - improve recognition of sign/zero extended upper bits
PMADDWD(v8i16 x, v8i16 y) == (v4i32) { (int)x[0]*y[0] + (int)x[1]*y[1], ..., (int)x[6]*y[6] + (int)x[7]*y[7] }

Currently combineMulToPMADDWD only folds cases where the upper 17 bits of both vXi32 inputs are known zero (i.e. the first half is positive and the second half of the pair is zero in each 2xi16 pair), this can be relaxed to only require one zero-extended input if the other input has at least 17 sign bits.

That way the sign of the result is still preserved, and the second half is still zero.

Noticed while investigating PR47437.

Differential Revision: https://reviews.llvm.org/D108522
2021-09-02 17:36:22 +01:00
Kazu Hirata e1bb54b593 [clangd, llvm] Remove redundant calls to c_str() (NFC)
Identified with readability-redundant-string-cstr.
2021-09-02 09:07:13 -07:00
Wenlei He a2768b4732 [CSSPGO] Honor preinliner decision for ThinLTO importing
When pre-inliner decision is used for CSSPGO, we should take that into account for ThinLTO importing as well, so post-link sample loader inliner can favor that decision. This is handled by a small tweak in this patch. It also includes a change to transfer preinliner decision when merging context.

Differential Revision: https://reviews.llvm.org/D109088
2021-09-02 08:24:06 -07:00
Bradley Smith 14e1a4a6ee [AArch64][SVE] Workaround incorrect types when lowering fixed length gather/scatter
When lowering a fixed length gather/scatter the index type is assumed to
be the same as the memory type, this is incorrect in cases where the
extension of the index has been folded into the addressing mode.

For now add a temporary workaround to fix the codegen faults caused by
this by preventing the removal of this extension. At a later date the
lowering for SVE gather/scatters will be redesigned to improve the way
addressing modes are handled.

As a short term side effect of this change, the addressing modes
generated for fixed length gather/scatters will not be optimal.

Differential Revision: https://reviews.llvm.org/D109145
2021-09-02 15:07:24 +00:00
Craig Topper b5fd6b46f5 [RISCV] Teach instruction selection to elide sext.w in some cases.
If a sext_inreg is up for isel, and all its users are W instructions,
we can skip emitting the sext_inreg. This helpful if the producing
instruction can't become a W instruction.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D108966
2021-09-02 07:54:34 -07:00
Evandro Menezes 5ebdb07e7e [RISCV] Enable shrink wrap by default
Differential Revision: https://reviews.llvm.org/D109037
2021-09-02 09:47:58 -05:00
Craig Topper e4e69ba4d1 [RISCV] Split PseudoVSETVLI into 2 instructions to allow different register classes for rs1.
X0 has special meaning for vsetvli, we need to make sure we never
create it a vsetvli that uses it by accident. This could happen
if the register coalescer coalesces a copy from X0 into this
instruction.

This patch splits the instruction so that we can have GPRNoX0
register class to use for the cases where we don't want the source
to be X0. The verifier won't let us explicitly use X0 on a GPRNoX0
operand so we need a separate pseudo for those cases.

I don't currently have a failing example for this. There was a
failure in D107957, but the coalescable copy from that example
should have been optimized away much earlier so I've fixed that.

This is not a complete fix. We still need to prevent the same
possible issue on the AVL operand of all of the vector instruction
pseudos. I don't want to make two versions of all of those so we
need to find a different solution for those. I have an idea I'm
going to try.

Differential Revision: https://reviews.llvm.org/D109110
2021-09-02 07:45:31 -07:00
Piotr Sobczak 30d6c39bca [AMDGPU] Add merging into S_BUFFER_LOAD_DWORDX8_IMM
Extend SILoadStoreOptimizer to merge into DWORDX8 variant of S_BUFFER_LOAD.

Merging into DWORDX2 and DWORDX4 variants is handled already.

Differential Revision: https://reviews.llvm.org/D108909
2021-09-02 16:26:25 +02:00
David Green 9cb8f4d1ad [ARM] Add a tail-predication loop predicate register
The semantics of tail predication loops means that the value of LR as an
instruction is executed determines the predicate. In other words:

mov r3, #3
DLSTP lr, r3        // Start tail predication, lr==3
VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0.
mov lr, #1
VADD.s32 q0, q1, q2 // Only first lane is updated.

This means that the value of lr cannot be spilled and re-used in tail
predication regions without potentially altering the behaviour of the
program. More lanes than required could be stored, for example, and in
the case of a gather those lanes might not have been setup, leading to
alignment exceptions.

This patch adds a new lr predicate operand to MVE instructions in order
to keep a reference to the lr that they use as a tail predicate. It will
usually hold the zeroreg meaning not predicated, being set to the LR phi
value in the MVETPAndVPTOptimisationsPass. This will prevent it from
being spilled anywhere that it needs to be used.

A lot of tests needed updating.

Differential Revision: https://reviews.llvm.org/D107638
2021-09-02 13:42:58 +01:00
Roman Lebedev 3f1f08f0ed
Revert @llvm.isnan intrinsic patchset.
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)

TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.

While the end result of discussion may lead back to the current design,
it may also not lead to the current design.

Therefore i take it upon myself
to revert the tree back to last known good state.

This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
2021-09-02 13:53:56 +03:00
Simon Pilgrim b0acd6c369 [X86] Fold PMADD(x,0) or PMADD(0,x) -> 0
Pulled out of D108522 - handle zero-operand cases for PMADDWD/VPMADDUBSW ops
2021-09-02 10:48:50 +01:00
Roman Lebedev 50634deaa5
Revert "[OpenMP][OpenMPIRBuilder] Implement loop unrolling."
Breaks build with -DBUILD_SHARED_LIBS=ON
```
CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
  "LLVMFrontendOpenMP" of type SHARED_LIBRARY
    depends on "LLVMPasses" (weak)
  "LLVMipo" of type SHARED_LIBRARY
    depends on "LLVMFrontendOpenMP" (weak)
  "LLVMCoroutines" of type SHARED_LIBRARY
    depends on "LLVMipo" (weak)
  "LLVMPasses" of type SHARED_LIBRARY
    depends on "LLVMCoroutines" (weak)
    depends on "LLVMipo" (weak)
At least one of these targets is not a STATIC_LIBRARY.  Cyclic dependencies are allowed only among static libraries.
CMake Generate step failed.  Build files cannot be regenerated correctly.
```

This reverts commit 707ce34b06.
2021-09-02 12:42:23 +03:00
Fraser Cormack ef78f2106c [LegalizeTypes][VP] Add splitting support for binary VP ops
This patch extends D107904's introduction of vector-predicated (VP)
operation legalization to include vector splitting.

When the result of a binary VP operation needs splitting, all of its
operands are split in kind. The two operands and the mask are split as
usual, and the vector-length parameter EVL is "split" such that the low
and high halves each execute the correct number of elements.

Tests have been added to the RISC-V target to show splitting several
scenarios for fixed- and scalable-vector types. Without support for
`umax` (e.g. in the `B` extension) the generated code starts to branch.
Ideally a cost model would prevent their insertion in the first place.

Through these tests many opportunities for better codegen can be seen:
combining known-undef VP operations and for constant-folding operations
on `ISD::VSCALE`, to name but a few.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107957
2021-09-02 10:15:53 +01:00
Simon Moll ea2cdbf5e6 [VP] Declaration and docs for vp.select intrinsic
llvm.vp.select extends the regular select instruction with an explicit
vector length (%evl).

All lanes with indexes at and above %evl are
undefined. Lanes below %evl are taken from the first input where the
mask is true and from the second input otherwise.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D105351
2021-09-02 11:17:14 +02:00
David Sherwood d581d94385 [SVE] Fix the FP arithmetic instruction costs for SVE
Several FP instructions (fadd, fsub, etc.) were incorrectly assigned
a higher cost for SVE because they have custom lowering, however we
know they are legal. This patch explicitly assigns a cost of 2 to
these opcodes.

Tests added here:

  Analysis/CostModel/AArch64/arith-fp-sve.ll

Differential Revision: https://reviews.llvm.org/D108993
2021-09-02 09:55:13 +01:00
Fangrui Song dfb7518df1 [MC] Set SHF_INFO_LINK on SHT_REL/SHT_RELA sections
sh_info links to a section, therefore SHF_INFO_LINK should be set as GNU as
does. The issue has been benign because linkers kindly combines relocation
sections w/ and w/o the flag.
2021-09-02 01:00:51 -07:00
Michael Kruse 707ce34b06 [OpenMP][OpenMPIRBuilder] Implement loop unrolling.
Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:

 * `unrollLoopFull`
 * `unrollLoopPartial`
 * `unrollLoopHeuristic`

`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.

With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.

Reviewed By: jdoerfert, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107764
2021-09-02 02:37:25 -05:00
Wenlei He c000b8bd5c [CSSPGO] Use preinliner decision by default when available
For CSSPGO, turn on `sample-profile-use-preinliner` by default. This simplifies the use of llvm-profgen preinliner as it's now simply driven by ContextShouldBeInlined flag for each context profile without needing extra compiler switch.

Note that llvm-profgen's preinliner is still off by default, under switch `csspgo-preinliner`.

Differential Revision: https://reviews.llvm.org/D109111
2021-09-01 23:45:38 -07:00
Markus Lavin 304f2bd21d [NPM] Added opt option -print-pipeline-passes.
Added opt option -print-pipeline-passes to print a -passes compatible
string describing the built pass pipeline.

As an example:
$ opt -enable-new-pm=1 -adce -licm -simplifycfg -o /dev/null /dev/null -print-pipeline-passes
verify,function(adce),function(loop-mssa(licm)),function(simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>),verify,BitcodeWriterPass

At the moment this is best-effort only and there are some known
limitations:
- Not all passes accepting parameters will print their parameters
  (currently only implemented for simplifycfg).
- Some ClassName to pass-name mappings are not unique.
- Some ClassName to pass-name mappings are missing (e.g.
  BitcodeWriterPass).

Differential Revision: https://reviews.llvm.org/D108298
2021-09-02 08:23:33 +02:00
Markus Lavin 645af79e8e Revert "[NPM] Added opt option -print-pipeline-passes."
This reverts commit c71869ed4c.
2021-09-02 08:22:17 +02:00
Markus Lavin c71869ed4c [NPM] Added opt option -print-pipeline-passes.
Added opt option -print-pipeline-passes to print a -passes compatible
string describing the built pass pipeline.

As an example:
$ opt -enable-new-pm=1 -adce -licm -simplifycfg -o /dev/null /dev/null -print-pipeline-passes
verify,function(adce),function(loop-mssa(licm)),function(simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>),verify,BitcodeWriterPass

At the moment this is best-effort only and there are some known
limitations:
- Not all passes accepting parameters will print their parameters
  (currently only implemented for simplifycfg).
- Some ClassName to pass-name mappings are not unique.
- Some ClassName to pass-name mappings are missing (e.g.
  BitcodeWriterPass).
2021-09-02 08:16:51 +02:00
Abinav Puthan Purayil 0baace5379 [DAGCombine] Add node level checks for fp-contract and fp-ninf in visitFMULForFMADistributiveCombine().
Differential Revision: https://reviews.llvm.org/D107551
2021-09-02 11:33:14 +05:30
Jinsong Ji 8671191d26 [NFC][PowerPC] Small code refactor in LoopInstrFormPrep
Avoid some duplicate code.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D109083
2021-09-02 03:16:01 +00:00
Arthur Eubanks 7b08d9da55 Reland [MemorySSA] Add pass to print results of MemorySSA walker
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109028
2021-09-01 18:58:57 -07:00
Arthur Eubanks 0f63496ea4 Revert "[MemorySSA] Add pass to print results of MemorySSA walker"
This reverts commit 8f98477c2d.

Breaks bots
2021-09-01 18:45:19 -07:00
Chen Zheng 2596120199 [PowerPC] small code format refactor ; NFC
address the code review comments in patch https://reviews.llvm.org/D105872
2021-09-02 01:39:32 +00:00