Commit Graph

30883 Commits

Author SHA1 Message Date
Jinsong Ji 2c9028170e [DebugInfo][AIX] Set target debugger-tune default to dbx
https://reviews.llvm.org/D99400 set clang DefaultDebuggerTuning for AIX
to dbx. However, we still need to update the target default so that llc
and other tools will get the same default debuggertuning, and avoid
passing extra options in LTO.

Reviewed By: #powerpc, shchenz, dblaikie

Differential Revision: https://reviews.llvm.org/D101197
2021-04-26 01:38:44 +00:00
Xiang1 Zhang 3b8ec86fd5 [X86] Support AMX fast register allocation
Differential Revision: https://reviews.llvm.org/D100026
2021-04-25 09:45:41 +08:00
RamNalamothu 0ce723cb22 [NFC] Refactor how CFI section types are represented in AsmPrinter
In terms of readability, the `enum CFIMoveType` didn't better document what it
intends to convey i.e. the type of CFI section that gets emitted.

Reviewed By: dblaikie, MaskRay

Differential Revision: https://reviews.llvm.org/D76519
2021-04-24 23:29:42 +05:30
Dávid Bolvanský ef2dc7ed9f [Analysis] Attribute alignment should not prevent tail call optimization
Fixes tail folding issue mentioned in D100879.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D101230
2021-04-24 19:57:42 +02:00
RamNalamothu 4e87fdd786 [NFC] Delete the redundant member 'shouldEmitMoves' from DwarfCFIException class
The data member 'shouldEmitMoves' is only used in DwarfCFIException::beginFunction()
and 'shouldEmitCFI' in DwarfCFIExceptionBase serves its purpose.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D101155
2021-04-24 06:35:39 +05:30
Michael Kitzan 59f2dd5f1a [MachineCSE] Prevent CSE of non-local convergent instrs
At the moment, MachineCSE allows CSE-ing convergent instrs which are
non-local to each other. This can cause illegal codegen as convergent
instrs are control flow dependent. The patch prevents non-local CSE of
convergent instrs by adding a check in isProfitableToCSE and rejecting
CSE-ing if we're considering CSE-ing non-local convergent instrs. We
can still CSE convergent instrs which are in the same control flow
scope, so the patch purposely does not make all convergent instrs
non-CSE candidates in isCSECandidate.

https://reviews.llvm.org/D101187
2021-04-23 16:44:48 -07:00
Snehasish Kumar 3da0aeea08 [NFC] Use hasSection instead of getSection().empty()
Use the optimized check hasSection() instead of calling
getSection().empty(). Originally suggested in D101004, but was dropped
in the commit.
2021-04-23 10:00:38 -07:00
Sander de Smalen f9a50f04ba [TTI] NFC: Change getIntImmCost[Inst|Intrin] to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100565
2021-04-23 16:06:36 +01:00
Stephen Tozer 791930d740 Re-reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
Previous build failures were caused by an error in bitcode reading and
writing for DIArgList metadata, which has been fixed in e5d844b587.
There were also some unnecessary asserts that were being triggered on
certain builds, which have been removed.

This reverts commit dad5caa59e.
2021-04-23 10:54:01 +01:00
Chen Zheng 027d6735ae [Debug-Info] change return type to void for attribute adding functions.
Make following function return void:

    addLabel()
    addSectionLabel()
    addSectionDelta()

This aligns with other attributes adding functions.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D101022
2021-04-23 04:16:55 -04:00
Serguei Katkov 914c832824 [InlineSpiller] Clean-up isSpillCandBB
This is mostly NFC except that for end of BB not previous slot is used.
Idx is used to find a def of sibling live interval in that slot.
The def on end of MBB and on previous slot of end MBB should be the same,
so it should be NFC.

Reviewers: reames, qcolombet, MatzeB, wmi, rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100922
2021-04-23 10:16:02 +07:00
Fangrui Song 2786e673c7 [IR][sanitizer] Add module flag "frame-pointer" and set it for cc1 -mframe-pointer={non-leaf,all}
The Linux kernel objtool diagnostic `call without frame pointer save/setup`
arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism
introduced in D100251, it's trivial to respect the command line
-m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it.

Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan)
Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan)

Also document the function attribute "frame-pointer" which is long overdue.

Differential Revision: https://reviews.llvm.org/D101016
2021-04-22 18:07:30 -07:00
Craig Topper 5185b52988 [RISCV] Fix crash with fptosi.sat/fptoui.sat intrinsics on RV64. Add test cases.
Add PromoteIntOp_FP_TO_XINT_SAT to type legalize the bit width
operand from i32 to i64 for RV64.

Add test cases for the saturating intrinsics for half/float/double
and i32/i64. CodeGen is definitely not optimal. We can probably
make use of the native behavior of fcvt instructions in many cases.

Fixes PR50083
2021-04-22 15:18:15 -07:00
Jun Ma 978eb3f168 [DAGCombiner] Allow operand of step_vector to be negative.
It is proper to relax non-negative limitation of step_vector.
Also this patch adds more combines for step_vector:
(sub X, step_vector(C)) -> (add X,  step_vector(-C))

Differential Revision: https://reviews.llvm.org/D100812
2021-04-22 20:58:03 +08:00
Snehasish Kumar 8077d0ff5c [CodeGen] Do not split functions with attr "implicit-section-name".
The #pragma clang section can be used at a coarse granularity to specify
the section used for bss/data/text/rodata for global objects. When split
functions is enabled, the function may be split into two parts violating
user expectations.

Reference:
https://clang.llvm.org/docs/LanguageExtensions.html#specifying-section-names-for-global-objects-pragma-clang-section

Differential Revision: https://reviews.llvm.org/D101004
2021-04-21 21:51:33 -07:00
Michael Holman 77357208c4 [CodeView] Add CodeView support for PGO debug information
This change adds debug information about whether PGO is being used or
not.

Microsoft performance tooling (e.g. xperf, WPA) uses this information to
show whether functions are optimized with PGO or not, as well as whether
PGO information is invalid.

This information is useful for validating whether training scenarios are
providing good coverage of real world scenarios, showing if profile data
is out of date, etc.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D99994
2021-04-21 15:29:19 -07:00
Petr Hosek 7a718e1630 [MC] Use COMDAT for LSDA only if IR comdat type is any
This fixed issue introduced in 16af973933
and 796feb6163.

Differential Revision: https://reviews.llvm.org/D100909
2021-04-21 14:41:39 -07:00
Hongtao Yu b6db6f5530 [CSSPGO] Exclude pseudo probe from slotindex verification. 2021-04-21 09:17:12 -07:00
Nico Weber ba7a92c01e [Support] Don't include VirtualFileSystem.h in CommandLine.h
CommandLine.h is indirectly included in ~50% of TUs when building
clang, and VirtualFileSystem.h is large.

(Already remarked by jhenderson on D70769.)

No behavior change.

Differential Revision: https://reviews.llvm.org/D100957
2021-04-21 10:19:01 -04:00
Simon Pilgrim d860bf2d0e [DAG] TargetLowering.cpp - breakup if-else chains where each block returns. NFCI.
Match style guide that requests that if+return blocks are separate.
2021-04-21 11:17:27 +01:00
Fraser Cormack c141bd3cf9 [DAGCombiner] Support all-ones/all-zeros SPLAT_VECTOR in more combines
This patch adds incrementally-better support for SPLAT_VECTOR in a
handful of vector combines by changing a few more
isBuildVectorAllOnes/isBuildVectorAllZeros to the equivalent
isConstantSplatVectorAllOnes/Zeros calls.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D100851
2021-04-21 11:05:37 +01:00
Adrian Prantl 81cad0be68 Make sure PHIElimination doesn't copy debug locations across basic blocks.
PHIElimination may insert copy instructions in multiple basic
blocks. Moving debug locations across basic block boundaries would be
misleading as illustrated by the test case.

rdar://75463656

Differential Revision: https://reviews.llvm.org/D100886
2021-04-20 17:03:29 -07:00
Philip Reames 4824d876f0 Revert "Allow invokable sub-classes of IntrinsicInst"
This reverts commit d87b9b81cc.

Post commit review raised concerns, reverting while discussion happens.
2021-04-20 15:38:38 -07:00
Philip Reames d87b9b81cc Allow invokable sub-classes of IntrinsicInst
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.

This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.

After this lands and has baked for a couple days, planned cleanups:
    Make GCStatepointInst a IntrinsicInst subclass.
    Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
    Do the same in SelectionDAG.
    Do the same in FastISEL.

Differential Revision: https://reviews.llvm.org/D99976
2021-04-20 15:03:49 -07:00
Simon Pilgrim bc98076ff6 Silence MSVC signed/unsigned comparison warning. NFCI. 2021-04-20 17:20:13 +01:00
Simon Pilgrim 2a419a0b99 [X86][SSE] combineX86ShuffleChain - check if we're blending with zero into already zero elements
Add a SelectionDAG::MaskedElementsAreZero helper that wraps SelectionDAG::MaskedValueIsZero testing for entirely zero vector elements
2021-04-20 17:09:49 +01:00
Matt Arsenault 620fdb9671 GlobalISel: Defer register creation in handleAssignments
This is currently built on top of the SelectionDAG call lowering, but
does not use it the same way. SelectionDAG passes legalized types to
the assignment functions, and the tablegenerated assignment functions
may change the value types expected for registers. This does not
change the types used, just moves the register creation to help fix
this in the future.

Defer the register creation until after all of the assignment
decisions have been made. This will also help have correct tail call
compatibility checking in a future change. Currently it does not work
as expected for any arguments split across multiple registers.
2021-04-20 11:48:12 -04:00
Matt Arsenault 14b03b4aad GlobalISel: Check for powers of 2 for inverse funnel shift lowering
This doesn't make a practical difference since it would only be broken
if a target actually had a legal non-power-of-2 inverse shift.
2021-04-20 11:30:22 -04:00
Matt Arsenault 83a25a1010 GlobalISel: Restrict narrow scalar for fptoui/fptosi results
This practically only works for the f16 case AMDGPU uses, not wider
types.

Fixes bug 49710 by failing legalization.
2021-04-20 10:54:40 -04:00
Matt Arsenault 8fbe04f46b MachineVerifier: Continue reporting errors for copies
This was skipping verification of later copies, but generally the
verifier tries to report as many things wrong as possible in the
function.
2021-04-20 10:54:40 -04:00
Simon Pilgrim e156f2515c [DAG] SelectionDAG.cpp - breakup if-else chains where each block returns. NFCI.
Match style guide that requests that if+return blocks are separate.
2021-04-20 12:37:00 +01:00
Serguei Katkov 70193bdfc0 Re-land [GreedyRA ORE] Add Cost of spill locations into remark
Re-land the patch with a fix of clang test.

Cost of spill location is computed basing on relative branch frequency
where corresponding spill/reload/copy are located.

While the number itself is highly depends on incoming IR,
the total cost can be used when do some changes in RA.

Revert "Revert "[GreedyRA ORE] Add Cost of spill locations into remark""
This reverts commit 680f3d6de7.
2021-04-20 16:21:07 +07:00
Jun Ma 1ef5699d1a [DAGCombiner] Support fold zero scalar vector.
This patch changes ISD::isBuildVectorAllZeros to
ISD::isConstantSplatVectorAllZeros which handles zero sclar vector.

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D100813
2021-04-20 16:28:43 +08:00
Fraser Cormack 457da7f298 [SelectionDAG] Relax constraints on STEP_VECTOR step operand
This patch relaxes the requirement that the STEP_VECTOR step constant
must be of a type at least as large as the vector element type. This
does not permit its use on targets which have legal vector element types
larger than the largest legal scalar type, such as i64 vectors on RV32.

As such, the requirement has been loosened so that the step operand must
be any scalar type so long as the constant immediate is non-negative and
the value fits inside the vector element type.

This limits combining optimizations in certain circumstances but in
practice it's unlikely to be a hindrance.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D100660
2021-04-20 08:41:42 +01:00
Serguei Katkov 680f3d6de7 Revert "[GreedyRA ORE] Add Cost of spill locations into remark"
This reverts commit 328377307a.

This commit causes buildbot failures due to some clang tests are not updated.
Temporary revert to fix clang tests.
2021-04-20 11:08:24 +07:00
Serguei Katkov 328377307a [GreedyRA ORE] Add Cost of spill locations into remark
Cost of spill location is computed basing on relative branch frequency
where corresponding spill/reload/copy are located.

While the number itself is highly depends on incoming IR,
the total cost can be used when do some changes in RA.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100020
2021-04-20 10:47:22 +07:00
Hongtao Yu b98807df05 [CSSPGO] Exclude pseudo probes from slot index
Pseudo probe are currently given a slot index like other regular instructions. This affects register pressure and lifetime weight computation because of enlarged lifetime length with pseudo probe instructions. As a consequence, program could get different code generated w/ and w/o pseudo probes. I'm closing the gap by excluding pseudo probes from stack index and downstream register allocation related passes.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D100334
2021-04-19 17:55:35 -07:00
Hongtao Yu 1812319292 [CSSPGO] Flip SkipPseudoOp to true for MIR APIs.
Flipping the default value of SkipPseudoOp to true for those MIR APIs to favor maximum performance. Note that certain spots like branch folding and MIR if-conversion is are disabled for better counts quality. For these two optimizations, this is a no-diff change.

The counts quality with SPEC2017 before/after this change is unchanged.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D100332
2021-04-19 17:55:34 -07:00
Roman Tereshin 76b0ea7f2d Reset NextFnNum in MachineModuleInfo::initialize
In an env that reuses compiler instances for multiple compilations, this
omission results in non-deterministic assembly output (names of the
auto-generated labels) if the order or full set of Modules compiled
varies.

Differential Revision: https://reviews.llvm.org/D100797
2021-04-19 15:51:30 -07:00
David Penry ca8eef7e3d [CodeGen] Use ProcResGroup information in SchedBoundary
When the ProcResGroup has BufferSize=0,

1. if there is a subunit in the list of write resources for the
   scheduling class, do not attempt to schedule the ProcResGroup.
2. if there is not a subunit in the list of write resources for the
   scheduling class, choose a subunit to use instead of the ProcResGroup.
3. having both the ProcResGroup and any of its subunits in the resources
   implied by a InstRW is not supported.

Used to model parallel uses from a pool of resources.

Differential Revision: https://reviews.llvm.org/D98976
2021-04-19 21:27:45 +01:00
Jessica Paquette 91bbb914e0 [AArch64][GlobalISel] Regbankselect + select @llvm.aarch64.neon.uaddlv
It turns out we actually import a bunch of selection code for intrinsics. The
imported code checks that the register banks on the G_INTRINSIC instruction
are correct. If so, it goes ahead and selects it.

This adds code to AArch64RegisterBankInfo to allow us to correctly determine
register banks on intrinsics which have known register bank constraints.

For now, this only handles @llvm.aarch64.neon.uaddlv. This is necessary for
porting AArch64TargetLowering::LowerCTPOP.

Also add a utility for getting the intrinsic ID from a G_INTRINSIC instruction.
This seems a little nicer than having to know about how intrinsic instructions
are structured.

Differential Revision: https://reviews.llvm.org/D100398
2021-04-19 10:47:49 -07:00
OCHyams 0ebf9a8e34 [DebugInfo] Move the findDbg* functions into DebugInfo.cpp
Move the findDbg* functions into lib/IR/DebugInfo.cpp from
lib/Transforms/Utils/Local.cpp.

D99169 adds a call to a function (findDbgUsers) that lives in
lib/Transforms/Utils/Local.cpp (LLVMTransformUtils) from lib/IR/Value.cpp
(LLVMCore). The Core lib doesn't include TransformUtils. The builtbots caught
this here: https://lab.llvm.org/buildbot/#/builders/109/builds/12664. This patch
moves the function, and the 3 similar ones for consistency, into DebugInfo.cpp
which is part of LLVMCore.

Reviewed By: dblaikie, rnk

Differential Revision: https://reviews.llvm.org/D100632
2021-04-19 10:30:25 +01:00
David Sherwood 83f5fa519e [CodeGen] Improve code generation for clamping of constant indices with scalable vectors
When trying to clamp a constant index into a scalable vector we can
test if the index is less than the minimum number of elements in the
vector. If so, we can simply return the index because we know it is
guaranteed to fit inside the vector.

Differential Revision: https://reviews.llvm.org/D100639
2021-04-19 08:34:17 +01:00
Serguei Katkov 9f33943ee0 [GreedyRA ORE] Add stats for copy of virtual registers.
Greedy RA adds copies of virtual registers when splitting live interval.
This stat might be useful.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100017
2021-04-19 12:43:44 +07:00
Serguei Katkov 61d22f2e4e [Greedy RA] Add a check to MachineVerifier
If Virtual Register is alive in landing pad its def must be
before the call causing the exception or it should be statepoint instruction itself and
in this case def actually means the relocation of gc pointer and is alive in
landing pad.

The test shows the triggering this check for an option under development
use-registers-for-gc-values-in-landing-pad which is off by default until
it is functionally correct.

Reviewers: reames, void, jyknight, nickdesaulniers, efriedma, arsenm, rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100525
2021-04-19 12:31:18 +07:00
Serge Guelton d6de1e1a71 Normalize interaction with boolean attributes
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

        if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

to

        if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

Introduce a getValueAsBool that normalize the check, with the following
behavior:

no attributes or attribute set to "false" => return false
attribute set to "true" => return true

Differential Revision: https://reviews.llvm.org/D99299
2021-04-17 08:17:33 +02:00
serge-sans-paille 550ed575cb Simplify BitVector code
Instead of managing memory by hand, delegate it to std::vector. This makes the
code much simpler, and also avoids repeatedly computing the storage size.

According to valgrind --tool=callgrind, this also slightly decreases the
instruction count, but by a small margin.

This is a recommit of 82f0e3d3ea with one usage
fixed in llvm/lib/CodeGen/RegisterScavenging.cpp.

Not the slight API change: BitVector::clear() now has the same behavior as any
other container: it does not free memory, but indeed sets the size of the
BitVector to 0. It is thus incorrect to access its content right afterwards, a
scenario which wasn't enforced in previous implementation.

Differential Revision: https://reviews.llvm.org/D100387
2021-04-16 22:48:33 +02:00
Simon Pilgrim 37a4621fb6 [DAG] SelectionDAG::isSplatValue - early out if binop is not splat. NFCI.
Just return false if we fail to match splats - the remainder of the code is for (fixed)vector operations - shuffles/insertions etc.
2021-04-16 18:26:33 +01:00
Jonathan Crowther e71994a239 [SystemZ][z/OS] Add IsText Argument to GetFile and GetFileOrSTDIN
Add the `IsText` argument to `GetFile` and `GetFileOrSTDIN` which will help z/OS distinguish between text and binary correctly. This is an extension to [this patch](https://reviews.llvm.org/D97785)

Reviewed By: abhina.sreeskantharajan, amccarth

Differential Revision: https://reviews.llvm.org/D100488
2021-04-16 10:08:36 -04:00
Momchil Velikov f9d932e673 [clang][AArch64] Correctly align HFA arguments when passed on the stack
When we pass a AArch64 Homogeneous Floating-Point
Aggregate (HFA) argument with increased alignment
requirements, for example

    struct S {
      __attribute__ ((__aligned__(16))) double v[4];
    };

Clang uses `[4 x double]` for the parameter, which is passed
on the stack at alignment 8, whereas it should be at
alignment 16, following Rule C.4 in
AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules)

Currently we don't have a way to express in LLVM IR the
alignment requirements of the function arguments. The align
attribute is applicable to pointers only, and only for some
special ways of passing arguments (e..g byval). When
implementing AAPCS32/AAPCS64, clang resorts to dubious hacks
of coercing to types, which naturally have the needed
alignment. We don't have enough types to cover all the
cases, though.

This patch introduces a new use of the stackalign attribute
to control stack slot alignment, when and if an argument is
passed in memory.

The attribute align is left as an optimizer hint - it still
applies to pointer types only and pertains to the content of
the pointer, whereas the alignment of the pointer itself is
determined by the stackalign attribute.

For byval arguments, the stackalign attribute assumes the
role, previously perfomed by align, falling back to align if
stackalign` is absent.

On the clang side, when passing arguments using the "direct"
style (cf. `ABIArgInfo::Kind`), now we can optionally
specify an alignment, which is emitted as the new
`stackalign` attribute.

Patch by Momchil Velikov and Lucas Prates.

Differential Revision: https://reviews.llvm.org/D98794
2021-04-15 22:58:14 +01:00
Jun Ma 7e1422c1e4 [DAGCombiner] Fold step_vector with add/mul/shl
This patch implements some DAG combines for STEP_VECTOR:
add step_vector(C1), step_vector(C2) -> step_vector(C1+C2)
add (add X step_vector(C1)), step_vector(C2) -> add X step_vector(C1+C2)
mul step_vector(C1), C2 -> step_vector(C1*C2)
shl step_vector(C1), C2 -> step_vector(C1<<C2)

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D100088
2021-04-15 18:06:35 +08:00
Tim Northover 6401b78ab3 SDAG: constant fold bf16 -> i16 casts
This direction is particularly useful because i16 constants are much more
likely to be legal than bf16.
2021-04-14 11:27:46 +01:00
Serguei Katkov cf0d3477aa [GreedyRA ORE] Separate Folder Reloads and Zero Cost Folder Reloads
Patchpoint instructions have operands which is actually zero cost
(or the same as register) to use the value from the stack.
In terms of statistic it makes same to separate them.

Move from computation instructions related to stack spill/reload to
number of stack slot referenced.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100016
2021-04-14 14:25:28 +07:00
Serguei Katkov 02265ed7ad [Live Intervals] Teach Greedy RA to recognize special case live-through
Statepoint instruction has a deopt section which is actually live-through the call.
Currently this is handled by special post pass after RA - fixup-statepoint-caller-saved.

This change teaches Greedy RA that if segment of live interval is ended with statepoint
instruction and its reg is used in deopt bundle then this live interval interferes regmask of this statepoint
and as a result caller-saved register cannot be assigned to this live interval.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100296
2021-04-14 13:26:49 +07:00
Serguei Katkov d5ed0d4816 [Live Intervals] Factor-out unionBitMask. NFC.
For further re-usage in other place.
2021-04-14 10:39:01 +07:00
Tim Northover 5e3d9fcc3a StackProtector: ensure protection does not interfere with tail call frame.
The IR stack protector pass must insert stack checks before the call instead of
between it and the return.

Similarly, SDAG one should recognize that ADJCALLFRAME instructions could be
part of the terminal sequence of a tail call. In this case because such call
frames cannot be nested in LLVM the stack protection code must skip over the
whole sequence (or risk clobbering argument registers).
2021-04-13 15:14:57 +01:00
Amy Huang dad5caa59e Revert "Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands""
This change causes an assert / segmentation fault in LTO builds.

This reverts commit f2e4f3eff3.
2021-04-12 20:10:17 -07:00
Serguei Katkov c362179b0a [GreedyRA ORE] Add debug location for function level report
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100168
2021-04-13 09:49:12 +07:00
Stephen Tozer f2e4f3eff3 Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
The causes of the previous build errors have been fixed in revisions
aa3e78a59f, and
140757bfaa

This reverts commit f40976bd01.
2021-04-12 16:57:29 +01:00
Stephen Tozer aa3e78a59f Reapply "[DebugInfo] Correctly track SDNode dependencies for list debug values"
Fixed memory leak error by using BumpAllocator for SDDbgValue arrays.

This reverts commit 1b589172bd.
2021-04-12 12:51:29 +01:00
Zhang Qing Shan d69c236e1d [NFC][Debug] Fix unnecessary deep-copy for vector to save compiling time
We saw some big compiling time impact after enabling the debug entry value
feature for X86 platform(D73534). Compiling time goes from 900s->1600s with
our testcase. It is caused by allocating/freeing the memory busily.

'using FwdRegWorklist = MapVector<unsigned, SmallVector<FwdRegParamInfo, 2>>;'
The value for this map is vector, and we miss the reference when access the
element. The same happens for `auto CalleesMap = MF->getCallSitesInfo();` which is a DenseMap.

Reviewed by: djtodoro, flychen50

Differential Revision: https://reviews.llvm.org/D100162
2021-04-12 14:55:03 +08:00
Chen Zheng bb346146a5 [Debug-Info] make fortran CHARACTER(1) type as valid unsigned type
This resolves https://bugs.llvm.org/show_bug.cgi?id=49872

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D100015
2021-04-11 23:17:01 -04:00
Arthur Eubanks c88b87f9ce Revert "Remove "Rewrite Symbols" from codegen pipeline"
This reverts commit 6210261ecb.

addr-label.ll crashes on armv7.
2021-04-10 23:28:16 -07:00
Arthur Eubanks 6210261ecb Remove "Rewrite Symbols" from codegen pipeline
It breaks up the function pass manager in the codegen pipeline.

With empty parameters, it looks at the -mllvm flag -rewrite-map-file.
This is likely not in use.

Add a check that we only have one function pass manager in the codegen
pipeline.

This required reverting commit 9583a3f2625818b78c0cf6d473cdedb9f23ad82c:
"[AsmPrinter] Delete dead takeDeletedSymbsForFunction()".
This was not NFC as initially thought. By coalescing two function
psas managers, this exposed the reverted code as necessary.
addr-label.ll was crashing due to an emitted blockaddress's block being
removed but the label not emitted.

Some tests relied on the fact that we had a module pass somewhere in the
codegen pipeline.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D99707
2021-04-10 22:38:44 -07:00
dfukalov d066079728 [NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset.
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98027
2021-04-09 12:54:22 +03:00
Sebastian Neubauer ba217b4655 [RegisterScavenging] Add asserts for better errors
These cases were failing before, but with cryptic asserts.
Add asserts in the RegScavenger that fail earlier with better
messages. NFC

Differential Revision: https://reviews.llvm.org/D100109
2021-04-09 11:23:36 +02:00
Serguei Katkov f6e3b4fe58 [GreedyRA ORE] Re-factor computeNumberOfSplillsReloads.
Replace if-else to if-continue usage.
This simplifies further extension of the collected stats.
2021-04-09 12:44:11 +07:00
Stephen Tozer 1b589172bd Revert "[DebugInfo] Correctly track SDNode dependencies for list debug values"
Reverted due to failure on the sanitizer-x86_64-linux-fast bot.

This reverts commit e10493eb50.
2021-04-08 17:55:45 +01:00
Stephen Tozer e10493eb50 [DebugInfo] Correctly track SDNode dependencies for list debug values
During SelectionDAG, we must track the SDNodes that each SDDbgValue depends on
to compute its value. These are ultimately derived from the location operands to
the SDDbgValue, but were stored in a separate vector prior to this patch. This
resulted in cases where one of the lists was updated incorrectly, resulting in
crashes during compilation. This patch fixes the issue by directly recomputing
the dependency list from the SDDbgOperands in getDependencies().

Differential Revision: https://reviews.llvm.org/D99423
2021-04-08 17:01:45 +01:00
Serguei Katkov a0e8738d45 [GreedyRA ORE] Add function level spill/reloads stats
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames, thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100014
2021-04-08 16:55:52 +07:00
Serguei Katkov 6b64c662c7 [GreedyRA ORE] Extract computeNumberOfSplillsReloads to use in different places. NFC.
Extract one basic block handling to introduce stat computation for method scope.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames, thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100013
2021-04-08 14:40:45 +07:00
Serguei Katkov df25787797 [GreedyRA ORE] Extract stats in RAGreedyStats struct. NFC.
Combine all collected stats into separate struct RAGreedyStats
with add and report methods.

The motivation is to extend the number of statistics to capture and instead of
adding new parameters, just combine all of them into one structure.
Additionally I plan to use report from different places in future to report data
for function as well.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100012
2021-04-08 14:27:37 +07:00
Serguei Katkov 0a1c6637a1 [GreedyRA ORE] Compute ORE stats if extra analysis is enabled
To save compile time, avoid computation of stats if ORE will not emit it.
The motivation is to add more stats and compute it only if it will dumped.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100010
2021-04-08 14:24:18 +07:00
Esme-Yi 0c36da722a [Debug-Info] Use inlined strings in .dwinfo section by default for DBX.
Summary: Set the default DwarfInlinedStrings as inlined strings for DBX, due to DBX does not support .dwstr section for now.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D99933
2021-04-08 07:20:22 +00:00
Hongtao Yu 2a2720a2de [CSSPGO] Move pseudo probes to the beginning of a block to unblock SelectionDAG combine.
Pseudo probes, when scattered in a block, can be chained dependencies of other regular DAG nodes and block DAG combine optimizations. To fix this, scattered probes in a block are grouped and placed at the beginning of the block. This shouldn't affect the profile quality.

Test Plan:

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D100002
2021-04-07 22:45:35 -07:00
Arthur Eubanks 90af134473 Revert "[AsmPrinter] Delete dead takeDeletedSymbsForFunction()"
This reverts commit 9583a3f262.

This wasn't NFC as initially thought. Needed for D99707.
2021-04-07 11:40:44 -07:00
Craig Topper 67953311e2 [SelectionDAG] Teach SelectionDAG::FoldConstantArithmetic to handle SPLAT_VECTOR
This allows FoldConstantArithmetic to handle SPLAT_VECTOR in
addition to BUILD_VECTOR. This allows it to support scalable
vectors. I'm also allowing fixed length SPLAT_VECTOR which is
used by some targets, but I'm not familiar enough to write tests
for those targets.

I had to block this function from running on CONCAT_VECTORS to
avoid calling getNode for a CONCAT_VECTORS of 2 scalars.
This can happen because the 2 operand getNode calls this
function for any opcode. Previously we were protected because
CONCAT_VECTORs of BUILD_VECTOR is folded to a larger BUILD_VECTOR
before that call. But it's not always possible to fold a CONCAT_VECTORS
of SPLAT_VECTORs, and we don't even try.

This fixes PR49781 where DAG combine thought constant folding
should be possible, but FoldConstantArithmetic couldn't do it.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D99682
2021-04-07 10:03:33 -07:00
Yevgeny Rouban 3e738afae4 [Statepoint Lowering] Allow other than N byte sized types in deopt bundle
I do not see any bit-width restriction from the point of the
LLVM Lang Ref - Operand Bundles on the types of the deopt bundle
operands. Statepoint Lowering seems to be able to work with any
types.
This patch relaxes the two related assertions and adds a new test
for this change.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D100006
2021-04-07 17:48:31 +07:00
Nicolás Alvarez a1aada75f5 [docs] Fix doxygen comments wrongly attached to the llvm namespace
Looking at the Doxygen-generated documentation for the llvm namespace
currently shows all sorts of random comments from different parts of the
codebase. These are mostly caused by:

- File doc comments that aren't marked with \file, so they're attached to
  the next declaration, which is usually "namespace llvm {".
- Class doc comments placed before the namespace rather than before the
  class.
- Code comments before the namespace that (in my opinion) shouldn't be
  extracted by doxygen at all.

This commit fixes these comments. The generated doxygen documentation now
has proper docs for several classes and files, and the docs for the llvm
and llvm::detail namespaces are now empty.

Reviewed By: thakis, mizvekov

Differential Revision: https://reviews.llvm.org/D96736
2021-04-07 01:20:18 +02:00
Philip Reames 908215b346 Use AssumeInst in a few more places [nfc]
Follow up to a6d2a8d6f5.  These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.
2021-04-06 13:18:53 -07:00
Philip Reames fb41cae039 More precisely type code used for gc.relocate assertions [nfc] 2021-04-06 11:27:36 -07:00
Abhina Sreeskantharajan 82b3e28e83 [SystemZ][z/OS][Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text
Problem:
On SystemZ we need to open text files in text mode. On Windows, files opened in text mode adds a CRLF '\r\n' which may not be desirable.

Solution:
This patch adds two new flags

  - OF_CRLF which indicates that CRLF translation is used.
  - OF_TextWithCRLF = OF_Text | OF_CRLF indicates that the file is text and uses CRLF translation.

Developers should now use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF.

So this is the behaviour per platform with my patch:

z/OS:
OF_None: open in binary mode
OF_Text : open in text mode
OF_TextWithCRLF: open in text mode

Windows:
OF_None: open file with no carriage return
OF_Text: open file with no carriage return
OF_TextWithCRLF: open file with carriage return

The Major change is in llvm/lib/Support/Windows/Path.inc to only set text mode if the OF_CRLF is set.
```
  if (Flags & OF_CRLF)
    CrtOpenFlags |= _O_TEXT;
```

These following files are the ones that still use OF_Text which I left unchanged. I modified all these except raw_ostream.cpp in recent patches so I know these were previously in Binary mode on Windows.
./llvm/lib/Support/raw_ostream.cpp
./llvm/lib/TableGen/Main.cpp
./llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
./llvm/unittests/Support/Path.cpp
./clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
./clang/lib/Frontend/CompilerInstance.cpp
./clang/lib/Driver/Driver.cpp
./clang/lib/Driver/ToolChains/Clang.cpp

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D99426
2021-04-06 07:23:31 -04:00
Simon Pilgrim ddbb58736a [KnownBits] Rename KnownBits::computeForMul to KnownBits::mul. NFCI.
As promised in D98866
2021-04-06 10:11:41 +01:00
Serguei Katkov 0057ec8034 [Statepoint] Factor-out utility function to get non-foldable area of STATEPOINT like instructions. NFC
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D99875
2021-04-06 11:44:37 +07:00
Stanislav Mekhanoshin 30b3aab329 Copy syncscope when expanding atomicrmw into cmpxchg loop
Fixes: SWDEV-280070

Differential Revision: https://reviews.llvm.org/D99902
2021-04-05 17:29:38 -07:00
Nikita Popov 665065821e [FastISel] Remove kill tracking
This is a followup to D98145: As far as I know, tracking of kill
flags in FastISel is just a compile-time optimization. However,
I'm not actually seeing any compile-time regression when removing
the tracking. This probably used to be more important in the past,
before FastRA was switched to allocate instructions in reverse
order, which means that it discovers kills as a matter of course.

As such, the kill tracking doesn't really seem to serve a purpose
anymore, and just adds additional complexity and potential for
errors. This patch removes it entirely. The primary changes are
dropping the hasTrivialKill() method and removing the kill
arguments from the emitFast methods. The rest is mechanical fixup.

Differential Revision: https://reviews.llvm.org/D98294
2021-04-03 15:50:13 +02:00
Simon Pilgrim 4ea5475a3f [KnownBits] Add KnownBits::haveNoCommonBitsSet helper. NFCI.
Include exhaustive test coverage.
2021-04-02 21:44:33 +01:00
Jun Ma 274ac9d40e [AArch64][SVE] Lowering sve.dot to DOT node
Differential Revision: https://reviews.llvm.org/D99699
2021-04-02 20:05:17 +08:00
Sander de Smalen 0f7bbbc481 Always emit error for wrong interfaces to scalable vectors, unless cmdline flag is passed.
In order to bring up scalable vector support in LLVM incrementally,
we introduced behaviour to emit a warning, instead of an error, when
asking the wrong question of a scalable vector, like asking for the
fixed number of elements.

This patch puts that behaviour under a flag. The default behaviour is
that the compiler will always error, which means that all LLVM unit
tests and regression tests will now fail when a code-path is taken that
still uses the wrong interface.

The behaviour to demote an error to a warning can be individually enabled
for tools that want to support experimental use of scalable vectors.
This patch enables that behaviour when driving compilation from Clang.
This means that for users who want to try out scalable-vector support,
fixed-width codegen support, or build user-code with scalable vector
intrinsics, Clang will not crash and burn when the compiler encounters
such a case.

This allows us to do away with the following pattern in many of the SVE tests:
  RUN: .... 2>%t
  RUN: cat %t | FileCheck --check-prefix=WARN
  WARN-NOT: warning: ...

The behaviour to emit warnings is only temporary and we expect this flag
to be removed in the future when scalable vector support is more stable.

This patch also has fixes the following tests:
 unittests:
   ScalableVectorMVTsTest.SizeQueries
   SelectionDAGAddressAnalysisTest.unknownSizeFrameObjects
   AArch64SelectionDAGTest.computeKnownBitsSVE_ZERO_EXTEND_VECTOR_INREG

 regression tests:
   Transforms/InstCombine/vscale_gep.ll

Reviewed By: paulwalker-arm, ctetreau

Differential Revision: https://reviews.llvm.org/D98856
2021-04-02 10:55:22 +01:00
Mircea Trofin ce61def529 [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration
The main part of the patch is the change in RegAllocGreedy.cpp: Q.collectInterferringVregs()
needs to be called before iterating the interfering live ranges.

The rest of the patch offers support that is the case: instead of  clearing the query's
InterferingVRegs field, we invalidate it. The clearing happens when the live reg matrix
is invalidated (existing triggering mechanism).

Without the change in RegAllocGreedy.cpp, the compiler ices.

This patch should make it more easily discoverable by developers that
collectInterferringVregs needs to be called before iterating.

I will follow up with a subsequent patch to improve the usability and maintainability of Query.

Differential Revision: https://reviews.llvm.org/D98232
2021-04-01 08:33:28 -07:00
Simon Pilgrim 77d625f8d8 [DAG] MergeInnerShuffle with BinOps - sometimes accept undef mask elements
If the inner shuffle already contains undef elements, then accept them in the merged shuffle as well.

This helps some X86 HADD/SUB patterns where slow targets were ending up with HADD/SUB because the (un)merged shuffles were stuck either side of the ADD/SUB - meaning we ended up with a total cost much higher than the "2*shuffle+add" that a slow target usually expands a HADD/SUB to.
2021-04-01 14:33:00 +01:00
Simonas Kazlauskas 777a58e05b Support {S,U}REMEqFold before legalization
This allows these optimisations to apply to e.g. `urem i16` directly
before `urem` is promoted to i32 on architectures where i16 operations
are not intrinsically legal (such as on Aarch64). The legalization then
later can happen more directly and generated code gets a chance to avoid
wasting time on computing results in types wider than necessary, in the end.

Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D88785
2021-04-01 01:35:41 +03:00
Craig Topper 9e00b6660d [SelectionDAG] Remove unneeded vector resize from the end of FoldConstantArithmetic. NFC
There's an assert right before that makes sure the size already matches.
Earlier in this function's life, scalars and vectors shared more
code.
2021-03-31 12:33:10 -07:00
Yang Fan 0d7fd9f0d0
[GlobalISel] Fix Wint-in-bool-context warning (NFC)
GCC warning:
```
/llvm-project/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp: In member function ‘bool llvm::CombinerHelper::matchFunnelShiftToRotate(llvm::MachineInstr&)’:
/llvm-project/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp:3882:35: warning: ?: using integer constants in boolean context, the expression will always evaluate to ‘true’ [-Wint-in-bool-context]
 3882 |       Opc == TargetOpcode::G_FSHL ? TargetOpcode::G_ROTL : TargetOpcode::G_ROTR;
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
2021-03-31 09:59:43 +08:00
Amara Emerson a35c2c7942 [GlobalISel] Implement fewerElements legalization for vector reductions.
This patch adds 3 methods, one for power-of-2 vectors which use tree
reductions using vector ops, before a final reduction op. For non-pow-2
types it generates multiple narrow reductions and combines the values with
scalar ops.

Differential Revision: https://reviews.llvm.org/D97163
2021-03-30 11:19:21 -07:00
Amara Emerson 91887cd4ec [AArch64][GlobalISel] Combine funnel shifts to rotates.
Differential Revision: https://reviews.llvm.org/D99388
2021-03-30 11:00:36 -07:00
Sourabh Singh Tomar f13f050551 [DebugInfo] Support for signed constants inside DIExpression
Negative numbers are represented using DW_OP_consts along with signed representation
of the number as the argument.

Test case IR is generated using Fortran front-end.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D99273
2021-03-30 23:20:38 +05:30
Jessica Paquette 700431128e [GlobalISel][AArch64] Combine G_SEXT_INREG + right shift -> G_SBFX
Basically a port of isBitfieldExtractOpFromSExtInReg in AArch64ISelDAGToDAG.

This is only done post-legalization for now. Once the legalizer knows how to
decompose these back into shifts, this requirement can probably be removed.

Differential Revision: https://reviews.llvm.org/D99230
2021-03-30 10:14:30 -07:00
Amara Emerson f5e9be6fdb [GlobalISel] Implement lowering for G_ROTR and G_ROTL.
This is a straightforward port.

Differential Revision: https://reviews.llvm.org/D99449
2021-03-30 09:44:41 -07:00
Tomas Matheson a9968c0a33 [NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions
Currently needsStackRealignment returns false if canRealignStack returns false.
This means that the behavior of needsStackRealignment does not correspond to
it's name and description; a function might need stack realignment, but if it
is not possible then this function returns false. Furthermore,
needsStackRealignment is not virtual and therefore some backends have made use
of canRealignStack to indicate whether a function needs stack realignment.

This patch attempts to clarify the situation by separating them and introducing
new names:

 - shouldRealignStack - true if there is any reason the stack should be
   realigned

 - canRealignStack - true if we are still able to realign the stack (e.g. we
   can still reserve/have reserved a frame pointer)

 - hasStackRealignment = shouldRealignStack && canRealignStack (not target
   customisable)

Targets can now override shouldRealignStack to indicate that stack realignment
is required.

This change will make it easier in a future change to handle the case where we
need to realign the stack but can't do so (for example when the register
allocator creates an aligned spill after the frame pointer has been
eliminated).

Differential Revision: https://reviews.llvm.org/D98716

Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87
2021-03-30 17:31:39 +01:00
Alok Kumar Sharma 9fb0025f70 [DebugInfo] Upgrade DISubragne::count to accept DIExpression also
This is needed for Fortran assumed shape arrays whose dimensions are
defined as,
  - 'count' is taken from array descriptor passed as parameter by
    caller, access from descriptor is defined by type DIExpression.
  - 'lowerBound' is defined by callee.
The current alternate way represents using upperBound in place of
count, where upperBound is calculated in callee in a temp variable
using lowerBound and count

Representation with count (DIExpression) is not only clearer as
compared to upperBound (DIVariable) but it has another advantage that
variable count is accessed by being parameter has better chance of
survival at higher optimization level than upperBound being local
variable.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D99335
2021-03-30 09:16:55 +05:30
Rahman Lavaee 90c401cab6 [Propeller] Do not generate the BB address map for empty functions.
Empty functions (functions with no real code) are irrelevant for propeller optimizations and their addresses sometimes conflict with other functions which obfuscates the analysis.
This simple change skips the BB address map emission for such functions.

Reviewed By: tmsriram

Differential Revision: https://reviews.llvm.org/D99395
2021-03-29 20:15:01 -07:00
Roger Ferrer Ibanez 489ca73ac4 [PrologEpilogInserter][AMDGPU] Only adjust offset for emergency spill slots if the stack grows down
D89239 adjusts the stack offset of emergency spill slots for overaligned
stacks. However the adjustment is not valid for targets whose stack
grows up (such as AMDGPU).

This change makes the adjustment conditional only to those targets whose
stack grows down.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49686

Differential Revision: https://reviews.llvm.org/D99504
2021-03-29 17:26:58 +00:00
Bradley Smith 9745dce8c3 [SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps
This is currently performed in SelectionDAGLegalize, here we make it also
happen in LegalizeVectorOps, allowing a target to lower the SETCC condition
codes first in LegalizeVectorOps and then lower to a custom node afterwards,
without having to duplicate all of the SETCC condition legalization in the
target specific lowering.

As a result of this, fixed length floating point SETCC nodes can now be
properly lowered for SVE.

Differential Revision: https://reviews.llvm.org/D98939
2021-03-29 15:32:25 +01:00
Florian Hahn eb3d9f2eb6
[SelDag] Add isIntOrFPConstant helper function.
This patch adds a new isIntOrFPConstant  helper function to check if a
SDValue is a integer of FP constant. This pattern is used in various
places.

There also are places that incorrectly just check for integer constants,
e.g. D99384, so hopefully this helper will help people avoid that issue.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D99428
2021-03-28 12:48:58 +01:00
Amara Emerson 55533203d7 [GlobalISel] Add G_ROTR and G_ROTL opcodes for rotates.
Differential Revision: https://reviews.llvm.org/D99383
2021-03-25 17:23:30 -07:00
Jessica Paquette 23f657c165 [AArch64][GlobalISel] Emit bzero on Darwin
Darwin platforms for both AArch64 and X86 can provide optimized `bzero()`
routines. In this case, it may be preferable to use `bzero` in place of a
memset of 0.

This adds a G_BZERO generic opcode, similar to G_MEMSET et al. This opcode can
be generated by platforms which may want to use bzero.

To emit the G_BZERO, this adds a pre-legalize combine for AArch64. The
conditions for this are largely a port of the bzero case in
`AArch64SelectionDAGInfo::EmitTargetCodeForMemset`.

The only difference in comparison to the SelectionDAG code is that, when
compiling for minsize, this will fire for all memsets of 0. The original code
notes that it's not beneficial to do this for small memsets; however, using
bzero here will save a mov from wzr. For minsize, I think that it's preferable
to prioritise omitting the mov.

This also fixes a bug in the libcall legalization code which would delete
instructions which could not be legalized. It also adds a check to make sure
that we actually get a libcall name.

Code size improvements (Darwin):

- CTMark -Os: -0.0% geomean (-0.1% on pairlocalalign)
- CTMark -Oz: -0.2% geomean (-0.5% on bullet)

Differential Revision: https://reviews.llvm.org/D99358
2021-03-25 17:14:25 -07:00
Amara Emerson 0d2c4db637 [GlobalISel] Fix crash in RBS with a non-generic IMPLICIT_DEF.
This may occur when swifterror codegen in the translator generates these,
but we shouldn't try to handle them since they should have regclasses anyway.

rdar://75784009

Differential Revision: https://reviews.llvm.org/D99287
2021-03-24 23:08:51 -07:00
Sander de Smalen 55d18b3cc2 [TTI] Return a TypeSize from getRegisterBitWidth.
This patch changes the interface to take a RegisterKind, to indicate
whether the register bitwidth of a scalar register, fixed-width vector
register, or scalable vector register must be returned.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D98874
2021-03-24 14:45:13 +00:00
Serguei Katkov 311d81ce97 [RegAlloc] Fix "ran out of regs" with uses in statepoint
Statepoint instruction is known to have a variable and big number of operands.
It is possible that Register Allocator will split live intervals in the way that all
physical registers are occupied by "zero-length" live intervals which are marked
as not-spillable.
While intervals are marked as not-spillable in the moment of creation when they are
really zero-length it is possible that in future as part of re-materialization there will
need for physical register between def and use of such tiny interval (the use is not
related to this interval at all).
As all physical registers are assigned to not-spillable intervals there is not avaialbe
registers and RA reports an error.

The idea of the fix is avoid marking tiny live intervals where there is a use in statepoint
instruction in var args section. Such interval may be perfectly spilled and folded to
operand of statepoint.

Reviewers: reames, dantrushin, qcolombet, dsanders, dmgreen
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D98766
2021-03-24 10:25:34 +07:00
serge-sans-paille e19884cd74 Introduce a generic operator to apply complex operations to BitVector
This avoids temporary and memcpy call when computing large expressions.

It's basically some kind of poor man's expression template, but it seems easier
to maintain to have a single generic `apply` call instead of the whole
expression template machinery here.

Differential Revision: https://reviews.llvm.org/D98176
2021-03-23 14:23:26 +01:00
Matt Arsenault b24436ac96 GlobalISel: Lower funnel shifts 2021-03-23 09:11:17 -04:00
David Sherwood 748ae5281d [IR][SVE] Add new llvm.experimental.stepvector intrinsic
This patch adds a new llvm.experimental.stepvector intrinsic,
which takes no arguments and returns a linear integer sequence of
values of the form <0, 1, ...>. It is primarily intended for
scalable vectors, although it will work for fixed width vectors
too. It is intended that later patches will make use of this
new intrinsic when vectorising induction variables, currently only
supported for fixed width. I've added a new CreateStepVector
method to the IRBuilder, which will generate a call to this
intrinsic for scalable vectors and fall back on creating a
ConstantVector for fixed width.

For scalable vectors this intrinsic is lowered to a new ISD node
called STEP_VECTOR, which takes a single constant integer argument
as the step. During lowering this argument is set to a value of 1.
The reason for this additional argument at the codegen level is
because in future patches we will introduce various generic DAG
combines such as

  mul step_vector(1), 2 -> step_vector(2)
  add step_vector(1), step_vector(1) -> step_vector(2)
  shl step_vector(1), 1 -> step_vector(2)
  etc.

that encourage a canonical format for all targets. This hopefully
means all other targets supporting scalable vectors can benefit
from this too.

I've added cost model tests for both fixed width and scalable
vectors:

  llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll
  llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll

as well as codegen lowering tests for fixed width and scalable
vectors:

  llvm/test/CodeGen/AArch64/neon-stepvector.ll
  llvm/test/CodeGen/AArch64/sve-stepvector.ll

See this thread for discussion of the intrinsic:
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html
2021-03-23 10:43:35 +00:00
Pushpinder Singh d0e5422eb8 [GlobalISel][AMDGPU] Lower G_UMULO/G_SMULO
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D93963
2021-03-23 05:45:43 +00:00
Max Kazantsev 105dc0f9de [NFC] Fix typo longre -> longer 2021-03-23 12:13:52 +07:00
Rahman Lavaee 949abf7d6a [llvm-readelf, propeller] Add fallthrough bit to basic block metadata in BB-Address-Map section.
This patch adds a fallthrough bit to basic block metadata, indicating whether the basic block can fallthrough without taking any branches. The bit will help us avoid an intel LBR bug which results in occasional duplicate entries at the beginning of the LBR stack.

This patch uses `MachineBasicBlock::canFallThrough()` to set the bit. This is not a const method because it eventually calls `TargetInstrInfo::analyzeBranch`, but it calls this function with the default `AllowModify=false`. So we can either make the argument to the `getBBAddrMapMetadata` non-const, or we can use `const_cast` when calling `canFallThrough`. I decide to go with the latter since this is purely due to legacy code, and in general we should not allow the BasicBlock to be mutable during `getBBAddrMapMetadata`.

Reviewed By: tmsriram

Differential Revision: https://reviews.llvm.org/D96918
2021-03-22 21:38:05 -07:00
Sanjay Patel 664d0c052c [TargetTransformInfo] move branch probability query from TargetLoweringInfo
This is no-functional-change intended (NFC), but needed to allow
optimizer passes to use the API. See D98898 for a proposed usage
by SimplifyCFG.

I'm simplifying the code by removing the cl::opt. That was added
back with the original commit in D19488, but I don't see any
evidence in regression tests that it was used. Target-specific
overrides can use the usual patterns to adjust as necessary.
We could also restore that cl::opt, but it was not clear to me
exactly how to do it in the convoluted TTI class structure.
2021-03-22 15:55:34 -04:00
Matt Arsenault 9fdfd8dd52 GlobalISel: Add utility function to constant fold FP ops 2021-03-22 14:38:17 -04:00
Matt Arsenault c34819afe3 GlobalISel: Handle G_BUILD_VECTOR in isKnownToBeAPowerOfTwo 2021-03-22 14:20:35 -04:00
Craig Topper 2f13e63f9e [LegalizeDAG] Add asserts to verify the types of custom legalized operation matches the original node.
We've messed this up a few times recently on RISCV. Experiments
with these asserts found a couple issues on other targets as well.
They've all been cleaned up now so we can put in these asserts to
catch future issues

I had to waive Glue because ADDC/ADDE/etc legalization replaces
Glue with i32 on at least AArch64. X86 used to do the same before
we switched to ADDCARRY. So I guess that's just how that works.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D98979
2021-03-22 10:28:51 -07:00
Craig Topper 30080b003e [DAGCombiner] Minor compile time improvement to (sext_in_reg (sign_extend_vector_inreg x)) optimization.
Don't bother calling ComputeNumSignBits if N00Bits < ExtVTBits. No
matter what answer we get back this will be true:
(N00Bits - DAG.ComputeNumSignBits(N00, DemandedSrcElts)) < ExtVTBits)

So we might as well save the computation. This makes the code more
consistent with the similar (sext_in_reg (sext x)) handling above.
2021-03-21 11:16:41 -07:00
Matt Arsenault 20a24af01d MIR: Fix missing serialization for HasTailCall 2021-03-21 13:14:04 -04:00
Matt Arsenault 1098acd46d GlobalISel: Avoid unnecessary truncation to i64
We can just directly pass through the APInt to create a new constant.
2021-03-21 10:07:41 -04:00
Simon Pilgrim 64c2641c89 [DAG] Limit (sext_in_reg (zero_extend_vector_inreg x)) to exact sign extension
As commented by @craig.topper on rG1ba5c550d418, we can't guarantee that we'll be extending zero bits, just sign bit. So, revert to the old code for zero_extend_vector_inreg cases.
2021-03-21 14:01:37 +00:00
Jessica Paquette 4773dd5ba9 [GlobalISel] Add G_SBFX + G_UBFX (bitfield extraction opcodes)
There is a bunch of similar bitfield extraction code throughout *ISelDAGToDAG.

E.g, ARMISelDAGToDAG, AArch64ISelDAGToDAG, and AMDGPUISelDAGToDAG all contain
code that matches a bitfield extract from an and + right shift.

Rather than duplicating code in the same way, this adds two opcodes:

- G_UBFX (unsigned bitfield extract)
- G_SBFX (signed bitfield extract)

They work like this

```
%x = G_UBFX %y, %lsb, %width
```

Where `lsb` and `width` are

- The least-significant bit of the extraction
- The width of the extraction

This will extract `width` bits from `%y`, starting at `lsb`. G_UBFX zero-extends
the result, while G_SBFX sign-extends the result.

This should allow us to use the combiner to match the bitfield extraction
patterns rather than duplicating pattern-matching code in each target.

Differential Revision: https://reviews.llvm.org/D98464
2021-03-19 14:37:19 -07:00
Simon Pilgrim 9d2df96407 [DAG] computeKnownBits - add ISD::MULHS/MULHU/SMUL_LOHI/UMUL_LOHI handling
Reuse the existing KnownBits multiplication code to handle the 'extend + multiply + extract high bits' pattern for multiply-high ops.

Noticed while looking at the codegen for D88785 / D98587 - the patch helps division-by-constant expansion code in particular, which suggests that we might have some further KnownBits div/rem cases we could handle - but this was far easier to implement.

Differential Revision: https://reviews.llvm.org/D98857
2021-03-19 16:02:31 +00:00
Simon Pilgrim ffb2887103 [DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),undef) -> bop(shuffle'(x,y),shuffle'(z,w))
Followup to D96345, handle unary shuffles of binops (as well as binary shuffles) if we can merge the shuffle with inner operand shuffles.

Differential Revision: https://reviews.llvm.org/D98646
2021-03-19 14:14:56 +00:00
Craig Topper 182b831aeb [DAGCombiner][RISCV] Teach visitMGATHER/MSCATTER to remove gather/scatters with all zeros masks that use SPLAT_VECTOR.
Previously only all zeros BUILD_VECTOR was recognized.
2021-03-18 15:34:14 -07:00
Simon Pilgrim 1ba5c550d4 [DAG] Improve folding (sext_in_reg (*_extend_vector_inreg x)) -> (sext_vector_inreg x)
Extend this to support ComputeNumSignBits of the (used) source vector elements so that we can handle more than just the case where we're sext_in_reg from the source element signbit.

Noticed while investigating the poor codegen in D98587.
2021-03-18 15:34:53 +00:00
Matt Arsenault b9a0384983 GlobalISel: Preserve source value information for outgoing byval args
Pass through the original argument IR value in order to preserve the
aliasing information in the memcpy memory operands.
2021-03-18 09:16:54 -04:00
Matt Arsenault 61f834cc09 GlobalISel: Insert memcpy for outgoing byval arguments
byval requires an implicit copy between the caller and callee such
that the callee may write into the stack area without it modifying the
value in the parent. Previously, this was passing through the raw
pointer value which would break if the callee wrote into it.

Most of the time, this copy can be optimized out (however we don't
have the optimization SelectionDAG does yet).

This will trigger more fallbacks for AMDGPU now, since we don't have
legalization for memcpy yet (although we should stop using byval
anyway).
2021-03-18 09:16:54 -04:00
Simon Pilgrim b1afa187c8 [DAG] SelectionDAG::isSplatValue - add ISD::ABS handling
Add ISD::ABS to the existing unary instructions handling for splat detection

This is similar to D83605, but doesn't appear to need to touch any of the wasm refactoring.

Differential Revision: https://reviews.llvm.org/D98778
2021-03-18 10:28:29 +00:00
Amara Emerson 28963d895b [GlobalISel] Don't DCE LIFETIME_START/LIFETIME_END markers.
These are pseudos without any users, so DCE was killing them in the combiner.

Marking them as having side effects doesn't seem quite right since they don't.

Gives a nice 0.3% geomean size win on CTMark -Os.

Differential Revision: https://reviews.llvm.org/D98811
2021-03-17 18:02:08 -07:00
Amara Emerson d7fed7b899 [AArch64][GlobalISel] Fall back if disabling neon/fp in the translator.
The previous technique relied on early-exiting the legalizer predicate
initialization, leaving an empty rule table. That causes a fallback
for most instructions, but some have legacy rules defined like G_ZEXT
which can try continue, but then crash.

We should fall back earlier, in the translator, to avoid this issue.

Differential Revision: https://reviews.llvm.org/D98730
2021-03-17 15:08:08 -07:00
Stephen Tozer 3bfddc2593 Reapply "[DebugInfo] Handle multiple variable location operands in IR"
Fixed section of code that iterated through a SmallDenseMap and added
instructions in each iteration, causing non-deterministic code; replaced
SmallDenseMap with MapVector to prevent non-determinism.

This reverts commit 01ac6d1587.
2021-03-17 16:45:25 +00:00
Hans Wennborg 01ac6d1587 Revert "[DebugInfo] Handle multiple variable location operands in IR"
This caused non-deterministic compiler output; see comment on the
code review.

> This patch updates the various IR passes to correctly handle dbg.values with a
> DIArgList location. This patch does not actually allow DIArgLists to be produced
> by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
> Other than that, it should cover every IR pass.
>
> Most of the changes simply extend code that operated on a single debug value to
> operate on the list of debug values in the style of any_of, all_of, for_each,
> etc. Instances of setOperand(0, ...) have been replaced with with
> replaceVariableLocationOp, which takes the value that is being replaced as an
> additional argument. In places where this value isn't readily available, we have
> to track the old value through to the point where it gets replaced.
>
> Differential Revision: https://reviews.llvm.org/D88232

This reverts commit df69c69427.
2021-03-17 13:36:48 +01:00
Nikita Popov 40bc309911 Revert "[regalloc] Ensure Query::collectInterferringVregs is called before interval iteration"
This reverts commit d40b4911bd.

This causes a large compile-time regression:
https://llvm-compile-time-tracker.com/compare.php?from=0aa637b2037d882ddf7861284169abf63f524677&to=d40b4911bd9aca0573752e065f29ddd9aff280e1&stat=instructions
2021-03-16 20:41:26 +01:00
Mircea Trofin d40b4911bd [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration
The main part of the patch is the change in RegAllocGreedy.cpp: Q.collectInterferringVregs()
needs to be called before iterating the interfering live ranges.

The rest of the patch offers support that is the case: instead of  clearing the query's
InterferingVRegs field, we invalidate it. The clearing happens when the live reg matrix
is invalidated (existing triggering mechanism).

Without the change in RegAllocGreedy.cpp, the compiler ices.

This patch should make it more easily discoverable by developers that
collectInterferringVregs needs to be called before iterating.

I will follow up with a subsequent patch to improve the usability and maintainability of Query.

Differential Revision: https://reviews.llvm.org/D98232
2021-03-16 12:10:10 -07:00
serge-sans-paille 35368bbdbb [NFC] Replace loop by idiomatic llvm::find_if 2021-03-16 12:49:19 +01:00
serge-sans-paille 6e040a19db [NFC] Wisely nest dyn_cast in FunctionLoweringInfo
Take advantage of the inheritance tree to avoid a few comparison.
2021-03-16 10:22:44 +01:00
Fangrui Song 5d44c92bf8 Change void getNoop(MCInst &NopInst) to MCInst getNop()
Prefer (self-documenting) return values to output parameters (which are
liable to be used).
While here, rename Noop to Nop which is more widely used and improves
consistency with hasEmitNops/setEmitNops/emitNop/etc.
2021-03-15 12:05:34 -07:00
Fraser Cormack 0035decae7 [CodeGen] Fix issues with scalable-vector INSERT/EXTRACT_SUBVECTORs
This patch addresses a few issues when dealing with scalable-vector
INSERT_SUBVECTOR and EXTRACT_SUBVECTOR nodes.

When legalizing in DAGTypeLegalizer::SplitVecRes_INSERT_SUBVECTOR, we
store the low and high halves to the stack separately. The offset for
the high half was calculated incorrectly.

Additionally, we can optimize this process when we can detect that the
subvector is contained entirely within the low/high split vector type.
While this optimization is valid on scalable vectors, when performing
the 'high' optimization, the subvector must also be a scalable vector.
Note that the 'low' optimization is still conservative: it may be
possible to insert v2i32 into the low half of a split nxv1i32/nxv1i32,
but we can't guarantee it. It is always possible to insert v2i32 into
nxv2i32 or v2i32 into nxv4i32+2 as we know vscale is at least 1.

Lastly, in SelectionDAG::isSplatValue, we early-exit on the extracted subvector value
type being a scalable vector, forgetting that we can also extract a
fixed-length vector from a scalable one.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98495
2021-03-15 17:04:21 +00:00
Philip Reames 7d38a91a7f Restore fixed version of "[CodeGenPrepare] Fix isIVIncrement (PR49466)"
Change was reverted in commit 8d20f2c2c6 because it was causing an infinite loop.  9228f2f32 fixed the root issue in the code structure, this change just reapplies the original change w/adaptation to the new code structure.
2021-03-13 15:25:02 -08:00
Philip Reames 9228f2f322 [CGP] Consolidate logic for getIVIncrement and isIVIncrement
This fixes the bug demonstrated by the test case in the commit message of 8d20f2c2 (which was a revert of cf82700).  The root issue was that we have two transforms which are inverses of each other.  We use one for simple induction variables (where we can use the post-inc form), and the other for everything else.  The problem was that the two transforms could disagree about whether something was an induction variable.

The reverted commit made a change to one of the matcher routines which was used for one of the two transforms without updating the other matcher.  However, it's worth noting the existing code w/o the reverted change also has cases where the decision could differ between the two paths.

The fix is simply to consolidate the code such that two paths must agree by construction, and to add an assert to catch any potential future re-divergence.

Triggering the infinite loop requires side stepping the SunkAddrs cache.  The SunkAddrs cache has the effect of suppressing the iteration in the common case, but there are codepaths through CGP which restart iteration and clear this cache.

Unfortunately, I have not been able to construct a standalone IR test case for this.  The original test case is a c++ program which when compiled by clang demonstrates the infinite loop, but all of my attempts at extracting an IR test case runnable through opt/llc have failed to reproduce.  (Including capturing the IR at point of the transform itself!)  I have no idea what weird state clang is creating here.

I also tried creating a test case by hand, but gave up after about an hour of trying to find the right combination to dance through multiple transforms to create the end result needed to trip the bug.
2021-03-13 14:55:25 -08:00
Craig Topper 5b825433d7 [DAGCombiner] Optimize 1-bit smulo to AND+SETNE.
A 1-bit smulo overflows is both inputs are -1 since the result
should be +1 which can't be represented in a signed 1 bit value.

We can detect this with an AND and a setcc. The multiply result
can also use the same AND.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97634
2021-03-13 09:39:36 -08:00
Jordan Rupprecht 8d20f2c2c6 Revert "[CodeGenPrepare] Fix isIVIncrement (PR49466)"
This reverts commit cf82700af8 due to a compile timeout when building the following with `clang -O2`:

```
template <class, class = int> class a;
struct b {
  using d = int *;
};
struct e {
  using f = b::d;
};
class g {
public:
  e::f h;
  e::f i;
};
template <class, class> class a : g {
public:
  long j() const { return i - h; }
  long operator[](long) const noexcept;
};
template <class c, class k> long a<c, k>::operator[](long l) const noexcept {
  return h[l];
}
template <typename m, typename n> int fn1(m, n, const char *);
int o, p;
class D {
  void q(const a<long> &);
  long r;
};
void D::q(const a<long> &l) {
  int s;
  if (l[0])
    for (; l.j(); ++s) {
      if (l[s])
        while (fn1(o, 0, ""))
          ;
      r = l[s] / p;
    }
}
```
2021-03-12 13:59:14 -08:00
Craig Topper 2ea7014089 [DAGCombiner] Use isConstantSplatVectorAllZeros/Ones instead of isBuildVectorAllZeros/Ones in visitMSTORE and visitMLOAD.
This allows us to optimize when the mask is a splat_vector in
addition to build_vector.
2021-03-12 12:14:56 -08:00
Nikita Popov 42eb658f65 [OpaquePtrs] Remove some uses of type-less CreateGEP() (NFC)
This removes some (but not all) uses of type-less CreateGEP()
and CreateInBoundsGEP() APIs, which are incompatible with opaque
pointers.

There are a still a number of tricky uses left, as well as many
more variation APIs for CreateGEP.
2021-03-12 21:01:16 +01:00
Matt Arsenault 6b76d82853 GlobalISel: Fix marking byval arguments as immutable
byval arguments need to be assumed writable. Only implicitly stack
passed arguments which aren't addressable in the IR can be assumed
immutable.

Mips is still broken since for some reason its doing its own thing
with the ValueHandlers (and x86 doesn't actually handle byval
arguments now, although some of the code is there).
2021-03-12 09:01:53 -05:00
Matt Arsenault 34471c3060 GlobalISel: Partially fix handling of byval arguments
This was essentially ignoring byval and treating them as a pointer
argument which needed to be loaded from. This should copy the frame
index value to the virtual register, not insert a load from the frame
index into the pointer value.

For AMDGPU, this was producing a load from the byval pointer argument,
to a pointer used for the byval arguments. I do not understand how
AArch64 managed to work before since it appears to be similarly
broken.

We could also change the ValueHandler API to avoid the extra copy from
the frame index, since currently it returns a new register.

I believe there is still an issue with outgoing byval arguments. These
should have a copy inserted in case the callee decided to overwrite
the memory.
2021-03-12 09:01:53 -05:00
LemonBoy cfe69c8efd [SelectionDAG] Improve scalarization of irregular vector types
Use a more general strategy when splitting a vector into scalar parts (and vice-versa) to correctly handle vector types whose element size is not a power of 2 (and a multiple of 8).

Reviewed By: atanasyan

Differential Revision: https://reviews.llvm.org/D98273
2021-03-11 19:57:13 +01:00
David Green fad70c3068 [ARM] Improve WLS lowering
Recently we improved the lowering of low overhead loops and tail
predicated loops, but concentrated first on the DLS do style loops. This
extends those improvements over to the WLS while loops, improving the
chance of lowering them successfully. To do this the lowering has to
change a little as the instructions are terminators that produce a value
- something that needs to be treated carefully.

Lowering starts at the Hardware Loop pass, inserting a new
llvm.test.start.loop.iterations that produces both an i1 to control the
loop entry and an i32 similar to the llvm.start.loop.iterations
intrinsic added for do loops. This feeds into the loop phi, properly
gluing the values together:

  %wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div)
  %wls0 = extractvalue { i32, i1 } %wls, 0
  %wls1 = extractvalue { i32, i1 } %wls, 1
  br i1 %wls1, label %loop.ph, label %loop.exit
...
loop:
  %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ]
  ..
  %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1)
  %cmp = icmp ne i32 %iv.next, 0
  br i1 %cmp, label %loop, label %loop.exit

The llvm.test.start.loop.iterations need to be lowered through ISel
lowering as a pair of WLS and WLSSETUP nodes, which each get converted
to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent
t2WhileLoopStart from being a terminator that produces a value,
something difficult to control at that stage in the pipeline. Instead
the t2WhileLoopSetup produces the value of LR (essentially acting as a
lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc).

These are then converted into a single t2WhileLoopStartLR at the same
point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop
to prevent them from progressing further in the pipeline. The
t2WhileLoopStartLR is a single instruction that takes a GPR and produces
LR, similar to the WLS instruction.

  %1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3
  t2B %bb.1
...
bb.2.loop:
  %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2
  ...
  %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2
  t2B %bb.3

The t2WhileLoopStartLR can then be treated similar to the other low
overhead loop pseudos, eventually being lowered to a WLS providing the
branches are within range.

Differential Revision: https://reviews.llvm.org/D97729
2021-03-11 17:56:19 +00:00
Craig Topper e9426dfbae [ValueTypes][RISCV] Add MVT for v1f16.
RISCV makes all fixed vector MVTs with size less than or equal
to a command line option legal.

This didn't include v1f16 because it was missing but did include v1f32 and v1f64.

One test is affected where we did test this type, but it is a horizontal
reduction so it is non-sensical. Perhaps we should canonicalize that
away somewhere.

I'm not sure if we should be making v1 types legal, but this will at
least make RISCV consistent across all types.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98365
2021-03-11 09:23:18 -08:00
Matt Arsenault cf5ecd5644 GlobalISel: Fix off by one in finding explicit byval alignment
For attribute sets, the return index is at 0, and arguments start at
1. getParamAlignment adds the offset of 1, so we need to convert from
attribute index back to IR index.
2021-03-11 10:23:08 -05:00
Stephen Tozer f40976bd01 Revert "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
This reverts commit c0f3dfb9f1.

Reverted due to an error on the clang-x64-windows-msvc buildbot.
2021-03-11 14:48:01 +00:00
gbtozers c0f3dfb9f1 [DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands
This patch improves salvageDebugInfoImpl by allowing it to salvage arithmetic
operations with two or more non-const operands; this includes the GetElementPtr
instruction, and most Binary Operator instructions. These salvages produce
DIArgList locations and are only valid for dbg.values, as currently variadic
DIExpressions must use DW_OP_stack_value. This functionality is also only added
for salvageDebugInfoForDbgValues; other functions that directly call
salvageDebugInfoImpl (such as in ISel or Coroutine frame building) can be
updated in a later patch.

Differential Revision: https://reviews.llvm.org/D91722
2021-03-11 13:33:49 +00:00
Serguei Katkov 0480927712 [Statepoint Lowering] Handle the case with several gc.result
Recently gc.result has been marked with readnone instead of readonly and
this opens a door for different optimization to duplicate gc.result.
Statepoint lowering is not ready to see several gc.results.
The problem appears when there are gc.results with one located in the same
basic block and another located in other basic block.
In this case we need both export VR and fill local setValue.

Note that this case is not sufficient optimization done before CodeGen.
It is evident that local gc.result dominates all other gc.results and it is handled
by GVN and EarlyCSE.

But anyway, even if IR is not optimal Backend should not crash on a valid IR.

Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D98393
2021-03-11 18:44:44 +07:00
David Blaikie 80d1f657a1 Fix unused lambda capture in a non-asserts build
For locally scoped lambdas like this there's no particular benefit to
explicitly listing captures - or avoiding capturing this. Switch to [&]
and make it all easier to maintain.

(& driveby change std::function to llvm::function_ref)
2021-03-11 00:22:18 -08:00
Daniel Sanders 134a179dee [mir] Change 'undef' for MMO base addresses to 'unknown-address'
Differential Revision: https://reviews.llvm.org/D98100
2021-03-10 16:46:44 -08:00
Quentin Colombet 66dab2fa84 [NFC] Fix compiler warnings
Fix warnings caused by -Wrange-loop-analysis.

Patch by Xiaoqing Wu <xiaoqing_wu@apple.com>

Differential Revision: https://reviews.llvm.org/D98298
2021-03-10 11:03:50 -08:00
Craig Topper 9106d04554 [RISCV][SelectionDAG] Introduce an ISD::SPLAT_VECTOR_PARTS node that can represent a splat of 2 i32 values into a nxvXi64 vector for riscv32.
On riscv32, i64 isn't a legal scalar type but we would like to
support scalable vectors of i64.

This patch introduces a new node that can represent a splat made
of multiple scalar values. I've used this new node to solve the current
crashes we experience when getConstant is used after type legalization.

For RISCV, we are now default expanding SPLAT_VECTOR to SPLAT_VECTOR_PARTS
when needed and then handling the SPLAT_VECTOR_PARTS later during
LegalizeOps. I've remove the special case I previously put in for
ABS for D97991 as the default expansion is now able to succesfully
use getConstant.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98004
2021-03-10 09:46:18 -08:00
Stephen Tozer 1db137b185 [DebugInfo] Handle DBG_VALUES with multiple variable location operands in MIR
This patch adds handling for DBG_VALUE_LIST in the MIR-passes (after
finalize-isel), excluding the debug liveness passes and DWARF emission. This
most significantly affects MachineSink, which now needs to consider all used
registers of a debug value when sinking, but for most passes this change is
simply replacing getDebugOperand(0) with an iteration over all debug operands.

Differential Revision: https://reviews.llvm.org/D92578
2021-03-10 17:15:24 +00:00
Stephen Tozer e64f3ccca3 Reapply "[DebugInfo] Add DWARF emission for DBG_VALUE_LIST"
This reverts commit 429c6ecbb3.
2021-03-10 15:59:24 +00:00
Stephen Tozer 429c6ecbb3 Revert "[DebugInfo] Add DWARF emission for DBG_VALUE_LIST"
This reverts commit 0da27ba56c.

This revision was causing an error on the sanitizer-x86_64-linux-autoconf build.
2021-03-10 14:35:33 +00:00
gbtozers 0da27ba56c [DebugInfo] Add DWARF emission for DBG_VALUE_LIST
This patch allows DBG_VALUE_LIST instructions to be emitted to DWARF with valid
DW_AT_locations. This change mainly affects DbgEntityHistoryCalculator, which
now tracks multiple registers per value, and DwarfDebug+DwarfExpression, which
can now emit multiple machine locations as part of a DWARF expression.

Differential Revision: https://reviews.llvm.org/D83495
2021-03-10 13:46:20 +00:00
Christudasan Devadasan 4c6ab48fb1 GlobalISel: Try to combine G_[SU]DIV and G_[SU]REM
It is good to have a combined `divrem` instruction when the
`div` and `rem` are computed from identical input operands.
Some targets can lower them through a single expansion that
computes both division and remainder. It effectively reduces
the number of instructions than individually expanding them.

Reviewed By: arsenm, paquette

Differential Revision: https://reviews.llvm.org/D96013
2021-03-10 18:46:07 +05:30
Jinzheng Tu 481079e284 [NFC] Unify FIME with FIXME in comments
There are 5 occurrences FIME and 15333 FIXME. All of them should be FIXME.

Reviewed By: alexfh

Differential Revision: https://reviews.llvm.org/D98321
2021-03-10 14:00:51 +01:00
Serguei Katkov 2fccd1b00a [Statepoint Lowering] Fix the crash with gc.relocate in a separate block
If it was decided to relocate derived pointer using the spill its value is
not exported in general case.
When gc.relocate is located in an another block than a statepoint we cannot
get SD for derived value but for spill case it is not required at all.
However implementation of gc.relocate lowering unconditionally request SD value
causing the assert triggering.

The CL fixes this by handling spill case earlier than SD is really required.

Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D98324
2021-03-10 19:51:04 +07:00
gbtozers 7d0cafba96 [DebugInfo] Process DBG_VALUE_LIST in LiveDebugVariables
This patch adds support for DBG_VALUE_LIST in the LiveDebugVariables pass. The
changes are mostly in computeIntervals, extendDef, and addDefsFromCopies; when
extending the def of a DBG_VALUE_LIST the live ranges of every used register
must be considered, and when such a def is killed by more than one of its used
registers being killed at the same time it is necessary to find valid copies of
all of those registers to create a new def with.

The DebugVariableValue class has also been changed to reference multiple
location numbers instead of just one. This has been accomplished by using a
C-style array with a unique_ptr and an array length packed into 6 bits, to
minimize the size of the class (which must be kept low to be used with
IntervalMap). This may not be the most efficient solution possible, and should
be looked at if performance issues arise.

Differential Revision: https://reviews.llvm.org/D83895
2021-03-10 12:37:59 +00:00
Philip Reames d6394d86ca [cgp] improve robustness of uadd/usub transforms
LSR prefers to schedule iv increments just before the latch.  The recent 80511565 broadened this to moving increments in the original IR.  This pointed out a robustness problem with the CGP transform.

When we have a use of an induction increment outside of the loop (we canonicalize away from this form, but it happens e.g. unanalyzeable loops) we'd avoid performing the uadd/usub transform.  Interestingly, all of these involve moving the increment closer to it's operands, so there's no concern about dominating all uses.  We can handle that case cheaply, resulting in a more robust transform.
2021-03-09 11:52:08 -08:00
Amara Emerson 55e760769b [GlobalISel] Fold away G_BUILD_VECTOR with all elements extracted.
If every element is extracted from a G_BUILD_VECTOR, pass through the source
registers. This is different to the extract(build_vector) combine because this
one tolerates multiple users as long as they're exhaustive.

Differential Revision: https://reviews.llvm.org/D97890
2021-03-09 11:34:26 -08:00
Philip Reames e85d798b5b [cgp] group related code together [nfc] 2021-03-09 11:23:15 -08:00
Amara Emerson e60ab72137 [AArch64][GlobalISel] Add combine for extract_vector_elt(build_vector, cst)
Differential Revision: https://reviews.llvm.org/D97835
2021-03-09 11:08:02 -08:00
gbtozers e2196ddcdb [DebugInfo] Process DBG_VALUE_LIST in LiveDebugValues
This patch implements DBG_VALUE_LIST handling to the LiveDebugValues pass. This
is a substantial change, and makes a few fundamental changes to the existing
logic.

We still use the basic model of a VarLocMap that is indexed by a LocIndex, with
a VarLocSet (a CoalescingBitVector underneath) giving us efficient lookups of
existing variable locations for a given location type. The main change is that
the VarLocMap may contain a given VarLoc multiple times (once for each unique
location operand), so that a VarLoc can be looked up from any of the registers
that it uses. This means that each VarLoc has multiple corresponding LocIndexes;
to allow us to iterate through the set of VarLocs (previously we would iterate
through the VarLocSet), we now also maintain a single entry in the VarLocMap
that contains every VarLoc exactly once.

The VarLoc class itself is also changed; this change is much simpler,
refactoring out location-specific members into a MachineLocation class and
adding a vector of these locations.

Differential Revision: https://reviews.llvm.org/D83890
2021-03-09 18:58:26 +00:00
Nikita Popov 55ae279ba7 [FastISel] Don't trivially kill extractvalues (PR49467)
All extractvalues of the same value at the same index will map to
the same register, so even if one specific extractvalue only has
one use, we should not mark it as a trivial kill, as there may be
more extractvalues later.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49467.

Differential Revision: https://reviews.llvm.org/D98145
2021-03-09 18:46:38 +01:00
gbtozers df69c69427 [DebugInfo] Handle multiple variable location operands in IR
This patch updates the various IR passes to correctly handle dbg.values with a
DIArgList location. This patch does not actually allow DIArgLists to be produced
by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
Other than that, it should cover every IR pass.

Most of the changes simply extend code that operated on a single debug value to
operate on the list of debug values in the style of any_of, all_of, for_each,
etc. Instances of setOperand(0, ...) have been replaced with with
replaceVariableLocationOp, which takes the value that is being replaced as an
additional argument. In places where this value isn't readily available, we have
to track the old value through to the point where it gets replaced.

Differential Revision: https://reviews.llvm.org/D88232
2021-03-09 16:44:38 +00:00
gbtozers 5491a86f59 [DebugInfo] Emit DBG_VALUE_LIST from ISel
This patch completes ISel support for DIArgList dbg.values by allowing
SDDbgValues with multiple location operands to be emitted as DBG_VALUE_LIST
instructions.

The primary change of this patch is refactoring EmitDbgValue by pulling location
operand emission out to the new function AddDbgValueLocationOps, which is used
for both DIArgList and single value dbg.values. Outside of that, the only
behaviour change is that the scheduler has a lambda added, HasUnknownVReg, to
prevent us from attempting to emit a DBG_VALUE_LIST before all of its used VRegs
have become available.

Differential Revision: https://reviews.llvm.org/D88592
2021-03-09 12:17:39 +00:00
John Brawn bf3a271960 [CodeGen] Report a normal instead of fatal error for label redefinition
A symbol being redefined as a label is something that can happen as a result of
ordinary input, so it shouldn't cause a fatal error. Also adjust the error
message to match the one you get when a symbol is redefined as a variable.

Differential Revision: https://reviews.llvm.org/D98181
2021-03-09 10:54:41 +00:00
Cullen Rhodes 2750f3ed31 [IR] Introduce llvm.experimental.vector.splice intrinsic
This patch introduces a new intrinsic @llvm.experimental.vector.splice
that constructs a vector of the same type as the two input vectors,
based on a immediate where the sign of the immediate distinguishes two
variants. A positive immediate specifies an index into the first vector
and a negative immediate specifies the number of trailing elements to
extract from the first vector.

For example:

  @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E>  ; index
  @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, -3) ==> <B, C, D, E> ; trailing element count

These intrinsics support both fixed and scalable vectors, where the
former is lowered to a shufflevector to maintain existing behaviour,
although while marked as experimental the recommended way to express
this operation for fixed-width vectors is to use shufflevector. For
scalable vectors where it is not possible to express a shufflevector
mask for this operation, a new ISD node has been implemented.

This is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].

Patch by Paul Walker and Cullen Rhodes.

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D94708
2021-03-09 10:44:22 +00:00
gbtozers 93b170ea24 [DebugInfo] Handle dbg.values with multiple variable location operands in ISel
This patch adds partial support in Instruction Selection for dbg.values that use
a DIArgList. This patch does not add support for producing DBG_VALUE_LIST, but
adds the logic for processing DIArgLists within the ISel pass. This change is
largely focused on handleDebugValue and some of the functions that it calls.
Outside of this, salvageDebugInfo and transferDbgValues have been modified to
replace individual operands instead of the entire value; dangling debug info for
variadic debug values is not currently supported (but may be added later).

Differential Revision: https://reviews.llvm.org/D88589
2021-03-09 09:48:03 +00:00
Ta-Wei Tu cf82700af8 [CodeGenPrepare] Fix isIVIncrement (PR49466)
In the NFC commit 8d835f42a5, the check for `!L` is
moved to a separate function `getIVIncrement` which, instead of using `BO->getParent()`,
uses `PN->getParent()`. However, these two basic blocks are not necessarily the same.

https://bugs.llvm.org/show_bug.cgi?id=49466 demonstrates a case where `PN` is contained in
a loop while `BO` is not, causing the null-pointer dereference in `L->getLoopLatch()`.

This patch checks whether both `BO` and `PN` belong to the same loop before entering `getIVIncrement`.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D98144
2021-03-09 13:32:34 +08:00
Jessica Paquette f7d73a6b9e [SelectionDAG] Don't scalarize vector fpround sources that don't need it.
Similar to the workaround code in ScalarizeVecRes_UnaryOp, ScalarizeVecRes_SETCC
, ScalarizeVecRes_VSELECT, etc.

If we have a case like this:

```
define <1 x half> @func(<1 x float> %x) {
  %tmp = fptrunc <1 x float> %x to <1 x half>
  ret <1 x half> %tmp
}
```

On AArch64, the <1 x float> is legal. So, this will crash if we call
GetScalarizedVector on it.

Differential Revision: https://reviews.llvm.org/D98208
2021-03-08 14:37:33 -08:00
Jessica Paquette 5c26be214d [AArch64][GlobalISel] Lower G_BUILD_VECTOR -> G_DUP
If we have

```
%vec = G_BUILD_VECTOR %reg, %reg, ..., %reg
```

Then lower it to

```
%vec = G_DUP %reg
```

Also update the selector to handle constant splats on G_DUP.

This will not combine when the splat is all zeros or ones. Tablegen-imported
patterns rely on these being G_BUILD_VECTOR.

Minor code size improvements on CTMark at -Os.

Also adds some utility functions to make it a bit easier to recognize splats,
and an AArch64-specific splat helper.

Differential Revision: https://reviews.llvm.org/D97731
2021-03-08 13:01:10 -08:00
Min-Yih Hsu 6dcc325ce0 [M68k][MIR](2/8) Changes in the target-independent MIR part
- Add new callback in `TargetInstrInfo` --
  `isPCRelRegisterOperandLegal` -- to query whether pc-rel
   register MachineOperand is legal.
 - Add new function to search DebugLoc in a reverse ordering

Authors: myhsu, m4yers, glaubitz

Differential Revision: https://reviews.llvm.org/D88386
2021-03-08 12:30:57 -08:00
Stephen Tozer c0450af559 Fix: [DebugInfo] Support representation of multiple location operands in SDDbgValue
Removes a "default" label from a fully covered switch, causing errors on
-Wcovered-switch-default builds.
2021-03-08 19:14:12 +00:00
gbtozers 9525af7b91 [DebugInfo] Support representation of multiple location operands in SDDbgValue
This patch modifies the class that represents debug values during ISel,
SDDbgValue, to support multiple location operands (to represent a dbg.value that
uses a DIArgList). Part of this class's functionality has been split off into a
new class, SDDbgOperand.

The new class SDDbgOperand represents a single value, corresponding to an SSA
value or MachineOperand in the IR and MIR respectively. Members of SDDbgValue
that were previously related to that specific value (as opposed to the
variable or DIExpression), such as the Kind enum, have been moved to
SDDbgOperand. SDDbgValue now contains an array of SDDbgOperand instead, allowing
it to hold more than one of these values.

All changes outside SDDbgValue are simply updates to use the new interface.

Differential Revision: https://reviews.llvm.org/D88585
2021-03-08 18:45:17 +00:00
gbtozers e5d958c456 [DebugInfo] Support DIArgList in DbgVariableIntrinsic
This patch updates DbgVariableIntrinsics to support use of a DIArgList for the
location operand, resulting in a significant change to its interface. This patch
does not update all IR passes to support multiple location operands in a
dbg.value; the only change is to update the DbgVariableIntrinsic interface and
its uses. All code outside of the intrinsic classes assumes that an intrinsic
will always have exactly one location operand; they will still support
DIArgLists, but only if they contain exactly one Value.

Among other changes, the setOperand and setArgOperand functions in
DbgVariableIntrinsic have been made private. This is to prevent code from
setting the operands of these intrinsics directly, which could easily result in
incorrect/invalid operands being set. This does not prevent these functions from
being called on a debug intrinsic at all, as they can still be called on any
CallInst pointer; it is assumed that any code directly setting the operands on a
generic call instruction is doing so safely. The intention for making these
functions private is to prevent DIArgLists from being overwritten by code that's
naively trying to replace one of the Values it points to, and also to fail fast
if a DbgVariableIntrinsic is updated to use a DIArgList without a valid
corresponding DIExpression.
2021-03-08 14:36:13 +00:00
serge-sans-paille 08d9e2ceec [NFC] Avoid useless BitVector move 2021-03-08 15:16:23 +01:00
Craig Topper 0eb405c3b8 [SelectionDAG] Add computeKnownBits support for ISD::USUBSAT.
The result of ISD::USUBSAT will never be larger than the LHS. We
can use this to put a bound on the number of leading zeros.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D98133
2021-03-07 09:48:42 -08:00
LemonBoy 2ec43e4167 [LegalizeDAG] Implement promotion rules for SELECT_CC
Implement the promotion rule for SELECT_CC nodes by upcasting all the parameters and downcasting the result.
The AArch64 target makes use of this rule and, since it was not implemented, in some cases the instruction selector would hit an assertion upon encountering the illegal node.

This patch requires D97840, the included test cases hit both problems.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97859
2021-03-05 18:22:55 +01:00
Stephen Tozer f677413071 Reapply "[DebugInfo] Add new instruction and DIExpression operator for variadic debug values"
Rewrites test to use correct architecture triple; fixes incorrect
reference in SourceLevelDebugging doc; simplifies `spillReg` behaviour
so as to not be dependent on changes elsewhere in the patch stack.

This reverts commit d2000b45d0.
2021-03-05 12:32:05 +00:00
Petar Avramovic d44f61f81c Reland [GlobalISel] Combine zext(trunc x) to x
Recommit 4112299ee7. Depends on
4c8fb7ddd6 which was reverted.

Combine zext(trunc x) to x when truncated bits are known to be zero.

Differential Revision: https://reviews.llvm.org/D96031
2021-03-05 11:05:37 +01:00
Craig Topper ad532be012 [SelectionDAG] Assert that operands to SelectionDAG::getNode are not DELETED_NODE to catch issues like PR49393 earlier.
I'm not sure this would catch all such issues, but it would catch some.

The problem for PR49393 was that we were holding a reference to a node that
wasn't connect edto the DAG across a function that could delete unused nodes. In
this particular case we managed to try to use the deleted node while it was in
the deleted state before its memory got recycled.

It could also happen that we delete the node, something allocates a new node
which recycles the memory. Then  we try to use the reference we were holding and
it is now a completely different node with different valid opcode. This patch
would not catch that.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D97969
2021-03-04 23:05:32 -08:00
Craig Topper 74e6030bcb [TargetLowering] Use HandleSDNodes to prevent nodes from being deleted by recursive calls in getNegatedExpression.
For binary or ternary ops we call getNegatedExpression multiple
times and then compare costs. While we're doing this we need to
hold a node from the first call across the second call, but its
not yet attached to the DAG. Its possible the second call creates
an identical node and then decides it didn't need it so will try
to delete it if it has no uses. This can cause a reference to the
node we're holding further up the call stack to become invalidated.

To prevent this, we can use a HandleSDNode to artifically give
the node a use without connecting it to the DAG.

I've used a std::list of HandleSDNodes so we can create handles
only when we have a node to hold. HandleSDNode does not have
default constructor and cannot be copied or moved.

Fixes PR49393.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D97914
2021-03-04 22:48:25 -08:00
Chen Zheng 87bbf3d1f8 [XCOFF][DebugInfo] support DWARF for XCOFF for assembly output.
Reviewed By: jasonliu

Differential Revision: https://reviews.llvm.org/D95518
2021-03-04 21:07:52 -05:00
Heejin Ahn 561abd83ff [WebAssembly] Disable uses of __clang_call_terminate
Background:

Wasm EH, while using Windows EH (catchpad/cleanuppad based) IR, uses
Itanium-based libraries and ABIs with some modifications.

`__clang_call_terminate` is a wrapper generated in Clang's Itanium C++
ABI implementation. It contains this code, in C-style pseudocode:
```
void __clang_call_terminate(void *exn) {
  __cxa_begin_catch(exn);
  std::terminate();
}
```
So this function is a wrapper to call `__cxa_begin_catch` on the
exception pointer before termination.

In Itanium ABI, this function is called when another exception is thrown
while processing an exception. The pointer for this second, violating
exception is passed as the argument of this `__clang_call_terminate`,
which calls `__cxa_begin_catch` with that pointer and calls
`std::terminate` to terminate the program.

The spec (https://libcxxabi.llvm.org/spec.html) for `__cxa_begin_catch`
says,
```
When the personality routine encounters a termination condition, it
will call __cxa_begin_catch() to mark the exception as handled and then
call terminate(), which shall not return to its caller.
```

In wasm EH's Clang implementation, this function is called from
cleanuppads that terminates the program, which we also call terminate
pads. Cleanuppads normally don't access the thrown exception and the
wasm backend converts them to `catch_all` blocks. But because we need
the exception pointer in this cleanuppad, we generate
`wasm.get.exception` intrinsic (which will eventually be lowered to
`catch` instruction) as we do in the catchpads. But because terminate
pads are cleanup pads and should run even when a foreign exception is
thrown, so what we have been doing is:
1. In `WebAssemblyLateEHPrepare::ensureSingleBBTermPads()`, we make sure
terminate pads are in this simple shape:
```
%exn = catch
call @__clang_call_terminate(%exn)
unreachable
```
2. In `WebAssemblyHandleEHTerminatePads` pass at the end of the
pipeline, we attach a `catch_all` to terminate pads, so they will be in
this form:
```
%exn = catch
call @__clang_call_terminate(%exn)
unreachable
catch_all
call @std::terminate()
unreachable
```
In `catch_all` part, we don't have the exception pointer, so we call
`std::terminate()` directly. The reason we ran HandleEHTerminatePads at
the end of the pipeline, separate from LateEHPrepare, was it was
convenient to assume there was only a single `catch` part per `try`
during CFGSort and CFGStackify.

---

Problem:

While it thinks terminate pads could have been possibly split or calls
to `__clang_call_terminate` could have been duplicated,
`WebAssemblyLateEHPrepare::ensureSingleBBTermPads()` assumes terminate
pads contain no more than calls to `__clang_call_terminate` and
`unreachable` instruction. I assumed that because in LLVM very limited
forms of transformations are done to catchpads and cleanuppads to
maintain the scoping structure. But it turned out to be incorrect;
passes can merge cleanuppads into one, including terminate pads, as long
as the new code has a correct scoping structure. One pass that does this
I observed was `SimplifyCFG`, but there can be more. After this
transformation, a single cleanuppad can contain any number of other
instructions with the call to `__clang_call_terminate` and can span many
BBs. It wouldn't be practical to duplicate all these BBs within the
cleanuppad to generate the equivalent `catch_all` blocks, only with
calls to `__clang_call_terminate` replaced by calls to `std::terminate`.

Unless we do more complicated transformation to split those calls to
`__clang_call_terminate` into a separate cleanuppad, it is tricky to
solve.

---

Solution (?):

This CL just disables the generation and use of `__clang_call_terminate`
and calls `std::terminate()` directly in its place.

The possible downside of this approach can be, because the Itanium ABI
intended to "mark" the violating exception handled, we don't do that
anymore. What `__cxa_begin_catch` actually does is increment the
exception's handler count and decrement the uncaught exception count,
which in my opinion do not matter much given that we are about to
terminate the program anyway. Also it does not affect info like stack
traces that can be possibly shown to developers.

And while we use a variant of Itanium EH ABI, we can make some
deviations if we choose to; we are already different in that in the
current version of the EH spec we don't support two-phase unwinding. We
can possibly consider a more complicated transformation later to
reenable this, but I don't think that has high priority.

Changes in this CL contains:
- In Clang, we don't generate a call to `wasm.get.exception()` intrinsic
  and `__clang_call_terminate` function in terminate pads anymore; we
  simply generate calls to `std::terminate()`, which is the default
  implementation of `CGCXXABI::emitTerminateForUnexpectedException`.
- Remove `WebAssembly::ensureSingleBBTermPads() function and
  `WebAssemblyHandleEHTerminatePads` pass, because terminate pads are
  already `catch_all` now (because they don't need the exception
  pointer) and we don't need these transformations anymore.
- Change tests to use `std::terminate` directly. Also removes tests that
  tested `LateEHPrepare::ensureSingleBBTermPads` and
  `HandleEHTerminatePads` pass.
- Drive-by fix: Add some function attributes to EH intrinsic
  declarations

Fixes https://github.com/emscripten-core/emscripten/issues/13582.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D97834
2021-03-04 14:26:35 -08:00
Petar Avramovic d7834556b7 Reland [GlobalISel] Start using vectors in GISelKnownBits
This is recommit of 4c8fb7ddd6.
MIR in one unit test had mismatched types.

For vectors we consider a bit as known if it is the same for all demanded
vector elements (all elements by default). KnownBits BitWidth for vector
type is size of vector element. Add support for G_BUILD_VECTOR.
This allows combines of urem_pow2_to_mask in pre-legalizer combiner.

Differential Revision: https://reviews.llvm.org/D96122
2021-03-04 21:47:13 +01:00
Akira Hatanaka 1900503595 [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.

Original commit message:

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-03-04 11:22:30 -08:00
Daniel Sanders 9fc2be6f28 [mir] Fix confusing MIR when MMO's value is nullptr but offset is non-zero
:: (store 1 + 4, addrspace 1)
->
:: (store 1 into undef + 4, addrspace 1)

An offset without a base isn't terribly useful but it's convenient to update
the offset without checking the value. For example, when breaking apart
stores into smaller units

Differential Revision: https://reviews.llvm.org/D97812
2021-03-04 10:34:30 -08:00
Philip Reames 6af94d22f7 [cgp] Defer lazy domtree usage to last possible point
This is a compile time optimization for d9e93e8e5. Not sure this matters or not, but why not do it just in case.

This does involve querying TLI with a potentially invalid addressing mode for the using instruction, but since we don't actually pass the using instruction to the TLI callback, that should be fine.
2021-03-04 10:19:45 -08:00
Philip Reames e0cfd45171 [CGP] Lazily compute domtree only when needed during address matching
This is a compile time optimization for d9e93e8e5.  As pointed out in post dommit review on the original review (D96399), there was a moderately large compile time regression with this patch and the eager computation of domtree on matcher construction is the first obvious candidate for why.
2021-03-04 09:32:57 -08:00
Nico Weber 59beb1ef6d Revert "[GlobalISel] Combine zext(trunc x) to x"
This reverts commit 4112299ee7.
Seems to depend on 4c8fb7ddd6 which
is being reverted.
2021-03-04 10:13:40 -05:00
Nico Weber 4b1015361c Revert "[GlobalISel] Start using vectors in GISelKnownBits"
This reverts commit 4c8fb7ddd6.
Breaks check-llvm everywhere, see https://reviews.llvm.org/D96122
2021-03-04 10:13:40 -05:00
Petar Avramovic 4112299ee7 [GlobalISel] Combine zext(trunc x) to x
Combine zext(trunc x) to x when truncated bits are known to be zero.

Differential Revision: https://reviews.llvm.org/D96031
2021-03-04 15:05:23 +01:00
Petar Avramovic 4c8fb7ddd6 [GlobalISel] Start using vectors in GISelKnownBits
For vectors we consider a bit as known if it is the same for all demanded
vector elements (all elements by default). KnownBits BitWidth for vector
type is size of vector element. Add support for G_BUILD_VECTOR.
This allows combines of urem_pow2_to_mask in pre-legalizer combiner.

Differential Revision: https://reviews.llvm.org/D96122
2021-03-04 15:05:23 +01:00
Jann Horn 91c9dee3fb [CodeGenPrepare] Eliminate llvm.expect before removing empty blocks
CodeGenPrepare currently first removes empty blocks, then in a loop
performs other optimizations. One of those optimizations is the removal
of call instructions that invoke @llvm.assume, which can create new
empty blocks.

This means that when a branch only contains a call to __builtin_assume(),
the empty branch will survive into MIR, and will then only be
half-removed by MIR-level optimizations (e.g. removing the branch but
leaving the condition intact).

Fix it by eliminating @llvm.expect builtin calls before removing empty
blocks.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D97848
2021-03-04 14:48:26 +01:00
Simon Pilgrim 7d3d9fe8cd [DAG] TargetLowering::BuildUDIV - use APInt as const ref. NFCI.
Fixes clang-tidy warning.
2021-03-04 12:15:08 +00:00
Stephen Tozer d2000b45d0 Revert "[DebugInfo] Add new instruction and DIExpression operator for variadic debug values"
This reverts commit d07f106f4a.
2021-03-04 11:59:21 +00:00
gbtozers d07f106f4a [DebugInfo] Add new instruction and DIExpression operator for variadic debug values
This patch adds a new instruction that can represent variadic debug values,
DBG_VALUE_VAR. This patch alone covers the addition of the instruction and a set
of basic code changes in MachineInstr and a few adjacent areas, but does not
correctly handle variadic debug values outside of these areas, nor does it
generate them at any point.

The new instruction is similar to the existing DBG_VALUE instruction, with the
following differences: the operands are in a different order, any number of
values may be used in the instruction following the Variable and Expression
operands (these are referred to in code as “debug operands”) and are indexed
from 0 so that getDebugOperand(X) == getOperand(X+2), and the Expression in a
DBG_VALUE_VAR must use the DW_OP_LLVM_arg operator to pass arguments into the
expression.

The new DW_OP_LLVM_arg operator is only valid in expressions appearing in a
DBG_VALUE_VAR; it takes a single argument and pushes the debug operand at the
index given by the argument onto the Expression stack. For example the
sub-expression `DW_OP_LLVM_arg, 0` has the meaning “Push the debug operand at
index 0 onto the expression stack.”

Differential Revision: https://reviews.llvm.org/D82363
2021-03-04 11:45:35 +00:00
Max Kazantsev 9d5af55589 [X86][CodeGenPrepare] Try to reuse IV's incremented value instead of adding the offset, part 2
This patch enables the case where we do not completely eliminate offset.
Supposedly in this case we reduce live range overlap that never harms, but
since there are doubts this is true, this goes as a separate change.

Differential Revision: https://reviews.llvm.org/D96399
Reviewed By: reames
2021-03-04 16:47:43 +07:00
Max Kazantsev d9e93e8e57 [X86][CodeGenPrepare] Try to reuse IV's incremented value instead of adding the offset, part 1
While optimizing the memory instruction, we sometimes need to add
offset to the value of `IV`. We could avoid doing so if the `IV.next` is
already defined at the point of interest. In this case, we may get two
possible advantages from this:

- If the `IV` step happens to match with the offset, we don't need to add
  the offset at all;
- We reduce overlap of live ranges of `IV` and `IV.next`. They may stop overlapping
  and it will lead to better register allocation. Even if the overlap will preserve,
  we are not introducing a new overlap, so it should be a neutral transform (Disabled
  this patch, will come with follow-up).

Currently I've only added support for IVs that get decremented using `usub`
intrinsic. We could also support `AddInstr`, however there is some weird
interaction with some other transform that may lead to infinite compilation
in this case (seems like same transform is done and undone over and over).
I need to investigate why it happens, but generally we could do that too.

The first part only handles case where this reuse fully elimiates the offset.

Differential Revision: https://reviews.llvm.org/D96399
Reviewed By: reames
2021-03-04 15:22:55 +07:00
Craig Topper 90b7825598 [LegalizeVectorTypes] Remove a tautological compare. 2021-03-03 23:26:00 -08:00
Hongtao Yu c75da238b4 [CSSPGO] Deduplicating dangling pseudo probes.
Same dangling probes are redundant since they all have the same semantic that is to rely on the counts inference tool to get reasonable count for the same original block. Therefore, there's no need to keep multiple copies of them. I've seen jump threading created tons of redundant dangling probes that slowed down the compiler dramatically. Other optimization passes can also result in redundant probes though without an observed impact so far.

This change removes block-wise redundant dangling probes specifically introduced by jump threading. To support removing redundant dangling probes caused by all other passes, a final function-wise deduplication is also added.

An 18% size win of the .pseudo_probe section was seen for SPEC2017. No performance difference was observed.

Differential Revision: https://reviews.llvm.org/D97482
2021-03-03 22:44:42 -08:00
Hongtao Yu 8985515822 [CSSPGO] Unblocking optimizations by dangling pseudo probes.
This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments.

The optimizations being unblocked are:

	1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted.
	2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions.
	3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks.

Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D97481
2021-03-03 22:44:42 -08:00
Hongtao Yu ad2a59f584 [CSSPGO] Introducing dangling pseudo probes.
Dangling probes are the probes associated to an empty block. This usually happens when all real instructions are optimized away from the block. There is a problem with dangling probes during the offline counts processing. The way the sample profiler works is that samples collected on the first physical instruction following a probe will be counted towards the probe. This logically equals to treating the instruction next to a probe as if it is from the same block of the probe. In the dangling probe case, the real instruction following a dangling probe actually starts a new block, and samples collected on the new block may cause issues when counted towards the empty block.

To mitigate this issue, we first try to move around a dangling probe inside its owning block. If there are still native instructions preceding the probe in the same block, we can then use them as a place holder to collect samples for the probe. A pass is added to walk each block backwards looking for probes not followed by any real instruction and moving them before the first real instruction. This is done right before the object emission.

If we are unlucky to find such in-block preceding instructions for a probe, the solution we are taking is to tag such probe as dangling so that the samples reported for them will not be trusted by the compiler. We leave it up to the counts inference algorithm to get such probes a reasonable count. The number `UINT64_MAX` is used to mark sample count as collected for a dangling probe.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95962
2021-03-03 22:44:41 -08:00
Jin Lin 7c2192b277 Add the use of register r for outlined function when register r is live in and defined later.
The compiler needs to mark register $x0 as live in for the following case.

    $x1 = ADDXri $sp, 16, 0
    BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def dead $x0

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D95267
2021-03-03 15:14:11 -08:00
Baptiste Saleil 54c0f520c7 [VirtRegRewriter] Insert missing killed flags when tracking subregister liveness
VirtRegRewriter may sometimes fail to correctly apply the kill flag where necessary,
which causes unecessary code gen on PowerPC. This patch fixes the way masks for
defined lanes are computed and the way mask for used lanes is computed.

Contact albion.fung@ibm.com instead of author for problems related to this commit.

Differential Revision: https://reviews.llvm.org/D92405
2021-03-03 12:02:04 -05:00
Hans Wennborg 0a5dd06718 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR"
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.

> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
>   which indicates the call is implicitly followed by a marker
>   instruction and an implicit retainRV/claimRV call that consumes the
>   call result. In addition, it emits a call to
>   @llvm.objc.clang.arc.noop.use, which consumes the call result, to
>   prevent the middle-end passes from changing the return type of the
>   called function. This is currently done only when the target is arm64
>   and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
>   with the operand bundle in the IR and removes the inserted calls after
>   processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
>   operand bundle. It doesn't remove the operand bundle on the call since
>   the backend needs it to emit the marker instruction. The retainRV and
>   claimRV calls are emitted late in the pipeline to prevent optimization
>   passes from transforming the IR in a way that makes it harder for the
>   ARC middle-end passes to figure out the def-use relationship between
>   the call and the retainRV/claimRV calls (which is the cause of
>   PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
>   nothing in the callee prevents it from being paired up with the
>   retainRV/claimRV call in the caller. It then inserts a release call if
>   claimRV is attached to the call since autoreleaseRV+claimRV is
>   equivalent to a release. If it cannot find an autoreleaseRV call, it
>   tries to transfer the operand bundle to a function call in the callee.
>   This is important since the ARC optimizer can remove the autoreleaseRV
>   returning the callee result, which makes it impossible to pair it up
>   with the retainRV/claimRV call in the caller. If that fails, it simply
>   emits a retain call in the IR if retainRV is attached to the call and
>   does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
>   constant value if the call has the operand bundle. This ensures the
>   call always has at least one user (the call to
>   @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
>   multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
>   calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808

This reverts commit ed4718eccb.
2021-03-03 15:51:40 +01:00
Matt Arsenault 78dcff4841 GlobalISel: Add default implementation of assignValueToReg
Refactor insertion of the asserting ops. This enables using them for
AMDGPU.

This code should essentially be the same for every target. Mips, X86
and ARM all have different code there now, but this seems to be an
accident. The assignment functions are called with different types
than they would be in the DAG, so this is all likely an assortment of
hacks to get around that.
2021-03-03 09:29:53 -05:00
Craig Topper 543b901e58 [LegalizeVectorTypes] Improve SplitVecRes_INSERT_SUBVECTOR to handle subvector being in the high half of the split or not at element 0 of the low half.
This function isn't exercised in lit tests today today according to
the code coverage report. But will be after the tests in D97543 and
D97559.

Posting this patch to help a crash that Fraser hit.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97582
2021-03-02 21:14:13 -08:00
Victor Huang 1756b2adc9 [AIX][TLS] Generate TLS variables in assembly files
This patch allows generating TLS variables in assembly files on AIX.
Initialized and external uninitialized variables are generated with the
.csect pseudo-op and local uninitialized variables are generated with
the .comm/.lcomm pseudo-ops. The patch also adds a check to
explicitly say that TLS is not yet supported on AIX.

Reviewed by: daltenty, jasonliu, lei, nemanjai, sfertile
Originally patched by: bsaleil
Commandeered by: NeHuang

Differential Revision: https://reviews.llvm.org/D96184
2021-03-02 18:22:48 -06:00
Matt Arsenault fd82cbcf7d GlobalISel: Merge and cleanup more AMDGPU call lowering code
This merges more AMDGPU ABI lowering code into the generic call
lowering. Start cleaning up by factoring away more of the pack/unpack
logic into the buildCopy{To|From}Parts functions. These could use more
improvement, and the SelectionDAG versions are significantly more
complex, and we'll eventually have to emulate all of those cases too.

This is mostly NFC, but does result in some minor instruction
reordering. It also removes some of the limitations with mismatched
sizes the old code had. However, similarly to the merge on the input,
this is forcing gfx6/gfx7 to use the gfx8+ ABI (which is what we
actually want, but SelectionDAG is stuck using the weird emergent
ABI).

This also changes the load/store size for stack passed EVTs for
AArch64, which makes it consistent with the DAG behavior.
2021-03-02 17:31:13 -05:00
Amara Emerson 8a316045ed [AArch64][GlobalISel] Enable use of the optsize predicate in the selector.
To do this while supporting the existing functionality in SelectionDAG of using
PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis
dependencies to the instruction selector pass.

Then, use the predicate to generate constant pool loads for f32 materialization,
if we're targeting optsize/minsize.

Differential Revision: https://reviews.llvm.org/D97732
2021-03-02 12:55:51 -08:00
Sanjay Patel 415c67ba4c [SDAG] allow partial undef vector constants with select->logic folds
This is an enhancement suggested in the original review/commit:
D97730 / 7fce3322a2
2021-03-02 14:29:15 -05:00
Sanjay Patel 7fce3322a2 [SDAG] allow vector types for select->logic folds
This prepares codegen for a change that will remove the identical
folds from IR because they are not poison-safe. See
D93065 / D97360
for details.

We already generically support scalar types, and there are various
target-specific transforms that overlap the vector folds. For example,
x86 recognizes the and patterns, but not or. We can end up with 1
extra instruction there, but I think that is still preferred over the
blendv alternative that loads a constant vector.

If this is not optimal, then it should be fixed with a later transform
(this change is not expected to result in any regressions because
InstCombine currently does the same thing).

Removing custom code and supporting undefs in constant-pattern-matching
can be follow-up changes.

Differential Revision: https://reviews.llvm.org/D97730
2021-03-02 09:25:10 -05:00
Simon Pilgrim c0d4b44e6a [DAG] DAGCombiner::tryStoreMergeOfLoads - remove unused StartAddress variable. NFCI.
Noticed in "initialization is never read" clang-tidy warning - the only StartAddress set/used is inside the load combine loop.
2021-03-02 13:29:31 +00:00
Yuanfang Chen 5de2d189e6 [Diagnose] Unify MCContext and LLVMContext diagnosing
The situation with inline asm/MC error reporting is kind of messy at the
moment. The errors from MC layout are not reliably propagated and users
have to specify an inlineasm handler separately to get inlineasm
diagnose. The latter issue is not a correctness issue but could be improved.

* Kill LLVMContext inlineasm diagnose handler and migrate it to use
  DiagnoseInfo/DiagnoseHandler.
* Introduce `DiagnoseInfoSrcMgr` to diagnose SourceMgr backed errors. This
  covers use cases like inlineasm, MC, and any clients using SourceMgr.
* Move AsmPrinter::SrcMgrDiagInfo and its instance to MCContext. The next step
  is to combine MCContext::SrcMgr and MCContext::InlineSrcMgr because in all
  use cases, only one of them is used.
* If LLVMContext is available, let MCContext uses LLVMContext's diagnose
  handler; if LLVMContext is not available, MCContext uses its own default
  diagnose handler which just prints SMDiagnostic.
* Change a few clients(Clang, llc, lldb) to use the new way of reporting.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D97449
2021-03-01 15:58:37 -08:00
Nikita Popov c35761db0f [GlobalISel] Bail on G_PHI narrowing of odd types (PR48188)
The current narrowing code for G_PHI can only handle the case
where the size is a multiple of the narrow size. If this is not
the case, fall back to SDAG instead of asserting.

Original patch by shepmaster.

Differential Revision: https://reviews.llvm.org/D92446
2021-03-01 23:30:50 +01:00
Matt Arsenault 0131498402 GlobalISel: Remove dead code
Generic code should probably not introduce G_INSERT/G_EXTRACT. The
mirror unpackRegs should also be removed, but AMDGPU still has a use
remaining which needs to be fixed.
2021-03-01 17:06:43 -05:00
Sanjay Patel 154c47dc06 [SDAG] add helper for select->logic folds; NFC
This set of transforms should be extended to handle vector types.
2021-03-01 16:24:15 -05:00
Wouter van Oortmerssen a0f4526836 [WebAssembly] Fix split-dwarf not emitting DW_OP_WASM_location correctly
It was using the regular path for target indices that uses uleb, but TI_GLOBAL_RELOC needs to be uint32_t.
Introduced here: https://reviews.llvm.org/D85685
Fixes: https://github.com/emscripten-core/emscripten/issues/13240

Differential Revision: https://reviews.llvm.org/D97564
2021-03-01 11:53:30 -08:00
Arthur Eubanks 040c1b49d7 Move EntryExitInstrumentation pass location
This seems to be more of a Clang thing rather than a generic LLVM thing,
so this moves it out of LLVM pipelines and as Clang extension hooks into
LLVM pipelines.

Move the post-inline EEInstrumentation out of the backend pipeline and
into a late pass, similar to other sanitizer passes. It doesn't fit
into the codegen pipeline.

Also fix up EntryExitInstrumentation not running at -O0 under the new
PM. PR49143

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D97608
2021-03-01 10:08:10 -08:00
Craig Topper e745f7c563 [LegalizeTypes] Improve ExpandIntRes_XMULO codegen.
The code previously used two BUILD_PAIRs to concatenate the two UMULO
results with 0s in the lower bits to match original VT. Then it created
an ADD and a UADDO with the original bit width. Each of those operations
need to be expanded since they have illegal types.

Since we put 0s in the lower bits before the ADD, the lower half of the
ADD result will be 0. So the lower half of the UADDO result is
solely determined by the other operand. Since the UADDO need to
be split in half, we don't really needd an operation for the lower
bits. Unfortunately, we don't see that in type legalization and end up
creating something more complicated and DAG combine or
lowering aren't always able to recover it.

This patch directly generates the narrower ADD and UADDO to avoid
needing to legalize them. Now only the MUL is done on the original
type.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97440
2021-03-01 09:54:32 -08:00
Matt Arsenault 361cfdf228 GlobalISel: Verify G_CONCAT_VECTORS has at least 2 sources 2021-03-01 09:10:36 -05:00
Matt Arsenault 6c260d3bc0 GlobalISel: Move splitToValueTypes to generic code
I copied the nearly identical function from AArch64 into AMDGPU, so
fix this duplication.

Mips and X86 have their own more exotic versions which should be
removed. However replacing those is better left for a separate patch
since it requires other changes to avoid regressions.
2021-03-01 08:58:18 -05:00
Simon Pilgrim 9dd83f5ee8 [DAG] visitVECTOR_SHUFFLE - attempt to match commuted shuffles with MergeInnerShuffle.
Try to match "shuffle(C, shuffle(A, B, M0), M1) -> shuffle(A, B, M2)" etc. by using MergeInnerShuffle's commuted inner shuffle mode.
2021-03-01 10:42:11 +00:00
Fraser Cormack 6718fda6ad [CodeGen] Fix issues with subvector intrinsic index types
This patch addresses issues arising from the fact that the index type
used for subvector insertion/extraction is inconsistent between the
intrinsics and SDNodes. The intrinsic forms require i64 whereas the
SDNodes use the type returned by SelectionDAG::getVectorIdxTy.

Rather than update the intrinsic definitions to use an overloaded index
type, this patch fixes the issue by transforming the index to the
correct type as required. Any loss of index bits going from i64 to a
smaller type is unexpected, and will be caught by an assertion in
SelectionDAG::getVectorIdxConstant.

The patch also updates the documentation for INSERT_SUBVECTOR and adds
an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR.
This necessitated changes to AArch64 which was using i64 for
EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed
its codegen after updating the backend accordingly.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D97459
2021-03-01 10:28:21 +00:00
Serguei Katkov 65fb706231 [Statepoint Lowering] Consider dead deopt gc values together with other gc values
Currently dead gc value mentioned in the deopt section are not listed in gc section
and so are processed separately.
With this CL all deopt gc values are considered as base pointers and processed in the
same way as other gc values.

The fact that deopt gc pointer is a base pointer was used all the time but
it is explicitly documented here by putting the value in SI.Base.

The idea of the patch comes from Philip Reames.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D97554
2021-03-01 17:23:02 +07:00
Simon Pilgrim 64c41301ce [DAG] visitVECTOR_SHUFFLE - move shuffle canonicalization/merges all under the same legality test. NFCI.
Minor cleanup to move related combines closer together to make it more coherent, without changing the ordering.
2021-03-01 09:42:00 +00:00
Max Kazantsev 9fac8496ea [NFC] Detect IV increment expressed as uadd_with_overflow and usub_with_overflow
Current callers do not call it with such argument, so this is NFC.
But for further changes, it can be very useful to detect such cases.
2021-03-01 13:24:01 +07:00
Max Kazantsev 8d835f42a5 [NFC] Introduce function getIVStep for further reuse 2021-03-01 13:04:56 +07:00
Max Kazantsev fdbad5e5ac [NFC] Whitespace fix 2021-03-01 12:14:03 +07:00
Max Kazantsev 2892fcc204 [NFC] Factor out IV detector function for further reuse 2021-03-01 12:11:54 +07:00
Serguei Katkov 06c5119c76 [Statepoint lowering] Require spill of deopt value in case its type is not legal
If the type of the deopt operand has an illegal type and we want to use
register for it then it needs to be legalized.
This is not supported currently by legalizer and it is not actually clear how to
legalize this type of values.

Instead we just spill such values and use spill slot location in statepoint.

Originally tests were created by Philip Reames.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D97541
2021-03-01 10:23:53 +07:00
Craig Topper 5de09ef02e [DAGCombiner][X86] Don't peek through ANDs on the shift amount in matchRotateSub when called from MatchFunnelPosNeg.
Peeking through AND is only valid if the input to both shifts is
the same. If the inputs are different, then the original pattern
ORs the two values when the masked shift amount is 0. This is ok
if the values are the same since the OR would be a NOP which is
why its ok for rotate.

Fixes PR49365 and reverts PR34641

Differential Revision: https://reviews.llvm.org/D97637
2021-02-28 12:58:00 -08:00
Kazu Hirata d639120983 [llvm] Use set_is_subset (NFC) 2021-02-28 10:59:20 -08:00
Craig Topper ca5247bb17 [DAGCombiner] Don't skip no overflow check on UMULO if the first computeKnownBits call doesn't return any 0 bits.
Even if the first computeKnownBits call doesn't have any zero
bits it is possible the other operand has bitwidth-1 leading zero.
In that case overflow is still impossible. So always call computeKnownBits
for both operands.
2021-02-28 08:26:22 -08:00
Heejin Ahn aa097ef8d4 [WebAssembly] Fix reverse mapping in WasmEHFuncInfo
D97247 added the reverse mapping from unwind destination to their
source, but it had a critical bug; sources can be multiple, because
multiple BBs can have a single BB as their unwind destination.

This changes `WasmEHFuncInfo::getUnwindSrc` to `getUnwindSrcs` and makes
it return a vector rather than a single BB. It does not return the const
reference to the existing vector but creates a new vector because
`WasmEHFuncInfo` stores not `BasicBlock*` or `MachineBasicBlock*` but
`PointerUnion` of them. Also I hoped to unify those methods for
`BasicBlock` and `MachineBasicBlock` into one using templates to reduce
duplication, but failed because various usages require `BasicBlock*` to
be `const` but it's hard to make it `const` for `MachineBasicBlock`
usages.

Fixes https://github.com/emscripten-core/emscripten/issues/13514.
(More precisely, fixes
https://github.com/emscripten-core/emscripten/issues/13514#issuecomment-784708744)

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D97583
2021-02-26 17:12:10 -08:00
Fangrui Song 47c5576d7d ELF: Create unique SHF_GNU_RETAIN sections for llvm.used global objects
If a global object is listed in `@llvm.used`, place it in a unique section with
the `SHF_GNU_RETAIN` flag. The section is a GC root under `ld --gc-sections`
with LLD>=13 or GNU ld>=2.36.

For front ends which do not expect to see multiple sections of the same name,
consider emitting `@llvm.compiler.used` instead of `@llvm.used`.

SHF_GNU_RETAIN is restricted to ELFOSABI_GNU and ELFOSABI_FREEBSD in
binutils. We don't do the restriction - see the rationale in D95749.

The integrated assembler has supported SHF_GNU_RETAIN since D95730.
GNU as>=2.36 supports section flag 'R'.
We don't need to worry about GNU ld support because older GNU ld just ignores
the unknown SHF_GNU_RETAIN.

With this change, `__attribute__((retain))` functions/variables emitted
by clang will get the SHF_GNU_RETAIN flag.

Differential Revision: https://reviews.llvm.org/D97448
2021-02-26 16:38:44 -08:00
Craig Topper eea53b142d [DAGCombiner] Optimize SMULO/UMULO if we can prove that overflow is impossible.
Using ComputeNumSignBits or computeKnownBits we might be able
to determine that overflow is impossible.

This especially helps after type legalization if the type was
promoted from a type with half the bits or more. Type legalization
conservatively creates a promoted smulo/umulo and an overflow
check for the promoted bits. The overflow from the promoted
smulo/umulo is ORed with the result of the promoted bits
overflow check. Proving that the promoted smulo/umulo can never
overflow will leave us with just the promoted bits overflow check.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97160
2021-02-26 14:50:03 -08:00
James Y Knight 6de6455752 Use getAlign() on atomicrmw/cmpxchg instructions, now that it's available.
These locations were missed as part of adding alignment to the
instructions, and were still making their own alignment assumptions.
2021-02-26 15:06:15 -05:00
Philip Reames 0832a58e22 [cgp] Minor code improvement - reuse an existing named helper [NFC] 2021-02-26 11:51:32 -08:00
Mircea Trofin 3e992326a5 [NFC][regalloc] const-ed APIs, using MCRegister instead of unsigned 2021-02-26 09:54:20 -08:00
Mircea Trofin a2bfc43ae1 [NFC] Const-ed 2 APIs in VirtRegMap 2021-02-26 09:32:42 -08:00
James Y Knight 740e69b6fd Fix assert to use getTypeStoreSize instead of getPrimitiveSizeInBits,
per comment on D97223.
2021-02-26 11:08:00 -05:00
Simon Pilgrim aefe8f2f6c [DAG] Fold vXi1 multiplies -> and
This allows us to remove X86 custom lowering of vXi1 MUL, which helps simplify a load of mask math.

Mentioned in D97478 post review.
2021-02-26 11:46:12 +00:00
Simon Pilgrim 73adc26ac0 [DAG] expandAddSubSat - break if-else chain. NFCI.
Fix styleguide issue - each if() block always returns so we don't need to make them a if-else chain.
2021-02-26 11:02:08 +00:00
Chen Zheng d39bc36b1b [debug-info] refactor emitDwarfUnitLength
remove `Hi` `Lo` argument from `emitDwarfUnitLength`, so we
can make caller of emitDwarfUnitLength easier.

Reviewed By: MaskRay, dblaikie, ikudrin

Differential Revision: https://reviews.llvm.org/D96409
2021-02-25 21:00:25 -05:00
James Y Knight 24539f1ef2 Add Alignment argument to IRBuilder CreateAtomicRMW and CreateAtomicCmpXchg.
And then push those change throughout LLVM.

Keep the old signature in Clang's CGBuilder for now -- that will be
updated in a follow-on patch (D97224).

The MLIR LLVM-IR dialect is not updated to support the new alignment
attribute, but preserves its existing behavior.

Differential Revision: https://reviews.llvm.org/D97223
2021-02-25 18:29:42 -05:00
Simon Pilgrim 9490b9f14b [DAG] Move simplification of SADDSAT/SSUBSAT/UADDSAT/USUBSAT of vXi1 to getNode()
As discussed on D97276 we should be able to always do this in node creation, we don't need a combine.
2021-02-25 17:49:26 +00:00
Jon Roelofs 7f6e331645 Support `#pragma clang section` directives on MachO targets
rdar://59560986

Differential Revision: https://reviews.llvm.org/D97233
2021-02-25 09:30:10 -08:00
David Sherwood 87dbcd8865 [CodeGen] Canonicalise adds/subs of i1 vectors using XOR
When calling SelectionDAG::getNode() to create an ADD or SUB
of two vectors with i1 element types we can canonicalise this
to use XOR instead, where 1+1 is treated as wrapping around
to 0 and 0-1 wraps to 1.

I've added the following tests for SVE targets:

  CodeGen/AArch64/sve-pred-arith.ll

and modified some X86 tests to reflect the much simpler codegen
required.

Differential Revision: https://reviews.llvm.org/D97276
2021-02-25 10:31:26 +00:00
Craig Topper fe50be12c8 [LegalizeIntegerTypes] Further improve ExpandIntRes_SADDSUBO for targets where SADDO/SSUBO aren't supported.
Rather than converting 3 signbits to bools and comparing them,
we can do bitwise logic on the whole vector and convert the
resulting sign bit to a bool at the end.

This is still a different algorithm than what we do in LegalizeDAG
through expandSADDOSSUBO. That algorithm needs to know that the
RHS of SSUBO is > 0, but that's costly when the type is split.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97325
2021-02-24 10:05:38 -08:00
Sander de Smalen 5e19208d96 [InstructionCost] NFC: Fix up missing cases in LoopVectorize and CodeGenPrep.
This fixes the types of a few more cost variables to be of type InstructionCost.
2021-02-24 14:30:03 +00:00
Simon Pilgrim 8082bfe7e5 [DAG] Add basic mul-with-overflow constant folding support
As noticed on D97160
2021-02-24 11:09:02 +00:00
Petr Hosek 11a53f47fb Revert "[InstrProfiling] Use nobits as __llvm_prf_cnts section type in ELF"
This reverts commit 6b286d93f7 because
in some cases when the optimizer evaluates the global initializer,
__llvm_prf_cnts may not be entirely zero initialized.
2021-02-24 00:41:43 -08:00
Craig Topper cb6fc4b0a3 [LegalizeIntegerTypes] Use GetExpandedInteger instead of SplitInteger in ExpandIntRes_XMULO.
We know the input is going to be expanded as well, so we should
just ask for the already expanded operands. Otherwise we create
nodes that are just going to need to be legalized.
2021-02-23 23:53:45 -08:00
Chen Zheng be5d92e37e [Debug-Info][NFC] move emitDwarfUnitLength to MCStreamer class
We may need to do some customization for DWARF unit length in DWARF
section headers for some targets for some code generation path.

For example, for XCOFF in assembly path, AIX assembler does not require
the debug section containing its debug unit length in the header.

Move emitDwarfUnitLength to MCStreamer class so that we can do
customization in different Streamers

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D95932
2021-02-23 21:29:05 -05:00
Heejin Ahn ea8c6375e3 [WebAssembly] Fix incorrect grouping and sorting of exceptions
This CL is not big but contains changes that span multiple analyses and
passes. This description is very long because it tries to explain basics
on what each pass/analysis does and why we need this change on top of
that. Please feel free to skip parts that are not necessary for your
understanding.

---

`WasmEHFuncInfo` contains the mapping of <EH pad, the EH pad's next
unwind destination>. The value (unwind dest) here is where an exception
should end up when it is not caught by the key (EH pad). We record this
info in WasmEHPrepare to fix catch mismatches, because the CFG itself
does not have this info. A CFG only contains BBs and
predecessor-successor relationship between them, but in `WasmEHFuncInfo`
the unwind destination BB is not necessarily a successor or the key EH
pad BB. Their relationship can be intuitively explained by this C++ code
snippet:
```
try {
  try {
    foo();
  } catch (int) { // EH pad
    ...
  }
} catch (...) {   // unwind destination
}
```
So when `foo()` throws, it goes to `catch (int)` first. But if it is not
caught by it, it ends up in the next unwind destination `catch (...)`.
This unwind destination is what you see in `catchswitch`'s
`unwind label %bb` part.

---

`WebAssemblyExceptionInfo` groups exceptions so that they can be sorted
continuously together in CFGSort, as we do for loops. What this analysis
does is very simple: it creates a single `WebAssemblyException` per EH
pad, and all BBs that are dominated by that EH pad are included in this
exception. We also identify subexception relationship in this way: if
EHPad A domiantes EHPad B, EHPad B's exception is a subexception of
EHPad A's exception.

This simple rule turns out to be incorrect in some cases. In
`WasmEHFuncInfo`, if EHPad A's unwind destination is EHPad B, it means
semantically EHPad B should not be included in EHPad A's exception,
because it does not make sense to rethrow/delegate to an inner scope.
This is what happened in CFGStackify as a result of this:
```
try
  try
  catch
    ...   <- %dest_bb is among here!
  end
delegate %dest_bb
```

So this patch adds a phase in `WebAssemblyExceptionInfo::recalculate` to
make sure excptions' unwind destinations are not subexceptions of
their unwind sources in `WasmEHFuncInfo`.

But this alone does not prevent `dest_bb` in the example above from
being sorted within the inner `catch`'s exception, even if its exception
is not a subexception of that `catch`'s exception anymore, because of
how CFGSort works, which will be explained below.

---

CFGSort places BBs within the same `SortRegion` (loop or exception)
continuously together so they can be demarcated with `loop`-`end_loop`
or `catch`-`end_try` in CFGStackify.

`SortRegion` is a wrapper for one of `MachineLoop` or
`WebAssemblyException`. `SortRegionInfo` already does some complicated
things because there discrepancies between those two data structures.
`WebAssemblyException` is what we control, and it is defined as an EH
pad as its header and BBs dominated by the header as its BBs (with a
newly added exception of unwind destinations explained in the previous
paragraph). But `MachineLoop` is an LLVM data structure and uses the
standard loop detection algorithm. So by the algorithm, BBs that are 1.
dominated by the loop header and 2. have a path back to its header.
Because of the second condition, many BBs that are dominated by the loop
header are not included in the loop. So BBs that contain `return` or
branches to outside of the loop are not technically included in
`MachineLoop`, but they can be sorted together with the loop with no
problem.

Maybe to relax the condition, in CFGSort, when we are in a `SortRegion`
we allow sorting of not only BBs that belong to the current innermost
region but also BBs that are by the current region header.
(This was written this way from the first version written by Dan, when
only loops existed.) But now, we have cases in exceptions when EHPad B
is the unwind destination for EHPad A, even if EHPad B is dominated by
EHPad A it should not be included in EHPad A's exception, and should not
be sorted within EHPad A.

One way to make things work, at least correctly, is change `dominates`
condition to `contains` condition for `SortRegion` when sorting BBs, but
this will change compilation results for existing non-EH code and I
can't be sure it will not degrade performance or code size. I think it
will degrade performance because it will force many BBs dominated by a
loop, which don't have the path back to the header, to be placed after
the loop and it will likely to create more branches and blocks.

So this does a little hacky check when adding BBs to `Preferred` list:
(`Preferred` list is a ready list. CFGSort maintains ready list in two
priority queues: `Preferred` and `Ready`. I'm not very sure why, but it
was written that way from the beginning. BBs are first added to
`Preferred` list and then some of them are pushed to `Ready` list, so
here we only need to guard condition for `Preferred` list.)

When adding a BB to `Preferred` list, we check if that BB is an unwind
destination of another BB. To do this, this adds the reverse mapping,
`UnwindDestToSrc`, and getter methods to `WasmEHFuncInfo`. And if the BB
is an unwind destination, it checks if the current stack of regions
(`Entries`) contains its source BB by traversing the stack backwards. If
we find its unwind source in there, we add the BB to its `Deferred`
list, to make sure that unwind destination BB is added to `Preferred`
list only after that region with the unwind source BB is sorted and
popped from the stack.

---

This does not contain a new test that crashes because of this bug, but
this fix changes the result for one of existing test case. This test
case didn't crash because it fortunately didn't contain `delegate` to
the incorrectly placed unwind destination BB.

Fixes https://github.com/emscripten-core/emscripten/issues/13514.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D97247
2021-02-23 14:54:55 -08:00
Amara Emerson 4691405ba9 Fix a range-loop-analysis warning. 2021-02-23 14:41:08 -08:00
Heejin Ahn 445f4e7484 [WebAssembly] Disable wasm.lsda() optimization in WasmEHPrepare
In every catchpad except `catch (...)`, we add a call to
`_Unwind_CallPersonality`, which is a wapper to call the personality
function. (In most of other Itanium-based architectures the call is done
from libunwind, but in wasm we don't have the control over the VM.)
Because the personatlity function is called to figure out whether the
current exception is a type we should catch, such as `int` or
`SomeClass&`, `catch (...)` does not need the personality function call.
For the same reason, all cleanuppads don't need it.

When we call `_Unwind_CallPersonality`, we store some necessary info in
a data structure called `__wasm_lpad_context` of type
`_Unwind_LandingPadContext`, which is defined  in the wasm's port of
libunwind in Emscripten. Also the personality wrapper function returns
some info (selector and the caught pointer) in that data structure, so
it is used as a medium for communication.

One of the info we need to store is the address for LSDA info for the
current function. `wasm.lsda()` intrinsic returns that address. (This
intrinsic will be lowered to a symbol that points to the LSDA address.)
The simpliest thing is call `wasm.lsda()` every time we need to call
`_Unwind_CallPersonality` and store that info in `__wasm_lpad_context`
data structure. But we tried to be better than that (D77423 and some
more previous CLs), so if catchpad A dominates catchpad B and catchpad A
is not `catch (...)`, we didn't insert `wasm.lsda()` call in catchpad B,
thinking that the LSDA address is the same for a single function and we
already visited catchpad A and `__wasm_lpad_context.lsda` field would
already have that value.

But this can be incorrect if there is a call to another function, which
also can have the personality function and LSDA, between catchpad A and
catchpad B, because `__wasm_lpad_context` is a globally defined
structure and the callee function will overwrite its `lsda` field.

So in this CL we don't try to do any optimizaions on adding
`wasm.lsda()` call; we store the result of `wasm.lsda()` every time we
call `_Unwind_CallPersonality`. We can do some complicated analysis,
like checking if there is a function call between the dominating
catchpad and the current catchpad, but at this time it seems overkill.

This deletes three tests because they all tested `wasm.ldsa()` call
optimization.

Fixes https://github.com/emscripten-core/emscripten/issues/13548.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D97309
2021-02-23 14:38:59 -08:00
Craig Topper eb165090bb [LegalizeIntegerTypes] Improve ExpandIntRes_SADDSUBO codegen on targets without SADDO/SSUBO.
This code creates 3 setccs that need to be expanded. It was
creating a sign bit test as setge X, 0 which is non-canonical.
Canonical would be setgt X, -1. This misses the special case in
IntegerExpandSetCCOperands for sign bit tests that assumes
canonical form. If we don't hit this special case we end up
with a multipart setcc instead of just checking the sign of
the high part.

To fix this I've reversed the polarity of all of the setccs to
setlt X, 0 which is canonical. The rest of the logic should
still work. This seems to produce better code on RISCV which
lacks a setgt instruction.

This probably still isn't the best code sequence we could use here.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97181
2021-02-23 09:40:32 -08:00
Jay Foad a6be26710b [GlobalISel] Make more use of replaceSingleDefInstWithReg. NFC. 2021-02-23 17:08:34 +00:00
Cassie Jones 8f956a5e8f [GlobalISel] Implement narrowScalar for SADDE/SSUBE/UADDE/USUBE
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96673
2021-02-22 19:59:36 -05:00
Cassie Jones e1532649cb [GlobalISel] Implement narrowScalar for SADDO/SSUBO
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96672
2021-02-22 19:59:36 -05:00
Cassie Jones c63b33b792 [GlobalISel] Implement narrowScalar for UADDO/USUBO
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96671
2021-02-22 19:59:35 -05:00
Amara Emerson 212d6a95ab [GloblalISel] Support lowering <3 x i8> arguments in multiple parts.
Differential Revision: https://reviews.llvm.org/D97086
2021-02-22 13:58:44 -08:00
Amara Emerson 69ce291bcc [AArch64][GlobalISel] Support lowering <1 x i8> arguments.
We don't yet have working codegen for the resulting unmerges, and if
we did it would probably be horrible.

Differential Revision: https://reviews.llvm.org/D97035
2021-02-22 13:58:44 -08:00
Heejin Ahn a08e609d2e [WebAssembly] Rename methods in WasmEHFuncInfo (NFC)
This renames variable and method names in `WasmEHFuncInfo` class to be
simpler and clearer. For example, unwind destinations are EH pads by
definition so it doesn't necessarily need to be included in every method
name. Also I am planning to add the reverse mapping in a later CL,
something like `UnwindDestToSrc`, so this renaming will make meanings
clearer.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D97173
2021-02-22 12:16:11 -08:00
Kazu Hirata ffba9e596d [CodeGen] Use range-based for loops (NFC) 2021-02-21 19:58:07 -08:00
Craig Topper 1a6c1ac686 [SelectionDAG][RISCV] Teach ComputeNumSignBits to handle SREM.
This also removes a pattern from RISCV that is no longer needed
since the sexti32 on the LHS of the srem in the pattern implies
the result is sign extended so the sign_extend_inreg should be
removed in DAG combine now.

Reviewed By: luismarques, RKSimon

Differential Revision: https://reviews.llvm.org/D97133
2021-02-21 11:13:36 -08:00
Simon Pilgrim 38ab47c813 [DAG] Match USUBSAT patterns through zext/trunc
This patch handles usubsat patterns hidden through zext/trunc and uses the getTruncatedUSUBSAT helper to determine if the USUBSAT can be correctly performed in the truncated form:

zext(x) >= y ? x - trunc(y) : 0 --> usubsat(x,trunc(umin(y,SatLimit)))
zext(x) >  y ? x - trunc(y) : 0 --> usubsat(x,trunc(umin(y,SatLimit)))

Based on original examples:

void foo(unsigned short *p, int max, int n) {
    int i;
    unsigned m;
    for (i = 0; i < n; i++) {
        m = *--p;
        *p = (unsigned short)(m >= max ? m-max : 0);
    }
}

Differential Revision: https://reviews.llvm.org/D25987
2021-02-21 15:26:54 +00:00
Kazu Hirata 0b417ba20f [CodeGen] Use range-based for loops (NFC) 2021-02-20 21:46:02 -08:00
Petr Hosek 6b286d93f7 [InstrProfiling] Use nobits as __llvm_prf_cnts section type in ELF
This can reduce the binary size because counters will no longer occupy
space in the binary, instead they will be allocated by dynamic linker.

Differential Revision: https://reviews.llvm.org/D97110
2021-02-20 14:20:33 -08:00
Simon Pilgrim 761bbed264 [DAG] foldSubToUSubSat - fold sub(a,trunc(umin(zext(a),b))) -> usubsat(a,trunc(umin(b,SatLimit)))
This moves the last custom x86 USUBSAT fold to generic DAGCombine.

Completes PR40111

Differential Revision: https://reviews.llvm.org/D96703
2021-02-20 12:02:07 +00:00
Kazu Hirata a205fa5cd9 [CodeGen] Use range-based for loops (NFC) 2021-02-19 22:44:14 -08:00
Pan, Tao 12edddafac [CodeGen] Fix two dots between text section name and symbol name
There is a trailing dot in text section name if it has prefix, don't add
repeated dot when connect text section name and symbol name.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D96327
2021-02-20 10:15:48 +08:00
Craig Topper baab797878 [ValueTypes] Assert if changeVectorElementType is called on a simple type with an extended element type.
Previously we would use the extended implementation, but
the extended implementation requires the vector type to be extended
so that we can access the LLVMContext. In theory we could
detect this case and use the context from the element type instead,
but since I know of no cases hitting this in practice today
I've done the simplest thing.

Also add asserts to several extended EVT functions that assume
LLVMTy is non-null.

Follow from discussion in D97036

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D97070
2021-02-19 17:30:46 -08:00
Mircea Trofin 82492f24ff [NFC][Regalloc] Share the VirtRegAuxInfo object with LiveRangeEdit
VirtRegAuxInfo is an extensibility point, so the register allocator's
decision on which implementation to use should be communicated to the
other users - namely, LiveRangeEdit.

Differential Revision: https://reviews.llvm.org/D96898
2021-02-19 07:44:28 -08:00
Simon Pilgrim 5d3930bb8f [DAG] visitTRUNCATE - attempt to truncate USUBSAT
Fold trunc(usubsat(zext(x),y)) -> usubsat(x,trunc(umin(y,satlimit)))
2021-02-19 14:26:05 +00:00
Kazu Hirata fd04f3a30c [CodeGen] Use range-based for loops (NFC) 2021-02-18 22:46:43 -08:00
Adrian Prantl c4ad878acb Reset the EntryValue location flag in finalizeEntryValue.
This fixes an assertion error when entry values are combined with
DW_OP_LLVM_fragment.
2021-02-18 18:36:36 -08:00
Matt Arsenault 2d3d2e78d0 MIR: Fix parser crash on syntax error on first character
This was calling the diagnostic printer before the context member was
initialized.
2021-02-18 18:59:08 -05:00
Leonard Chan c77659e549 [llvm][IR] Do not place constants with static relocations in a mergeable section
This patch provides two major changes:

1. Add getRelocationInfo to check if a constant will have static, dynamic, or
   no relocations. (Also rename the original needsRelocation to needsDynamicRelocation.)
2. Only allow a constant with no relocations (static or dynamic) to be placed
   in a mergeable section.

This will allow unused symbols that contain static relocations and happen to
fit in mergeable constant sections (.rodata.cstN) to instead be placed in
unique-named sections if -fdata-sections is used and subsequently garbage collected
by --gc-sections.

See https://lists.llvm.org/pipermail/llvm-dev/2021-February/148281.html.

Differential Revision: https://reviews.llvm.org/D95960
2021-02-18 15:39:00 -08:00
Matt Arsenault 62d946e133 GlobalISel: Merge some AMDGPU ABI lowering code to generic code
AMDGPU currently has a lot of pre-processing code to pre-split
argument types into 32-bit pieces before passing it to the generic
code in handleAssignments. This is a bit sloppy and also requires some
overly fancy iterator work when building the calls. It's better if all
argument marshalling code is handled directly in
handleAssignments. This handles more situations like decomposing large
element vectors into sub-element sized pieces.

This should mostly be NFC, but does change the generated code by
shifting where the initial argument packing instructions are placed. I
think this is nicer looking, since it now emits the packing code
directly after the relevant copies, rather than after the copies for
the remaining arguments.

This doubles down on gfx6/gfx7 using the gfx8+ ABI for 16-bit
types. This is ultimately the better option, but incompatible with the
DAG. Fixing this requires more work, especially for f16.
2021-02-18 17:26:55 -05:00
Simon Pilgrim 53e83afcaf [DAG] getTruncatedUSUBSAT - always truncate operands. NFCI.
As noticed on D96703, we're always truncating the operands so should use getNode(ISD::TRUNCATE) instead of getZExtOrTrunc.
2021-02-18 21:28:55 +00:00
Guozhi Wei 66f2d09ebf [DAGCombiner] Transform (zext (select c, load1, load2)) -> (select c, zextload1, zextload2)
If extload is legal, following transform
    (zext (select c, load1, load2)) -> (select c, zextload1, zextload2)
can save one ext instruction.

Differential Revision: https://reviews.llvm.org/D95086
2021-02-18 13:15:20 -08:00
Philip Reames dcebe8ab1e Fix a buildbot warning triggered by 1dfb06d 2021-02-18 09:37:49 -08:00
Philip Reames 13753808f4 [verify-regalloc] Verify after allocation and before postOptimization
I've now hit several cases where a mistake in the regalloc main loop caused corrupt live intervals that didn't get caught until either the next verify or during post-optimization.  The later case is rather confusing and tends to lead one down false trails, so let's catch corruption before that.
2021-02-18 09:10:50 -08:00
Philip Reames 5318d9e516 [splitkit] Add a minor wrapper function for readability [NFC] 2021-02-18 09:00:22 -08:00
Bradley Smith 8bad8a43c3 [AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD
Adjust generateFMAsInMachineCombiner to return false if SVE is present
in order to combine fmul+fadd into fma. Also add new pseudo instructions
so as to select the most appropriate of FMLA/FMAD depending on register
allocation.

Depends on D96599

Differential Revision: https://reviews.llvm.org/D96424
2021-02-18 16:55:16 +00:00
Craig Topper 61d4d9a5d3 [TableGen][SelectionDAG] Improve efficiency of encoding negative immediates for isel's CheckInteger opcode.
CheckInteger uses an int64_t encoded using a variable width encoding
that is optimized for encoding a number with a lot of leading zeros.
Negative numbers have no leading zeros so use the largest encoding
requiring 9 bytes.

I believe its most like we want to check for positive and negative
numbers near 0. -1 is quite common due to its use in the 'not'
idiom.

To optimize for this, we can borrow an idea from the bitcode format
and move the sign bit to bit 0 with the magnitude stored in the
upper bits. This will drastically increase the number of leading
zeros for small magnitudes. Then we can run this value through
VBR encoding.

This gives a small reduction in the table size on all in tree
targets except VE where size increased by about 300 bytes due
to intrinsic ids now requiring 3 bytes instead of 2. Since the
intrinsic enum space is shared by all targets this an unfortunate
consquence of where VE is currently located in the range.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D96317
2021-02-18 08:53:17 -08:00
Philip Reames 1dfb06d0b4 [regalloc] Add a couple of dump routines for ease of debugging [NFC] 2021-02-18 08:50:00 -08:00
Kazu Hirata 61efa3d93f [CodeGen] Use range-based for loops (NFC) 2021-02-17 23:58:46 -08:00
Kazu Hirata 8e13bbca08 [CodeGen] Use ListSeparator (NFC) 2021-02-17 23:58:43 -08:00
Yang Fan 796feb6163
[MC][ELF] Fix unused variable warning (NFC)
GCC warning:
```
/llvm-project/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp: In member function ‘virtual llvm::MCSection* llvm::TargetLoweringObjectFileELF::getSectionForLSDA(const llvm::Function&, const llvm::MCSymbol&, const llvm::TargetMachine&) const’:
/llvm-project/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp:871:8: warning: variable ‘IsComdat’ set but not used [-Wunused-but-set-variable]
  871 |   bool IsComdat = false;
      |        ^~~~~~~~
```
2021-02-18 14:23:18 +08:00
Chen Zheng 5517923b1c [XCOFF][NFC] make csect properties optional for getXCOFFSection
We are going to support debug sections for XCOFF. So the csect
properties are not necessary. This patch makes these properties
optional.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D95931
2021-02-17 20:51:42 -05:00
Jessica Paquette e6064a6418 [GlobalISel] Implement computeKnownBits for G_ASSERT_SEXT
Implementation is the same as G_SEXT_INREG.

Differential Revision: https://reviews.llvm.org/D96899
2021-02-17 14:00:36 -08:00
Jessica Paquette 26fb036559 [GlobalISel] Implement computeNumSignBits for G_ASSERT_SEXT
Same implementation as G_SEXT_INREG.

Add a testcase to combine-sext-inreg for a concrete example, and a testcase
to KnownBitsTest.

Differential Revision: https://reviews.llvm.org/D96897
2021-02-17 13:53:17 -08:00
Jessica Paquette 60aa646441 [GlobalISel] Add G_ASSERT_SEXT
This adds a G_ASSERT_SEXT opcode, similar to G_ASSERT_ZEXT. This instruction
signifies that an operation was already sign extended from a smaller type.

This is useful for functions with sign-extended parameters.

E.g.

```
define void @foo(i16 signext %x) {
 ...
}
```

This adds verifier, regbankselect, and instruction selection support for
G_ASSERT_SEXT equivalent to G_ASSERT_ZEXT.

Differential Revision: https://reviews.llvm.org/D96890
2021-02-17 13:10:34 -08:00
Mircea Trofin 3a030c2f2f [NFC][RegAlloc] InlineSpiller::Original is a Register 2021-02-17 12:07:59 -08:00
Derek Schuff 1f9e551a81 [WebAssembly] Do not use EHCatchret symbols with wasm EH
D94835 added support for WinEH to export public symbols pointing to
basic blocks which are catchret targets for use with Windows CET.
Wasm currently doesn't support public symbols to non-function code
addresses (they get treated like new functions in asm but then don't
lower to object files correctly).
It created them unconditionally for all catchret targets.

This change disables those symbols unless the exceptionHandlingType
is WinEH (since they aren't used with ExceptionHandling::Wasm)

Differential Revision: https://reviews.llvm.org/D96824
2021-02-17 11:22:48 -08:00
Marianne Mailhot-Sarrasin f0ec9f1bb3 [Pipeliner] Fixed optimization remarks and debug dumps Initiation
Interval value

The II value was incremented before exiting the loop, and therefor when
used in the optimization remarks and debug dumps it did not reflect the
initiation interval actually used in Schedule.

Differential Revision: https://reviews.llvm.org/D95692
2021-02-17 12:28:37 -05:00
Simon Pilgrim 87fbc06d06 [DAG] Pull out getTruncatedUSUBSAT helper from foldSubToUSubSat. NFCI.
This will simplify an incoming generic implementation of D25987.

I'll rebase D96703 shortly to support this.
2021-02-17 12:17:08 +00:00
Simon Pilgrim 05c64ea672 [DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) (REAPPLIED)
Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))

Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles.

Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle.

This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise.

Fixes issue raised by @saugustine in rG5aa8f4c0843a where we were failing to replace null shuffle operands from MergeInnerShuffle to UNDEFs.

Differential Revision: https://reviews.llvm.org/D96345
2021-02-17 11:42:43 +00:00
Igor Kudrin aa84289629 [DebugInfo] Keep the DWARF64 flag in the module metadata
This allows the option to affect the LTO output. Module::Max helps to
generate debug info for all modules in the same format.

Differential Revision: https://reviews.llvm.org/D96597
2021-02-17 17:03:34 +07:00
Sjoerd Meijer 7f3170ec19 [MachineSink] Add a loop sink limit
To make sure compile-times don't regress, add an option to restrict the number
of instructions considered for sinking as alias analysis can be expensive and
for the same reason also skip large blocks.

Differential Revision: https://reviews.llvm.org/D96485
2021-02-17 08:50:53 +00:00
Kazu Hirata 3279943adf [CodeGen] Use range-based for loops (NFC) 2021-02-16 23:23:08 -08:00
Sriraman Tallam d1a838babc Basic block sections should enable function sections implicitly.
Basic block sections enables function sections implicitly, this is not needed
and is inefficient with "=list" option.

We had basic block sections enable function sections implicitly in clang. This
is particularly inefficient with "=list" option as it places functions that do
not have any basic block sections in separate sections. This causes unnecessary
object file overhead for large applications.

This patch disables this implicit behavior. It only creates function sections
for those functions that require basic block sections.

Further, there was an inconistent behavior with llc as llc was not turning on
function sections by default. This patch makes llc and clang consistent and
tests are added to check the new behavior.

This is the first of two patches and this adds functionality in LLVM to
create a new section for the entry block if function sections is not
enabled.

Differential Revision: https://reviews.llvm.org/D93876
2021-02-16 16:27:16 -08:00
Petr Hosek 16af973933 [MC][ELF] Support for zero flag section groups
This change introduces support for zero flag ELF section groups to LLVM.
LLVM already supports COMDAT sections, which in ELF are a special type
of ELF section groups. These are generally useful to enable linker GC
where you want a group of sections to always travel together, that is to
be either retained or discarded as a whole, but without the COMDAT
semantics. Other ELF assemblers already support zero flag ELF section
groups and this change helps us reach feature parity.

Differential Revision: https://reviews.llvm.org/D95851
2021-02-16 14:23:40 -08:00
Sterling Augustine 5aa8f4c084 Revert "[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)))"
This reverts commit 5dfba562dd.

That commit causes an assertion failure with the following repro:

typedef long b __attribute__((__vector_size__(16)));
b *d;
b e;
b __attribute__((__always_inline__)) c(b h, b i) {
  return (__attribute__((__vector_size__(8 * sizeof(short)))) short)h + i;
}
j() {
  b k, l, m, n, o[6], p, q;
  m = d[5];
  b r = m;
  b s = f(r, 8);
  q = s;
  l = d[1];
  p = l;
  t(q);
  n = c(m, l);
  o[1] = c(s, f(p, 8));
  k = __builtin_shufflevector(n, o[1], 0, 2);
  e = __builtin_ia32_psrlwi128(k, j);
}

./bin/clang -cc1 -triple x86_64-grtev4-linux-gnu -emit-obj -O1 -std=c99 test.c
2021-02-16 12:48:15 -08:00
Simon Pilgrim df45c18135 [DAG] PromoteIntRes_ADDSUBSHLSAT - promote ISD::UADDSAT as clamped add
Similar to D96622, we're better off just promoting uaddsat(x,y) -> umin(add(x,y),c) instead of trying to perform a shifted uaddsat.

I initially tried to just use shifted promotion in cases where we didn't have a legal/custom umin - but we don't appear to have any targets that have uaddsat but not umin, so imo we're better off always using the umin and avoid an untested shifted uaddsat code path.

Differential Revision: https://reviews.llvm.org/D96767
2021-02-16 17:37:44 +00:00
Craig Topper 064ada4ec6 [SelectionDAG][AArch64] Restrict matchUnaryPredicate to only handle SPLAT_VECTOR for scalable vectors.
fde2466171 added support for
scalable vectors to matchUnaryPredicate by handling SPLAT_VECTOR in
addition to BUILD_VECTOR. This was used to enabled UDIV/SDIV/UREM/SREM
by constant expansion in BuildUDIV/BuildSDIV in TargetLowering.cpp

The caller there expects to call getBuildVector from the match factors.
This leads to a crash right now if there is a SPLAT_VECTOR of
fixed vectors since the number of vectors won't match the number
of elements.

To fix this, this patch updates the callers to check the opcode
instead of whether the type is fixed or scalable. This assumes
that only 3 opcodes are handled by matchUnaryPredicate so
I've added an assertion to the final else to check that opcode.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D96174
2021-02-16 09:22:46 -08:00
Simon Pilgrim 5dfba562dd [DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)))
Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))

Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles.

Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle.

This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise.

Differential Revision: https://reviews.llvm.org/D96345
2021-02-16 15:46:34 +00:00
Simon Pilgrim 420420de57 [DAG] Avoid APInt copies by directly using the APInt reference from getAPIntValue. NFCI. 2021-02-16 13:50:34 +00:00
Simon Pilgrim dd879f7dc9 [DAG] Use APInt::extractBits instead of lshr().trunc(). NFCI.
Avoids so many APInt instances by directly using the APInt reference from getAPIntValue.
2021-02-16 13:50:33 +00:00
Kazu Hirata 22f00f61dd [CodeGen] Use range-based for loops (NFC) 2021-02-15 14:46:11 -08:00
Matt Arsenault 392e0fcfd1 GlobalISel: Handle arguments partially passed on the stack
The API is a bit awkward since you need to index into an array in the
passed struct. I guess an alternative would be to pass all of the
individual fields.
2021-02-15 17:06:14 -05:00
Matt Arsenault 1b3d8ddeb9 CodeGen: Move function to get subregister indexes to cover a LaneMask
Return the best covering index, and additional needed to complete the
mask. This logically belongs in TargetRegisterInfo, although I ended
up not needing it for why I originally split this out.
2021-02-15 17:05:37 -05:00
Craig Topper eb75f250fe [RISCV][LegalizeTypes] Try to expand BITREVERSE before promoting if the promoted BITREVERSE would expand anyway.
If we're going to end up expanding anyway, we should do it early
so we don't create extra operations to handle the bytes added by
promotion.

Simlilar was done for BSWAP previously.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D96681
2021-02-15 12:33:16 -08:00
Adrian Prantl 09b832e74f Support emitting complex expressions that include entry values
This patch enables AsmPrinter support for complex expression with
entry values. It shouldn't AsmPrinter's call whether these are safe or
not but the pass who introduces the DW_OP_LLVM_entry_value. This patch
on its own has no effect on clang.

Differential Revision: https://reviews.llvm.org/D96559
2021-02-15 11:09:09 -08:00
Simon Pilgrim e47f21da61 [DAG] visitVSELECT - move OpLHS == LHS into inner if() in USUBSAT matching. NFCI.
This will be necessary for the update of D25987 where we'll need to match OpLHS against other ops.
2021-02-15 18:27:00 +00:00
Caroline Concatto 2d728bbff5 [CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse
This patch adds  a new intrinsic experimental.vector.reduce that takes a single
vector and returns a vector of matching type but with the original lane order
 reversed. For example:

```
vector.reverse(<A,B,C,D>) ==> <D,C,B,A>
```

The new intrinsic supports fixed and scalable vectors types.
The fixed-width vector relies on shufflevector to maintain existing behaviour.
Scalable vector uses the new ISD node - VECTOR_REVERSE.

This new intrinsic is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].

Patch by Paul Walker (@paulwalker-arm).

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html

Differential Revision: https://reviews.llvm.org/D94883
2021-02-15 13:39:43 +00:00
Arlo Siemsen 080866470d Add ehcont section support
In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling.

This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker.

This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag.

The change includes a test that when the module flag is enabled the section is correctly generated.

The set of exception continuation information includes returns from exceptional control flow (catchret in llvm).

In order to collect catchret we:
1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation,
2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and
3) Combines these targets with the other EHCont targets that were already being collected.

Change originally authored by Daniel Frampton <dframpto@microsoft.com>

For more details, see MSVC documentation for `/guard:ehcont`
  https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D94835
2021-02-15 14:27:12 +08:00
Cassie Jones 97a1cdb156 [GlobalISel] Disable vector types in narrowScalarAddSub
The implementation for vectors is broken and doesn't seem to be used by
anything. Explicitly remove support for them, they can be added again
later when they're properly implemented.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D95699
2021-02-14 18:06:32 -05:00
Cassie Jones 36246388ba [GlobalISel] Extract a narrowScalarAddSub method. NFC
Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D95426
2021-02-14 18:06:32 -05:00
Kazu Hirata 910e2d1e57 [llvm] Use llvm::is_contained (NFC) 2021-02-14 08:36:20 -08:00
Kazu Hirata d5adba10f0 [CodeGen] Use range-based for loops (NFC) 2021-02-13 20:41:39 -08:00
Simon Pilgrim 6f5a805bbb [DAG] Fold i1/vXi1 saddsat/uaddsat(x,y) -> or(x,y)
Alive2: https://alive2.llvm.org/ce/z/FzcrpH
2021-02-13 15:02:01 +00:00
Simon Pilgrim 0df15e5eff [DAG] Fold i1/vXi1 ssubsat/usubsat(x,y) -> and(x,~y)
Alive2: https://alive2.llvm.org/ce/z/4nkNGh
2021-02-13 13:21:15 +00:00
Simon Pilgrim 60ba5397df [DAG] PromoteIntRes_ADDSUBSHLSAT - use promoted ISD::USUBSAT directly
As discussed on D96413, as long as the promoted bits of the args are zero we can use the basic ISD::USUBSAT pattern directly, without the shifting like we do for other ops.

I think something similar should be possible for ISD::UADDSAT as well, which I'll look at later.

Also, create a ISD::USUBSAT node directly - this will be expanded back by the legalizer later on if necessary.

Differential Revision: https://reviews.llvm.org/D96622
2021-02-13 12:35:10 +00:00
Simon Pilgrim 7ad0c573bd [DAG] Fix shift amount limit in SimplifyDemandedBits trunc(shift(x,c)) to truncated bitwidth
We lost this in D56387/rG69bc0990a9181e6eb86228276d2f59435a7fae67 - where I got the src/dst bitwidths mixed up and assumed getValidShiftAmountConstant would catch it.

Patch by @craig.topper - confirmed by @Carrot that it fixes PR49162
2021-02-13 12:00:08 +00:00
Kazu Hirata 905cf88d18 [CodeGen] Use range-based for loops (NFC) 2021-02-12 23:44:33 -08:00
Adrian Prantl 982b891905 Store the LocationKind of an entry value buffer independently from the main LocationKind (NFC)
This patch hides the logic for setting the location kind of an entry
value inside the begin/finalize/cancel functions. This way we get rid
the strange workaround that is currently in setLocation().

In the future, this will allow us to set the location kind of the
entry value independently from the location kind of the main
expression.

Differential Revision: https://reviews.llvm.org/D96554
2021-02-12 16:59:39 -08:00
Jay Foad 7c749baa3a [GlobalISel] Simpler verification of G_SEXT_INREG and G_ASSERT_ZEXT
There's no need to call verifyVectorElementMatch since we already know
that the source and destination types are identical.

Differential Revision: https://reviews.llvm.org/D96589
2021-02-12 21:33:27 +00:00
Amara Emerson 5d6d9b63a3 [GlobalISel] Propagate extends through G_PHIs into the incoming value blocks.
This combine tries to do inter-block hoisting of extends of G_PHIs, into the
originating blocks of the phi's incoming value. The idea is to expose further
optimization opportunities that are normally obscured by the PHI.

Some basic heuristics, and a target hook for AArch64 is added, to allow tuning.
E.g. if the extend is used by a G_PTR_ADD, it doesn't perform this combine
since it may be folded into the addressing mode during selection.

There are very minor code size improvements on AArch64 -Os, but the real benefit
is that it unlocks optimizations like AArch64 conditional compares on some
benchmarks.

Differential Revision: https://reviews.llvm.org/D95703
2021-02-12 11:52:52 -08:00
Simon Pilgrim 4841a225b7 [DAG] Move basic USUBSAT pattern matches from X86 to DAGCombine
Begin transitioning the X86 vector code to recognise sub(umax(a,b) ,b) or sub(a,umin(a,b)) USUBSAT patterns to make it more generic and available to all targets.

This initial patch just moves the basic umin/umax patterns to DAG, removing some vector-only checks on the way - these are some of the patterns that the legalizer will try to expand back to so we can be reasonably relaxed about matching these pre-legalization.

We can handle the trunc(sub(..))) variants as well, which helps with patterns where we were promoting to a wider type to detect overflow/saturation.

The remaining x86 code requires some cleanup first - some of it isn't actually tested etc. I also need to resurrect D25987.

Differential Revision: https://reviews.llvm.org/D96413
2021-02-12 18:22:57 +00:00
Lukas Sommer 6577cef9b0 [CodeGen] New pass: Replace vector intrinsics with call to vector library
This patch adds a pass to replace calls to vector intrinsics (i.e., LLVM
intrinsics operating on vector operands) with calls to a vector library.

Currently, calls to LLVM intrinsics are only replaced with calls to vector
libraries when scalar calls to intrinsics are vectorized by the Loop- or
SLP-Vectorizer.

With this pass, it is now possible to replace calls to LLVM intrinsics
already operating on vector operands, e.g., if such code was generated
by MLIR. For the replacement, information from the TargetLibraryInfo,
e.g., as specified via -vector-library is used.

This is a re-try of the original commit 2303e93e66 that was reverted
due to pass manager problems. Other minor changes have also been made.

Differential Revision: https://reviews.llvm.org/D95373
2021-02-12 12:53:27 -05:00
Akira Hatanaka ed4718eccb [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-12 09:51:57 -08:00
Petar Avramovic f0d65f4096 AMDGPU/GlobalISel: Calculate isKnownNeverNaN for fminnum and fmaxnum
Implements same logis as in SelectionDAG.
G_FMINNUM_IEEE and G_FMAXNUM_IEEE are never SNaN by definition and
never NaN when one operand is known non-NaN and other known non-SNaN.
G_FMINNUM and G_FMAXNUM are never NaN/SNaN when one of the operands
is known non-NaN/SNaN.

Differential Revision: https://reviews.llvm.org/D91716
2021-02-12 17:14:34 +01:00
Petar Avramovic 122c649c98 AMDGPU/GlobalISel: Check values of constants in isKnownNeverNaN
Differential Revision: https://reviews.llvm.org/D91714
2021-02-12 17:14:34 +01:00
Simon Pilgrim 2465541dc0 [DAG] DAGTypeLegalizer::PromoteIntRes_ADDSUBSHLSAT - break if-else chain. NFCI.
Style fixup - the if() block always returns so we can pull out the contents of the else() block.
2021-02-12 10:33:12 +00:00
Kazu Hirata d61b4cb9d8 [CodeGen] Use range-based for loops (NFC) 2021-02-11 23:31:31 -08:00
Amara Emerson de035c18cf [GlobalISel] Fix sext_inreg(load) combine to not move the originating load.
The builder was using the extend user as the insertion point, which meant that
we were incorrectly "moving" the load from its original position, and therefore
could violate memory operation ordering.
2021-02-11 19:27:09 -08:00
Snehasish Kumar 2c7077e67d [CodeGen] Split out cold exception handling pads.
Support for splitting exception handling pads was added in D73739. This
change updates the code to split out exception handling pads if profile
information indicates that they are cold. For a given function with
multiple landind pads, if one of them is hot they are all retained as
part of the hot code section.

Differential Revision: https://reviews.llvm.org/D96372
2021-02-11 11:23:43 -08:00
Snehasish Kumar d079dbc591 [CodeGen] Basic block sections should take precendence over splitting.
The use of basic block sections should take precedence over the machine
function splitting pass. Since they use the same underlying mechanism
they are kept exclusive. Updated the tests to check that split machine
functions is overridden by all flavours of basic block sections.

Differential Revision: https://reviews.llvm.org/D96392
2021-02-11 11:14:10 -08:00
Craig Topper 5744502a13 [TargetLowering][RISCV][AArch64][PowerPC] Enable BuildUDIV/BuildSDIV on illegal types before type legalization if we can find a larger legal type that supports MUL.
If we wait until the type is legalized, we'll lose information
about the orginal type and need to use larger magic constants.
This gets especially bad on RISCV64 where i64 is the only legal
type.

I've limited this to simple scalar types so it only works for
i8/i16/i32 which are most likely to occur. For more odd types
we might want to do a small promotion to a type where MULH is legal
instead.

Unfortunately, this does prevent some urem/srem+seteq matching since
that still require legal types.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D96210
2021-02-11 09:43:13 -08:00
Simon Pilgrim 5beebf9c58 [DAG] foldLogicOfSetCCs - Generalize and/or (setcc X, CMax, ne), (setcc X, CMin, ne/eq) fold. NFCI.
Prep work to add support for non-uniform vectors - replace APInt values with using the SDValue ops directly.
2021-02-11 17:09:01 +00:00
Thomas Preud'homme bad0290ce3 Improve STRICT_FSETCC codegen in absence of no NaN
As for SETCC, use a less expensive condition code when generating
STRICT_FSETCC if the node is known not to have Nan.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D91972
2021-02-11 14:19:43 +00:00
Joe Ellis 67464dfe36 [DebugInfo] Only perform TypeSize -> unsigned cast when necessary
This commit moves a line in SelectionDAGBuilder::handleDebugValue to
avoid implicitly casting a TypeSize object to an unsigned earlier than
necessary. It was possible that we bail out of the loop before the value
is ever used, which means we could create a superfluous TypeSize
warning.

Reviewed By: DavidTruby

Differential Revision: https://reviews.llvm.org/D96423
2021-02-11 13:54:09 +00:00
Max Kazantsev 418c218efa Return "[Codegenprepare][X86] Use usub with overflow opt for IV increment"
The patch did not account for one corner case where cmp does not dominate
the loop latch. This patch adds this check, hopefully it's cheap because
the CFG does not change during the transform, so DT queries should be
executed quickly.

If you see compile time slowness from this, please revert.

Differential Revision: https://reviews.llvm.org/D96119
2021-02-11 19:49:23 +07:00
Max Kazantsev 90081f3020 Revert "[Codegenprepare][X86] Use usub with overflow opt for IV increment"
This reverts commit 3d15b7e7df.

We've found an internal failure, need to analyze.
2021-02-11 17:52:11 +07:00
Max Kazantsev 3d15b7e7df [Codegenprepare][X86] Use usub with overflow opt for IV increment
Function `replaceMathCmpWithIntrinsic` artificially limits the scope
of the optimization, setting a requirement of two instructions be in
the same block, due to two reasons:
- usage of DT for more general check is costly in terms of compile time;
- risk of creating a new value that lives through multiple blocks.

Because of this, two semantically equivalent tests may be or not be the
subject of this opt depending on where the binary operation is located.
See `test/CodeGen/X86/usub_inc_iv.ll` for motivation

There is one important particular case where this limitation is  too strict:
it is when the binary operation is the increment of the induction variable.
As result, the application of this opt becomes fragile and highly reliant on
where other passes decide to place IV increment. In most cases, they place
it in the end of the latch block, killing the opt opportunity (when in fact it
does not matter where to insert the actual instruction).

This patch handles this particular case separately.
- The detector does not use dom tree and has constant cost;
- The value of IV or IV.next lives through all loop in any case, so this should not
  create a new unexpected long-living value.

As result, the transform becomes more robust. It also seems to lead to
better code generation in some cases (see `test/CodeGen/X86/lsr-loop-exit-cond.ll`).

Differential Revision: https://reviews.llvm.org/D96119
Reviewed By: spatel, reames
2021-02-11 11:59:45 +07:00
Kazu Hirata c5e90a8857 [AsmPrinter] Use range-based for loops (NFC) 2021-02-10 20:01:22 -08:00
Hongtao Yu 1cb47a063e [CSSPGO] Unblock optimizations with pseudo probe instrumentation.
The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:

1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95982
2021-02-10 12:43:17 -08:00
Jeremy Morse 1d68e0a075 Reland [DWARF] Location-less inlined variables should not have DW_TAG_variable
Originally landed in ddc2f1e3fb and reverted in d32deaab4d because of
a Generic test objecting. That was fixed up in 013613964f. Original
landing commit message follows:

[DWARF] Location-less inlined variables should not have DW_TAG_variable

Discussed in this thread:

  https://lists.llvm.org/pipermail/llvm-dev/2021-January/148139.html

DwarfDebug::collectEntityInfo accidentally distinguishes between variable
locations that never have a location specified, and variable locations that
have an empty location specified. The latter leads to the creation of an
empty variable referring to the abstract origin.

Fix this by seeking a non-empty location before producing a concrete
entity, to guarantee a DW_AT_location will be produced. Other loops in
collectEntityInfo and endFunctionImpl take care of examining the
retainedNodes collection and ensuring optimised-out variables are created.

Differential Revision: https://reviews.llvm.org/D95617
2021-02-10 15:40:47 +00:00
Luís Marques acac29ca42 [DAGCombiner] Don't fold FCOPYSIGN vector sign operand casts
Avoid doing the following combine for vector types:

```
copysign(x, fp_extend(y)) -> copysign(x, y)
copysign(x, fp_round(y)) -> copysign(x, y)
```

That combine seemed to impede the selection of vector instruction and cause
a mess in some circumstances.

Differential Revision: https://reviews.llvm.org/D96037
2021-02-10 14:25:24 +00:00
Sander de Smalen 750a78cd5d [ValueTypes] Add MVT for nxv1bf16.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96249
2021-02-10 08:50:41 +00:00
Kazu Hirata 7e75f6fc1d [SelectionDAG] Use range-based for loops (NFC) 2021-02-09 22:14:30 -08:00
Matt Arsenault b72a23650f GlobalISel: Fix using wrong calling convention for callees
This was taking the calling convention from the parent function,
instead of the callee. Avoids regressions in a future patch when the
caller and callee have different type breakdowns.

For some reason AArch64's lowerFormalArguments seems to intentionally
ignore the parent isVarArg.
2021-02-09 13:48:56 -05:00
Nico Weber de1966e542 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 4a64d8fe39.
Makes clang crash when buildling trivial iOS programs, see comment
after https://reviews.llvm.org/D92808#2551401
2021-02-09 11:06:32 -05:00
Nemanja Ivanovic a5222aa085 [DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets
As of commit 284f2bffc9, the DAG Combiner gets rid of the masking of the
input to this node if the mask only keeps the bottom 16 bits. This is because
the underlying library function does not use the high order bits. However, on
PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits
from the register. Therefore, the library implementation of __gnu_h2f_ieee will
return an incorrect result if the bits aren't cleared.

This combine is desired for ARM (and possibly other targets) so this patch adds
a query to Target Lowering to check if this zeroing needs to be kept.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092

Differential revision: https://reviews.llvm.org/D96283
2021-02-09 06:33:48 -06:00
Thomas Preud'homme a50ab8672d Revert STRICT_FCMP nonan optimisation
Summary: This reverts commit b7b61a7b5b which fails on some of the builders: http://lab.llvm.org:8011/#/builders/14/builds/5806

Reviewers:

Subscribers:
2021-02-09 11:27:35 +00:00
Thomas Preud'homme b7b61a7b5b Improve STRICT_FSETCC codegen in absence of no NaN
As for SETCC, use a less expensive condition code when generating
STRICT_FSETCC if the node is known not to have Nan.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D91972
2021-02-09 11:18:16 +00:00
Matt Arsenault 87e280110d GlobalISel: Use correct calling convention in handleAssignments
This was using the calling convention of the calling function, not the
callee. Avoids regressions in a future patch.
2021-02-08 17:09:28 -05:00
Amara Emerson ec41ed5b1b [AArch64][GlobalISel] Support the 'returned' parameter attribute.
On AArch64 (which seems to be the only target that supports it), this
attribute allows codegen to avoid saving/restoring the value in x0
across a call.

Gives a 0.1% geomean -Os code size improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D96099
2021-02-08 12:47:39 -08:00
Simon Pilgrim c5c690a835 [DAG] visitVECTOR_SHUFFLE - move shuffle legality check into MergeInnerShuffle lamda. NFCI.
This is going to be necessary for a future reuse of MergeInnerShuffle
2021-02-08 14:25:16 +00:00
Nicholas Guy cd880442ae [CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold
Different targets might handle branch performance differently, so this patch allows for
targets to specify the TailDuplicateSize threshold. Said threshold defines how small a branch
can be and still be duplicated to generate straight-line code instead.
This patch also specifies said override values for the AArch64 subtarget.

Differential Revision: https://reviews.llvm.org/D95631
2021-02-08 13:28:00 +00:00
Jeremy Morse c1d45abda5 Revert "Re-land D94976 after revert in e29552c5aff6"
Maskray has reported a fault with .debug_gnu_pubnames in the comments on
D94976, caused by this patch, reverting to investigate.

This reverts commit 8998f58435.
2021-02-08 12:41:12 +00:00
Jeremy Morse 6ade2dea7b Revert "DebugInfo: Temporarily work around -gsplit-dwarf + LTO .debug_gnu_pubnames regression after D94976"
Backing out this workaround to focus on fixing whatever's wrong with
.debug_gnu_pubnames, I'll revert the cause, (8998f584) in the next commit.

This reverts commit 56fa34ae35.
2021-02-08 12:41:01 +00:00
Kazu Hirata 7b9f6c2d42 [SelectionDAG] Drop unnecessary const from a return type (NFC)
Identified with const-return-type.
2021-02-07 09:49:33 -08:00
Simon Pilgrim 86dabf4226 [DAG] SelectionDAG::isSplatValue - handle OR/XOR cases
Add OR/XOR to the basic binops that we support when checking for a splat vector value
2021-02-07 13:27:57 +00:00
Fangrui Song e44a100942 .gcc_except_table: Set SHF_LINK_ORDER if binutils>=2.36, and drop unneeded unique ID for -fno-unique-section-names
GNU ld>=2.36 supports mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER sections in an
output section, so we can set SHF_LINK_ORDER if -fbinutils-version=2.36 or above.

If -fno-function-sections or older binutils, drop unique ID for -fno-unique-section-names.
The users can just specify -fbinutils-version=2.36 or above to allow GC with both GNU ld and LLD.
(LLD does not support garbage collection of non-group non-SHF_LINK_ORDER .gcc_except_table sections.)
2021-02-05 21:45:21 -08:00
Fangrui Song 853a264916 [AsmPrinter] __patchable_function_entries: Set SHF_LINK_ORDER for binutils 2.36 and above
This matches GCC behavior when the configure-time binutils is new. GNU ld<2.36
did not support mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER sections in an
output section, so we conservatively disable SHF_LINK_ORDER for <2.36.
2021-02-05 19:53:06 -08:00
Sanjay Patel c981f6f8e1 Revert "[Codegen][ReplaceWithVecLib] add pass to replace vector intrinsics with calls to vector library"
This reverts commit 2303e93e66.
Investigating bot failures.
2021-02-05 15:10:11 -05:00
Lukas Sommer 2303e93e66 [Codegen][ReplaceWithVecLib] add pass to replace vector intrinsics with calls to vector library
This patch adds a pass to replace calls to vector intrinsics
(i.e., LLVM intrinsics operating on vector operands) with
calls to a vector library.

Currently, calls to LLVM intrinsics are only replaced with
calls to vector libraries when scalar calls to intrinsics are
vectorized by the Loop- or SLP-Vectorizer.

With this pass, it is now possible to replace calls to LLVM
intrinsics already operating on vector operands, e.g., if
such code was generated by MLIR. For the replacement,
information from the TargetLibraryInfo, e.g., as specified
via -vector-library is used.

Differential Revision: https://reviews.llvm.org/D95373
2021-02-05 14:25:19 -05:00
Wouter van Oortmerssen e3c0b0fe09 [WebAssembly] locals can now be indirect in DWARF
This for example to indicate that byval args are represented by a pointer to a struct.
Followup to https://reviews.llvm.org/D94140

Differential Revision: https://reviews.llvm.org/D94347
2021-02-05 11:14:42 -08:00
Amy Huang 34f3249abd [DebugInfo] Fix error from D95893, where I accidentally used an
unsigned int in a loop and it wraps around.

Follow up to https://reviews.llvm.org/D95893
2021-02-05 10:25:21 -08:00
Huihui Zhang 1b81117f88 [DAGCombiner][SVE] Fix invalid use of getVectorNumElements() in visitSRA.
Make sure scalable property is preserved by using getVectorElementCount().

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D95967
2021-02-05 09:56:49 -08:00
Amy Huang a740af4de9 [CodeView][DebugInfo] Update the code for removing template arguments from the display name of a codeview function id.
Previously the code split the string at the first '<', which
incorrectly truncated names like `operator<`.

Differential Revision: https://reviews.llvm.org/D95893
2021-02-05 09:49:11 -08:00
Akira Hatanaka 4a64d8fe39 [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.

Original commit message:

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 06:09:42 -08:00
Akira Hatanaka 2fbbb18c1d Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 3fe3946d9a.

The commit violates layering by including a header from Analysis in
lib/IR/AutoUpgrade.cpp.
2021-02-05 06:00:05 -08:00
Akira Hatanaka 3fe3946d9a [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 05:55:18 -08:00
Guillaume Chatelet 4b15156dca [NFC] inline variable 2021-02-05 10:17:02 +00:00
Kazu Hirata 5438e079b1 [GlobalISel] Use ListSeparator (NFC) 2021-02-04 21:18:04 -08:00
Craig Topper 11ef356d9e [TargetLowering] Use Align in allowsMisalignedMemoryAccesses.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96097
2021-02-04 19:22:06 -08:00
Fangrui Song 56fa34ae35 DebugInfo: Temporarily work around -gsplit-dwarf + LTO .debug_gnu_pubnames regression after D94976
`-flto -gsplit-dwarf -g -O[123]` may create .debug_gnu_pubnames with 0 DIE
offset entries. llvm-dwarfdump -debug-gnu-pubnames/ld.lld --gdb-index errors for that.

```
        .section        .debug_gnu_pubnames,"",@progbits
        .long   .LpubNames_end2-.LpubNames_begin2 # Length of Public Names Info
.LpubNames_begin2:
        .short  2                               # DWARF Version
        .long   .Lcu_begin2                     # Offset of Compilation Unit Info
        .long   57                              # Compilation Unit Length
        .long   0                               # DIE offset
        .byte   16                              # Attributes: TYPE, EXTERNAL
        .asciz  "absl"                          # External Name
        .long   0                               # DIE offset
        .byte   16                              # Attributes: TYPE, EXTERNAL
        .asciz  "absl::base_internal"           # External Name
        .long   0                               # End Mark
```
2021-02-04 17:35:09 -08:00
Craig Topper 8cc9c42a0c [TargetLowering] Use LegalOnly operand to isOperationLegalOrCustom to simplify some code. NFC 2021-02-04 12:30:37 -08:00
Sanjay Patel 056d31dd2a [ExpandReductions] fix FMF requirement for fmin/fmax
The upstream callers (the vectorizers) were fixed with:
bbed5f2f8a ( D95690 )
77adbe6a8c

We should remove this pass entirely now that reduction
legalization/lowering is expected to work just as well,
but we need to confirm that the shuffle ops do not
regress (for x86 in particular).

This should be the last step needed to close:
https://llvm.org/PR23116
2021-02-04 13:32:08 -05:00
Jeremy Morse 8998f58435 Re-land D94976 after revert in e29552c5af
This modified patch avoids redirecting the unit in which a subprogram is
created if type units are enabled -- DIEs were getting children allocated
from different units memory pools. Original commit message:

[DWARF] Create subprogram's DIE in DISubprogram's unit

This is a fix for PR48790. Over in D70350, subprogram DIEs were permitted
to be shared between CUs. However, the creation of a subprogram DIE can be
triggered early, from other CUs. The subprogram definition is then created
in one CU, and when the function is actually emitted children are attached
to the subprogram that expect to be in another CU. This breaks internal CU
references in the children.

Fix this by redirecting the creation of subprogram DIEs in
getOrCreateContextDIE to the CU specified by it's DISubprogram definition.
This ensures that the subprogram DIE is always created in the correct CU.

Differential Revision: https://reviews.llvm.org/D94976
2021-02-04 11:17:18 +00:00