This allows functions and globals to to be reordered later in the linking phase
(using the -symbol-ordering-file) even though reordering will be limited to
the scope of the explicit section.
Patch by Rahman Lavaee!
Differential Revision: https://reviews.llvm.org/D65478
llvm-svn: 367501
X86 at least is able to use movmsk or kmov to move the mask to the scalar
domain. Then we can just use test instructions to test individual bits.
This is more efficient than extracting each mask element
individually.
I special cased v1i1 to use the previous behavior. This avoids
poor type legalization of bitcast of v1i1 to i1.
I've skipped expandload/compressstore as I think we need to
handle constant masks for those better first.
Many tests end up with duplicate test instructions due to tail
duplication in the branch folding pass. But the same thing
happens when constructing similar code in C. So its not unique
to the scalarization.
Not sure if this lowering code will also be good for other targets,
but we're only testing X86 today.
Differential Revision: https://reviews.llvm.org/D65319
llvm-svn: 367489
Summary: Honoring no signed zeroes is also available as a user control through clang separately regardless of fastmath or UnsafeFPMath context, DAG guards should reflect this context.
Reviewers: spatel, arsenm, hfinkel, wristow, craig.topper
Reviewed By: spatel
Subscribers: rampitec, foad, nhaehnle, wuzish, nemanjai, jvesely, wdng, javed.absar, MaskRay, jsji
Differential Revision: https://reviews.llvm.org/D65170
llvm-svn: 367486
Summary:
This will make it possible to improve IPRA by taking into account
register usage in indirect calls.
NFC yet; this is just laying the groundwork to start building
up patches to take advantage of the information for improved register
allocation.
Reviewers: aditya_nandakumar, volkan, qcolombet, arsenm, rovka, aemerson, paquette
Subscribers: sdardis, wdng, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65488
llvm-svn: 367476
This makes the field wider than MachineOperand::SubReg_TargetFlags so that
we don't end up silently truncating any higher bits. We should still catch
any bits truncated from the MachineOperand field as a consequence of the
assertion in MachineOperand::setTargetFlags().
Differential Revision: https://reviews.llvm.org/D65465
llvm-svn: 367474
to bail out in store merging dependence check.
We run into a case where dependence check in store merging bail out many times
for the same store and root nodes in a huge basicblock. That increases compile
time by almost 100x. The patch add a map to track how many times the bailing
out happen for the same store and root, and if it is over a limit, stop
considering the store with the same root as a merging candidate.
Differential Revision: https://reviews.llvm.org/D65174
llvm-svn: 367472
Add an option to control whether or not to enable store merging in dag combiner
so we can workaround some bugs more easily.
Differential Revision: https://reviews.llvm.org/D65482
llvm-svn: 367365
Addresses number of comment made on D64652 after commiting:
- Reorders function decls in the TargetLoweringObjectFileXCOFF class.
- Fix comment in MCSectionXCOFF to include description of external reference
csects.
- Convert several llvm_unreachables to report_fatal_error
- Convert several dyn_casts to casts as they are expected not to fail.
- Avoid copying DataLayout object.
llvm-svn: 367324
This allows us to peek through BITCASTs, attempt to simplify the source operand, and then bitcast back.
This reapplies rL367091 which was reverted at rL367118 - we were inconsistently peeking through the bitcasts to the source value.
Fixes PR42777
llvm-svn: 367174
If anything called the recursive isKnownNeverNaN/computeKnownBits/ComputeNumSignBits/SimplifyDemandedBits/SimplifyMultipleUseDemandedBits with an incorrect depth then we could continue to recurse if we'd already exceeded the depth limit.
This replaces the limit check (Depth == 6) with a (Depth >= 6) to make sure that we don't circumvent it.
This causes a couple of regressions as a mixture of calls (SimplifyMultipleUseDemandedBits + combineX86ShufflesRecursively) were calling with depths that were already over the limit. I've fixed SimplifyMultipleUseDemandedBits to not do this. combineX86ShufflesRecursively is trickier as we get a lot of regressions if we reduce its own limit from 8 to 6 (it also starts at Depth == 1 instead of Depth == 0 like the others....) - I'll see what I can do in future patches.
llvm-svn: 367171
We're getting reports of massive compile time increases because SimplifyMultipleUseDemandedBits was losing track of the depth and not earlying-out. No repro yet, but consider this a pre-emptive commit.
llvm-svn: 367169
Adds machine operand lowering for MCSymbolSDNodes to the PowerPC
backend. This is needed to produce call instructions in assembly for AIX
because the callee operand is a MCSymbolSDNode. The test is XFAIL'ed for
asserts due to a (valid) assertion in PEI that the AIX ABI isn't supported yet.
Differential Revision: https://reviews.llvm.org/D63738
llvm-svn: 367133
Eventually all of these will be moved over, but we create nodes in GetDemandedBits recursion at the moment which causes regressions when we try to remove them all.
llvm-svn: 367092
Summary:
In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun.
But the `early-ret` pass is before `block-placement`, we don't want to run it again.
This patch is to do the simple early return to optimize the blocks at the last of `block-placement`.
Below is an example
```
BB: | BB:
XOR 3, 3, 4 | XOR 3, 3, 4
B TBB | B ChainBB
... | ...
ChainBB: | ChainBB:
B TBB | ADD 3, 3, 4
... | BLR
TBB: |
ADD 3, 3, 4 |
BLR |
```
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D63972
llvm-svn: 367080
This allows every serializer format to implement metaSerializer() and
return the corresponding meta serializer.
Original llvm-svn: 366946
Reverted llvm-svn: 367004
This fixes the unit tests on Windows bots.
llvm-svn: 367078
Currently, stack protector loads and stores are resolved during
LocalStackSlotAllocation (if the pass needs to run). When this is the
case, the base register assigned to the frame access is going to be one
of the vregs created during LocalStackSlotAllocation. This means that we
are keeping a pointer to the stack protector slot, and we're using this
pointer to load and store to it.
In case register pressure goes up, we may end up spilling this pointer
to the stack, which can be a security concern.
Instead, leave it to PEI to resolve the frame accesses. In order to do
that, we make all stack protector accesses go through frame index
operands, then PEI will resolve this using an offset from sp/fp/bp.
Differential Revision: https://reviews.llvm.org/D64759
llvm-svn: 367068
Summary:
This was originally reported in D62818.
https://rise4fun.com/Alive/oPH
InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression
will be hoisted out of a loop if `Y` is invariant and `X` is not.
But as it is seen from the diffs here, if it didn't get hoisted,
the produced assembly is almost universally worse.
Much like with my recent "hoist add/sub by/from const" patches,
we should get almost universal win if we hoist constant,
there is almost always an "and/test by imm" instruction,
but "shift of imm" not so much, so we may avoid having to
materialize the immediate, and thus need one less register.
And since we now shift not by constant, but by something else,
the live-range of that something else may reduce.
Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit`
instruction pattern. And to not get into endless combine loop.
Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm
Reviewed By: spatel
Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62871
llvm-svn: 366955
This introduces a new family of combiner helper routines that re-use the
target specific cost model from SelectionDAG, and generate inline implementations
of the memcpy family of intrinsics.
The combines are only enabled at optimization levels higher than -O0, and give
very substantial performance improvements.
Differential Revision: https://reviews.llvm.org/D65167
llvm-svn: 366951
r366317 added a legalization for s128 G_ICMP narrow scalar which tried to hard
code the result type of the new legalized G_SELECT. Change this to instead use
type of the original G_ICMP result and allow the target to legalize it if necessary
later.
llvm-svn: 366943
This patch adds support for recognizing cases where a larger vector type is being used to reduce just the elements in the lower subvector:
e.g. <8 x i32> reduction pattern in a <16 x i32> vector:
<4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u>
<2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u>
<1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u>
matchBinOpReduction returns the lower extracted subvector in such cases, assuming isExtractSubvectorCheap accepts the extraction.
I've only enabled it for X86 reduction sums so far. I intend to enable it for the bitop/minmax cases in future patches, and eventually I think its worth turning it on all the time. This is mainly just a case of ensuring calls to matchBinOpReduction don't make assumptions on the vector width based on the original vector extraction.
Fixes the x86 partial reduction sum cases in PR33758 and PR42023.
Differential Revision: https://reviews.llvm.org/D65047
llvm-svn: 366933
If we are already using the same chain for the old/new memory ops then just return.
Fixes PR42727 which had getLoad() reusing an existing node.
llvm-svn: 366922
If all the demanded elts are from one operand and are inline, then we can use the operand directly.
The changes are mainly from SSE41 targets which has blendvpd but not cmpgtq, allowing the v2i64 comparison to be simplified as we only need the signbit from alternate v4i32 elements.
llvm-svn: 366817
This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured.
The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use.
We do see a couple of regressions that need to be addressed:
AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better).
X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG.
The code owners have confirmed its ok for these cases to fixed up in future patches.
Differential Revision: https://reviews.llvm.org/D63281
llvm-svn: 366799
The function was calling getNode() on an SDValue to return and the
caller turned the result back into a SDValue. So just return the
original SDValue to avoid this.
llvm-svn: 366779
We were silently using the ABI alignment for all of the stores generated for deopt and gc values. We'd gotten the alignment of the stack slot itself properly reduced (via MachineFrameInfo's clamping), but having the MMO on the store incorrect was enough for us to generate an aligned store to a unaligned location.
The simplest fix would have been to just pass the alignment to the helper function, but once we do that, the helper function doesn't really help. So, inline it and directly call the MMO version of DAG.getStore with a properly constructed MMO.
Note that there's a separate performance possibility here. Even if we *can* realign stacks, we probably don't *want to* if all of the stores are in slowpaths. But that's a later patch, if at all. :)
llvm-svn: 366765
Stubs out a TargetLoweringObjectFileXCOFF class, implementing only
SelectSectionForGlobal for common symbols. Also adds an override of
EmitGlobalVariable in PPCAIXAsmPrinter which adds a number of defensive errors
and adds support for emitting common globals.
llvm-svn: 366727
ARM has code to recognise uses of the "returned" function parameter
attribute which guarantee that the value passed to the function in r0
will be returned in r0 unmodified. IPRA replaces the regmask on call
instructions, so needs to be told about this to avoid reverting the
optimisation.
Differential revision: https://reviews.llvm.org/D64986
llvm-svn: 366669
Summary:
Four things here:
1. Generalize the fold to handle non-splat divisors. Reasonably trivial.
2. Unban power-of-two divisors. I don't see any reason why they should
be illegal.
* There is no ban in Hacker's Delight
* I think the ban came from the same bug that caused the miscompile
in the base patch - in `floor((2^W - 1) / D)` we were dividing by
`D0` instead of `D`, and we **were** ensuring that `D0` is not `1`,
which made sense.
3. Unban `1` divisors. I no longer believe Hacker's Delight actually says
that the fold is invalid for `D = 0`. Further considerations:
* We know that
* `(X u% 1) == 0` can be constant-folded to `1`,
* `(X u% 1) != 0` can be constant-folded to `0`,
* Also, we know that
* `X u<= -1` can be constant-folded to `1`,
* `X u> -1` can be constant-folded to `0`,
* https://godbolt.org/z/7jnZJXhttps://rise4fun.com/Alive/oF6p
* We know will end up with the following:
`(setule/setugt (rotr (mul N, P), K), Q)`
* Therefore, for given new DAG nodes and comparison predicates
(`ule`/`ugt`), we will still produce the correct answer if:
`Q` is a all-ones constant; and both `P` and `K` are *anything*
other than `undef`.
* The fold will indeed produce `Q = all-ones`.
4. Try to re-splat the `P` and `K` vectors - we don't care about
their values for the lanes where divisor was `1`.
Reviewers: RKSimon, hermord, craig.topper, spatel, xbolva00
Reviewed By: RKSimon
Subscribers: hiraditya, javed.absar, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63963
llvm-svn: 366637
The top-level BUNDLE instruction should behave as an ordinary
instruction. It is supposed to have all relevant registers as implicit
operands. Moving it should work as any other instruction. I believe
the assert intended to avoid moving instructions inside bundles.
llvm-svn: 366605
This was handled previously for arguments split due to not fitting in
an MVT. This was dropping the register for argument registers split
due to TLI::getRegisterTypeForCallingConv.
llvm-svn: 366574
Summary:
Current PRE hoists common computations into
CMBB = DT->findNearestCommonDominator(MBB, MBB1).
However, if CMBB is in a hot loop body, we might get performance
degradation.
Differential Revision: https://reviews.llvm.org/D64394
llvm-svn: 366570
If a function definition is not exact, then the linker could select a
differently-compiled version of it, which could use different registers.
https://reviews.llvm.org/D64909
llvm-svn: 366557
Summary:
Inline asm doesn't use labels when compiled as an object file. Therefore, we
shouldn't create one for the (potential) callbr destination. Instead, use the
symbol for the MachineBasicBlock.
Reviewers: nickdesaulniers, craig.topper
Reviewed By: nickdesaulniers
Subscribers: xbolva00, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64888
llvm-svn: 366523
I plan on adding memcpy optimizations in the GlobalISel pipeline, but we can't
do that unless we delay lowering to actual function calls. This patch changes
the translator to generate G_INTRINSIC_W_SIDE_EFFECTS for these functions, and
then have each target specify that using the new custom legalizer for intrinsics
hook that they want it expanded it a libcall.
Differential Revision: https://reviews.llvm.org/D64895
llvm-svn: 366516
This is a small extension of !associated, mostly useful for the implementation
convenience of instrumentation passes that RAUW globals with aliases, such
as LowerTypeTests.
Differential Revision: https://reviews.llvm.org/D64951
llvm-svn: 366502
The LocalStackSlotPass pre-allocates a stack protector and makes sure
that it comes before the local variables on the stack.
We need to make sure that later during PEI we don't re-allocate a new
stack protector slot. If that happens, the new stack protector slot will
end up being **after** the local variables that it should be protecting.
Therefore, we would have two slots assigned for two different stack
protectors, one at the top of the stack, and one at the bottom. Since
PEI will overwrite the assigned slot for the stack protector, the load
that is used to compare the value of the stack protector will use the
slot assigned by PEI, which is wrong.
For this, we need to check if the object is pre-allocated, and re-use
that pre-allocated slot.
Differential Revision: https://reviews.llvm.org/D64757
llvm-svn: 366371
Extract the sources to the GCD of the original size and target size,
padding with implicit_def as necessary.
Also fix the case where the requested source type is wider than the
original result type. This was ignoring the type, and just using the
destination. Do the operation in the requested type and truncate back.
llvm-svn: 366367
Use an anyext to the requested type for the leftover operand to
produce a slightly wider type, and then truncate the final merge.
I have another implementation almost ready which handles arbitrary
widens, but I think it produces worse code in this example (which I
think is 90% due to not folding redundant copies or folding out
implicit_def users), so I wanted to add this as a baseline first.
llvm-svn: 366366
Implement IR intrinsics for stack tagging. Generated code is very
unoptimized for now.
Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are
used to implement a tagged stack frame pointer in a virtual register.
Differential Revision: https://reviews.llvm.org/D64172
llvm-svn: 366360
The original behavior was to always emit the offsets to each call site in the
call site table as uleb128 values, however on some architectures (eg RISCV)
these uleb128 offsets into the code cannot always be resolved until link time
(because relaxation will invalidate any calculated offsets), and there are no
appropriate relocations for uleb128 values. As a consequence it needs to be
possible to specify an alternative.
This also switches RISCV to use DW_EH_PE_udata4 for call side encodings in
.gcc_except_table
Differential Revision: https://reviews.llvm.org/D63415
Patch by Edward Jones.
llvm-svn: 366329
This patch sets correct encodings for DWARF exception handling for RISC-V
(other than call site encoding, which must be udata4 rather than uleb128 and
is handled by D63415).
This has the same intend as D63409, except this version matches GCC/binutils
behaviour which uses the same encodings regardless of PIC/non-PIC and
medlow/medany code model.
llvm-svn: 366327
Add narrowScalar to half of original size for G_ICMP.
ClampScalar G_ICMP's operands 2 and 3 to to s32.
Select G_ICMP for pointers for MIPS32. Pointer compare is same
as for integers, it is enough to declare them as legal type.
Differential Revision: https://reviews.llvm.org/D64856
llvm-svn: 366317
AMDGPU needs to allocate special argument registers separately from
the user function argument list, so needs direct control over the
CCState.
The ArgLocs argument is only really necessary because CCState doesn't
allow access to it.
llvm-svn: 366279
D64033 <https://reviews.llvm.org/D64033> added DW_AT_call_column for
inline sites. However, that change wasn't aware of "-gno-column-info".
To avoid adding column info when "-gno-column-info" is used, now
DW_AT_call_column is only added when we have non-zero column (when
"-gno-column-info" is used, column will be zero).
Patch by Wenlei He!
Differential Revision: https://reviews.llvm.org/D64784
llvm-svn: 366264
Reimplement scheduling constraints for strict FP instructions in
ScheduleDAGInstrs::buildSchedGraph to allow for more relaxed
scheduling. Specifially, allow one strict FP instruction to
be scheduled across another, as long as it is not moved across
any global barrier.
Differential Revision: https://reviews.llvm.org/D64412
Reviewed By: cameron.mcinally
llvm-svn: 366222
Summary:
As per title. DAGCombiner only mathes the special case where b = 0, this patches extends the pattern to match any value of b.
Depends on D57302
Reviewers: hfinkel, RKSimon, craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59208
llvm-svn: 366214
Summary:
We agreed to rename `except_ref` to `exnref` for consistency with other
reference types in
https://github.com/WebAssembly/exception-handling/issues/79. This also
renames WebAssemblyInstrExceptRef.td to WebAssemblyInstrRef.td in order
to use the file for other reference types in future.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64703
llvm-svn: 366145
The column field is missing for all inline sites, currently it's always
zero. This changes populates DW_AT_call_column field for inline sites.
Test case modified to cover this change.
Patch by: Wenlei He
Differential revision: https://reviews.llvm.org/D64033
llvm-svn: 365945
Summary:
Problem exposed in PowerPC functional testing.
We did not consider Anti dependence for nodes in same cycle,
so we may end up generating bad machine code.
eg: the reduced test won't verify.
*** Bad machine code: Using an undefined physical register ***
- function: lame_encode_buffer_interleaved
- basic block: %bb.4 (0x4bde4e12928)
- instruction: %29:gprc = ADDZE %27:gprc, implicit-def dead $carry, implicit $carry
- operand 3: implicit $carry
Reviewers: bcahoon, kparzysz, hfinkel
Subscribers: MaskRay, wuzish, nemanjai, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64192
llvm-svn: 365859
We already split extract_subvector(binop(insert_subvector(v,x),insert_subvector(w,y))) -> binop(x,y).
This patch adds support for extract_subvector(binop(concat_vectors(),concat_vectors())) cases as well.
In particular this means we don't have to wait for X86 lowering to convert concat_vectors to insert_subvector chains, which helps avoid some cases where demandedelts/combine calls occur too late to split large vector ops.
The fast-isel-store.ll load folding regression is annoying but I don't think is that critical.
Differential Revision: https://reviews.llvm.org/D63653
llvm-svn: 365785
If we have:
R = sub X, Y
P = cmp Y, X
...then flipping the operands in the compare instruction can allow using a subtract that sets compare flags.
Motivated by diffs in D58875 - not sure if this changes anything there,
but this seems like a good thing independent of that.
There's a more involved version of this transform already in IR (in instcombine
although that seems misplaced to me) - see "swapMayExposeCSEOpportunities()".
Differential Revision: https://reviews.llvm.org/D63958
llvm-svn: 365711
Since we have distinct types for pointers and scalars, G_INTTOPTRs can sometimes
obstruct attempts to find constant source values. These usually come about when
try to do some kind of null pointer check. Teaching getConstantVRegValWithLookThrough
about this operation allows the CBZ/CBNZ optimization to catch more cases.
This change also improves the case where we can't find a constant source at all.
Previously we would emit a cmp, cset and tbnz for that. Now we try to just emit
a cmp and conditional branch, saving an instruction.
The cumulative code size improvement of this change plus D64354 is 5.5% geomean
on arm64 CTMark -O0.
Differential Revision: https://reviews.llvm.org/D64377
llvm-svn: 365690
Summary: Unsafe does not map well alone for each of these three cases as it is missing NoNan context when accessed directly with clang. I have migrated the fold guards to reflect the expectations of handing nan and zero contexts directly (NoNan, NSZ) and some tests with it. Unsafe does include NSZ, however there is already precedent for using the target option directly to reflect that context.
Reviewers: spatel, wristow, hfinkel, craig.topper, arsenm
Reviewed By: arsenm
Subscribers: michele.scandale, wdng, javed.absar
Differential Revision: https://reviews.llvm.org/D64450
llvm-svn: 365679
In SelectionDAG AMDGPU treated these as legal, but this was mostly
because the bitcasts required for FP types were painful. Theoretically
the bitpattern should eventually match to bfi, so don't bother trying
to get the patterns to import.
llvm-svn: 365583
Basically the problem is that X86 doesn't set the Fast flag from
allowsMemoryAccess on certain CPUs due to slow unaligned memory
subtarget features. This prevents bitcasts from being folded into
loads and stores. But all vector loads and stores of the same width
are the same cost on X86.
This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it.
Differential Revision: https://reviews.llvm.org/D64295
llvm-svn: 365549
If we have an icmp->brcond->br sequence where the brcond just branches to the
next block jumping over the br, while the br takes the false edge, then we can
modify the conditional branch to jump to the br's target while inverting the
condition of the incoming icmp. This means we can eliminate the br as an
unconditional branch to the fallthrough block.
Differential Revision: https://reviews.llvm.org/D64354
llvm-svn: 365510
Select gprb or fprb when def/use register operand of G_PHI is
used/defined by either:
copy to/from physical register or
instruction with only one mapping available for that use/def operand.
Integer s64 phi is handled with narrowScalar when mapping is applied,
produced artifacts are combined away. Manually set gprb to all register
operands of instructions created during narrowScalar.
Differential Revision: https://reviews.llvm.org/D64351
llvm-svn: 365494
This makes the functions in Loads.h require a type to be specified
independently of the pointer Value so that when pointers have no structure
other than address-space, it can still do its job.
Most callers had an obvious memory operation handy to provide this type, but a
SROA and ArgumentPromotion were doing more complicated analysis. They get
updated to merge the properties of the various instructions they were
considering.
llvm-svn: 365468
Dump the DWARF information about call sites and call site parameters into
debug info sections.
The patch also provides an interface for the interpretation of instructions
that could load values of a call site parameters in order to generate DWARF
about the call site parameters.
([13/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D60716
llvm-svn: 365467
DAGTypeLegalizer and SelectionDAGLegalize has helper
functions wrapping the call to TLI.getSetCCResultType(...).
Use those helpers in more places.
llvm-svn: 365456
Summary:
Make sure we use SETGE instead of SETGT when checking
if the sign bit is zero at SMULFIXSAT expansion.
The faulty expansion occured when doing "expand" of
SMULFIXSAT and the scale was exactly matching the
size of the smaller type. For example doing
i64 Z = SMULFIXSAT X, Y, 32
and expanding X/Y/Z into using two i32 values.
The problem was that we sometimes did not saturate
to min when overflowing.
Here is an example using Q3.4 numbers:
Consider that we are multiplying X and Y.
X = 0x80 (-8.0 as Q3.4)
Y = 0x20 (2.0 as Q3.4)
To avoid loss of precision we do a widening
multiplication, getting a 16 bit result
Z = 0xF000 (-16.0 as Q7.8)
To detect negative overflow we should check if
the five most significant bits in Z are less than -1.
Assume that we name the 4 most significant bits
as HH and the next 4 bits as HL. Then we can do the
check by examining if
(HH < -1) or (HH == -1 && "sign bit in HL is zero").
The fault was that we have been doing the check as
(HH < -1) or (HH == -1 && HL > 0)
instead of
(HH < -1) or (HH == -1 && HL >= 0).
In our example HH is -1 and HL is 0, so the old
code did not trigger saturation and simply truncated
the result to 0x00 (0.0). With the bugfix we instead
detect that we should saturate to min, and the result
will be set to 0x80 (-8.0).
Reviewers: leonardchan, bevinh
Reviewed By: leonardchan
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64331
llvm-svn: 365455
Emit replacements for clobbered parameters location if the parameter
has unmodified value throughout the funciton. This is basic scenario
where we can use the debug entry values.
([12/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D58042
llvm-svn: 365444
Summary:
This is exposed by functional testing on PowerPC.
In some pipelined loops, Phi refer to phi did not get value defined by
the Phi, hence getting wrong value later.
As the comment mentioned, we should "use the value defined by the Phi,
unless we're generating the firstepilog and the Phi refers to a Phi
in a different stage.", so Phi refering to same stage Phi should use
the value defined by the Phi here.
Reviewers: bcahoon, hfinkel
Reviewed By: hfinkel
Subscribers: MaskRay, wuzish, nemanjai, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64035
llvm-svn: 365428
Summary:
This makes it so that IR files using triples without an environment work
out of the box, without normalizing them.
Typically, the MSVC behavior is more desirable. For example, it tends to
enable things like constant merging, use of associative comdats, etc.
Addresses PR42491
Reviewers: compnerd
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64109
llvm-svn: 365387
This is extremly slow on AMDGPU, which has a lot of physical register
and a lot of register classes.
determineCalleeSaves, via MachineRegisterInfo::isPhysRegUsed already
added all of the super registers to the saved set.
llvm-svn: 365370
Don't do this locally, computeKnownBits does this better (and can handle non-constant cases as well).
A next step would be to actually simplify non-constant elements - building on what we already do in SimplifyDemandedVectorElts.
llvm-svn: 365309
Some out of tree backend require larger vector type. Since maintaining the changes out of tree is difficult due to the many manual changes needed when adding a new type we are adding it even if no backend currently use it.
Differential Revision: https://reviews.llvm.org/D64141
Patch by Thomas Raoux!
llvm-svn: 365274
Although removeCopyByCommutingDef deals with full copies, it is still
possible to copy undef lanes and thus, we wouldn't have any a value
number for these lanes.
This fixes PR40215.
llvm-svn: 365256
I'm not sure if it's worth it or not to add a hook to disable the pass
for an arbitrary function.
This pass is taking up to 5% of compile time in tiny programs by
iterating through all of the physical registers in every register
class. This pass should be rewritten in terms of regunits. For now,
skip doing anything for entry point functions. The vast majority of
functions in the real world aren't callable, so just not running this
will give the majority of the benefit.
llvm-svn: 365255
When looking for uses/defs to add kill flags, the iterator was double
incremented, skipping the first instruction in the bundle. The use
register in the first bundle instruction was then incorrectly killed.
The "First" instruction should be the BUNDLE itself as the proper
reverse iterator endpoint.
llvm-svn: 365216
Revision r365061 changed a skip of debug instructions for a skip
of meta instructions. This is not safe, as IMPLICIT_DEF is classed
as a meta instruction.
llvm-svn: 365198
Summary:
The uaddo won't be removed and the addcarry will still be
dependent on the uaddo. So we'll just increase the use count
of X and Y and potentially require a COPY.
Reviewers: spatel, RKSimon, deadalnix
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64190
llvm-svn: 365149
We previously marked all the tests with branch funnels as
`-verify-machineinstrs=0`.
This is an attempt to fix it.
1) `ICALL_BRANCH_FUNNEL` has no defs. Mark it as `let OutOperandList =
(outs)`
2) After that we hit an assert: ``` Assertion failed: (Op.getValueType()
!= MVT::Other && Op.getValueType() != MVT::Glue && "Chain and glue
operands should occur at end of operand list!"), function AddOperand,
file
/Users/francisvm/llvm/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp,
line 461. ```
The chain operand was added at the beginning of the operand list. Move
that to the end.
3) After that we hit another verifier issue in the pseudo expansion
where the registers used in the cmps and jmps are not added to the
livein lists. Add the `EFLAGS` to all the new MBBs that we create.
PR39436
Differential Review: https://reviews.llvm.org/D54155
llvm-svn: 365058
Summary:
This diff improve the capability of DAGCOmbine to generate linear carries propagation in presence of a diamond pattern. It is now able to match a large variety of different patterns rather than some hardcoded one.
Arguably, the codegen in test cases is not better, but this is to be expected. The goal of this transformation is more about canonicalisation than actual optimisation.
Reviewers: hfinkel, RKSimon, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D57302
llvm-svn: 365051
When a target intrinsic has been determined to touch memory, we construct a MachineMemOperand during SDAG construction. In this case, we should propagate AAMDNodes metadata to the MachineMemOperand where available.
Differential revision: https://reviews.llvm.org/D64131
llvm-svn: 365043
For Thumb2, we prefer low regs (costPerUse = 0) to allow narrow
encoding. However, current allocation order is like:
R0-R3, R12, LR, R4-R11
As a result, a lot of instructs that use R12/LR will be wide instrs.
This patch changes the allocation order to:
R0-R7, R12, LR, R8-R11
for thumb2 and -Osize.
In most cases, there is no extra push/pop instrs as they will be folded
into existing ones. There might be slight performance impact due to more
stack usage, so we only enable it when opt for min size.
https://reviews.llvm.org/D30324
llvm-svn: 365014
Summary:
This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]].
In middle-end, we'd want to prefer the form with two adds - D63992,
but as this diff shows, not every target will prefer that pattern.
Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars,
but only X86 prefer that same pattern for vectors.
Here i'm adding a new TLI hook, always defaulting to the inc-of-add,
but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars.
Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel
Reviewed By: efriedma
Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64090
llvm-svn: 365010
The internal option added with r323870 has a typo. It isn't being used
by any tests, but I decided to fix the spelling and leave it in for use
in debugging the changes added in that patch.
llvm-svn: 364958
The code for duplicating instructions could sometimes try to emit copies
intended to deal with unconstrainable register classes to the tail block of the
original instruction, rather than before the newly cloned instruction in the
predecessor block.
This was exposed by GlobalISel on arm64.
Differential Revision: https://reviews.llvm.org/D64049
llvm-svn: 364888
For a given floating point load / store pair, if the load value isn't used by any other operations,
then consider transforming the pair to integer load / store operations if the target deems the transformation profitable.
And we can exploiting much more when there are other operation nodes with chain operand between the load/store pair
so long as we keep the chain ordering original. We only replace the register used to load/store from float to integer.
I only add testcase in ARM because the TLI.isDesirableToTransformToIntegerOp hook is only enabled in ARM target.
Differential Revision: https://reviews.llvm.org/D60601
llvm-svn: 364883
If the requested source type an be used as a merge source type, create
a merge of merges. This avoids creating large, illegal extensions and
bit-ops directly to the result type.
llvm-svn: 364841
https://reviews.llvm.org/D31359
Add a hook "legalizeInstrinsic" to allow backends to override this
and custom lower/legalize intrinsics.
llvm-svn: 364821
Fix stack-use-after-scope errors from r364512. One instance was already
fixed in r364611 - this patch simplifies that fix and addresses one more
instance of similar code.
Discussed in: https://reviews.llvm.org/D63905
llvm-svn: 364778
The SDAGBuilder behavior stems from the days when we didn't have fast
math flags available in SDAG. We do now and doing the transformation in
the legalizer has the advantage that it also works for vector types.
llvm-svn: 364743
This patch addresses PR41675, where a stack-pointer variable is dereferenced
too many times by its location expression, presenting a value on the stack as
the pointer to the stack.
The difference between a stack *pointer* DBG_VALUE and one that refers to a
value on the stack, is currently the indirect flag. However the DWARF backend
will also try to guess whether something is a memory location or not, based
on whether there is any computation in the location expression. By simply
prepending the stack offset to existing expressions, we can accidentally
convert a register location into a memory location, which introduces a
suprise (and unintended) dereference.
The solution is to add DW_OP_stack_value whenever we add a DIExpression
computation to a stack *pointer*. It's an implicit location computed on the
expression stack, thus needs to be flagged as a stack_value.
For the edge case where the offset is zero and the location could be a register
location, DIExpression::prepend will still generate opcodes, and thus
DW_OP_stack_value must still be added.
Differential Revision: https://reviews.llvm.org/D63429
llvm-svn: 364736
Backend changes to enable WLS/LE low-overhead loops for armv8.1-m:
1) Use TTI to communicate to the HardwareLoop pass that we should try
to generate intrinsics that guard the loop entry, as well as setting
the loop trip count.
2) Lower the BRCOND that uses said intrinsic to an Arm specific node:
ARMWLS.
3) ISelDAGToDAG the node to a new pseudo instruction:
t2WhileLoopStart.
4) Add support in ArmLowOverheadLoops to handle the new pseudo
instruction.
Differential Revision: https://reviews.llvm.org/D63816
llvm-svn: 364733
Introduce llvm.test.set.loop.iterations which sets the loop counter
and also produces an i1 after testing that the count is not zero.
Differential Revision: https://reviews.llvm.org/D63809
llvm-svn: 364628
The new switch lowering code that tries to generate jump tables and range checks
were tested at -O0 on arm64, but on -O3 the generic switch lowering code goes to
town on trying to generate optimized lowerings, e.g. multiple jump tables, range
checks etc. This exposed bugs in the way PHI nodes are handled because the CFG
looks even stranger after all of this is done.
llvm-svn: 364613
This patch intends to fix ASAN stack-use-after-scope error.
This is at least a short-term fix to unbreak LLVM's mainline.
Differential Revision: https://reviews.llvm.org/D63905
llvm-svn: 364611
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.
This is a recommit, the original commit rL364563 was reverted in rL364568
because test-suite detected miscompile - the new comparison constant 'Q'
was being computed incorrectly (we divided by `D0` instead of `D`).
Original patch D50222 by @hermord (Dmytro Shynkevych)
Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.
Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00
Reviewed By: RKSimon, xbolva00
Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63391
llvm-svn: 364600
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
Original patch D50222 by @hermord (Dmytro Shynkevych)
This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.
Original patch author: @hermord (Dmytro Shynkevych)!
Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.
Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00
Reviewed By: RKSimon, xbolva00
Subscribers: xbolva00, javed.absar, llvm-commits, hermord
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63391
llvm-svn: 364563
Emit replacements for clobbered parameters location if the parameter
has unmodified value throughout the funciton. This is basic scenario
where we can use the debug entry values.
([12/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D58042
llvm-svn: 364553
Add the IR and the AsmPrinter parts for handling of the DW_OP_entry_values
DWARF operation.
([11/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D60866
llvm-svn: 364542
Handle call instruction replacements and deletions in order to preserve
valid state of the call site info of the MachineFunction.
NOTE: If the call site info is enabled for a new target, the assertion from
the MachineFunction::DeleteMachineInstr() should help to locate places
where the updateCallSiteInfo() should be called in order to preserve valid
state of the call site info.
([10/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D61062
llvm-svn: 364536
While lowering calls, collect info about registers that forward arguments
into following function frame. We store such info into the MachineFunction
of the call. This is used very late when dumping DWARF info about
call site parameters.
([9/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D60715
llvm-svn: 364516
Once MIR code leaves SSA form and the liveness of a vreg is considered,
DBG_VALUE insts are able to refer to non-live vregs, because their
debug-uses do not contribute to liveness. This non-liveness becomes
problematic for optimizations like register coalescing, as they can't
``see'' the debug uses in the liveness analyses.
As a result registers get coalesced regardless of debug uses, and that can
lead to invalid variable locations containing unexpected values. In the
added test case, the first vreg operand of ADD32rr is merged with various
copies of the vreg (great for performance), but a DBG_VALUE of the
unmodified operand is blindly updated to the modified operand. This changes
what value the variable will appear to have in a debugger.
Fix this by changing any DBG_VALUE whose operand will be resurrected by
register coalescing to be a $noreg DBG_VALUE, i.e. give the variable no
location. This is an overapproximation as some coalesced locations are
safe (others are not) -- an extra domination analysis would be required to
work out which, and it would be better if we just don't generate non-live
DBG_VALUEs.
This fixes PR40010.
Differential Revision: https://reviews.llvm.org/D56151
llvm-svn: 364515
Remove the last use of packRegs from IRTranslator and delete
pack/unpackRegs. This introduces a fallback to DAGISel for intrinsics
with aggregate arguments, since we don't have a testcase for them so
it's hard to tell how we'd want to handle them.
Discussed in https://reviews.llvm.org/D63551
llvm-svn: 364514
Change the interface of CallLowering::lowerCall to accept several
virtual registers for each argument, instead of just one. This is a
follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660 and
lowerFormalArguments in D63549.
With this change, we no longer pack the virtual registers generated for
aggregates into one big lump before delegating to the target. Therefore,
the target can decide itself whether it wants to handle them as separate
pieces or use one big register.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
NFCI for AMDGPU, Mips and X86.
Differential Revision: https://reviews.llvm.org/D63551
llvm-svn: 364512
Change the interface of CallLowering::lowerCall to accept several
virtual registers for the call result, instead of just one. This is a
follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660 and
lowerFormalArguments in D63549.
With this change, we no longer pack the virtual registers generated for
aggregates into one big lump before delegating to the target. Therefore,
the target can decide itself whether it wants to handle them as separate
pieces or use one big register.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
NFCI for AMDGPU, Mips and X86.
Differential Revision: https://reviews.llvm.org/D63550
llvm-svn: 364511
Change the interface of CallLowering::lowerFormalArguments to accept
several virtual registers for each formal argument, instead of just one.
This is a follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660. lowerCall
will be refactored in the same way in follow-up patches.
With this change, we forward the virtual registers generated for
aggregates to CallLowering. Therefore, the target can decide itself
whether it wants to handle them as separate pieces or use one big
register. We also copy the pack/unpackRegs helpers to CallLowering to
facilitate this.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was
put into a s64 instead of a p0. Added a test-case which illustrates the
problem more clearly (it crashes without this patch) and fixed the
existing test-case to expect p0.
AMDGPU has been updated to unpack into the virtual registers for
kernels. I think the other code paths fall back for aggregates, so this
should be NFC.
Mips doesn't support aggregates yet, so it's also NFC.
x86 seems to have code for dealing with aggregates, but I couldn't find
the tests for it, so I just added a fallback to DAGISel if we get more
than one virtual register for an argument.
Differential Revision: https://reviews.llvm.org/D63549
llvm-svn: 364510
Allow CallLowering::ArgInfo to contain more than one virtual register.
This is useful when passes split aggregates into several virtual
registers, but need to also provide information about the original type
to the call lowering. Used in follow-up patches.
Differential Revision: https://reviews.llvm.org/D63548
llvm-svn: 364509
Add an attribute into the MachineFunction that tracks call site info.
([8/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D61061
llvm-svn: 364506
A unique DISubprogram may be attached to a function declaration used for
call site debug info.
([6/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D60713
llvm-svn: 364500
Summary:
(Not so) boringly identical to pattern a (D62786)
Not yet sure how do deal with the last pattern c.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62793
llvm-svn: 364418
This allows later passes (in particular InstCombine) to optimize more
cases.
One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0.
llvm-svn: 364412
We support 'big to little' (e.g. extract_subvector(v16i8 bitcast(v2i64))) but not 'little to big' cases (e.g. extract_subvector(v2i64 bitcast(v16i8)))
llvm-svn: 364405
When we calculate MII, we use two loops, one with iterator R++ to
check whether we can reserve the resource, then --R to move back
the iterator to do reservation.
This is risky, as R++, --R may not point to the same element at all.
The can cause wrong MII.
Differential Revision: https://reviews.llvm.org/D63536
llvm-svn: 364353
Peephole opt has a one use limitation which appears to be accidental. The function being used was incorrectly documented as returning whether the def had one *user*, but instead returned true only when there was one *use*. Add a corresponding hasOneNonDbgUser helper, and adjust peephole-opt to use the appropriate one.
All of the actual folding code handles multiple uses within a single instruction. That codepath is well exercised through instruction selection.
Differential Revision: https://reviews.llvm.org/D63656
llvm-svn: 364336
Change the generic ctpop expansion to more efficiently handle a
check for not-a-power-of-two value:
(ctpop x) != 1 --> (x == 0) || ((x & x-1) != 0)
This is the inverted predicate sibling pattern that was added with:
D63004
This should have been done before I changed IR canonicalization to
favor this form with:
rL364246
...so if this requires revert/changing, the earlier commit may also
need to modified.
llvm-svn: 364319
Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required.
Matches what we already do for ZERO_EXTEND.
Reapplies rL363850 but now with legality checks added at rL364290
llvm-svn: 364303
This should not cause any visible change in output, but it's
more efficient because we were producing non-canonical 'sub x, 1'
and 'setcc ugt x, 0'. As mentioned in the TODO, we should also
be handling the inverse predicate.
llvm-svn: 364302
Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero.
Matches what we already do for SIGN_EXTEND.
Reapplies rL363802 but now with legality checks added at rL364290
llvm-svn: 364299
The *_EXTEND_VECTOR_INREG opcodes were relaxed back around rL346784 to support source vector widths that are smaller than the output - it looks like the legalizers were never updated to account for this.
This patch inserts the smaller source vector into an undef vector of the same width of the result before performing the shuffle+bitcast to correctly handle this.
Part of the yak shaving to solve the crashes from rL364264 and rL364272
llvm-svn: 364295
As part of the fix for rL364264 + rL364272 - limit the *_EXTEND conversion to !TLO.LegalOperations || isOperationLegal cases.
We'll improve X86 legality in future commits.
llvm-svn: 364290
Summary:
This addresses the regression that is being exposed by D50222 in `test/CodeGen/X86/jump_sign.ll`
The missing fold, at least partially, looks trivial:
https://rise4fun.com/Alive/Zsln
i.e. if we are comparing with zero, and comparing the `urem`-by-non-power-of-two,
and the `urem` is of something that may at most have a single bit set (or no bits set at all),
the `urem` is not needed.
Reviewers: RKSimon, craig.topper, xbolva00, spatel
Reviewed By: xbolva00, spatel
Subscribers: xbolva00, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63390
llvm-svn: 364286
This reverts the following patches.
"[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG"
"[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG"
"[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support"
We can end up with an any_extend_vector_inreg with a 256 bit result type
and a 128 bit result type. This is allowed by the ISD opcode, but the
generic operation legalizer is only able to expand cases where the
total vector width is the same.
The X86 backend creates these mismatched cases for zext_vec_inreg/sext_vec_inreg.
The SimplifyDemandedBits changes are allowing those nodes to become
aext_vec_inreg. For the zext/sext cases, the X86 backend has Custom
handling and never lets them get to the generic legalizer. We need to do the same
for aext_vec_inreg.
llvm-svn: 364264
Widen vector result type for ctlz_zero_undef and cttz_zero_undef the same as
ctlz and cttz.
Differential Revision: https://reviews.llvm.org/D63463
llvm-svn: 364221
Avoids using a plain unsigned for registers throughoug codegen.
Doesn't attempt to change every register use, just something a little
more than the set needed to build after changing the return type of
MachineOperand::getReg().
llvm-svn: 364191
This can occur under certain circumstances when undefs are created later on in the constant multipliers (e.g. in this case due to SimplifyDemandedVectorElts). Its better to let the shift by zero to occur and perform any cleanup afterward.
Fixes OSS Fuzz #15429
llvm-svn: 364179
The code divides the alignment by 2 if the original alignment is
equal to the original VT size. But this wouldn't be correct
if the alignment was larger than the VT size.
The memory operand object already takes care of calling MinAlign
on the base alignment and the memory pointer offset. So we don't
need any special code at all.
llvm-svn: 364151
GlobalISel/IRTranslator.cpp now references SelectionDAG/FunctionLoweringInfo.cpp.
This fixes a link error in -DBUILD_SHARED_LIBS=on builds:
ld.lld: error: undefined symbol: llvm::FunctionLoweringInfo::clear()
>>> referenced by IRTranslator.cpp:2198 (../lib/CodeGen/GlobalISel/IRTranslator.cpp:2198)
>>> lib/CodeGen/GlobalISel/CMakeFiles/LLVMGlobalISel.dir/IRTranslator.cpp.o:(llvm::IRTranslator::finalizeFunction())
llvm-svn: 364124
This change makes use of the newly refactored SwitchLoweringUtils code from
SelectionDAG to in order to generate jump tables and range checks where appropriate.
Much of this code is ported from SDAG with some modifications. We generate
G_JUMP_TABLE and G_BRJT instructions when JT opportunities are found. This means
that targets which previously relied on the naive one MBB per case stmt
translation will now start falling back until they add support for the new opcodes.
For range checks, we don't generate any previously unused operations. This
just recognizes contiguous ranges of case values and generates a single block per
range. Single case value blocks are just a special case of ranges so we get that
support almost for free.
There are still some optimizations missing that I haven't ported over, and
bit-tests are also unimplemented. This patch series is already complex enough.
Actual arm64 support for selection of jump tables is coming in a later patch.
Differential Revision: https://reviews.llvm.org/D63169
llvm-svn: 364085
G_INTTOPTR can prevent the localizer from moving G_CONSTANTs, but since it's
essentially a side effect free cast instruction we can remat both instructions.
This patch changes the localizer to enable localization of the chains by
iterating over the entry block instructions in reverse order. That way, uses will
localized first, and then the defs are free to be localized as well.
This also changes the previous SmallPtrSet of localized instructions to use a
SetVector instead. We're dealing with pointers and need deterministic iteration
order.
Overall, this change improves ARM64 -O0 CTMark code size by around 0.7% geomean.
Differential Revision: https://reviews.llvm.org/D63630
llvm-svn: 364001
We tend to only test for scalar/scalar consts when really we could support non-uniform vectors using ISD::matchUnaryPredicate/matchBinaryPredicate etc.
llvm-svn: 363924
Use getAPIntValue() in a few more places. Most of the time getZExtValue() is fine, but occasionally there's fuzzed code or someone decides to create i65536 or something.....
llvm-svn: 363887
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases.
This requires us to tweak matchBinaryPredicate to allow it to (optionally) handle constants with different type widths.
llvm-svn: 363792
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.
Patch by Matthias Braun
llvm-svn: 363757
Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches.
llvm-svn: 363713
Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches.
llvm-svn: 363710
This adds vector splitting for vaarg instructions during type legalization
Committed on behalf of @luke (Luke Lau)
Differential Revision: https://reviews.llvm.org/D60762
llvm-svn: 363671
Summary:
All the GlobalISel passes are initialized when the target calls
initializeGlobalISel(), so we don't need to call the initializers
from the pass constructors.
Reviewers: qcolombet, t.p.northover, paquette, dsanders, aemerson, aditya_nandakumar
Reviewed By: aemerson
Subscribers: rovka, kristof.beyls, hiraditya, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63235
llvm-svn: 363642
This was ignoring the flag on fneg, and using the source instruction's
flags. Also fixes tests missing from r358702.
Note the expansion itself isn't correct without nnan, but that should
be fixed separately.
llvm-svn: 363637
The goal is to improve hwasan's error reporting for stack use-after-return by
recording enough information to allow the specific variable that was accessed
to be identified based on the pointer's tag. Currently we record the PC and
lower bits of SP for each stack frame we create (which will eventually be
enough to derive the base tag used by the stack frame) but that's not enough
to determine the specific tag for each variable, which is the stack frame's
base tag XOR a value (the "tag offset") that is unique for each variable in
a function.
In IR, the tag offset is most naturally represented as part of a location
expression on the llvm.dbg.declare instruction. However, the presence of the
tag offset in the variable's actual location expression is likely to confuse
debuggers which won't know about tag offsets, and moreover the tag offset
is not required for a debugger to determine the location of the variable on
the stack, so at the DWARF level it is represented as an attribute so that
it will be ignored by debuggers that don't know about it.
Differential Revision: https://reviews.llvm.org/D63119
llvm-svn: 363635
Inter-block localization is the same as what currently happens, except now it
only runs on the entry block because that's where the problematic constants with
long live ranges come from.
The second phase is a new intra-block localization phase which attempts to
re-sink the already localized instructions further right before one of the
multiple uses.
One additional change is to also localize G_GLOBAL_VALUE as they're constants
too. However, on some targets like arm64 it takes multiple instructions to
materialize the value, so some additional heuristics with a TTI hook have been
introduced attempt to prevent code size regressions when localizing these.
Overall, these changes improve CTMark code size on arm64 by 1.2%.
Full code size results:
Program baseline new diff
------------------------------------------------------------------------------
test-suite...-typeset/consumer-typeset.test 1249984 1217216 -2.6%
test-suite...:: CTMark/ClamAV/clamscan.test 1264928 1232152 -2.6%
test-suite :: CTMark/SPASS/SPASS.test 1394092 1361316 -2.4%
test-suite...Mark/mafft/pairlocalalign.test 731320 714928 -2.2%
test-suite :: CTMark/lencod/lencod.test 1340592 1324200 -1.2%
test-suite :: CTMark/kimwitu++/kc.test 3853512 3820420 -0.9%
test-suite :: CTMark/Bullet/bullet.test 3406036 3389652 -0.5%
test-suite...ark/tramp3d-v4/tramp3d-v4.test 8017000 8016992 -0.0%
test-suite...TMark/7zip/7zip-benchmark.test 2856588 2856588 0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test 765704 765704 0.0%
Geomean difference -1.2%
Differential Revision: https://reviews.llvm.org/D63303
llvm-svn: 363632
Summary: This case is related to D63405 in that we need to be propagating FMF on negates.
Reviewers: volkan, spatel, arsenm
Reviewed By: arsenm
Subscribers: wdng, javed.absar
Differential Revision: https://reviews.llvm.org/D63458
llvm-svn: 363631
Summary:
Change the way we deal with iterator invalidation in the extload combines as it
was still possible to neglect to visit a use. Even worse, it happened in the
in-tree test cases and the checks weren't good enough to detect it.
We now take a cheap copy of the use list before iterating over it. This
prevents iterator invalidation from occurring and has the nice side effect
of making the existing schedule-for-erase/schedule-for-insert mechanism
moot.
Reviewers: aditya_nandakumar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, javed.absar, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61813
llvm-svn: 363616
A target intrinsic may be defined as possibly reading memory, but the
call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic assumption
of the intrinsic definition, so the chain should still be used.
I fixed the same bug in SelectionDAG in r287593.
llvm-svn: 363580
I keep using the wrong instruction when manually writing tests. This
really needs to check the number of operands, but I don't see an easy
way to do that right now.
llvm-svn: 363579
Summary:
There is PHINode::getBasicBlockIndex() and PHINode::setIncomingValue()
but no function to replace incoming value for a specified BasicBlock*
predecessor.
Clearly, there are a lot of places that could use that functionality.
Reviewer: craig.topper, lebedev.ri, Meinersbur, kbarton, fhahn
Reviewed By: Meinersbur, fhahn
Subscribers: fhahn, hiraditya, zzheng, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63338
llvm-svn: 363566
The HardwareLoops pass finds exit blocks with a scevable exit count.
If the target specifies to update the loop counter in a register,
through a phi, we need to ensure that the exit block is a latch so
that we can insert the phi with the correct value for the incoming
edge.
Differential Revision: https://reviews.llvm.org/D63336
llvm-svn: 363556
Some GEPs were not being split, presumably because that split would just be
undone by the DAGCombiner. Not performing those splits can prevent important
optimizations, such as preventing the element indices / member offsets from
being (partially) folded into load/store instruction immediates. This patch:
- Makes the splits also occur in the cases where the base address and the GEP
are in the same BB.
- Ensures that the DAGCombiner doesn't reassociate them back again.
Differential Revision: https://reviews.llvm.org/D60294
llvm-svn: 363544
This is already done in DAGCombiner::visitINSERT_SUBVECTOR, but this helps a number of shuffles across different vector widths recognise when they come from the same source.
llvm-svn: 363542
This patch changes MIR stack-id from an integer to an enum,
and adds printing/parsing support for this in MIR files. The default
stack-id '0' is now renamed to 'default'.
This should make MIR tests that have stack objects with different stack-ids
more descriptive. It also clarifies code operating on StackID.
Reviewers: arsenm, thegameg, qcolombet
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D60137
llvm-svn: 363533
This is based on the example/discussion in PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428
Proper vector shift instructions don't appear until AVX2, so we may generate several
extra instructions within a loop trying to compensate for that. It's difficult to
recover from that shift expansion later than this, so use the existing TLI hook and
splat analysis to enable better codegen.
This extends CGP functionality introduced with:
rL201655
Differential Revision: https://reviews.llvm.org/D63233
llvm-svn: 363511
This reverts rL363474. -debug-only=isel was added to some tests that
don't specify `REQUIRES: asserts`. This causes failures on
-DLLVM_ENABLE_ASSERTIONS=off builds.
I chose to revert instead of fixing the tests because I'm not sure
whether we should add `REQUIRES: asserts` to more tests.
llvm-svn: 363482
Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is:
* a latch block
* it has two successors, one is loop header, another is exit
* it has more than one predecessors
If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch.
Differential Revision: https://reviews.llvm.org/D43256
llvm-svn: 363471
This is a branch opcode that takes a jump table pointer, jump table index and an
index into the table to do an indirect branch.
We pass both the table pointer and JTI to allow targets like ARM64 to more
easily use the existing jump table compression optimization without having to
walk up the block to find a paired G_JUMP_TABLE.
Differential Revision: https://reviews.llvm.org/D63159
llvm-svn: 363434
Avoid producing illegal register bank copies for reg_sequence and
phi. The default implementation assumes it is possible to pick any
operand's bank and use that for the result, introducing a copy for
operands with a different bank. This does not check for illegal
copies. It is not legal to introduce a VGPR->SGPR copy, so any VGPR
operand requires the result to be a VGPR.
The changes in getInstrMappingImpl aren't strictly necessary, since
AMDGPU now just bypasses this for reg_sequence/phi. This could be
replaced with an assert in case other targets run into this. It is
currently responsible for producing the error for unsatisfiable
copies, but this will be better served with a verifier check.
For phis, for now assume any undetermined operands must be
VGPRs. Eventually, this needs to be able to defer mapping these
operations. This also does not yet have a way to check for whether the
block is in a divergent region.
llvm-svn: 363410
Avoid a check for valid and a set of redundant asserts. The place
InstructionMapping is constructed asserts all of the default fields
are passed anyway for an invalid mapping, so don't overcomplicate
this.
llvm-svn: 363391
This is consistent with GCC's behavior (which is the defacto standard
for pubnames). Though I find the presence of enumerators from enum
classes to be a bit confusing, possibly a bug on GCC's end (since they
can't be named unqualified, unlike the other names - and names nested in
classes don't go in pubnames, for instance - presumably because one must
name the class first & that's enough to limit the scope of the search)
llvm-svn: 363349
Summary:
Before it was using the fully qualified name only for static data members.
Now it does for all variable names to match MSVC.
Reviewers: rnk
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63012
llvm-svn: 363335
Constants, including G_GLOBAL_VALUE, are all emitted into the entry block which
lets us use the vreg def assuming it dominates all other users. However, it can
cause jumpy debug behaviour since the DebugLoc attached to these MIs are from
a user instruction that could be in a different block.
Fixes PR40887.
Differential Revision: https://reviews.llvm.org/D63286
llvm-svn: 363331
This was exposed by PowerPC target enablement.
In ScheduleDAG, if we haven't seen any uses in this scheduling region,
we will create a dependence edge to ExitSU to model the live-out latency.
This is required for vreg defs with no in-region use, and prefetches with
no vreg def.
When we build NodeOrder in Scheduler, we ignore these boundary nodes.
However, when we check Succs in checkValidNodeOrder, we did not skip
them, so we still assume all the nodes have been sorted and in order in
Indices array. So when we call lower_bound() for ExitSU, it will return
Indices.end(), causing memory issues in following Node access.
Differential Revision: https://reviews.llvm.org/D63282
llvm-svn: 363329
Summary:
I found the following case having tail blocks with no successors merging opportunities after block placement.
Before block placement:
bb0:
...
bne a0, 0, bb2:
bb1:
mv a0, 1
ret
bb2:
...
bb3:
mv a0, 1
ret
bb4:
mv a0, -1
ret
The conditional branch bne in bb0 is opposite to beq.
After block placement:
bb0:
...
beq a0, 0, bb1
bb2:
...
bb4:
mv a0, -1
ret
bb1:
mv a0, 1
ret
bb3:
mv a0, 1
ret
After block placement, that appears new tail merging opportunity, bb1 and bb3 can be merged as one block. So the conditional constraint for merging tail blocks with no successors should be removed. In my experiment for RISC-V, it decreases code size.
Author of original patch: Jim Lin
Reviewers: haicheng, aheejin, craig.topper, rnk, RKSimon, Jim, dmgreen
Reviewed By: Jim, dmgreen
Subscribers: xbolva00, dschuff, javed.absar, sbc100, jgravelle-google, aheejin, kito-cheng, dmgreen, PkmX, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D54411
llvm-svn: 363284
Summary:
Relate bug: https://bugs.llvm.org/show_bug.cgi?id=37472
The shrink wrapping pass prematurally restores the stack, at a point where the stack might still be accessed.
Taking an exception can cause the stack to be corrupted.
As a first approach, this patch is overly conservative, assuming that any instruction that may load or store could access
the stack.
Reviewers: dmgreen, qcolombet
Reviewed By: qcolombet
Subscribers: simpal01, efriedma, eli.friedman, javed.absar, llvm-commits, eugenis, chill, carwil, thegameg
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63152
llvm-svn: 363265
This patch makes the LiveDebugValues pass consider fragments when propagating
DBG_VALUE insts between blocks, fixing PR41979. Fragment info for a variable
location is added to the open-ranges key, which allows distinct fragments to be
tracked separately. To handle overlapping fragments things become slightly
funkier. To avoid excessive searching for overlaps in the data-flow part of
LiveDebugValues, this patch:
* Pre-computes pairings of fragments that overlap, for each DILocalVariable
* During data-flow, whenever something happens that causes an open range to
be terminated (via erase), any fragments pre-determined to overlap are
also terminated.
The effect of which is that when encountering a DBG_VALUE fragment that
overlaps others, the overlapped fragments do not get propagated to other
blocks. We still rely on later location-list building to correctly handle
overlapping fragments within blocks.
It's unclear whether a mixture of DBG_VALUEs with and without fragmented
expressions are legitimate. To avoid suprises, this patch interprets a
DBG_VALUE with no fragment as overlapping any DBG_VALUE _with_ a fragment.
Differential Revision: https://reviews.llvm.org/D62904
llvm-svn: 363256
Since the DebugLocEntry::Value is used as part of DwarfDebug and
DebugLocEntry make it as the separate class.
Reviewers: aprantl, dstenb
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D63213
llvm-svn: 363246
We aim to ignore changes in variable locations during the prologue and
epilogue of functions, to avoid using space documenting location changes
that aren't visible. However in D61940 / r362951 this got ripped out as
the previous implementation was unsound.
Instead, use the FrameDestroy flag to identify when we're in the epilogue
of a function, and ignore variable location changes accordingly. This fits
in with existing code that examines the FrameSetup flag.
Some variable locations get shuffled in modified tests as they now cover
greater ranges, which is what would be expected. Some additional
single-location variables are generated too. Two tests are un-xfailed,
they were only xfailed due to r362951 deleting functionality they depended
on.
Apparently some out-of-tree backends don't accurately maintain FrameDestroy
flags -- if you're an out-of-tree maintainer and see changes in variable
locations disappear due to a faulty FrameDestroy flag, it's safe to back
this change out. The impact is just slightly more debug info than necessary.
Differential Revision: https://reviews.llvm.org/D62314
llvm-svn: 363245
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.
This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.
If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.
Differential Revision: https://reviews.llvm.org/D63075
llvm-svn: 363179
This was using its own, outdated list of possible captures. This was
at minimum not catching cmpxchg and addrspacecast captures.
One change is now any volatile access is treated as capturing. The
test coverage for this pass is quite inadequate, but this required
removing volatile in the lifetime capture test.
Also fixes some infrastructure issues to allow running just the IR
pass.
Fixes bug 42238.
llvm-svn: 363169
Summary:
Fix hoisting to basic block which are not legal for hoisting cause
it can be terminated by exception or it is return block.
Reviewers: john.brawn, RKSimon, MatzeB
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63148
llvm-svn: 363164
This opcode generates a pointer to the address of the jump table
specified by the source operand, which is a jump table index.
It will be used in conjunction with an upcoming G_BRJT opcode to support
jump table codegen with GlobalISel.
Differential Revision: https://reviews.llvm.org/D63111
llvm-svn: 363096
Implement necessary target hooks to enable MachinePipeliner for P9 only.
The pass is off by default, can be enabled with -ppc-enable-pipeliner for P9.
Differential Revision: https://reviews.llvm.org/D62164
llvm-svn: 363085
As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075.
llvm-svn: 363048
This patch changes how LLVM handles the accumulator/start value
in the reduction, by never ignoring it regardless of the presence of
fast-math flags on callsites. This change introduces the following
new intrinsics to replace the existing ones:
llvm.experimental.vector.reduce.fadd -> llvm.experimental.vector.reduce.v2.fadd
llvm.experimental.vector.reduce.fmul -> llvm.experimental.vector.reduce.v2.fmul
and adds functionality to auto-upgrade existing LLVM IR and bitcode.
Reviewers: RKSimon, greened, dmgreen, nikic, simoll, aemerson
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D60261
llvm-svn: 363035
An earlier fix of a subtle iterator invalidation bug had uncovered a
nondeterminism that was present in the MultiUsers bag. Problem was that
MultiUsers was being looked up using pointers.
This patch is an NFC change that numbers each multiuser and processes each in
numbered order. This fixes the test failure on netbsd and will likely fix the
green-dragon bot too.
llvm-svn: 363012
If the source is undef, then just don't do anything.
This matches SelectionDAG's behaviour in SelectionDAG.cpp.
Also add a test showing that we do the right thing here.
(irtranslator-memfunc-undef.ll)
Differential Revision: https://reviews.llvm.org/D63095
llvm-svn: 362989
This behavior was added in r130928 for both FastISel and SD, and then
disabled in r131156 for FastISel.
This re-enables it for FastISel with the corresponding fix.
This is triggered only when FastISel can't lower the arguments and falls
back to SelectionDAG for it.
FastISel contains a map of "register fixups" where at the end of the
selection phase it replaces all uses of a register with another
register that FastISel sometimes pre-assigned. Code at the end of
SelectionDAGISel::runOnMachineFunction is doing the replacement at the
very end of the function, while other pieces that come in before that
look through the MachineFunction and assume everything is done. In this
case, the real issue is that the code emitting COPY instructions for the
liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg
assigned to the physreg is used, and if it's not, it will skip the COPY.
If a register wasn't replaced with its assigned fixup yet, the copy will
be skipped and we'll end up with uses of undefined registers.
This fix moves the replacement of registers before the emission of
copies for the live-ins.
The initial motivation for this fix is to enable tail calls for
swiftself functions, which were blocked because we couldn't prove that
the swiftself argument (which is callee-save) comes from a function
argument (live-in), because there was an extra copy (vreg to vreg).
A few tests are affected by this:
* llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21
(callee-save) but never reload it because it's attached to the return.
We now don't even spill it anymore.
* llvm/test/CodeGen/*/swiftself.ll: we tail-call now.
* llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this
test was not really testing the right thing, but it worked because the
same registers were re-used.
* llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes
* llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy
* llvm/test/CodeGen/Mips/*: get rid of spills and copies
* llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack
* llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack
* llvm/test/CodeGen/X86/swifterror.ll: same as AArch64
* llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed
Differential Revision: https://reviews.llvm.org/D62361
llvm-svn: 362963
This commit reapplies r359426 (which was reverted in r360301 due to
performance problems) and rolls in D61940 to address the performance problem.
I've combined the two to avoid creating a span of slow-performance, and to
ease reverting if more problems crop up.
The summary of D61940: This patch removes the "ChangingRegs" facility in
DbgEntityHistoryCalculator, as its overapproximate nature can produce incorrect
variable locations. An unchanging register doesn't mean a variable doesn't
change its location.
The patch kills off everything that calculates the ChangingRegs vector.
Previously ChangingRegs spotted epilogues and marked registers as unchanging if
they weren't modified outside the epilogue, increasing the chance that we can
emit a single-location variable record. Without this feature,
debug-loc-offset.mir and pr19307.mir become temporarily XFAIL. They'll be
re-enabled by D62314, using the FrameDestroy flag to identify epilogues, I've
split this into two steps as FrameDestroy isn't necessarily supported by all
backends.
The logic for terminating variable locations at the end of a basic block now
becomes much more enjoyably simple: we just terminate them all.
Other test changes: inlined-argument.ll becomes XFAIL, but for a longer term.
The current algorithm for detecting that a variable has a single-location
doesn't work in this scenario (inlined function in multiple blocks), only other
bugs were making this test work. fission-ranges.ll gets slightly refreshed too,
as the location of "p" is now correctly determined to be a single location.
Differential Revision: https://reviews.llvm.org/D61940
llvm-svn: 362951
Variable's stack location can stretch longer than it should. If a
variable is placed at the stack in a some nested basic block its range
can be calculated to be up to the next occurrence of the variable's
DBG_VALUE, or up to the end of the function, thus covering a basic
blocks that should not be included in the variable’s location range.
This happens because the DbgEntityHistoryCalculator ends register
locations at the end of a basic block only if the variable’s location
register has been changed throughout the function, which is not the
case for the register used to reference stack objects.
This patch also tries to produce a single value location if the location
list builder managed to merge all the locations into one.
Reviewers: aprantl, dstenb, jmorse
Reviewed By: aprantl, dstenb, jmorse
Subscribers: djtodoro, ivanbaev, asowda
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D61600
llvm-svn: 362923
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c
static void store64(u64 x, unsigned char* y)
{
for(int i = 0; i != 8; ++i)
y[i] = (x >> ((7-i) * 8)) & 255;
}
static u64 load64(const unsigned char* y)
{
u64 res = 0;
for(int i = 0; i != 8; ++i)
res |= (u64)(y[i]) << ((7-i) * 8);
return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.
Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
>
*((i32)p) = val;
i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;
>
*((i32)p) = BSWAP(val);
Differential Revision: https://reviews.llvm.org/D62897
llvm-svn: 362921
This is the second part of the commit fixing PR38917 (hoisting
partitially redundant machine instruction). Most of PRE (partitial
redundancy elimination) and CSE work is done on LLVM IR, but some of
redundancy arises during DAG legalization. Machine CSE is not enough
to deal with it. This simple PRE implementation works a little bit
intricately: it passes before CSE, looking for partitial redundancy
and transforming it to fully redundancy, anticipating that the next
CSE step will eliminate this created redundancy. If CSE doesn't
eliminate this, than created instruction will remain dead and eliminated
later by Remove Dead Machine Instructions pass.
The third part of the commit is supposed to refactor MachineCSE,
to make it more clear and to merge MachinePRE with MachineCSE,
so one need no rely on further Remove Dead pass to clear instrs
not eliminated by CSE.
First step: https://reviews.llvm.org/D54839
Fixes llvm.org/PR38917
This is fixed recommit of r361356 after PowerPC64 multistage build failure.
llvm-svn: 362901
This patch aims to reduce spilling and register moves by using the 3-address
versions of instructions per default instead of the 2-address equivalent
ones. It seems that both spilling and register moves are improved noticeably
generally.
Regalloc hints are passed to increase conversions to 2-address instructions
which are done in SystemZShortenInst.cpp (after regalloc).
Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are
the same), foldMemoryOperandImpl() can no longer trivially fold a spilled
source register since the reg/reg instruction is now 3-address. In order to
remedy this, new 3-address pseudo memory instructions are used to perform the
folding only when the dst and lhs virtual registers are known to be allocated
to the same physreg. In order to not let MachineCopyPropagation run and
change registers on these transformed instructions (making it 3-address), a
new target pass called SystemZPostRewrite.cpp is run just after
VirtRegRewriter, that immediately lowers the pseudo to a target instruction.
If it would have been possibe to insert a COPY instruction and change a
register operand (convert to 2-address) in foldMemoryOperandImpl() while
trusting that the caller (e.g. InlineSpiller) would update/repair the
involved LiveIntervals, the solution involving pseudo instructions would not
have been needed. This is perhaps a potential improvement (see Phabricator
post).
Common code changes:
* A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a
target pass immediately before MachineCopyPropagation.
* VirtRegMap is passed as an argument to foldMemoryOperand().
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D60888
llvm-svn: 362868
In order for GlobalISel to re-use the significant amount of analysis and
optimization code in SDAG's switch lowering, we first have to extract it and
create an interface to be used by both frameworks.
No test changes as it's NFC.
Differential Revision: https://reviews.llvm.org/D62745
llvm-svn: 362857
When we call checkResourceLimit in bumpCycle or bumpNode, and we
know the resource count has just reached the limit (the equations
are equal). We should return true to mark that we are resource
limited for next schedule, or else we might continue to schedule
in favor of latency for 1 more schedule and create a schedule that
actually overbook the resource.
When we call checkResourceLimit to estimate the resource limite before
scheduling, we don't need to return true even if the equations are
equal, as it shouldn't limit the schedule for it .
Differential Revision: https://reviews.llvm.org/D62345
llvm-svn: 362805
This could fail, which looked concerning. However nothing was actually
using the results of this. I assume this was intended to use the
anti-feature of analyzeBranch of removing instructions, but wasn't
actually calling it with AllowModify = true.
Fixes bug 42162.
llvm-svn: 362800
Patch which introduces a target-independent framework for generating
hardware loops at the IR level. Most of the code has been taken from
PowerPC CTRLoops and PowerPC has been ported over to use this generic
pass. The target dependent parts have been moved into
TargetTransformInfo, via isHardwareLoopProfitable, with
HardwareLoopInfo introduced to transfer information from the backend.
Three generic intrinsics have been introduced:
- void @llvm.set_loop_iterations
Takes as a single operand, the number of iterations to be executed.
- i1 @llvm.loop_decrement(anyint)
Takes the maximum number of elements processed in an iteration of
the loop body and subtracts this from the total count. Returns
false when the loop should exit.
- anyint @llvm.loop_decrement_reg(anyint, anyint)
Takes the number of elements remaining to be processed as well as
the maximum numbe of elements processed in an iteration of the loop
body. Returns the updated number of elements remaining.
llvm-svn: 362774
Incorrect Debug Variable Range was calculated while "COMPUTING LIVE DEBUG VARIABLES" stage.
Range for Debug Variable("i") computed according to current state of instructions
inside of basic block. But Register Allocator creates new instructions which were not taken
into account when Live Debug Variables computed. In the result DBG_VALUE instruction for
the "i" variable was put after these newly inserted instructions. This is incorrect.
Debug Value for the loop counter should be inserted before any loop instruction.
Differential Revision: https://reviews.llvm.org/D62650
llvm-svn: 362750
Summary:
(1) Function descriptor on AIX
On AIX, a called routine may have 2 distinct symbols associated with it:
* A function descriptor (Name)
* A function entry point (.Name)
The descriptor structure on AIX is the same as those in the ELF V1 ABI:
* The address of the entry point of the function.
* The TOC base address for the function.
* The environment pointer.
The descriptor symbol uses the same name as the source level function in C.
The function entry point is analogous to the symbol we would generate for a
function in a non-descriptor-based ABI, except that it is renamed by
prepending a ".".
Which symbol gets referenced depends on the context:
* Taking the address of the function references the descriptor symbol.
* Calling the function references the entry point symbol.
(2) Speaking of implementation on AIX, for direct function call target, we
create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to
replace original TargetGlobalAddress SDNode. Then down the path, we can
take advantage of this MCSymbol.
Patch by: Xiangling_L
Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara
Differential Revision: https://reviews.llvm.org/D62532
llvm-svn: 362735
This patch is the first step towards ensuring MergeConsecutiveStores correctly handles non-temporal loads\stores:
1 - When merging load\stores we must ensure that they all have the same non-temporal flag. This is unlikely to occur, but can in strange cases where we're storing at the end of one page and the beginning of another.
2 - The merged load\store node must retain the non-temporal flag.
Differential Revision: https://reviews.llvm.org/D62910
llvm-svn: 362723
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).
This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.
To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:
- A new MCID flag "mayRaiseFPException" that the target should set on any
instruction that possibly can raise FP exception according to the
architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
instruction resulting from expansion of any constrained FP intrinsic.
Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).
Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.
The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.
This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.
Reviewed By: andrew.w.kaylor
Differential Revision: https://reviews.llvm.org/D55506
llvm-svn: 362663
Most parts of LLVM don't care whether the byval type is derived from an
explicit Attribute or from the parameter's pointee type, so it makes
sense for the main access function to just return the right value.
The very few users who do care (only BitcodeReader so far) can find out
how it's specified by accessing the Attribute directly.
llvm-svn: 362642
Instead of passing around fast-math-flags as a parameter, we can set those
using an IRBuilder guard object. This is no-functional-change-intended.
The motivation is to eventually fix the vectorizers to use and set the
correct fast-math-flags for reductions. Examples of that not behaving as
expected are:
https://bugs.llvm.org/show_bug.cgi?id=23116 (should be able to reduce with less than 'fast')
https://bugs.llvm.org/show_bug.cgi?id=35538 (possible miscompile for -0.0)
D61802 (should be able to reduce with IR-level FMF)
Differential Revision: https://reviews.llvm.org/D62272
llvm-svn: 362612
Summary:
An argument that is return by a function but bit-casted before can still
be annotated as "returned". Make sure we do not crash for this case.
Reviewers: sunfish, stephenwlin, niravd, arsenm
Subscribers: wdng, hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59917
llvm-svn: 362546
This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial.
Fixes PR42118
Differential Revision: https://reviews.llvm.org/D62828
llvm-svn: 362533
The proposal in D62498 showed that x86 would benefit from vector
store splitting, but that may conflict with the generic DAG
combiner's store merging transforms.
Add memory type to the existing TLI hook that enables the merging
transforms, so we can limit those changes to scalars only for x86.
llvm-svn: 362507
Summary:
This *might* be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet.
As far as i can tell, there are no regressions here (ignoring x86-32),
all changes are either good or neutral.
This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`)
`@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/vMd3
Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma
Reviewed By: RKSimon
Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62774
llvm-svn: 362488
As I mentioned on D61887 we don't get many hits on ComputeNumSignBits as we did on computeKnownBits.
The case we do get is interesting though - it allows us to use the 'ConditionalNegate' combine in combineLogicBlendIntoPBLENDV to remove a select.
It comes too late for SSE41 (BLENDV) cases, but SSE2 tests can hit it now. We should probably try to make use of this for SSE41+ targets as well - avoiding variable blends is usually a good idea. I'll investigate as a followup.
Differential Revision: https://reviews.llvm.org/D62777
llvm-svn: 362486
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c
static void store64(u64 x, unsigned char* y)
{
for(int i = 0; i != 8; ++i)
y[i] = (x >> ((7-i) * 8)) & 255;
}
static u64 load64(const unsigned char* y)
{
u64 res = 0;
for(int i = 0; i != 8; ++i)
res |= (u64)(y[i]) << ((7-i) * 8);
return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.
Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
>
*((i32)p) = val;
i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;
>
*((i32)p) = BSWAP(val);
Differential Revision: https://reviews.llvm.org/D61843
llvm-svn: 362472
Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG.
Reviewers: qcolombet, spatel
Reviewed By: qcolombet
Subscribers: nemanjai, jsji
Differential Revision: https://reviews.llvm.org/D62552
llvm-svn: 362439
For some reason multiple places need to do this, and the variant the
loop unroller and inliner use was not handling it.
Also, introduce a new wrapper to be slightly more precise, since on
AMDGPU some addrspacecasts are free, but not no-ops.
llvm-svn: 362436
We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction
Differential Revision: https://reviews.llvm.org/D62807
llvm-svn: 362397
When LiveDebugValues deduces new variable's location from spill, restore or
register copy instruction it should close old variable's location. Otherwise
we can have multiple block output locations for same variable. That could lead
to inserting two DBG_VALUEs for same variable to the beginning of the successor
block which results to ignoring of first DBG_VALUE.
Reviewers: aprantl, jmorse, wolfgangp, dstenb
Reviewed By: aprantl
Subscribers: probinson, asowda, ivanbaev, petarj, djtodoro
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D62196
llvm-svn: 362373
If we hit the limit, we do expand the outstanding tokenfactors.
Otherwise, we might drop nodes with users in the unexpanded
tokenfactors. This fixes the crashes reported by Jordan Rupprecht.
Reviewers: niravd, spatel, craig.topper, rupprecht
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D62633
llvm-svn: 362350
Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize.
Differential Revision: https://reviews.llvm.org/D59188
llvm-svn: 362324
Add (opt-in) support for implicit truncation to isConstOrConstSplat, which allows us to match truncated 'all ones' cases in isBitwiseNot.
PR41020 compares against using ISD::isBuildVectorAllOnes() instead, but that predicate silently accepts any UNDEF elements in the build vector which might not be what we want in isBitwiseNot - so I've added an opt-in 'AllowUndefs' flag that is set to false by default but will allow us to enable it on individual cases where its safe.
Differential Revision: https://reviews.llvm.org/D62783
llvm-svn: 362323
The results of the dyn_casts were immediately dereferenced on the next line
so they had better not be null.
I don't think there's any way for these dyn_casts to fail, so use a cast
of adding null check.
llvm-svn: 362315
Over a year ago, MachineInstr gained a fourth boolean parameter that occurs
before the TII pointer. When this happened, several places started accidentally
passing TII into this boolean parameter instead of the TII parameter.
llvm-svn: 362312
We were hashing the string pointer, not the string, so two instructions
could be identical (isIdenticalTo), but have different hash codes.
This showed up as a very rare, non-deterministic assertion failure
rehashing a DenseMap constructed by MachineOutliner. So there's no
"real" testcase, just a unittest which checks that the hash function
behaves correctly.
I'm a little scared fixing this is going to cause a regression in
outlining or MachineCSE, but hopefully we won't run into any issues.
Differential Revision: https://reviews.llvm.org/D61975
llvm-svn: 362281
Just copy all of the operands except the chain and call MorphNode on that.
This removes the IsUnary and IsTernary flags.
Also always get the result type from the result type of the original
nodes. Previously we got it from the operand except for two nodes
where that didn't work.
llvm-svn: 362269