Check for NoNaNsFPMath function attribute in isKnownNeverSNaN.
Function attributes are in held in 'TargetMachine.Options'.
Among other things, this allows selection of some patterns imported
in D87351 since G_FCANONICALIZE is not generated when isKnownNeverSNaN
returns true in lowerFMinNumMaxNum.
However we notice some incorrect results since function attributes are
not correctly written in TargetMachine.Options when next function is
processed. Take a look at @v_test_no_global_nnans_med3_f32_pat0_srcmod0,
it has "no-nans-fp-math"="false" but TargetMachine.Options still has it
set to true since first function in test file had this attribute set to
true. This will be fixed in D87511.
Differential Revision: https://reviews.llvm.org/D87456
Add a DBG_INSTR_REF instruction and a "debug instruction number" field to
MachineInstr. The two allow variable values to be specified by
identifying where the value is computed, rather than the register it lies
in, like so:
%0 = fooinst, debug-instr-number 1
[...]
DBG_INSTR_REF 1, 0
See the original RFC for motivation:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html
This patch is NFCI; it only adds fields and other boiler plate.
Differential Revision: https://reviews.llvm.org/D85741
In an earlier patch I meant to add the correct flags to the ADD
node when incrementing the pointer, but forgot to pass them to
SelectionDAG::getNode.
Differential Revision: https://reviews.llvm.org/D87496
This patch fixes a problem of the commit 52cc97a0.
A test case is created to demonstrate the crash caused by
the instruction iterator invalidated by the recursive
removal of dead operands of assume. The solution restarts
from the blocks's first instruction in case CurInstIterator
is invalidated by RecursivelyDeleteTriviallyDeadInstructions().
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D87434
Previously, we formed ISD::PARITY by looking for (and (ctpop X), 1)
but the AND might be separated from the ctpop. For example if the
parity result is multiplied by 2, we'll pull the AND through the
shift.
So to handle more cases, move to SimplifyDemandedBits where we
can handle more cases that result in only the LSB of the CTPOP
being used.
DAG combiner folds (fma a 1.0 b) into (fadd a b) but the flag isn't
propagated into new fadd. This patch fixes that.
Some code in visitFMA is redundant and such support for vector constants
is missing. Need follow-up patch to clean.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D87037
The recently added optimizePhiType algorithm had no checks to make sure
it didn't continually iterate backward and forth between float and int
types. This means that given an input like store(phi(bitcast(load))), we
could convert that back and forth to store(bitcast(phi(load))). This
particular case would usually have been simplified to a different load
type (folding the bitcast into the load) before CGP, but other cases can
occur. The one that came up was phi(bitcast(phi)), where the two phi's
of different types were bitcast between. That was not helped by a dead
bitcast being kept around which could make conversion look profitable.
This adds an extra check of the bitcast Uses or Defs, to make sure that
at least one is grounded and will not end up being converted back. It
also makes sure that dead bitcasts are removed, and there is a minor
change to include newly created Phi nodes in the Visited set so that
they do not need to be revisited.
Differential Revision: https://reviews.llvm.org/D82676
CTTZ, CTLZ, CTPOP, and FCANONICALIZE all have the same input and
output types so the operand should have already been legalized when the
result type was legalized.
Clang emits (and (ctpop X), 1) for __builtin_parity. If ctpop
isn't natively supported by the target, this leads to poor codegen
due to the expansion of ctpop being more complex than what is needed
for parity.
This adds a DAG combine to convert the pattern to ISD::PARITY
before operation legalization. Type legalization is updated
to handled Expanding and Promoting this operation. If after type
legalization, CTPOP is supported for this type, LegalizeDAG will
turn it back into CTPOP+AND. Otherwise LegalizeDAG will emit a
series of shifts and xors followed by an AND with 1.
I've avoided vectors in this patch to avoid more legalization
complexity for this patch.
X86 previously had a custom DAG combiner for this. This is now
moved to Custom lowering for the new opcode. There is a minor
regression in vector-reduce-xor-bool.ll, but a follow up patch
can easily fix that.
Fixes PR47433
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87209
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html
This is hopefully the final remaining showstopper before we can remove
the 'experimental' from the reduction intrinsics.
No behavior was specified for the FP min/max reductions, so we have a
mess of different interpretations.
There are a few potential options for the semantics of these max/min ops.
I think this is the simplest based on current behavior/implementation:
make the reductions inherit from the existing llvm.maxnum/minnum intrinsics.
These correspond to libm fmax/fmin, and those are similar to the (now
deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing
data). So the default expansion creates calls to libm functions.
Another option would be to inherit from llvm.maximum/minimum (NaNs propagate),
but most targets just crash in codegen when given those nodes because no
default expansion was ever implemented AFAICT.
We could also just assume 'nnan' semantics by default (we are already
assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets
(AArch64, PowerPC) support the more defined behavior, so it doesn't make much
sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to
loosen the semantics.
(Note that D67507 was proposed to update the LangRef to acknowledge the more
recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do
update based on the new standard, the reduction instructions can seamlessly
inherit from whatever updates are made to the max/min intrinsics.)
x86 sees a regression here on 'nnan' tests because we have underlying,
longstanding bugs in FMF creation/propagation. Those need to be fixed apart
from this change (for example: https://llvm.org/PR35538). The expansion
sequence before this patch may not have been correct.
Differential Revision: https://reviews.llvm.org/D87391
Following up on D67687.
Please refer to the RFC here http://lists.llvm.org/pipermail/llvm-dev/2020-July/143309.html
`CodeGenPassBuilder` is the NPM counterpart of `TargetPassConfig` with below differences.
- Debugging features (MIR print/verify, disable pass, start/stop-before/after, etc.) living in `TargetPassConfig` are moved to use PassInstrument as much as possible. (Implementation also lives in `TargetPassConfig.cpp`)
- `TargetPassConfig` is a polymorphic base (virtual inheritance) to build the target-dependent pipeline whereas `CodeGenPassBuilder` is the CRTP base/helper to implement the target-dependent pipeline. The motivation is flexibility for targets to customize the pipeline, inlining opportunity, and fits the overall NPM value semantics design.
- `TargetPassConfig` is a legacy immutable pass to declare hooks for targets to customize some target-independent codegen layer behavior. This is partially ported to TargetMachine::options. The rest, such as `createMachineScheduler/createPostMachineScheduler`, are left out for now. They should be implemented in LLVMTargetMachine in the future.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D83608
This was landed but reverted in 5b9c2b1bea due to asan picking up a memory
leak. This is fixed in the change to InstrRefBasedImpl.cpp. Original
commit message follows:
[LiveDebugValues][NFC] Add instr-ref tests, adapt old tests
This patch adds a few tests in DebugInfo/MIR/InstrRef/ of interesting
behaviour that the instruction referencing implementation of
LiveDebugValues has. Mostly, these tests exist to ensure that if you
give the "-experimental-debug-variable-locations" command line switch,
the right implementation runs; and to ensure it behaves the same way as
the VarLoc LiveDebugValues implementation.
I've also touched roughly 30 other tests, purely to make the tests less
rigid about what output to accept. DBG_VALUE instructions are usually
printed with a trailing !debug-location indicating its scope:
!debug-location !1234
However InstrRefBasedLDV produces new DebugLoc instances on the fly,
meaning there sometimes isn't a numbered node when they're printed,
making the output:
!debug-location !DILocation(line: 0, blah blah)
Which causes a ton of these tests to fail. This patch removes checks for
that final part of each DBG_VALUE instruction. None of them appear to
be actually checking the scope is correct, just that it's present, so
I don't believe there's any loss in coverage here.
Differential Revision: https://reviews.llvm.org/D83054
This is to fix CodeView build failure https://bugs.llvm.org/show_bug.cgi?id=47287
after DIsSubrange upgrade D80197
Assert condition is now removed and Count is calculated in case LowerBound
is absent or zero and Count or UpperBound is constant. If Count is unknown
it is later handled as VLA (currently Count is set to zero).
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D87406
Halide users reported this here: https://llvm.org/pr46176
I reported the issue to MSVC here:
https://developercommunity.visualstudio.com/content/problem/1179643/msvc-copies-overaligned-non-trivially-copyable-par.html
This codepath is apparently not covered by LLVM's unit tests, so I added
coverage in a unit test.
If we want to support this configuration going forward, it means that is
in general not safe to pass a SmallVector<T, N> by value if alignof(T)
is greater than 4. This doesn't appear to come up often because passing
a SmallVector by value is inefficient and not idiomatic: it copies the
inline storage. In this case, the SmallVector<LLT,4> is captured by
value by a lambda, and the lambda is passed by value into std::function,
and that's how we hit the bug.
Differential Revision: https://reviews.llvm.org/D87475
The PointerReg arg was passed into the dependence function for an
assertion which no longer exists. So, this patch updates the dependence
functions to avoid the PointerReg in the signature.
Tests-Run: make check
This is the first in a series of patches to make implicit null checks
more general. This patch identifies instructions that preserves zero
value of a register and considers that as a valid instruction to hoist
along with the faulting load. See added testcases.
Reviewed-By: reames, dantrushin
Differential Revision: https://reviews.llvm.org/D87108
Truncating from an illegal SVE type to a legal type, e.g.
`trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>`
fails after PromoteIntOp_CONCAT_VECTORS attempts to
create a BUILD_VECTOR.
This patch changes the promote function to create a sequence of
INSERT_SUBVECTORs if the return type is scalable, and replaces
these with UNPK+UZP1 for AArch64.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D86548
fminnum(X, NaN) is X, fminimum(X, NaN) is NaN. This mirrors the
behavior of existing InstSimplify folds.
This is expected to improve the reduction lowerings in D87391,
which use NaN as a neutral element.
Differential Revision: https://reviews.llvm.org/D87415
We weren't using this before, so none of the MachineFunction CFG edges had the
branch probability information added. As a result, block placement later in the
pipeline was flying blind.
This is enabled only with optimizations enabled like SelectionDAG.
Differential Revision: https://reviews.llvm.org/D86824
This is a port of the functionality from SelectionDAG, which tries to find
a tree of conditions from compares that are then combined using OR or AND,
before using that result as the input to a branch. Instead of naively
lowering the code as is, this change converts that into a sequence of
conditional branches on the sub-expressions of the tree.
Like SelectionDAG, we re-use the case block codegen functionality from
the switch lowering utils, which causes us to generate some different code.
The result of which I've tried to mitigate in earlier combine patches.
Differential Revision: https://reviews.llvm.org/D86665
This combine previously tried to take sequences like:
%cond = G_ICMP pred, a, b
G_BRCOND %cond, %truebb
G_BR %falsebb
%truebb:
...
%falsebb:
...
and by inverting the compare predicate and swapping branch targets, delete the
G_BR and instead have a single conditional branch to the falsebb. Since in an
earlier patch we have a combine to fold not(icmp) into just an inverted icmp,
we don't need this combine to do as much. This patch instead generalizes the
combine by just looking for:
G_BRCOND %cond, %truebb
G_BR %falsebb
%truebb:
...
%falsebb:
...
and then inverting the condition using a not (xor). The xor can be folded away
in a separate combine. This change also lets us avoid some optimization code
in the IRTranslator.
I also think that deleting G_BRs in the combiner is unnecessary. That's
something that targets can decide to do at selection time and could simplify
generic code in future.
Differential Revision: https://reviews.llvm.org/D86664
During the main DAGCombine loop, whenever a node gets replaced, the new
node and all its users are pushed onto the worklist. Omit this if the
new node is the EntryToken (e.g. if a store managed to get optimized
out), because re-visiting the EntryToken and its users will not uncover
any additional opportunities, but there may be a large number of such
users, potentially causing compile time explosion.
This compile time explosion showed up in particular when building the
SingleSource/UnitTests/matrix-types-spec.cpp test-suite case on any
platform without SIMD vector support.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86963
Since we always generate CopyToRegs for statepoint results,
we must update DAG root after emitting statepoint, so that
these copies are scheduled before any possible local uses.
Note: getControlRoot() flushes all PendingExports, not only
those we generates for relocates. If that'll become a problem,
we can change it to flushing relocate exports only.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D87251
Current code in InstEmitter assumes all GC pointers are either
VRegs or stack slots - hence, taking only one operand.
But it is possible to have constant base, in which case it
occupies two machine operands.
Add a convinience function to StackMaps to get index of next
meta argument and use it in InsrEmitter to properly advance to
the next statepoint meta operand.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D87252
This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130.
This required adding a SDNodeFlags to SelectionDAG::getSetCC.
Now we manage to contant fold some stuff undefs during the
initial getNode that we don't do in later DAG combines.
Differential Revision: https://reviews.llvm.org/D87200
We only need to include MachineInstrBundle.h, but exposes an implicit dependency in MachineOutliner.h.
Also, remove duplicate includes from LiveRegUnits.cpp + MachineOutliner.cpp.
On SystemZ, a ZERO_EXTEND of an i1 vector handled by WidenVecRes_Convert()
always ended up being scalarized, because the type action of the input is
promotion which was previously an unhandled case in this method.
This fixes https://bugs.llvm.org/show_bug.cgi?id=47132.
Differential Revision: https://reviews.llvm.org/D86268
Patch by Eli Friedman.
Review: Ulrich Weigand
Rather than using SELECT instructions, use SRA, UADDO/ADDCARRY and
XORs to expand ABS. This is the multi-part version of the sequence
we use in LegalizeDAG.
It's also the same as the Custom sequence uses for i64 on 32-bit
and i128 on 64-bit. So we can remove the X86 customization.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D87215
This is a follow-up suggested in D86420 - if we have a pair of stores
in inverted order for the target endian, we can rotate the source
bits into place.
The "be_i64_to_i16_order" test shows a limitation of the current
function (which might be avoided if we integrate this function with
the other cases in mergeConsecutiveStores). In the earlier
"be_i64_to_i16" test, we skip the first 2 stores because we do not
match the full set as consecutive or rotate-able, but then we reach
the last 2 stores and see that they are an inverted pair of 16-bit
stores. The "be_i64_to_i16_order" test alters the program order of
the stores, so we miss matching the sub-pattern.
Differential Revision: https://reviews.llvm.org/D87112
a warning about clobbering reserved registers (NFC).
Also address some minor inefficiencies and style issues.
Differential Revision: https://reviews.llvm.org/D86088
In getMemcpyLoadsAndStores(), a memcpy where the source is a zero constant is expanded to a MemOp::Set instead of a MemOp::Copy, even when the memcpy is volatile.
This is incorrect.
The fix is to add a check for volatile, and expand to MemOp::Copy in the volatile case.
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D87134
The post-index matcher, before it queries the target legality, walks uses
of some instructions which in pathological cases can be massive. Since
no targets actually support indexed loads yet, disable this to stop wasting
compile time on something which is going to fail anyway.
Previously SDNodeFlags::instersectWith(Flags) would do nothing if Flags was
in an undefined state, which is very bad given that this is the default when
getNode() is called without passing an explicit SDNodeFlags argument.
This meant that if an already existing and reused node had a flag which the
second caller to getNode() did not set, that flag would remain uncleared.
This was exposed by https://bugs.llvm.org/show_bug.cgi?id=47092, where an NSW
flag was incorrectly set on an add instruction (which did in fact overflow in
one of the two original contexts), so when SystemZElimCompare removed the
compare with 0 trusting that flag, wrong-code resulted.
There is more that needs to be done in this area as discussed here:
Differential Revision: https://reviews.llvm.org/D86871
Review: Ulrich Weigand, Sanjay Patel
Reduce to forward declaration, add the Register.h include that we still needed, move CCState::ensureMaxAlignment into CallingConvLower.cpp as it was the only function that needed the full definition of MachineFunction.
Fix a few implicit dependencies further down.
I have fixed up some more ElementCount/TypeSize related warnings in
the following tests:
CodeGen/AArch64/sve-split-extract-elt.ll
CodeGen/AArch64/sve-split-insert-elt.ll
In SelectionDAG::CreateStackTemporary we were relying upon the implicit
cast from TypeSize -> uint64_t when calling MachineFrameInfo::CreateStackObject.
I've fixed this by passing in the known minimum size instead, which I
believe is fine because the associated stack id indicates whether this
is a scalable object or not.
I've also fixed up a case in TargetLowering::SimplifyDemandedBits when
extracting a vector element from a scalable vector. The result is a scalar,
hence it wasn't caught at the start of the function. If the vector is
scalable we just bail out for now.
Differential Revision: https://reviews.llvm.org/D86431
- When an operand is changed into an immediate value or like, ensure their
target flags being cleared or set properly.
Differential Revision: https://reviews.llvm.org/D87109
This hashing scheme has been useful out of tree, and I want to start
experimenting with it. Specifically I want to experiment on the
MIRVRegNamer, MIRCanononicalizer, and eventually the MachineOutliner.
This diff is a first step, that optionally brings stable hashing to the
MIRVRegNamer (and as a result, the MIRCanonicalizer). We've tested this
hashing scheme on a lot of MachineOperand types that llvm::hash_value
can not handle in a stable manner.
This stable hashing was also the basis for
"Global Machine Outliner for ThinLTO" in EuroLLVM 2020
http://llvm.org/devmtg/2020-04/talks.html#TechTalk_58
Credits: Kyungwoo Lee, Nikolai Tillmann
Differential Revision: https://reviews.llvm.org/D86952
Use forward declarations and move the include down to dependent files that actually use it.
This also exposes a number of implicit dependencies on KnownBits.h
This helps SelectionDAGBuilder recognize the splat can be used as a uniform base.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D86371
After computing dependence, we check if it is safe to hoist by
identifying if it clobbers any liveIns in the sibling block (NullSucc).
This check is moved to its own function which will be used in the
soon-to-be modified dependence checking algorithm for implicit null
checks pass.
Tests-Run: lit tests on X86/implicit-*
Separated out some checks in isSuitableMemoryOp and added comments
explaining why some of those checks are done.
Tests-Run:X86 implicit null checks tests.
When lowering fixed length vector operations for SVE the subvector
operations are used extensively to marshall data between scalable
and fixed-length vectors. This means that sequences like:
extract_subvec(binop(insert_subvec(a), insert_subvec(b)))
are very common. DAGCombine only checks if the resulting binop is
legal or can be custom lowered when undoing such sequences. When
it's custom lowering that is introducing them the result is an
infinite legalise->combine->legalise loop.
This patch extends the isOperationLegalOr... functions to include
a "LegalOnly" parameter to restrict the check to legal operations
only. Although isOperationLegal could be used it's common for
the affected code paths to be visited pre and post legalisation,
so the extra parameter keeps the code tidy.
Differential Revision: https://reviews.llvm.org/D86450
Unwinders may only preserve the lower 64bits of Neon and SVE registers,
as only the registers in the base ABI are guaranteed to be preserved
over the exception edge. The caller will need to preserve additional
registers for when the call throws an exception and the unwinder has
tried to recover state.
For e.g.
svint32_t bar(svint32_t);
svint32_t foo(svint32_t x, bool *err) {
try { bar(x); } catch (...) { *err = true; }
return x;
}
`z0` needs to be spilled before the call to `bar(x)` and reloaded before
returning from foo, as the exception handler may have clobbered z0.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84737
As stated in section 6.1.1.2, DWARFv5, p. 142,
| The last entry for each name is followed by a zero byte that
| terminates the list. There may be gaps between the lists.
The patch changes emitting a 4-byte zero value to a 1-byte one, which
effectively removes the gap between entry lists, and thus saves
approximately 3 bytes per name; the calculation is not exact because
the total size of the table is aligned to 4.
Differential Revision: https://reviews.llvm.org/D86927
The member is not in use; the unit length for the table is emitted as
a difference between two labels. Moreover, the type of the member might
be misleading, because for DWARF64 the field should be 64 bit long.
Differential Revision: https://reviews.llvm.org/D86912
Previously if the source match we asserted that the destination
matched. But GPR <-> mask register copies on X86 can violate this
since we use the same K-registers for multiple sizes.
Fixes this ISPC issue https://github.com/ispc/ispc/issues/1851
Differential Revision: https://reviews.llvm.org/D86507
This is needed for an upcoming change to how we translate conditional branches
which might generate these.
Differential Revision: https://reviews.llvm.org/D86383
I have fixed up a number of warnings resulting from TypeSize -> uint64_t
casts and calling getVectorNumElements() on scalable vector types. I
think most of the changes are fairly trivial except for those in
DAGTypeLegalizer::SplitVecRes_MLOAD I've tried to ensure we create
the MachineMemoryOperands in a sensible way for scalable vectors.
I have added a CHECK line to the following test:
CodeGen/AArch64/sve-split-load.ll
that ensures no new warnings are added.
Differential Revision: https://reviews.llvm.org/D86697
fabs and fneg share a common transformation:
(fneg (bitconvert x)) -> (bitconvert (xor x sign))
(fabs (bitconvert x)) -> (bitconvert (and x ~sign))
This patch separate the code into a single method.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D86862
I tried to fix this in:
rG716e35a0cf53
...but that patch depends on the order that we encounter the
magic "x/sqrt(x)" expression in the combiner's worklist.
This patch should improve that by waiting until we walk the
user list to decide if there's a use to skip.
The AArch64 test reveals another (existing) ordering problem
though - we may try to create an estimate for plain sqrt(x)
before we see that it is part of a 1/sqrt(x) expression.
In general, we probably want to try the multi-use reciprocal
transform before sqrt transforms, but x/sqrt(x) is a special-case
because that will always reduce to plain sqrt(x) or an estimate.
The AArch64 tests show that the transform is limited by TLI
hook to patterns where there are 3 or more uses of the divisor.
So this change can result in an extra division compared to
what we had, but that's the intended behvior based on the
current setting of that hook.
Current `v:t = zext(setcc x,y,cc)` will be transformed to `select x, y, 1:t, 0:t, cc`. It misses some opportunities if x's type size is less than `t`'s size. This patch enhances the above transformation.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D86687
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.
So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.
This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.
Differential Revision: https://reviews.llvm.org/D86744
We introduce a codegen optimization pass which splits functions into hot and cold
parts. This pass leverages the basic block sections feature recently
introduced in LLVM from the Propeller project. The pass targets
functions with profile coverage, identifies cold blocks and moves them
to a separate section. The linker groups all cold blocks across
functions together, decreasing fragmentation and improving icache and
itlb utilization.
We evaluated the Machine Function Splitter pass on clang bootstrap and
SPECInt 2017.
For clang bootstrap we observe a mean 2.33% runtime improvement with a
~32% reduction in itlb and stlb misses. Additionally, L1 icache misses
reduced by 9.5% while L2 instruction misses reduced by 20%.
For SPECInt we report the change in IntRate the C/C++
benchmarks. All benchmarks apart from mcf and x264 improve, on average
by 0.6% with the max for deepsjeng at 1.6%.
Benchmark % Change
500.perlbench_r 0.78
502.gcc_r 0.82
505.mcf_r -0.30
520.omnetpp_r 0.18
523.xalancbmk_r 0.37
525.x264_r -0.46
531.deepsjeng_r 1.61
541.leela_r 0.83
557.xz_r 0.15
Differential Revision: https://reviews.llvm.org/D85368
There is a subtle problem with new statepoint lowering scheme
when base and pointers are the same (see PR46917 for more context):
%1 = STATEPOINT ... %0, %0(tied-def 0)...
if, for some reason, register allocator desides to put two instances
of %0 into two different objects (registers or spill slots), we may
end up with
$reg3 = STATEPOINT ... $reg2, $reg1(tied-def 0)...
and nothing will prevent later passes to sink uses of $reg2 below
statepoint, which is incorrect.
As a short term solution, always put base pointers on stack during
lowering.
A longer term solution may be to rework MIR statepoint format to
avoid GC pointer duplication in statepoint argument list.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D86712
With gcc 6.3.0, I hit the following compilation error:
../lib/CodeGen/GlobalISel/Combiner.cpp: In member function
‘bool llvm::Combiner::combineMachineInstrs(llvm::MachineFunction&,
llvm::GISelCSEInfo*)’:
../lib/CodeGen/GlobalISel/Combiner.cpp:156:54: error: suggest parentheses
around ‘&&’ within ‘||’ [-Werror=parentheses]
assert(!CSEInfo || !errorToBool(CSEInfo->verify()) &&
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
"CSEInfo is not consistent. Likely missing calls to "
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"observer on mutations");
Fix the code as suggested by the compiler.
This is the follow up patch for https://reviews.llvm.org/D86183 as we miss to delete the node if NegX == NegY, which has use after we create the node.
```
if (NegX && (CostX <= CostY)) {
Cost = std::min(CostX, CostZ);
RemoveDeadNode(NegY);
return DAG.getNode(Opcode, DL, VT, NegX, Y, NegZ, Flags); #<-- NegY is used here if NegY == NegX.
}
```
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D86689
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().
Differential Revision: https://reviews.llvm.org/D86065
Original D81646 had check for tied regs in foldPatchpoint().
Due to unfortunate miscommunication with review comments and
adressing some comments post commit, it turned into assertion.
We had an offline talk and agreed that with current implementation
this path is possible, so I'm changing it back to check.
Note that this is workaround until ussues described in PR46917 are
resolved.
Remove the code that tried to look for reduction patterns, since the
vectorizer and isel can now produce predicated arithmetic instructios
within the loop body. This has required some reorganisation and fixes
around live-out and predication checks, as well as looking for cases
where an input/output is initialised to zero.
Differential Revision: https://reviews.llvm.org/D86613
It's possible to have a single virtual register def with a subreg
index that would pass the previous check, but it's not possible to
have a subregister def in SSA.
This is in preparation for adding stricter checks for SSA MIR.
https://reviews.llvm.org/D83833
Patch adds two new GICombinerRules for G_SELECT. The rules include:
combining selects with undef comparisons into their first selectee value,
and to combine away selects with constant comparisons. Patch additionally
adds a new combiner test for the AArch64 target to test these new G_SELECT
combiner rules and the existing select_same_val combiner rule.
Patch by mkitzan
https://reviews.llvm.org/D86676
Sometimes we can have the following code
x:gpr(s32) = G_OP
Say we build G_OP2 to the same x and then delete the previous instruction. Using something like
Register X = ...;
auto NewMIB = CSEBuilder.buildOp2(X, ... args);
Currently there's a mismatch in how NewMIB is profiled and inserted into the CSEMap (ie it doesn't consider register bank/register class along with type).Unify the profiling by refactoring and calling the common method.
This was found by turning on the CSEInfo::verify in at the end of each of our GISel passes which turns inconsistent state/non determinism in CSEing into crashes which likely usually indicates missing calls to Observer on mutations (the most common case). Here non determinism usually means not cseing sometimes, but almost never about producing incorrect code.
Also this patch adds this verification at the end of the combiners as well.
When joining the legal parts of vector arguments into its original value
during the lower of Formal Arguments in SelectionDAGBuilder, the Calling
Convention information was not being propagated for the handling of each
individual parts. The same did not happen when lowering calls, causing a
mismatch.
This patch fixes the issue by properly propagating the Calling
Convention details.
This fixes Bugzilla #47001.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86715
This is the first of a set of DAGCombiner changes enabling strictfp
optimizations. I want to test to waters with this to make sure changes
like these are acceptable for the strictfp case- this particular change
should preserve exception ordering and result precision perfectly, and
many other possible changes appear to be able to as well.
Copied from regular fadd combines but modified to preserve ordering via
the chain, this change allows strict_fadd x, (fneg y) to become
struct_fsub x, y and strict_fadd (fneg x), y to become strict_fsub y, x.
Differential Revision: https://reviews.llvm.org/D85548
This reverts commit b9d977b0ca.
This cutoff is no longer required. The commit 34ffa7fc501 (D86153) introduces a
performance improvement which was tested against the motivating case for this
patch.
Discussed in differential revision: https://reviews.llvm.org/D86153
Almost NFC (see end).
The backwards scan in validThroughout significantly contributed to compile time
for a pathological case, causing the 'X86 Assembly Printer' pass to account for
roughly 70% of the run time. This patch guards the loop against running
unnecessarily, bringing the pass contribution down to 4%.
Almost NFC: There is a hack in validThroughout which promotes single constant
value DBG_VALUEs in the prologue to be live throughout the function. We're more
likely to hit this code path with this patch applied. Similarly to the parent
patches there is a small coverage change reported in the order of 10s of bytes.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D86153
With the changes introduced in D86151 we can now check for single locations
which span multiple blocks for inlined scopes and blocks.
D86151 introduced the InstructionOrdering parameter, replacing a scan through
MBB instructions. The functionality to compare instruction positions across
blocks was add there, and this patch just removes the exit checks that were
previously (but no longer) required.
CTMark shows a geomean binary size reduction of 2.2% for RelWithDebInfo builds.
llvm-locstats (using D85636) shows a very small variable location coverage
change in 5 of 10 binaries, but just like in D86151 it is only in the order of
10s of bytes.
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D86152
With this patch we're now accounting for two more cases which should be
considered 'valid throughout': First, where RangeEnd is ScopeEnd. Second, where
RangeEnd comes before ScopeEnd when including meta instructions, but are both
preceded by the same non-meta instruction.
CTMark shows a geomean binary size reduction of 1.5% for RelWithDebInfo builds.
`llvm-locstats` (using D85636) shows a very small variable location coverage
change in 2 of 10 binaries, but it is in the order of 10s of bytes which lines
up with my expectations.
I've added a test which checks both of these new cases. The first check in the
test isn't strictly necessary for this patch. But I'm not sure that it is
explicitly tested anywhere else, and is useful for the final patch in the
series.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D86151
Group the map and methods used to query instruction ordering for trimVarLocs
(D82129) into a class. This will make it easier to reuse the functionality
upcoming patches.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D86150
This patch adds type information for SVE ACLE vector types,
by describing them as vectors, with a lower bound of 0, and
an upper bound described by a DWARF expression using the
AArch64 Vector Granule register (VG), which contains the
runtime multiple of 64bit granules in an SVE vector.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86101
Fix the ARM backend's analyzeBranch so it doesn't ignore predicated
return instructions, and make the MachineVerifier rule more strict.
Differential Revision: https://reviews.llvm.org/D40061
AArch64, X86 and Mips currently directly consumes these and custom
lowering to produce a libcall, but really these should follow the
normal legalization process through the libcall/lower action.
Summary:
When looking for all reaching definitions, we sort basic blocks on dominance. When sorting looking for properlyDominates() handles the case A == B.
Authored by: pranavb
Differential Revision: https://reviews.llvm.org/D86661
We have a gap in our store merging capabilities for shift+truncate
patterns as discussed in:
https://llvm.org/PR46662
I generalized the code/comments for this function in earlier commits,
so we only need ease the type restriction and adjust the address/endian
checking to make this work.
AArch64 lets us switch endian to make sure that patterns are matched
either way.
Differential Revision: https://reviews.llvm.org/D86420
This produces less work for addressing mode matching. I think this is
safe since I don't think machine IR is supposed to give the same
aliasing properties as getelementptr in the IR.
Before calling target hook to determine if two loads/stores are clusterable,
we put them into different groups to avoid fake cluster due to dependency.
For now, we are putting the loads/stores into the same group if they have
the same predecessor. We assume that, if two loads/stores have the same
predecessor, it is likely that, they didn't have dependency for each other.
However, one SUnit might have several predecessors and for now, we just
pick up the first predecessor that has non-data/non-artificial dependency,
which is too arbitrary. And we are struggling to fix it.
So, I am proposing some better implementation.
1. Collect all the loads/stores that has memory info first to reduce the complexity.
2. Sort these loads/stores so that we can stop the seeking as early as possible.
3. For each load/store, seeking for the first non-dependency instruction with the
sorted order, and check if they can cluster or not.
Reviewed By: Jay Foad
Differential Revision: https://reviews.llvm.org/D85517
If the basic block of the instruction passed to getUniqueReachingMIDef
is a transitive predecessor of itself and has a definition of the
register, the function will return that definition even if it is after
the instruction given to the function. This patch stops the function
from scanning the instruction's basic block to prevent this.
Differential Revision: https://reviews.llvm.org/D86607
There are two ways .llvmbc can be produced:
* clang -c -fembed-bitcode=all (which also produces .llvmcmd)
* LTO backend: ld.lld -mllvm -lto-embed-bitcode or -plugin-opt=-lto-embed-bitcode
.llvmbc and .llvmcmd have the SHF_ALLOC flag, so they can be dropped by
--gc-sections.
This patch sets SectionKind::Metadata to drop the SHF_ALLOC flag. This
is conceptually correct: the two sections are not part of the process
image, so SHF_ALLOC is not appropriate.
`test/LTO/X86/embed-bitcode.ll`: changed `llvm-objcopy -O binary --only-section` to
`llvm-objcopy --dump-section`. `-O binary` does not dump non-SHF_ALLOC sections.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D86374
This adapts legalization of intrinsic get.active.lane.mask to the new semantics
as described in D86147. Because the second argument is now the loop tripcount,
we legalize this intrinsic to an 'icmp ULT' instead of an ULE when it was the
backedge-taken count.
Differential Revision: https://reviews.llvm.org/D86302
This patch adds the -Xclang option
"-fexperimental-debug-variable-locations" and same LLVM CodeGen option,
to pick which variable location tracking solution to use.
Right now all the switch does is pick which LiveDebugValues
implementation to use, the normal VarLoc one or the instruction
referencing one in rGae6f78824031. Over time, the aim is to add fragments
of support in aid of the value-tracking RFC:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html
also controlled by this command line switch. That will slowly move
variable locations to be defined by an instruction calculating a value,
and a DBG_INSTR_REF instruction referring to that value. Thus, this is
going to grow into a "use the new kind of variable locations" switch,
rather than just "use the new LiveDebugValues implementation".
Differential Revision: https://reviews.llvm.org/D83048
The arm backend does not handle select/select_cc on vectors with scalar
conditions, preferring to expand them in codegenprepare instead. This
usually works except when optimizing for size, where the optsize check
would end up overruling the backend isSelectSupported check.
We could handle the selects in ISel too, but this seems like smaller
code than trying to splat the condition to all lanes.
Differential Revision: https://reviews.llvm.org/D86433
Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A)
transformation when -1 is expressed as an ISD::SPLAT_VECTOR.
Differential Revision: https://reviews.llvm.org/D86415
Explicitly check that there is a local def prior to the given
instruction in getReachingLocalMIDef instead of just relying on
a nullptr return from getInstFromId.
With FMF ( "nsz" and " reassoc") fold X/Sqrt(X) to Sqrt(X).
This is done after targets have the chance to produce a
reciprocal sqrt estimate sequence because that expansion
is probably more efficient than an expansion of a
non-reciprocal sqrt. That is also why we deferred doing
this transform in IR (D85709).
Differential Revision: https://reviews.llvm.org/D86403
D77152 tried to do this but got it wrong in the shift-by-zero case.
D86430 reverted the wrong code. Reimplement the optimization with
different code depending on whether the shift amount is known to be
non-zero (modulo bitwidth).
This improves code quality for fshl tests on AMDGPU, which only has an
fshr instruction.
Differential Revision: https://reviews.llvm.org/D86438
shl ([sza]ext x, y) => zext (shl x, y).
Turns expensive 64 bit shifts into 32 bit if it does not overflow the
source type:
This is a port of an AMDGPU DAG combine added in
5fa289f0d8. InstCombine does this
already, but we need to do it again here to apply it to shifts
introduced for lowered getelementptrs. This will help matching
addressing modes that use 32-bit offsets in a future patch.
TableGen annoyingly assumes only a single match data operand, so
introduce a reusable struct. However, this still requires defining a
separate GIMatchData for every combine which is still annoying.
Adds a morally equivalent function to the existing
getShiftAmountTy. Without this, we would have to do try to repeatedly
query the legalizer info and guess at what type to use for the shift.
This is a fixup of commit 0819a6416f (D77152) which could
result in miscompiles. The miscompile could only happen for targets
where isOperationLegalOrCustom could return different values for
FSHL and FSHR.
The commit mentioned above added logic in expandFunnelShift to
convert between FSHL and FSHR by swapping direction of the
funnel shift. However, that transform is only legal if we know
that the shift count (modulo bitwidth) isn't zero.
Basically, since fshr(-1,0,0)==0 and fshl(-1,0,0)==-1 then doing a
rewrite such as fshr(X,Y,Z) => fshl(X,Y,0-Z) would be incorrect if
Z modulo bitwidth, could be zero.
```
$ ./alive-tv /tmp/test.ll
----------------------------------------
define i32 @src(i32 %x, i32 %y, i32 %z) {
%0:
%t0 = fshl i32 %x, i32 %y, i32 %z
ret i32 %t0
}
=>
define i32 @tgt(i32 %x, i32 %y, i32 %z) {
%0:
%t0 = sub i32 32, %z
%t1 = fshr i32 %x, i32 %y, i32 %t0
ret i32 %t1
}
Transformation doesn't verify!
ERROR: Value mismatch
Example:
i32 %x = #x00000000 (0)
i32 %y = #x00000400 (1024)
i32 %z = #x00000000 (0)
Source:
i32 %t0 = #x00000000 (0)
Target:
i32 %t0 = #x00000020 (32)
i32 %t1 = #x00000400 (1024)
Source value: #x00000000 (0)
Target value: #x00000400 (1024)
```
It could be possible to add back the transform, given that logic
is added to check that (Z % BW) can't be zero. Since there were
no test cases proving that such a transform actually would be useful
I decided to simply remove the faulty code in this patch.
Reviewed By: foad, lebedev.ri
Differential Revision: https://reviews.llvm.org/D86430
D70867 introduced support for expanding most ppc_fp128 operations. But
sitofp/uitofp is missing. This patch adds that after D81669.
Reviewed By: uweigand
Differntial Revision: https://reviews.llvm.org/D81918
The pattern matching does not account for truncating stores,
so it is unlikely to work at later stages. So we are likely
wasting compile-time with no hope of improvement by running
this later.
This should be NFC in terms of output because the endian
check further down would bail out too, but we are wasting
time by waiting to that point to give up. If we generalize
that function to deal with more than i8 types, we should
not have to deal with the degenerate case.
This patch imports the instruction-referencing implementation of
LiveDebugValues proposed here:
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142368.html
The new implementation is unreachable in this patch, it's the next patch
that enables it behind a command line switch. Briefly, rather than
tracking variable locations by just their location as the 'VarLoc'
implementation does, this implementation does it by value:
* Each value defined in a function is numbered, and propagated through
dataflow,
* Each DBG_VALUE reads a machine value number from a machine location,
* Variable _values_ are propagated through dataflow,
* Variable values are translated back into locations, DBG_VALUEs
inserted to specify where those locations are.
The ultimate aim of this is to enable referring to variable values
throughout post-isel code, rather than locations. Those patches will
build on top of this new LiveDebugValues implementation in later patches
-- it can't be done with the VarLoc implementation as we don't have
value information, only locations.
Differential Revision: https://reviews.llvm.org/D83047
This patch renames the current LiveDebugValues class to "VarLocBasedLDV"
and removes the pass-registration code from it. It creates a separate
LiveDebugValues class that deals with pass registration and management,
that calls through to VarLocBasedLDV::ExtendRanges when
runOnMachineFunction is called. This is done through the "LDVImpl"
abstract class, so that a future patch can install the new
instruction-referencing LiveDebugValues implementation and have it
picked at runtime.
No functional change is intended, just shuffling responsibilities.
Differential Revision: https://reviews.llvm.org/D83046
This is a pure file move of LiveDebugValues.cpp ahead of the pass being
refactored, with an experimental new implementation to follow.
The motivation for these changes can be found here:
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142368.html
And the other related changes can be found in the phabricator stack for
this revision:
Differential Revision: https://reviews.llvm.org/D83304
This patch adds support for representing Fortran `character(n)`.
Primarily patch is based out of D54114 with appropriate modifications.
Test case IR is generated using our downstream classic-flang. We're in process
of upstreaming flang PR's but classic-flang has dependencies on llvm, so
this has to get in first.
Patch includes functional test case for both IR and corresponding
dwarf, furthermore it has been manually tested as well using GDB.
Source snippet:
```
program assumedLength
call sub('Hello')
call sub('Goodbye')
contains
subroutine sub(string)
implicit none
character(len=*), intent(in) :: string
print *, string
end subroutine sub
end program assumedLength
```
GDB:
```
(gdb) ptype string
type = character (5)
(gdb) p string
$1 = 'Hello'
```
Reviewed By: aprantl, schweitz
Differential Revision: https://reviews.llvm.org/D86305
The register class is required for inserting PHIs, but the "current
virtual register" isn't actually used for anything, so let's remove it
while we're at it.
Differential Revision: https://reviews.llvm.org/D85602
Change-Id: I1e647f31570ef21a7ea8e20db3454178e98a6a8b
In SelectionDAGBuilder always translate the fshl and fshr intrinsics to
FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and
ORs. Improve the legalization of FSHL and FSHR to avoid code quality
regressions.
Differential Revision: https://reviews.llvm.org/D77152
Both AfterPass and AfterPassInvalidated pass instrumentation
callbacks get additional parameter of type PreservedAnalyses.
This patch was created by @fedor.sergeev. I have just slightly
changed it.
Reviewers: fedor.sergeev
Differential Revision: https://reviews.llvm.org/D81555
Known bits for G_ANYEXT was incorrectly using KnownBits::zext, causing
us to treat the high bits as zero even though they're (by definition)
unknown.
Differential Revision: https://reviews.llvm.org/D86323
Assuming this is used to split a memory access into smaller pieces,
the new access should still have the same aliasing properties as the
original memory access. As far as I can tell, this wasn't
intentionally dropped. It may be necessary to drop this if you are
moving the operand outside of the bounds of the original object in
such a way that it may alias another IR object, but I don't think any
of the existing users are doing this. Some of the uses widen into
unused alignment padding, which I think is OK.
The byte swapping, when dealing with 4 byte (float) FP constants
in DwarfExpression::addConstantFP, added in commit ef8992b9f0
was not correct. It always performed byte swapping using an
uint64_t value. When dealing with 4 byte values the 4 interesting
bytes ended up in the big end of the uint64_t, but later we emitted
the 4 bytes at the little end. So we ended up with zeroes being
emitted and faulty debug information.
This patch simplifies things a bit, IMHO. Using the APInt
representation throughout the function, instead of looking at
the internal representation using getRawBytes and without using
reinterpret_cast etc. And using API.byteSwap() should result in
correct byte swapping independent of APInt being 4 or 8 bytes.
Differential Revision: https://reviews.llvm.org/D86272
The check for the landingpad instructions was overly restrictive. In optimimized builds PHI nodes can appear
before the landingpad instructions, resulting in a fallback to SelectionDAG.
This change relaxes the check to allow PHI nodes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86141
This patch was reverted in 7c182663a8 due to some failures
observed on PCC based machines. Failures were due to Endianness issue and
long double representation issues.
Patch is revised to address Endianness issue. Furthermore, support
for emission of `DW_OP_implicit_value` for `long double` has been removed
(since it was unclean at the moment). Planning to handle this in
a clean way soon!
For more context, please refer to following review link.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83560
llvm is missing support for DW_OP_implicit_value operation.
DW_OP_implicit_value op is indispensable for cases such as
optimized out long double variables.
For intro refer: DWARFv5 Spec Pg: 40 2.6.1.1.4 Implicit Location Descriptions
Consider the following example:
```
int main() {
long double ld = 3.14;
printf("dummy\n");
ld *= ld;
return 0;
}
```
when compiled with tunk `clang` as
`clang test.c -g -O1` produces following location description
of variable `ld`:
```
DW_AT_location (0x00000000:
[0x0000000000201691, 0x000000000020169b): DW_OP_constu 0xc8f5c28f5c28f800, DW_OP_stack_value, DW_OP_piece 0x8, DW_OP_constu 0x4000, DW_OP_stack_value, DW_OP_bit_piece 0x10 0x40, DW_OP_stack_value)
DW_AT_name ("ld")
```
Here one may notice that this representation is incorrect(DWARF4
stack could only hold integers(and only up to the size of address)).
Here the variable size itself is `128` bit.
GDB and LLDB confirms this:
```
(gdb) p ld
$1 = <invalid float value>
(lldb) frame variable ld
(long double) ld = <extracting data from value failed>
```
GCC represents/uses DW_OP_implicit_value in these sort of situations.
Based on the discussion with Jakub Jelinek regarding GCC's motivation
for using this, I concluded that DW_OP_implicit_value is most appropriate
in this case.
Link: https://gcc.gnu.org/pipermail/gcc/2020-July/233057.html
GDB seems happy after this patch:(LLDB doesn't have support
for DW_OP_implicit_value)
```
(gdb) p ld
p ld
$1 = 3.14000000000000012434
```
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83560
If we have a mask, and a value x, where (x & mask) == x, we can drop the AND
and just use x.
This is about a 0.4% geomean code size improvement on CTMark at -O3 for AArch64.
In AArch64, this is most useful post-legalization. Patterns like this often
show up when legalizing s1s, which must be extended to larger types.
e.g.
```
%cmp:_(s32) = G_ICMP ...
%and:_(s32) = G_AND %cmp, 1
```
Since G_ICMP only produces a single bit, there's no reason to mask it with the
G_AND.
Differential Revision: https://reviews.llvm.org/D85463
In DAGTypeLegalizer::GenWidenVectorLoads the algorithm assumes it only
ever deals with fixed width types, hence the offsets for each individual
store never take 'vscale' into account. I've changed the code in that
function to use TypeSize instead of unsigned for tracking the remaining
load amount. In addition, I've changed the load loop to use the new
IncrementPointer helper function for updating the addresses in each
iteration, since this handles scalable vector types.
Also, I've added report_fatal_errors in GenWidenVectorExtLoads,
TargetLowering::scalarizeVectorLoad and TargetLowering::scalarizeVectorStores,
since these functions currently use a sequence of element-by-element
scalar loads/stores. In a similar vein, I've also added a fatal error
report in FindMemType for the case when we decide to return the element
type for a scalable vector type.
I've added new tests in
CodeGen/AArch64/sve-split-load.ll
CodeGen/AArch64/sve-ld-addressing-mode-reg-imm.ll
for the changes in GenWidenVectorLoads.
Differential Revision: https://reviews.llvm.org/D85909
It's annoying to have to maintain multiple, nearly identical chains of if
statements which all set the same attributes.
Add a helper function, `addFlagsUsingAttrFn` which performs the attribute
setting.
Then, use wrappers for that function in `lowerCall` and `setArgFlags`.
(Note that the flag-setting code in `setArgFlags` was missing the returned
attribute. There's no selection for this yet, so no test. It's an example of
the kind of thing this lets us avoid, though.)
Differential Revision: https://reviews.llvm.org/D86159
Similar to this commit:
faf8065a99
Testcase is pretty much the same as
test/CodeGen/AArch64/tailcall-explicit-sret.ll
Except it uses i64 (since we don't handle the i1024 return values yet), and
doesn't have indirect tail call testcases (because we can't translate those
yet).
Differential Revision: https://reviews.llvm.org/D86148
This is restricted to single use loads, which if we fold to sextloads we can
find more optimal addressing modes on AArch64.
This also fixes an overload the MachineFunction::getMachineMemOperand() method
which was incorrectly using the MF alignment instead of the MMO alignment.
Differential Revision: https://reviews.llvm.org/D85966
By detecting this sign extend pattern early, we can uncover opportunities for
more optimizations.
Differential Revision: https://reviews.llvm.org/D85965
We weren't looking through the parameters on calls at all.
E.g., say you had
```
declare i32 @zext(i32 zeroext %x)
...
%y = call i32 @zext(i32 %something)
...
```
At the point of the call, we wouldn't know that the %something should have the
zeroext attribute.
This sets flags in about the same way as
TargetLoweringBase::ArgListEntry::setAttributes.
Differential Revision: https://reviews.llvm.org/D86125
Theory was that we should never reach a non-type unit (eg: type in an
anonymous namespace) when we're already in the invalid "encountered an
address-use, so stop emitting types for now, until we throw out the
whole type tree to restart emitting in non-type unit" state. But that's
not the case (prior commit cleaned up one reason this wasn't exposed
sooner - but also makes it easier to test/demonstrate this issue)
This reads more like what you'd expect the DWARF to look like (from the
lexical order of C++ - template parameters come before members, etc),
and also happens to make it easier to tickle (& thus test) a bug related
to type units and Split DWARF I'm about to fix.
Some of the lower implementations were relying on this, however the
type was not set depending on which form .lower* helper form you were
using. For instance, if you used an unconditonal lower(), the type was
never set. Most of the lower actions do not benefit from a type
parameter, and just expand in terms of the original operation's types.
However, some lowerings could benefit from an additional type hint to
combine a promotion and an expansion. An example of this is for
add/sub sat. The DAG integer legalization tries to use smarter
expansions directly when promoting the integer type, and doesn't
always produce the same instruction with a wider type.
Treat this as an optional hint argument, that only means something for
specific lower actions. It may be useful to generalize this mechanism
to pass a full list of type indexes and desired types, but I haven't
run into a case like that yet.
The "isa" checks were less constrained because they allow
target constants, but the later matching code would bail
out on those anyway, so this should be slightly more
efficient.
(Forgot to land this a couple of weeks back.)
In a recent series of changes, I've introduced support for using the respective operand bundle kinds on the statepoint. At the moment, code supports either/or, but there's no need to keep the old support around. For the moment, I am simply changing the specification and verifier to require zero length argument sets in the intrinsic.
The intrinsic itself is experimental. Given that, there's no forward serialization needed. The in tree uses and generation have already been updated to use the new operand bundle based forms, the only folks broken by the change will be those with frontends generating statepoints directly and the updates should be easy.
Why not go ahead and just remove the arguments entirely? Well, I plan to. But while working on this I've found that almost all of the arguments to the statepoint can be expressed via operand bundles or attributes. Given that, I'm planning a radical simplification of the arguments and figured I'd do one update not several small ones.
Differential Revision: https://reviews.llvm.org/D80892
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.
This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.
One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.
I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.
Differential Revision: https://reviews.llvm.org/D85165
These should really match either G_BUILD_VECTOR or
G_BUILD_VECTOR_TRUNC, but there doesn't seem to be an existing
mechanism for matching alternative opcodes. There is GIM_SwitchOpcode,
but it seems to assume it's oly only used for matcher optimization.
I could also omit any opcode check and rely on the matcher directly
checking the opcode, but the table optimizer currently assumes there
has to be an opcode check.
Also doesn't try to handle undef elements like the DAG version.
Extend FixupStatepointCallerSaved pass with ability to spill
statepoint GC pointer arguments (optionally allowing them on CSRs).
Special handling is required for invoke statepoints, because at MI
level single landing pad may be shared by multiple statepoints, so
we must ensure we spill landing pad's live-ins into the same stack
slots.
Full statepoint refactoring change set is available at D81603.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D81647
This patch changes SplitVecOp_EXTRACT_VECTOR_ELT to work correctly
for scalable vectors and also fixes an a bug in DAGCombiner where
the scalable property is dropped in visitTRUNCATE when attempting
to fold an extract + a truncate.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85754
In DAGTypeLegalizer::GenWidenVectorStores the algorithm assumes it only
ever deals with fixed width types, hence the offsets for each individual
store never take 'vscale' into account. I've changed the main loop in
that function to use TypeSize instead of unsigned for tracking the
remaining store amount and offset increment. In addition, I've changed
the loop to use the new IncrementPointer helper function for updating
the addresses in each iteration, since this handles scalable vector
types.
Whilst fixing this function I also fixed a minor issue in
IncrementPointer whereby we were not adding the no-unsigned-wrap flag
for the add instruction in the same way as the fixed width case does.
Also, I've added a report_fatal_error in GenWidenVectorTruncStores,
since this code currently uses a sequence of element-by-element scalar
stores.
I've added new tests in
CodeGen/AArch64/sve-intrinsics-stores.ll
CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll
for the changes in GenWidenVectorStores.
Differential Revision: https://reviews.llvm.org/D84937
In narrowExtractedVectorLoad there is an optimisation that tries to
combine extract_subvector with a narrowing vector load. At the moment
this produces warnings due to the incorrect calls to
getVectorNumElements() for scalable vector types. I've got this
working for scalable vectors too when the extract subvector index
is a multiple of the minimum number of elements. I have added a
new variant of the function:
MachineFunction::getMachineMemOperand
that copies an existing MachineMemOperand, but replaces the pointer
info with a null version since we cannot currently represent scaled
offsets.
I've added a new test for this particular case in:
CodeGen/AArch64/sve-extract-subvector.ll
Differential Revision: https://reviews.llvm.org/D83950
This is mostly a straight port from SelectionDAG. We re-use the actual bit-test
analysis part from SwitchLoweringUtils, which was factored out earlier to
support jump-tables.
Differential Revision: https://reviews.llvm.org/D85233
In this patch I have fixed two issues:
1. Our SVE tuple get/set intrinsics were using the wrong constant type
for the index passed to EXTRACT_SUBVECTOR. I have fixed this by using the
function SelectionDAG::getVectorIdxConstant to create the value. Also, I
have updated the documentation for EXTRACT_SUBVECTOR describing what type
the constant index should be and we now enforce this when creating the
node.
2. The AArch64 backend was missing the appropriate patterns for
extracting certain subvectors (nxv4f16 and nxv2f32) from legal SVE types.
I have added them as part of this patch.
The only way that I could find to test the new patterns was to use the
SVE tuple get intrinsics, although I realise it looks a bit unusual.
Tests added here:
test/CodeGen/AArch64/sve-extract-subvector.ll
Differential Revision: https://reviews.llvm.org/D85516
SUMMARY:
1. in the patch , remove setting storageclass in function .getXCOFFSection and construct function of class MCSectionXCOFF
there are
XCOFF::StorageMappingClass MappingClass;
XCOFF::SymbolType Type;
XCOFF::StorageClass StorageClass;
in the MCSectionXCOFF class,
these attribute only used in the XCOFFObjectWriter, (asm path do not need the StorageClass)
we need get the value of StorageClass, Type,MappingClass before we invoke the getXCOFFSection every time.
actually , we can get the StorageClass of the MCSectionXCOFF from it's delegated symbol.
2. we also change the oprand of branch instruction from symbol name to qualify symbol name.
for example change
bl .foo
extern .foo
to
bl .foo[PR]
extern .foo[PR]
3. and if there is reference indirect call a function bar.
we also add
extern .bar[PR]
Reviewers: Jason liu, Xiangling Liao
Differential Revision: https://reviews.llvm.org/D84765
This implements
```
(logic_op (op x...), (op y...)) -> (op (logic_op x, y))
```
when `op` is an extend, a shift, or an and.
This is similar to `DAGCombiner::hoistLogicOpWithSameOpcodeHands`
(with a bunch of missing cases, e.g. G_TRUNC, G_BITCAST, etc.)
This is implemented so it works both pre and post-legalization.
This also adds a general way to add a series of instructions in a combine.
(`applyBuildInstructionSteps`).
Differential Revision: https://reviews.llvm.org/D85050
Allow the GNU .debug_macro extension to be emitted for DWARF versions
earlier than 5. The extension is basically what became DWARF 5's format,
except that a DW_AT_GNU_macros attribute is emitted, and some entries
like the strx entries are missing. In this patch I emit GNU's indirect
entries, which are the same as DWARF 5's strp entries.
This patch adds the extension behind a hidden LLVM flag,
-use-gnu-debug-macro. I would later want to enable it by default when
tuning for GDB and targeting DWARF versions earlier than 5.
The size of a Clang 8.0 binary built with RelWithDebInfo and the flags
"-gdwarf-4 -fdebug-macro" reduces from 1533 MB to 1349 MB with
.debug_macro (compared to 1296 MB without -fdebug-macro).
Reviewed By: SouraVX, dblaikie
Differential Revision: https://reviews.llvm.org/D82975
Broken out from a review comment on D82975. This is an NFC expect for
that the Macinfo macro string is now emitted using a single emitBytes()
invocation, so it can be done using a single string directive.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D83557
This mirrors the support for the equivalent extracts. This also
creates a huge mess that would be greatly improved if we had any bit
operation combines.
When the result type of insertelement needs to be split,
SplitVecRes_INSERT_VECTOR_ELT will try to store the vector to a
stack temporary, store the element at the location of the stack
temporary plus the index, and reload the Lo/Hi parts.
This patch does the following to ensure this works for scalable vectors:
- Sets the StackID with getStackIDForScalableVectors() in CreateStackTemporary
- Adds an IsScalable flag to getMemBasePlusOffset() and scales the
offset by VScale when this is true
- Ensures the immediate is clamped correctly by clampDynamicVectorIndex
so that we don't try to use an out of range index
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D84874
Move the Dwarf version checks that determine if the .debug_macro section
should be emitted, into a DwarfDebug member. This is a preparatory
refactoring for allowing the GNU .debug_macro extension, which is a
precursor to the DWARF 5 format, to be emitted by LLVM for earlier DWARF
versions.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D82971
Changes the Offset arguments to both functions from int64_t to TypeSize
& updates all uses of the functions to create the offset using TypeSize::Fixed()
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85220
We skip debug instructions in RDA so we cannot attempt to look them
up in our instruction map without causing a crash. But some of the
methods select the last instruction in the block and this
instruction may be a debug instruction... So, use getLastNonDebugInstr
instead of calling back on a MachineBasicBlock.
MachineBasicBlock iterators have also been updated to use
instructionsWithoutDebug so we can avoid the manual checks for debug
instructions.
Differential Revision: https://reviews.llvm.org/D85658
Summary:
Use TE SMC instead of TC SMC in large code model mode,
so that large code model TOC entries could get placed after all
the small code model TOC entries, which reduces the chance of TOC overflow.
Reviewed By: Xiangling_L
Differential Revision: https://reviews.llvm.org/D85455
SplitKit forms invalid COPY subreg bundles without a leading
BUNDLE instruction. That manifests itself in post-RA scheduler
counting instruction and asserting on "Instruction count mismatch".
The bundle shall be undone by VirtRegRewriter::expandCopyBundle(),
but it does not because VirtRegRewriter::handleIdentityCopy() can
turn COPY bundle into a KILL bundle.
Process KILLs as well.
Differential Revision: https://reviews.llvm.org/D85484
This patch adds the missing information to the LF_BUILDINFO record, which allows for rebuilding a .CPP without any external dependency but the .OBJ itself (other than the compiler).
Some external tools that we are using (Recode, Live++) are extracting the information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO stores a full path to the compiler, the PWD (CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variables). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.
For more information see PR36198 and D43002.
Differential Revision: https://reviews.llvm.org/D80833
X86 is the only user of this interface in tree. Previously the
X86 pass would loop over operands looking for one undef operand for
the pass to fix. But there could theoretically be multiple operands
to fix. So it makes more sense for the pass to do the looping and
ask the target if an operand needs to be fixed.
On the frontend side, this patch recovers AIX static init implementation to
use the linkage type and function names Clang chooses for sinit related function.
On the backend side, this patch sets correct linkage and function names on aliases
created for sinit/sterm functions.
Differential Revision: https://reviews.llvm.org/D84534
As noticed on D66004, scalarization of an expandload with a constant mask as a chain of irregular loads+inserts makes it tricky to optimize before lowering, resulting in difficulties in merging loads etc.
This patch instead scalarizes the expansion to a build_vector(load0, load1, undef, load2,....) style pattern and then performs a blend shuffle with the pass through vector. This allows us to more easily make use of all the build_vector combines, merging of consecutive loads etc.
Differential Revision: https://reviews.llvm.org/D85416
This also fixes the condition in the assertion in
DwarfCompileUnit::getLabelBegin() because it checked something unrelated
to the returned value.
Differential Revision: https://reviews.llvm.org/D85437
These aren't the canonical forms we'd get from InstCombine, but
we do have X86 tests for them. Recognizing them is pretty cheap.
While there make use of APInt:isSignedMinValue/isSignedMaxValue
instead of creating a new APInt to compare with. Also use
SelectionDAG::getAllOnesConstant helper to hide the all ones
APInt creation.
Follow-up to D82716 / rGea71ba11ab11
We do not have the fabs removal fold in IR yet for the case
where the sqrt operand is repeated, so that's another potential
improvement.
As mentioned on D85463, we should be using SimplifyMultipleUseDemandedBits (which is the default fallback).
The minor regression in illegal-bitfield-loadstore.ll will be addressed properly by D77804.
This removes members of the DIEUnit class which were used only in unit
tests. Note also that child classes shadowed some of these methods,
namely, getDwarfVersion() was overridden in DwartfUnit and getLength()
was overridden in DwarfCompileUnit.
Differential Revision: https://reviews.llvm.org/D85436
If it is load cluster, we don't need to create the dependency edges(SUb->reg) from SUb to SUa
as they both depend on the base register "reg"
+-------+
+----> reg |
| +---+---+
| ^
| |
| |
| |
| +---+---+
| | SUa | Load 0(reg)
| +---+---+
| ^
| |
| |
| +---+---+
+----+ SUb | Load 4(reg)
+-------+
But if it is store cluster, we need to create it as follow shows to avoid the instruction store
depend on scheduled in-between SUb and SUa.
+-------+
+----> reg |
| +---+---+
| ^
| | Missing +-------+
| | +-------------------->+ y |
| | | +---+---+
| +---+-+-+ ^
| | SUa | Store x 0(reg) |
| +---+---+ |
| ^ |
| | +------------------------+
| | |
| +---+--++
+----+ SUb | Store y 4(reg)
+-------+
Reviewed By: evandro, arsenm, rampitec, foad, fhahn
Differential Revision: https://reviews.llvm.org/D72031
One of the callers only wants the condition, but the vselect can
be simplified by getNode making it hard or impossible to retrieve
the condition.
Instead, return the condition and make the other 2 callers
responsible for creating the vselect node using the condition.
Rename the function to WidenVSELECTMask accordingly.
Differential Revision: https://reviews.llvm.org/D85468
Use the same basic strategy as LegalizeVectorTypes. Try to index into
smaller pieces if there's a constant index, and otherwise fall back to
a stack temporary.
If a function is in a unique section, putting all jump tables in
.rodata will prevent functions that have a jump table to get
garbage collect by the linker.
Therefore, we need to put jump table into a unique section as well.
Reviewed By: Xiangling_L
Differential Revision: https://reviews.llvm.org/D84761
Add given input and mark it as tied.
Doesn't create additional copy compared to
matching input constraint to virtual register.
Differential Revision: https://reviews.llvm.org/D85122
This allows us to remove extra patterns from AArch64SVEInstrInfo.td
because we can reuse those required for fixed length vectors.
Differential Revision: https://reviews.llvm.org/D85328
This patch changes the functionality of AsmPrinter to name the basic block end labels as LBB_END${i}_${j}, with ${i} being the identifier for the function and ${j} being the identifier for the basic block. The new naming scheme is consistent with how basic block labels are named (.LBB${i}_{j}), and how function end symbol are named (.Lfunc_end${i}) and helps to write stronger tests for the upcoming patch for BB-Info section (as proposed in https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html). The end label is used with basicblock-labels (BB-Info section in future) and basicblock-sections to compute the size of basic blocks and basic block sections, respectively. For BB sections, the section containing the entry basic block will not have a BB end label since it already gets the function end-label.
This label is cached for every basic block (CachedEndMCSymbol) like the label for the basic block (CachedMCSymbol).
Differential Revision: https://reviews.llvm.org/D83885
Implement proper folding of statepoint meta operands (deopt and GC)
when statepoint uses tied registers.
For deopt operands it is just about properly preserving tiedness
in new instruction.
For tied GC operands folding is a little bit more tricky.
We can fold tied GC operands only from InlineSpiller, because it knows
how to properly reload tied def after it was turned into memory operand.
Other users (e.g. peephole) cannot properly fold such operands as they
do not know how (or when) to reload them from memory.
We do this by un-tieing operand we want to fold in InlineSpiller
and allowing to fold only untied operands in foldPatchpoint.
We currently don't do anything to fold any_extend vector loads as no target has such an instruction.
Instead I've added support for folding to a zextload, SimplifyDemandedBits does a good job of adjusting the zext(truncate(()) stages as required later on.
We still need the custom scalar extload handling instead of using the tryToFoldExtOfLoad helper as it has different legality tests - we can probably tweak that to reduce most of the code duplication.
Fixes the regression I mentioned in rG99a971cadff7
Differential Revision: https://reviews.llvm.org/D85129
Currently, we only test the `--stackmap` option here:
https://github.com/llvm/llvm-project/blob/master/llvm/test/Object/stackmap-dump.test
it uses a precompiled MachO binary currently and I've found no tests for this option for ELF.
The implementation also has issues. For example, it might assert on a wrong version
of the .llvm-stackmaps section. Or it might crash on an empty or truncated section.
This patch introduces a new tools/llvm-readobj/ELF test file as well as implements a few
basic checks to catch simple crashes/issues
It also eliminates `unwrapOrError` calls in `printStackMap()`.
Differential revision: https://reviews.llvm.org/D85208
The sorting is needed, because reaching defs are (logically) ordered,
but are not collected in that order. This change will break up the
single call to std::sort into a series of smaller sorts, each of which
should use a cheaper comparison function than the original.
Get the argument register and ensure there's a copy to the virtual
register. AMDGPU and AArch64 have similarish code to get the livein
value, and I also want to use this in multiple places.
This is a bit more aggressive about setting the register class than
the original function, but that's probably OK.
I think we're missing a few verifier checks for function live ins. I
noticed AArch64's calling convention code is not actually adding
liveins to functions, only the entry block (which apparently might not
matter that much?). There should probably be a verifier check that
entry block live ins are also live into the function. We also might
need a verifier check that the copy to the livein virtual register is
in the entry block.
This corresponds with the SelectionDAGISel change in D84056.
Also, rename some poorly named tests in CodeGen/X86/fast-isel-fneg.ll with NFC.
Differential Revision: https://reviews.llvm.org/D85149
This one is pretty easy and shrinks the list of unhandled
intrinsics. I'm not sure how relevant the insert point is. Using the
insert position of EntryBuilder will place this after
constants. SelectionDAG seems to end up emitting these after argument
copies and before anything else, but I don't think it really
matters. This also ends up emitting these in the opposite order from
SelectionDAG, but I don't think that matters either.
This also needs a fix to stop the later passes dropping this as a dead
instruction. DeadMachineInstructionElim's version of isDead special
cases LOCAL_ESCAPE for some reason, and I'm not sure why it's excluded
from MachineInstr::isLabel (or why isDead doesn't check it).
I also noticed DeadMachineInstructionElim never considers inline asm
as dead, but GlobalISel will drop asm with no constraints.
This patch stops unconditionally transforming FSUB(-0, X) into an FNEG(X) while building the MIR.
This corresponds with the SelectionDAGISel change in D84056.
Differential Revision: https://reviews.llvm.org/D85139
The custom lowering saves an instruction over the generic expansion, by
taking advantage of the fact that PowerPC shift instructions are well
defined in the shift-by-bitwidth case.
Differential Revision: https://reviews.llvm.org/D83948
The CFA is calculated as (SP/FP + offset), but when there are
SVE objects on the stack the SP offset is partly scalable and
should instead be expressed as the DWARF expression:
SP + offset + scalable_offset * VG
where VG is the Vector Granule register, containing the
number of 64bits 'granules' in a scalable vector.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84043
Part of https://bugs.llvm.org/show_bug.cgi?id=41734
LTO can drop externally available definitions. Such AssociatedSymbol is
not associated with a symbol. ELFWriter::writeSection() will assert.
Allow a SHF_LINK_ORDER section to have sh_link=0.
We need to give sh_link a syntax, a literal zero in the linked-to symbol
position, e.g. `.section name,"ao",@progbits,0`
Reviewed By: pcc
Differential Revision: https://reviews.llvm.org/D72899
This patch stops unconditionally transforming FSUB(-0,X) into an FNEG(X) while building the DAG. There is also one small change to handle the new FSUB(-0,X) similarly to FNEG(X) in the AMDGPU backend.
Differential Revision: https://reviews.llvm.org/D84056
Use pad with undef and unmerge with unused results. This is annoyingly
similar to several other places in LegalizerHelper, but they're all
slightly different.
The patch restricts DIEDelta::SizeOf() to accept only DWARF forms that
are actually used in the LLVM codebase. This should make the use of the
class more explicit and help to avoid issues similar to fixed in D83958
and D84094.
Differential Revision: https://reviews.llvm.org/D84095
DIELabel can emit only 32- or 64-bit values, while it was created in
some places with DW_FORM_udata, which implies emitting uleb128.
Nevertheless, these places also expected to emit U32 or U64, but just
used a misleading DWARF form. The patch updates those places to use more
appropriate DWARF forms and restricts DIELabel::SizeOf() to accept only
forms that are actually used in the LLVM codebase.
Differential Revision: https://reviews.llvm.org/D84094
DIELocList is used with a limited number of DWARF forms, see the only
place where it is instantiated, DwarfCompileUnit::addLocationList().
The patch marks the unexpected execution path in DIELocList::SizeOf()
as unreachable, to reduce ambiguity.
Differential Revision: https://reviews.llvm.org/D84092
For AMDGPU, vectors with elements < 32 bits should be indexed in
32-bit elements and the desired bits extracted from there. For
elements > 64-bits, these should be reduce to 64/32 elements to enable
the normal dynamic indexing paths.
In the dynamic index cases, this produces shorter code most of the
time. This does immediately regress the constant index cases, but this
should be fixed once we have the most basic of shift combines.
The element size > 64 case is pretty much ported from the exisiting
DAG implementation for extract element promote. The increasing element
size case is new.
Try to be more consistent with the SDLoc param in the TargetLowering methods.
This also exposes an issue where we were passing a SDNode as a SDLoc, relying on the implicit SDLoc(SDNode) constructor.
D68049 created options for basic block sections: -fbasic-block-sections=,
-funique-basic-block-section-names. Rename options in llc and lld (--lto-)
to be consistent. Specifically,
+ Rename basicblock-sections to basic-block-sections
+ Rename unique-bb-section-names to unique-basic-block-section-names
Differential Revision: https://reviews.llvm.org/D84462
https://reviews.llvm.org/D84909
Patch adds two new GICombinerRules, one for G_INTTOPTR and one for
G_PTRTOINT. The G_INTTOPTR elides ptr2int(int2ptr(x)) to a copy of x, if
the cast is within the same address space. The G_PTRTOINT elides
int2ptr(ptr2int(x)) to a copy of x. Patch additionally adds new combiner
tests for the AArch64 target to test these new combiner rules.
Patch by mkitzan
Just the obvious implementation that rewrites the result type. Also fix
warning from EXTRACT_SUBVECTOR legalization that triggers on the test.
Differential Revision: https://reviews.llvm.org/D84706
This fixes an assertion failure that was being triggered in
SelectionDAG::getZeroExtendInReg(), where it was trying to extend the <2xi32>
to i64 (which should have been <2xi64>).
Fixes: rdar://66016901
Differential Revision: https://reviews.llvm.org/D84884
In cases where the alignment of the datatype is smaller than
expected by the instruction, the address is aligned. The aligned
address is used for the load, but wasn't used for the store
conditional, which resulted in a run-time alignment exception.
Summary:
This patch implements -ffunction-sections on AIX.
This patch focuses on assembly generation.
Follow-on patch needs to handle:
1. -ffunction-sections implication for jump table.
2. Object file generation path and associated testing.
Differential Revision: https://reviews.llvm.org/D83875
Summary:
In the phi-node-elimination pass, we set the killed flag incorrectly.
When we eliminate the PHI node, we replace the PHI with a copy for the
incoming value.
Before this patch, we will set incoming value as killed(PHICopy). And
we will remove the killed flag from last using incoming value(OldKill).
This is correct, only if the new PHICopy is after the OldKill.
Reviewed By: bjope
Differential Revision: https://reviews.llvm.org/D80886
I still think it's highly questionable that we have two intrinsics
with identical behavior and only vary by the name of the libcall used
if it happens to be lowered that way, but try to reduce the feature
delta between SDAG and GlobalISel for recently added intrinsics. I'm
not sure which opcode should be considered the canonical one, but
lower roundeven back to round.
This change is mechanical, it just removes the restriction and updates tests. The key building blocks were submitted in 31342eb and 8fe2abc.
Note that this (and preceeding changes) entirely subsumes D83965. I did includes a couple of it's tests.
From the codegen changes, an interesting observation: this doesn't actual reduce spilling, it just let's the register allocator do it's job. That results in a slightly different overall result which has both pros and cons over the eager spill lowering. (i.e. We'll have some perf tuning to do once this is stable.)
Change the way we track how a particular pointer was relocated at a statepoint in selection dag. Previously, we used an optional<location> for the spill lowering, and a block local Register for the newly introduced vreg lowering. Combine all three lowerings (norelocate, spill, and vreg) into a single helper class, and keep a single copy of the information.
This is submitted separately as it really does make the code more readible on it's own, but the indirect motivation is to move vreg tracking from StatepointLowering to FunctionLoweringInfo. This is the last piece needed to support cross block relocations with vregs; that will follow in a separate (non-NFC) patch.
In future, we'd like to use the perfect-shuffle mechanism to deal with these
shuffle permutations. For now, this improves performance by avoiding the
super-expensive const-pool load + tbl instruction.
Differential Revision: https://reviews.llvm.org/D84866
This builds on 3da1a96 on the path towards supporting invokes and cross block relocations. The actual change attempts to be NFC, but does fail in one corner-case explained below.
The change itself is fairly mechanical. Rather than remember SDValues - which are inherently block local - immediately produce a virtual register copy and remember that.
Once this lands, we'll update the FunctionLoweringInfo::StatepointSpillMap map to allow register based lowerings, delete VirtRegs from StatepointLowering, and drop the restriction against cross block relocations. I deliberately separate the semantic part into it's own change for easy of understanding and fault isolation.
The corner-case which isn't quite NFC is that the old implementation implicitly CSEd gc.relocates of the same SDValue regardless of type. The new implementation still only relocates once, but it produces distinct vregs for the bitcast and it's source, whereas SelectionDAG's generic CSE was able to remove the bitcast in the old implementation. Note that the final assembly doesn't change (at least in the test), as our MI level optimizations catch the duplication.
I assert that this is an uninteresting corner-case. It's functionally correct, and if we find a case where this influences performance, we should really be canonicalizing types to i8* at the IR level.
Differential Revision: https://reviews.llvm.org/D84692
Summary:
When doing MachineVerifier for LiveVariables, the MachineVerifier pass
will calculate the LiveVariables, and compares the result with the
result livevars pass gave. If they are different, verifyLiveVariables()
will give error.
But when we calculate the LiveVariables in MachineVerifier, we don't
consider the PHI node, while livevars considers.
This patch is to fix above bug.
Reviewed By: bjope
Differential Revision: https://reviews.llvm.org/D80274
In MachineCopyPropagation::BackwardPropagatableCopy(),
a check is added for multiple destination registers.
The copy propagation is avoided if the copied destination register
is the same register as another destination on the same instruction.
A new test is added. This used to fail on ARM like this:
error: unpredictable instruction, RdHi and RdLo must be different
umull r9, r9, lr, r0
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D82638
I have added tests to:
CodeGen/AArch64/sve-intrinsics-int-arith.ll
for doing simple integer add operations on tuple types. Since these
tests introduced new warnings due to incorrect use of
getVectorNumElements() I have also fixed up these warnings in the
same patch. These fixes are:
1. In narrowExtractedVectorBinOp I have changed the code to bail out
early for scalable vector types, since we've not yet hit a case that
proves the optimisations are profitable for scalable vectors.
2. In DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS I have replaced
calls to getVectorNumElements with getVectorMinNumElements in cases
that work with scalable vectors. For the other cases I have added
asserts that the vector is not scalable because we should not be
using shuffle vectors and build vectors in such cases.
Differential revision: https://reviews.llvm.org/D84016
In DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR I have replaced
calls to getVectorNumElements with getVectorMinNumElements, since
this code path works for both fixed and scalable vector types. For
scalable vectors the index will be multiplied by VSCALE.
Fixes warnings in this test:
sve-sext-zext.ll
Differential revision: https://reviews.llvm.org/D83198
The refactoring encapsulates frequency calculation in MachineBlockFrequencyInfo,
and renames the API to clarify its motivation. It should clarify
frequencies may not be reset 'freely' by users of the analysis, as the
API serves as a partial update to avoid a full analysis recomputation.
Differential Revision: https://reviews.llvm.org/D84427
I think these were added as a workaround for SelectionDAG lacking half
legalization support in the past. I think they should probably be
removed from the IR, but clang does still have a target control to
emit these instead of the native half fpext/fptrunc.
The move constructor of MachineModuleInfo currently does not copy the
MachineFunctions map. This commit fixes this issue.
Patch by Sridhar Gopinath. Thanks!
Differential Revision: https://reviews.llvm.org/D84274
Speed up the method RequiresStackProtector by checking the intrinsic
value of the call. The original code calls getName() that returns an
allocating std::string on each check. This change removes about 96072
std::string instances when compiling sqlite3.c; The function was
discovered with a Facebook-internal performance tool.
Differential Revision: https://reviews.llvm.org/D84620
I have introduced a new TargetFrameLowering query function:
isStackIdSafeForLocalArea
that queries whether or not it is safe for objects of a given stack
id to be bundled into the local area. The default behaviour is to
always bundle regardless of the stack id, however for AArch64 this is
overriden so that it's only safe for fixed-size stack objects.
There is future work here to extend this algorithm for multiple local
areas so that SVE stack objects can be bundled together and accessed
from their own virtual base-pointer.
Differential Revision: https://reviews.llvm.org/D83859
Store Addr and Store Addr+8 are clusterable pair. They have memory(ctrl) dependency on different loads.
Current implementation will put these two stores into different group and miss to cluster them.
Reviewed By: evandro
Differential Revision: https://reviews.llvm.org/D84139
Summary:
In parallelizeChainedStores, a TokenFactor was created with the size greater than 3000.
We found that DAGCombiner::visitTokenFactor will consume a huge amount of time on
such nodes. Since the number of operands already exceeds TokenFactorInlineLimit, we propose
to give up simplification with the consideration of compile time.
Reviewers:
@spatel, @arsenm
Differential Revision:
https://reviews.llvm.org/D84204
(Disabled under flag for the moment)
This is part of a larger project wherein we are finally integrating lowering of gc live operands with the register allocator. Today, we force spill all operands in SelectionDAG. The code to do so is distinctly non-optimal. The approach this patch is working towards is to instead lower the relocations directly into the MI form, and let the register allocator pick which ones get spilled and which stack slots they get spilled to. In terms of performance, the later part is actually more important as it avoids redundant shuffling of values between stack slots.
This particular change adds ISEL support to produce the variadic def STATEPOINT form required by the above. In particular, the first N are lowered to variadic tied def/use pairs. So new statepoint looks like this:
reloc1,reloc2,... = STATEPOINT ..., base1, derived1<tied-def0>, base2, derived2<tied-def1>, ...
N is limited by the maximal number of tied registers machine instruction can have (15 at the moment).
The current patch is restricted to handling relocations within a single basic block. Cross block relocations (e.g. invokes) are handled via the legacy mechanism. This restriction will be relaxed in future patches.
Patch By: dantrushin
Differential Revision: https://reviews.llvm.org/D81648
Common up some existing MBB name printing logic into a single place.
Note that basic block dumping now prints the same set of attributes as
the MIRPrinter.
Change-Id: I8f022bbd922e831bc96d63143d7472c03282530b
Differential Revision: https://reviews.llvm.org/D83253
Emit DWARF 5 call-site symbols even though DWARF 4 is set,
only in the case of LLDB tuning.
This patch addresses PR46643.
Differential Revision: https://reviews.llvm.org/D83463
PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list.
This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.
SONY debugger does not prefer debug entry values feature, so
the plan is to avoid production of the entry values
by default when the tuning is SCE debugger.
The feature still can be enabled with the -debug-entry-values
option for the testing/development purposes.
This patch addresses PR46643.
Differential Revision: https://reviews.llvm.org/D83462
In the included test case the align 16 allowed the v23f32 load to handled as load v16f32, load v4f32, and load v4f32(one element not used). These loads all need to be concatenated together into a final vector. In this case we tried to concatenate the two v4f32 loads to match the type of the v16f32 load so we could do a second concat_vectors, but those loads alone only add up to v8f32. So we need to two v4f32 undefs to pad it.
It appears we've tried to hack around a similar issue in this code before by adding undef padding to loads in one of the earlier loops in this function. Originally in r147964 by padding all loads narrower than previous loads to the same size. Later modifed to only the last load in r293088. This patch removes that earlier code and just handles it on demand where we know we need it.
Fixes PR46820
Differential Revision: https://reviews.llvm.org/D84463
Widen or narrow a type to a type with the same scalar size as
another. This can be used to force G_PTR_ADD/G_PTRMASK's scalar
operand to match the bitwidth of the pointer type. Use this to
disallow narrower types for G_PTRMASK.
This adds the llvm.abs(), llvm.umin(), llvm.umax(), llvm.smin(),
and llvm.smax() intrinsics specified in D81829. For SelectionDAG,
the ISD opcodes and all the legalization and lowering already exist,
so this just wires them up to the intrinsic in the SDAG builder and
adds rudimentary tests. For GlobalISel only the min/max intrinsics
are wired up, as llvm.abs() will require the addition of a G_ABS op,
and corresponding legalization support.
Differential Revision: https://reviews.llvm.org/D84125
Clarify the relation between a block's BlockFrequency and the
getEntryFreq() API, and added an API for the relatively common case of
finding a block's frequency relative to the entrypoint.
Added / moved some comments to header.
Differential Revision: https://reviews.llvm.org/D84357
Add support in LegalizerHelper for lowering G_SADDSAT etc. either
using add/subtract-with-overflow or using max/min instructions.
Enable this lowering for AMDGPU so it can be tested. The legalization
rules are still approximate and skips out on using the clamp bit to
treat these as legal, which has never been used before. This also
doesn't yet try to deal with expanding SALU cases.
Summary: We do this already for output operands, but missed it for (non-tied) input operands.
Reviewers: arsenm, Petar.Avramovic
Reviewed By: arsenm
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, llvm-commits, kerbowa
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83763
On systems where size() doesn't return unsigned long, this leads to an
overloading mismatch. Convert the constant to whatever type is used for
Q.size() on the system.
Currently popFromQueueImpl iterates over all candidates to find the best
one. While the candidate queue is small, this is not a problem. But it
becomes a problem once the queue gets larger. For example, the snippet
below takes 330s to compile with llc -O0, but completes in 3s with this
patch.
define void @test(i4000000* %ptr) {
entry:
store i4000000 0, i4000000* %ptr, align 4
ret void
}
This patch limits the number of candidates to check to 1000. This limit
ensures that it never triggers for test-suite/SPEC2000/SPEC2006 on X86
and AArch64 with -O3, while still drastically limiting the compile-time
in case of very large queues.
It would be even better to use a binary heap to manage to queue
(D83335), but some heuristics change the score of a node in the queue
after another node has been scheduled. I plan to address this for
backends that use the MachineScheduler in the future, but that requires
a more careful evaluation. In the meantime, the limit should help users
impacted by this issue.
The patch includes a slightly smaller version of the motivating example
as test case, to guard against the issue.
Reviewers: efriedma, paquette, niravd
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84328
Test case `test/CodeGen/WebAssembly/stackified-debug.ll`
was failing due to malformed DwarfExpression.
This failure has been seen in lot of bots, for instance in:
http://lab.llvm.org:8011/builders/lld-x86_64-ubuntu-fast/builds/18794
: 'RUN: at line 1'
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/build/bin/llc
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/build/bin/FileCheck /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/test/CodeGen/WebAssembly/stackified-debug.ll
home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/test/CodeGen/WebAssembly/stackified-debug.ll:26:10: error: CHECK: expected string not found in input
CHECK: .int16 4 # Loc expr size
^
<stdin>:34:2: note: scanning from here
.int16 3 # Loc expr size
Differential Revision: https://reviews.llvm.org/D83560
This patch was reverted in 9d2da6759b due to assertion failure seen
in `test/DebugInfo/Sparc/subreg.ll`. Assertion failure was happening
due to malformed/unhandeled DwarfExpression.
Differential Revision: https://reviews.llvm.org/D83560
Summary:
llvm is missing support for DW_OP_implicit_value operation.
DW_OP_implicit_value op is indispensable for cases such as
optimized out long double variables.
For intro refer: DWARFv5 Spec Pg: 40 2.6.1.1.4 Implicit Location Descriptions
Consider the following example:
```
int main() {
long double ld = 3.14;
printf("dummy\n");
ld *= ld;
return 0;
}
```
when compiled with tunk `clang` as
`clang test.c -g -O1` produces following location description
of variable `ld`:
```
DW_AT_location (0x00000000:
[0x0000000000201691, 0x000000000020169b): DW_OP_constu 0xc8f5c28f5c28f800, DW_OP_stack_value, DW_OP_piece 0x8, DW_OP_constu 0x4000, DW_OP_stack_value, DW_OP_bit_piece 0x10 0x40, DW_OP_stack_value)
DW_AT_name ("ld")
```
Here one may notice that this representation is incorrect(DWARF4
stack could only hold integers(and only up to the size of address)).
Here the variable size itself is `128` bit.
GDB and LLDB confirms this:
```
(gdb) p ld
$1 = <invalid float value>
(lldb) frame variable ld
(long double) ld = <extracting data from value failed>
```
GCC represents/uses DW_OP_implicit_value in these sort of situations.
Based on the discussion with Jakub Jelinek regarding GCC's motivation
for using this, I concluded that DW_OP_implicit_value is most appropriate
in this case.
Link: https://gcc.gnu.org/pipermail/gcc/2020-July/233057.html
GDB seems happy after this patch:(LLDB doesn't have support
for DW_OP_implicit_value)
```
(gdb) p ld
p ld
$1 = 3.14000000000000012434
```
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83560
Summary: These should have half float as the element type
Reviewers: cameron.mcinally, efriedma, sdesmalen, paulwalker-arm
Reviewed By: paulwalker-arm
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D84211
This is an alternative proposal to D81476 (and D82084) - the details were sufficiently confusing to me it seemed easier to write some code and see how it looks.
Reviewers: SouraVX
Differential Revision: https://reviews.llvm.org/D84278
This was structured in a way that implied every split argument is in
memory, or in registers. It is possible to pass an original argument
partially in registers, and partially in memory. Transpose the logic
here to only consider a single piece at a time. Every individual
CCValAssign should be treated independently, and any merge to original
value needs to be handled later.
This is in preparation for merging some preprocessing hacks in the
AMDGPU calling convention lowering into the generic code.
I'm also not sure what the correct behavior for memlocs where the
promoted size is larger than the original value. I've opted to clamp
the memory access size to not exceed the value register to avoid the
explicit trunc/extend/vector widen/vector extract instruction. This
happens for AMDGPU for i8 arguments that end up stack passed, which
are promoted to i16 (I think this is a preexisting DAG bug though, and
they should not really be promoted when in memory).
Summary:
AIX assembly's .set directive is not usable for aliasing purpose.
We need to use extra-label-at-defintion strategy to generate symbol
aliasing on AIX.
Reviewed By: DiggerLin, Xiangling_L
Differential Revision: https://reviews.llvm.org/D83252
Summary:
This patch reduces file size in debug builds by dropping variable locations a
debugger user will not see.
After building the debug entity history map we loop through it. For each
variable we look at each entry. If the entry opens a location range which does
not intersect any of the variable's scope's ranges then we mark it for removal.
After visiting the entries for each variable we also mark any clobbering
entries which will no longer be referenced for removal, and then finally erase
the marked entries. This all requires the ability to query the order of
instructions, so before this runs we number them.
Tests:
Added llvm/test/DebugInfo/X86/trim-var-locs.mir
Modified llvm/test/DebugInfo/COFF/register-variables.ll
Branch folding merges the tails of if.then and if.else into if.else. Each
blocks' debug-locations point to different scopes so when they're merged we
can't use either. Because of this the variable 'c' ends up with a location
range which doesn't cover any instructions in its scope; with the patch
applied the location range is dropped and its flag changes to IsOptimizedOut.
Modified llvm/test/DebugInfo/X86/live-debug-variables.ll
Modified llvm/test/DebugInfo/ARM/PR26163.ll
In both tests an out of scope location is now removed. The remaining location
covers the entire scope of the variable allowing us to emit it as a single
location.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D82129
There are a few questionable things about this intrinsic and existing
DAG implementation. For some reason the intrinsic hardcodes the second
operand to be scalar-only i32, and SelectionDAG builder makes a
legalization decision based on whether the operand is constant.
The AMDGPU handling of f16 vectors is terrible still since it gets
scalarized even when the vector operation is legal.
The code is is essentially duplicated between the non-strict and
strict case. Apparently no other expansions are currently trying to do
this. This is mostly because I found the behavior of
getStrictFPOperationAction to be confusing. In the ARM case, it would
expand strict_fsub even though it shouldn't due to the later check. At
that point, the logic required to check for legality was more complex
than just duplicating the 2 instruction expansion.
Current tail duplication in machine block placement pass uses block frequency
information in cost model. But frequency number has only relative meaning
compared to other basic blocks in the same function. A large frequency number
doesn't mean it is hot and a small frequency number doesn't mean it is cold.
To overcome this problem, this patch uses profile count in cost model if it's
available. So we can tail duplicate real hot basic blocks.
Differential Revision: https://reviews.llvm.org/D83265
Try to make the behavior more consistent with getGCDType, and bias
towards returning something closer to the source type whenever there's
an ambiguity.
Try harder to find a canonical unmerge type when trying to cover the
desired target type. Handle finding a compatible unmerge type for two
vectors with different element types. This will return the largest
multiple of the source vector element that will evenly divide the
target vector type.
Also make the handling mixing scalars and vectors, and prefer the
source element type as the unmerge target type.
This isn't a natively supported operation, so convert it to a
mask+compare.
In addition to the operation itself, fix up some surrounding stuff to
make the testcase work: we need concat_vectors on i1 vectors, we need
legalization of i1 vector truncates, and we need to fix up all the
relevant uses of getVectorNumElements().
Differential Revision: https://reviews.llvm.org/D83811
Its effect could be achieved by
`-stop-after`,`-print-after`,`-print-after-all`. But a few tests need to
print MIR after ISel which could not be done with
`-print-after`/`-stop-after` since isel pass does not have commandline name.
That's the reason `--print-machineinstrs` is downgraded to
`--print-after-isel` in this patch. `--print-after-isel` could be
removed after we switch to new pass manager since isel pass would have a
commandline text name to use `print-after` or equivalent switches.
The motivation of this patch is to reduce tests dependency on
would-be-deprecated feature.
Reviewed By: arsenm, dsanders
Differential Revision: https://reviews.llvm.org/D83275
Summary:
This support is needed for the Fortran array variables with pointer/allocatable
attribute. This support enables debugger to identify the status of variable
whether that is currently allocated/associated.
for pointer array (before allocation/association)
without DW_AT_associated
(gdb) pt ptr
type = integer (140737345375288:140737354129776)
(gdb) p ptr
value requires 35017956 bytes, which is more than max-value-size
with DW_AT_associated
(gdb) pt ptr
type = integer (:)
(gdb) p ptr
$1 = <not associated>
for allocatable array (before allocation)
without DW_AT_allocated
(gdb) pt arr
type = integer (140737345375288:140737354129776)
(gdb) p arr
value requires 35017956 bytes, which is more than max-value-size
with DW_AT_allocated
(gdb) pt arr
type = integer, allocatable (:)
(gdb) p arr
$1 = <not allocated>
Testing
- unit test cases added
- check-llvm
- check-debuginfo
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83544
Add narrowScalarFor action.
Add narrow scalar for typeIndex == 0 for G_FPTOSI/G_FPTOUI.
Legalize using narrowScalarFor as s16->s32 G_FPTOSI/G_FPTOUI
followed by s32->s64 G_SEXT/G_ZEXT.
Differential Revision: https://reviews.llvm.org/D84010
There were cases where a do-while loop would be converted to a while
loop before finding out that it would be unsafe to expand the SCEV in
this situation and then bailing out of hardware loop conversion.
This patch checks if it would be unsafe to expand the SCEV and if so stops converting the do-while into a while, allowing conversion to a hardware loop.
Differential Revision: https://reviews.llvm.org/D83953
tryLatency compares two sched candidates. For the top zone it prefers
the one with lesser depth, but only if that depth is greater than the
total latency of the instructions we've already scheduled -- otherwise
its latency would be hidden and there would be no stall.
Unfortunately it only tests the depth of one of the candidates. This can
lead to situations where the TopDepthReduce heuristic does not kick in,
but a lower priority heuristic chooses the other candidate, whose depth
*is* greater than the already scheduled latency, which causes a stall.
The fix is to apply the heuristic if the depth of *either* candidate is
greater than the already scheduled latency.
All this also applies to the BotHeightReduce heuristic in the bottom
zone.
Differential Revision: https://reviews.llvm.org/D72392
MBBs are not allowed to have non-terminator instructions after the first
terminator. Currently in some cases (see the modified test),
EmitSchedule can add DBG_VALUEs after the last terminator, for example
when referring a debug value that gets folded into a TCRETURN
instruction on ARM.
This patch updates EmitSchedule to move inserted DBG_VALUEs just before
the first terminator. I am not sure if there are terminators produce
values that can in turn be used by a DBG_VALUE. In that case, moving the
DBG_VALUE might result in referencing an undefined register. But in any
case, it seems like currently there is no way to insert a proper DBG_VALUEs
for such registers anyways.
Alternatively it might make sense to just remove those extra DBG_VALUES.
I am not too familiar with the details of debug info in the backend and
would appreciate any suggestions on how to address the issue in the best
possible way.
Reviewers: vsk, aprantl, jpaquette, efriedma, paquette
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83561
For now, DIEExpr is used only in two places:
1) in the debug info library unit test suite to emit
a DW_AT_str_offsets_base attribute with the DW_FORM_sec_offset
form, see dwarfgen::DIE::addStrOffsetsBaseAttribute();
2) in DwarfCompileUnit::addLocationAttribute() to generate the location
attribute for a TLS variable.
The later case used an incorrect DWARF form of DW_FORM_udata, which
implies storing an uleb128 value, not a 4/8 byte constant. The generated
result was as expected because DIEExpr::SizeOf() did not handle the used
form, but returned the size of the code pointer by default.
The patch fixes the issue by using more appropriate DWARF forms for
the problematic case and making DIEExpr::SizeOf() more straightforward.
Differential Revision: https://reviews.llvm.org/D83958
Replace std::vector with SmallVector to reduce the number of mallocs.
This method is frequently executed, and the number of elements in the
vector is typically small.
https://reviews.llvm.org/D83920
In an upcoming AMDGPU patch, the scalar cases will be legal and vector
ops should be scalarized, rather than producing a long sequence of
vector ops which will also need to be scalarized.
Use a lazy heuristic that seems to work and improves the thumb2 MVE
test.
Basic support for variadic-def MIR Statepoint:
- Change TableGen STATEPOINT description to variadic out list
(For self-documentation purpose; by itself it does not affect
code generation in any way).
- Update StatepointOpers helper class to handle variadic defs.
- Update MachineVerifier to properly handle them, too.
With this change, new Statepoint instruction can be passed through
backend (excluding ISEL) without errors.
Full change set is available at D81603.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D81645
When the byref attribute is added, there will need to be two similar
functions for the existing cases which have an associate value copy,
and byref which does not. Most, but not all of the existing uses will
use the existing version.
The associated size function added by D82679 also needs to
contextually differ, and will help eliminate a few places still
relying on pointee element types.
Add widenScalar for TypeIdx == 0 for G_SITOFP/G_UITOFP.
Legailize, using widenScalar, as s64->s32 G_SITOFP/G_UITOFP
followed by s32->s16 G_FPTRUNC.
Differential Revision: https://reviews.llvm.org/D83880
This function has a bug which will incorrectly reschedule instructions
after an INLINEASM_BR (which can branch). (The bug may also allow
scheduling past a throwing-CALL, I'm not certain.)
I could fix that bug, but, as the removed FIXME notes, it's better to
attempt rescheduling before converting to 3-addr form, as that may
remove the need to convert in the first place. In fact, the code to do
such reordering was added to this pass only a few months later, in
2011, via the addition of the function rescheduleMIBelowKill. That
code does not contain the same bug.
The removal of the sink3AddrInstruction function is not a no-op: in
some cases it would move an instruction post-conversion, when
rescheduleMIBelowKill would not move the instruction pre-converison.
However, this does not appear to be important: the machine instruction
scheduler can reorder the after-conversion instructions, in any case.
This patch fixes a kernel panic 4.4 LTS x86_64 Linux kernels, when
built with clang after 4b0aa5724f.
Link: https://github.com/ClangBuiltLinux/linux/issues/1085
Differential Revision: https://reviews.llvm.org/D83708
Summary:
This patch modifies IncrementMemoryAddress to use a vscale
when calculating the new address if the data type is scalable.
Also adds tablegen patterns which match an extract_subvector
of a legal predicate type with zip1/zip2 instructions
Reviewers: sdesmalen, efriedma, david-arm
Reviewed By: efriedma, david-arm
Subscribers: tschuett, hiraditya, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83137
When we calculate the weight of a live-interval, add some code to
check if the original live-interval was markied as not spillable and
if so, progagate that information down to the new interval.
Previously we would just recompute a weight for the new interval,
thus, we could in theory just spill live-intervals marked as not
spillable by just splitting them. That goes against the spirit of
a non-spillable live-interval.
E.g., previously we could do:
v1 = // v1 must not be spilled
...
= v1
Split:
v1 = // v1 must not be spilled
...
v2 = v1 // v2 can be spilled
...
v3 = v2 // v3 can be spilled
= v3
There's no test case for that one as we would need to split a
non-spillable live-interval without using LiveRangeEdit to see this
happening.
RegAlloc inserts non-spillable intervals only as part of the spilling
mechanism, thus at this point the intervals are not splittable anymore.
On top of that, RegAlloc uses the LiveRangeEdit API, which already
properly propagate that information.
In other words, this could only happen if a target was to mark
a live-interval as not spillable before register allocation and
split it without using LRE, e.g., through
LiveIntervals::splitSeparateComponent.
The operands of a BUILD_VECTOR must all have the same type, so we can hoist this invariant condition out of the loop.
Differential Revision: https://reviews.llvm.org/D83882
CodeGenPrepare keeps fairly close track of various instructions it's
seen, particularly GEPs, in maps and vectors. However, sometimes those
instructions become dead and get removed while it's still executing.
This triggers AssertingVH references to them in an asserts build and
could lead to miscompiles in a release build (I've only seen a later
segfault though).
So this patch adds a callback to
RecursivelyDeleteTriviallyDeadInstructions which can make sure the
instruction about to be deleted is removed from CodeGenPrepare's data
structures.
Some of the system registers readable on AArch64 and ARM platforms
return different values with each read (for example a timer counter),
these shouldn't be hoisted outside loops or otherwise interfered with,
but the normal @llvm.read_register intrinsic is only considered to read
memory.
This introduces a separate @llvm.read_volatile_register intrinsic and
maps all system-registers on ARM platforms to use it for the
__builtin_arm_rsr calls. Registers declared with asm("r9") or similar
are unaffected.
The existing code already considered this case. Unfortunately a typo in
the condition prevents it from triggering. Also the existing code, had
it run, forgot to do the folding.
This fixes PR42876.
Differential Revision: https://reviews.llvm.org/D65802
This patch handles CFI with basic block sections, which unlike DebugInfo does
not support ranges. The DWARF standard explicitly requires emitting separate
CFI Frame Descriptor Entries for each contiguous fragment of a function. Thus,
the CFI information for all callee-saved registers (possibly including the
frame pointer, if necessary) have to be emitted along with redefining the
Call Frame Address (CFA), viz. where the current frame starts.
CFI directives are emitted in FDE’s in the object file with a low_pc, high_pc
specification. So, a single FDE must point to a contiguous code region unlike
debug info which has the support for ranges. This is what complicates CFI for
basic block sections.
Now, what happens when we start placing individual basic blocks in unique
sections:
* Basic block sections allow the linker to randomly reorder basic blocks in the
address space such that a given basic block can become non-contiguous with the
original function.
* The different basic block sections can no longer share the cfi_startproc and
cfi_endproc directives. So, each basic block section should emit this
independently.
* Each (cfi_startproc, cfi_endproc) directive will result in a new FDE that
caters to that basic block section.
* Now, this basic block section needs to duplicate the information from the
entry block to compute the CFA as it is an independent entity. It cannot refer
to the FDE of the original function and hence must duplicate all the stuff that
is needed to compute the CFA on its own.
* We are working on a de-duplication patch that can share common information in
FDEs in a CIE (Common Information Entry) and we will present this as a follow up
patch. This can significantly reduce the duplication overhead and is
particularly useful when several basic block sections are created.
* The CFI directives are emitted similarly for registers that are pushed onto
the stack, like callee saved registers in the prologue. There are cfi
directives that emit how to retrieve the value of the register at that point
when the push happened. This has to be duplicated too in a basic block that is
floated as a separate section.
Differential Revision: https://reviews.llvm.org/D79978
This fixes warnings raised by Clang's new -Wsuggest-override, in preparation for enabling that warning in the LLVM build. This patch also removes the virtual keyword where redundant, but only in places where doing so improves consistency within a given file. It also removes a couple unnecessary virtual destructor declarations in derived classes where the destructor inherited from the base class is already virtual.
Differential Revision: https://reviews.llvm.org/D83709
ComputeNumSignBits and computeKnownBits both trigger "Scalable flag
may be dropped" warnings when a fixed length vector is extracted
from a scalable vector. This patch assumes nothing about the
demanded elements thus matching the behaviour when extracting a
scalable vector from a scalable vector.
Differential Revision: https://reviews.llvm.org/D83642
In DAGCombiner::TransformFPLoadStorePair we were dropping the scalable
property of TypeSize when trying to create an integer type of equivalent
size. In fact, this optimisation makes no sense for scalable types
since we don't know the size at compile time. I have changed the code
to bail out when encountering scalable type sizes.
I've added a test to
llvm/test/CodeGen/AArch64/sve-fp.ll
that exercises this code path. The test already emits an error if it
encounters warnings due to implicit TypeSize->uint64_t conversions.
Differential Revision: https://reviews.llvm.org/D83572
Caused by uninitialized load of llvm::DwarfDebug::PrevCU:
llvm::DwarfCompileUnit::addRange () at ../lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:276
llvm::DwarfDebug::endFunctionImpl () at ../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:1586
llvm::DebugHandlerBase::endFunction () at ../lib/CodeGen/AsmPrinter/DebugHandlerBase.cpp:319
llvm::AsmPrinter::EmitFunctionBody () at ../lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1230
llvm::ARMAsmPrinter::runOnMachineFunction () at ../lib/Target/ARM/ARMAsmPrinter.cpp:161
Most of the DebugInfo tests under `LLVM_LIT_ARGS:STRING=-sv --vg` prior to this fix, and pass with the fix applied.
Reviewed By: aprantl, dblaikie
Differential Revision: https://reviews.llvm.org/D81631
We have this generic transform in IR (instcombine),
but as shown in PR41098:
http://bugs.llvm.org/PR41098
...the pattern may emerge in codegen too.
x86 has a potential refinement/reversal opportunity here,
but that should come later or needs a target hook to
avoid the transform. Converting to bswap is the more
specific form, so we should use it if it is available.
This carves out an exception for a pair of consecutive loads that are
reversed from the consecutive order of a pair of stores. All of the
existing profitability/legality checks for the memops remain between
the 2 altered hunks of code.
This should give us the same x86 base-case asm that gcc gets in
PR41098 and PR44895:
http://bugs.llvm.org/PR41098http://bugs.llvm.org/PR44895
I think we are missing a potential subsequent conversion to use "movbe"
if the target supports that. That might be similar to what AArch64
would use to get "rev16".
Differential Revision: https://reviews.llvm.org/D83567
This carves out an exception for a pair of consecutive loads that are
reversed from the consecutive order of a pair of stores. All of the
existing profitability/legality checks for the memops remain between
the 2 altered hunks of code.
This should give us the same x86 base-case asm that gcc gets in
PR41098 and PR44895:i
http://bugs.llvm.org/PR41098http://bugs.llvm.org/PR44895
I think we are missing a potential subsequent conversion to use "movbe"
if the target supports that. That might be similar to what AArch64
would use to get "rev16".
Differential Revision:
Summary:
Helper used when splitting load & store operations to calculate
the pointer + offset for the high half of the split
Reviewers: efriedma, sdesmalen, david-arm
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83577
Check that input size matches size of destination reg class.
Attempt to extend input size when needed.
Differential Revision: https://reviews.llvm.org/D83384
fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)
This is only allowed when "reassoc" is present on the fadd.
As discussed in D80801, this transform goes beyond
what is allowed by "contract" FMF (-ffp-contract=fast).
That is because we are fusing the trailing add of 'E' with a
multiply, but without "reassoc", the code mandates that the
products A*B and C*D are added together before adding in 'E'.
I've added this example to the LangRef to try to clarify the
meaning of "contract". If that seems reasonable, we should
probably do something similar for the clang docs because
there does not appear to be any formal spec for the behavior
of -ffp-contract=fast.
Differential Revision: https://reviews.llvm.org/D82499
This patch adds some missing information to the LF_BUILDINFO which allows for rebuilding an .OBJ without any external dependency but the .OBJ itself (other than the compiler executable).
Some tools need this information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO therefore stores a full path to the compiler, the PWD (which is the CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variable). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.
For more information see PR36198 and D43002.
Differential Revision: https://reviews.llvm.org/D80833
In DAGTypeLegalizer::SetSplitVector I have changed calls in the assert
from getVectorNumElements() to getVectorElementCount(), since this
code path works for both fixed and scalable vectors.
This fixes up one warning in the test:
sve-sext-zext.ll
Differential Revision: https://reviews.llvm.org/D83196
This patch replaces some invalid calls to getVectorNumElements() with calls
to getVectorMinNumElements() instead, since the code paths changed in this
patch work for both fixed and scalable vector types.
Fixes warnings in this test:
sve-sext-zext.ll
Differential Revision: https://reviews.llvm.org/D83203
Since the `RISCVExpandPseudo` pass has been split from
`RISCVExpandAtomicPseudo` pass, it would be nice to run the former as
early as possible (The latter has to be run as late as possible to
ensure correctness). Running earlier means we can reschedule these pairs
as we see fit.
Running earlier in the machine pass pipeline is good, but would mean
teaching many more passes about `hasLabelMustBeEmitted`. Splitting the
basic blocks also pessimises possible optimisations because some
optimisations are MBB-local, and others are disabled if the block has
its address taken (which is notionally what `hasLabelMustBeEmitted`
means).
This patch uses a new approach of setting the pre-instruction symbol on
the AUIPC instruction to a temporary symbol and referencing that. This
avoids splitting the basic block, but allows us to reference exactly the
instruction that we need to. Notionally, this approach seems more
correct because we do actually want to address a specific instruction.
This then allows the pass to be moved much earlier in the pass pipeline,
before both scheduling and register allocation. However, to do so we
must leave the MIR in SSA form (by not redefining registers), and so use
a virtual register for the intermediate value. By using this virtual
register, this pass now has to come before register allocation.
Reviewed By: luismarques, asb
Differential Revision: https://reviews.llvm.org/D82988
Summary:
When legalizing a biscast operation from an fp16 operand to an i16 on a
target that requires both input and output types to be promoted to
32-bits, an assertion can fail when building the new node due to a
mismatch between the the operation's result size and the type specified to
the node.
This patches fix the issue by making sure the bit width of the types
match for the FP_TO_FP16 node, covering the difference with an extra
ANYEXTEND operation.
Reviewers: ostannard, efriedma, pirama, jmolloy, plotfi
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82552
This should be a typo introduced in D69275, which may cause an unknown
segment fault in getNode.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D83376
9cac4e6d1403554b06ec2fc9d834087b1234b695/D32628 intended to eliminate
this, and move all isel pseudo expansion to FinalizeISel. This was a
bad rebase or something, and failed to actually delete this call.
GlobalISel also has a redundant call of finalizeLowering. However, it
requires more work to remove it since it currently triggers a lot of
verifier errors in tests.
It looks like 9cac4e6d14 accidentally
added a second copy of this from a bad rebase or something. This
second copy was added, and the finalizeLowering call was not deleted
as intended.
Updated the AArch64 tests the best I could with my vague, inferred
understanding of AArch64 register banks. As far as I can tell, there
is only one 32-bit/64-bit type which will use the gpr register bank,
so we have to use the fpr bank for the other operand.
This removes existing code duplication and allows us to
assert that we are handling the expected cases.
We have a list of outstanding bugs that could benefit by
handling truncated source values, so that's a possible
addition going forward.
ExpandVectorBuildThroughStack is also used for CONCAT_VECTORS.
However, when calculating the offsets for each of the operands we
incorrectly use the element size rather than actual size and thus
the stores overlap.
Differential Revision: https://reviews.llvm.org/D83303
Summary:
The following combine currently breaks in the DAGCombiner:
```
extract_vector_elt (concat_vectors v4i16:a, v4i16:b), x
-> extract_vector_elt a, x
```
This happens because after we have combined these nodes we have inserted nodes
that use individual instances of the vector element type. In the above example
i16. However this isn't a legal type on all backends, and when the combining pass calls
the legalizer it breaks as it expects types to already be legal. The type legalizer has
already been run, and running it again would make a mess of the nodes.
In the example code at least, the generated code is still efficient after the change.
Reviewers: miyuki, arsenm, dmgreen, lebedev.ri
Reviewed By: miyuki, lebedev.ri
Subscribers: lebedev.ri, wdng, hiraditya, steven.zhang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83231
Occasionally we see absolutely massive basic blocks, typically in global
constructors that are vulnerable to heavy inlining. When these blocks are
dense with DBG_VALUE instructions, we can hit near quadratic complexity in
DwarfDebug's validThroughout function. The problem is caused by:
* validThroughout having to step through all instructions in the block to
examine their lexical scope,
* and a high proportion of instructions in that block being DBG_VALUEs
for a unique variable fragment,
Leading to us stepping through every instruction in the block, for (nearly)
each instruction in the block.
By adding this guard, we force variables in large blocks to use a location
list rather than a single-location expression, as shown in the added test.
This shouldn't change the meaning of the output DWARF at all: instead we
use a less efficient DWARF encoding to avoid a poor-performance code path.
Differential Revision: https://reviews.llvm.org/D83236
In DAGTypeLegalizer::SplitVecRes_ExtendOp I have replaced an invalid
call to getVectorNumElements() with a call to getVectorMinNumElements(),
since the code path works for both fixed and scalable vectors.
This fixes up a warning in the following test:
sve-sext-zext.ll
Differential Revision: https://reviews.llvm.org/D83197
Calling getVectorNumElements() is not safe for scalable vectors and we
should normally use getVectorElementCount() instead. However, for the
code changed in this patch I decided to simply move the instantiation of
the variable 'OutNumElems' lower down to the place where only fixed-width
vectors are used, and hence it is safe to call getVectorNumElements().
Fixes up one warning in this test:
sve-sext-zext.ll
Differential Revision: https://reviews.llvm.org/D83195
For the GetElementPtr case in function
AddressingModeMatcher::matchOperationAddr
I've changed the code to use the TypeSize class instead of relying
upon the implicit conversion to a uint64_t. As part of this we now
check for scalable types and if we encounter one just bail out for
now as the subsequent optimisations doesn't currently support them.
This changes fixes up all warnings in the following tests:
llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll
llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll
Differential Revision: https://reviews.llvm.org/D83124
`__stack_chk_fail` does not return, but `unreachable` was not generated
following `call __stack_chk_fail`. This had a possibility to generate an
invalid binary for functions with a return type, because
`__stack_chk_fail`'s return type is void and `call __stack_chk_fail` can
be the last instruction in the function whose return type is non-void.
Generating `unreachable` after it makes sure CFGStackify's
`fixEndsAtEndOfFunction` handles it correctly.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D83277
handleAssignments was assuming every argument type is an MVT, and
assignArg would always fail. This fixes one of the hacks in the
current AMDGPU calling convention code that pre-processes the
arguments.
This is inspired by D81648. The basic idea is to have the set of SDValues which are lowered as either constants or direct frame references explicit in one place, and to separate them clearly from the spilling logic.
This is not NFC in that the handling of constants larger than > 64 bit has changed. The old lowering would crash on values which could not be encoded as a sign extended 64 bit value. The new lowering just spills all constants > 64 bits. We could be consistent about doing the sext(Con64) optimization, but I happen to know that this code path is utterly unexercised in practice, so simple is better for now.
handleMoveDown or handleMoveUp cannot properly repair a main
range of a LiveInterval since they only get LiveRange. There
is a problem if certain use has moved few segments away and
there is a hole in the main range in between of these two
locations. We may get a SubRange with a very extended Segment
spanning several Segments of the main range and also spanning
that hole. If that happens then we end up with the main range
not covering its SubRange which is an error.
It might be possible to attempt fixing the main range in place
just between of the old and new index by extending all of its
Segments in between, but it is unclear this logic will be
faster than just straight constructMainRangeFromSubranges,
which itself is pretty cheap since it only contains interval
logic. That will also require shrinkToUses() call after which
is probably even more expensive.
In the test second move is from 64B to 92B for the sub1.
Subrange is correctly fixed:
L000000000000000C [16r,32B:0)[32B,92r:1) 0@16r 1@32B-phi
But the main range has a hole in between 80d and 88r after
updateRange():
%1 [16r,32B:0)[32B,80r:4)[80r,80d:3)[88r,96r:1)[96r,160B:2)
Since source position is 64B this segment is not even considered
by the updateRange().
Differential Revision: https://reviews.llvm.org/D82916
Summary:
When splitting a store of a scalable type, the new address is
calculated in SplitVecOp_STORE using a vscale and an add instruction.
Reviewers: sdesmalen, efriedma, david-arm
Reviewed By: david-arm
Subscribers: tschuett, hiraditya, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83041
Summary:
When splitting a load of a scalable type, the new address is
calculated in SplitVecRes_LOAD using a vscale and an add instruction.
This patch also adds a DAG combiner fold to visitADD for vscale:
- Fold (add (vscale(C0)), (vscale(C1))) to (add (vscale(C0 + C1)))
Reviewers: sdesmalen, efriedma, david-arm
Reviewed By: david-arm
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82792
In an earlier commit 584d0d5c17 I
added functionality to allow AArch64 CodeGen support for falling
back to DAG ISel when Global ISel encounters scalable vector
types. However, it seems that we were not falling back early
enough as llvm::getLLTForType was still being invoked for scalable
vector types.
I've added a new fallback function to the call lowering class in
order to catch this problem early enough, rather than wait for
lowerFormalArguments to reject scalable vector types.
Differential Revision: https://reviews.llvm.org/D82524
This patch fixes all remaining warnings in:
llvm/test/CodeGen/AArch64/sve-trunc.ll
llvm/test/CodeGen/AArch64/sve-vector-splat.ll
I hit some warnings related to getCopyPartsToVector. I fixed two
issues:
1. In widenVectorToPartType() we assumed that we'd always be
using BUILD_VECTOR nodes to expand from one vector type to another,
which is incorrect for scalable vector types. I've fixed this for now
by simply bailing out immediately for scalable vectors.
2. In getCopyToPartsVector() I've changed the code to compare
the element counts of different types.
Differential Revision: https://reviews.llvm.org/D83028
X / (fabs(A) * sqrt(Z)) --> X / sqrt(A*A*Z) --> X * rsqrt(A*A*Z)
In the motivating case from PR46406:
https://bugs.llvm.org/show_bug.cgi?id=46406
...this is restoring the sequence that was originally in the source code.
We extracted a term from within the sqrt because we do not know in
instcombine whether a target will expand a sqrt call.
Note: we could say that the transform in IR should be restricted, but
that would not solve the problem if the source was originally in the
pattern shown here.
This is a gray area for fast-math-flag requirements. I think we should at
least check fast-math-flags on the fdiv and fmul because I view this
transform as 2 pieces: reassociate the fmul operands and form reciprocal
from the fdiv (as with the existing transform). We could argue that the
sqrt also needs FMF, but that was not required before, so we should change
that in a follow-up patch if that seems better.
We don't currently have a way to check that the target will produce a sqrt
or recip estimate without actually creating nodes (the APIs are SDValue
getSqrtEstimate() and SDValue getRecipEstimate()), so we clean up
speculatively created nodes if we are not able to create an estimate.
The x86 test with doubles verifies that we are not changing a test with
no estimate sequence.
Differential Revision: https://reviews.llvm.org/D82716
Summary:
Avoid exposing details about how children are stored. This will enable
subsequent type-erasure changes.
New methods are introduced to cover common access patterns.
Change-Id: Idb5f4b1b9c84e4cc71ddb39bb52a388682f5674f
Reviewers: arsenm, RKSimon, mehdi_amini, courbet
Subscribers: qcolombet, sdardis, wdng, hiraditya, jrtc27, zzheng, atanasyan, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83083
Summary:
When a desired symbol name contains invalid character that the
system assembler could not process, we need to emit .rename
directive in assembly path in order for that desired symbol name
to appear in the symbol table.
Reviewed By: hubert.reinterpretcast, DiggerLin, daltenty, Xiangling_L
Differential Revision: https://reviews.llvm.org/D82481
This matches the DAG behavior where this is called after the loop
checking for calls. The AMDGPU implementation depends on knowing if
there are calls in the function or not, so move this later.
Another problem is finalizeLowering is actually called twice; I was
seeing weird inconsistencies since the first call would produce
unexpected results and the second run would correct them in some
contexts. Since this requires disabling the verifier, and it's useful
to serialize the MIR immediately after selection, FinalizeISel should
probably not be a real pass.
Use a simpler code sequence when the shift amount is known not to be
zero modulo the bit width.
Nothing much uses this until D77152 changes the translation of fshl and
fshr intrinsics.
Differential Revision: https://reviews.llvm.org/D82540
Using a negation instead of a subtraction from a constant can save an
instruction on some targets.
Nothing much uses this until D77152 changes the translation of fshl and
fshr intrinsics.
Differential Revision: https://reviews.llvm.org/D82539
We need to ensure that the sign bits of the result all match
so we can't fold to undef.
Similar to PR46585.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D83163
zext_vector_inreg needs to produces 0s in the extended bits and
sext_vector_inreg needs to produce upper bits that are all the
same. So we should fold them to a 0 vector instead of undef.
Fixes PR46585.
Currently matchBinOpReduction only handles shufflevector reduction patterns, but in many cases these only occur in the final stages of a reduction, once we're down to legal vector widths.
Before this its likely that we are performing reductions using subvector extractions to repeatedly split the source vector in half and perform the binop on the halves.
Assuming we've found a non-partial reduction, this patch continues looking for subvector reductions as far as it can beyond the last shufflevector.
Fixes PR37890
Given a loop with two subloops, it should be possible for both to be
converted to hardware loops. That's what this patch does, simply enough.
It slightly alters the loop iterating order to try and convert all
subloops. If one (or more) succeeds, it stops as before.
Differential Revision: https://reviews.llvm.org/D78502
SelectionDAGBuilder converts logic-of-compares into multiple branches based
on a boolean TLI setting in isJumpExpensive(). But that probably never
considered the pattern of extracted bools from a vector compare - it seems
unlikely that we would want to turn vector logic into control-flow.
The motivating x86 reduction case is shown in PR44565:
https://bugs.llvm.org/show_bug.cgi?id=44565
...and that test shows the expected improvement from using pmovmsk codegen.
For AArch64, I modified the test to include an extra op because the simpler
test gets transformed by a codegen invocation of SimplifyCFG.
Differential Revision: https://reviews.llvm.org/D82602
There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics,
that shouldn't really have been there (I suspect this was a remnant from when
we expected the wider vector always to have come from a vector CONCAT).
When I tried to create a more minimal reproducer, I found a bug in
DAGCombiner where it drops the scalable flag when trying to fold:
extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')
This patch fixes both issues.
Reviewers: david-arm, efriedma, spatel
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82910
Whilst trying to assemble the following test:
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_set2.c
I discovered we were hitting some warnings about possible invalid
calls to getVectorNumElements() in getCopyToPartsVector(). I've
tried to fix these by using ElementCount types where possible and
I've made the assumption that we don't support using a fixed width
vector to copy parts of a scalable vector, and vice versa. Looking
at how the copy is implemented I think that's the right thing for
now.
Differential Revision: https://reviews.llvm.org/D82744
This patch uses ranges for debug information when a function contains basic block sections rather than using [lowpc, highpc]. This is also the first in a series of patches for debug info and does not contain the support for linker relaxation. That will be done as a follow up patch.
Differential Revision: https://reviews.llvm.org/D78851
The caller can't handle the node having multiple results like a
masked load does. So we need to detect the case and do our own
result replacement.
Fixes PR46532.
In visitSCALAR_TO_VECTOR we try to optimise cases such as:
scalar_to_vector (extract_vector_elt %x)
into vector shuffles of %x. However, it led to numerous warnings
when %x is a scalable vector type, so for now I've changed the
code to only perform the combination on fixed length vectors.
Although we probably could change the code to work with scalable
vectors in certain cases, without a proper profit analysis it
doesn't seem worth it at the moment.
This change fixes up one of the warnings in:
llvm/test/CodeGen/AArch64/sve-merging-stores.ll
I've also added a simplified version of the same test to:
llvm/test/CodeGen/AArch64/sve-fp.ll
which already has checks for no warnings.
Differential Revision: https://reviews.llvm.org/D82872
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.
Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.
Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.
Reviewed By: nickdesaulniers, void
Differential Revision: https://reviews.llvm.org/D79794
This prevents the outlined functions from pulling in a lot of unnecessary code
in our downstream libraries/linker. Which stops outlining making codesize
worse in c++ code with no-exceptions.
Differential Revision: https://reviews.llvm.org/D57254
As per documentation of `hasPairLoad`:
"`RequiredAlignment` gives the minimal alignment constraints that must be met to be able to select this paired load."
In this sense, `0` is strictly equivalent to `1`. We make this obvious by using `Align` instead of unsigned.
There is only one implementor of this interface.
Differential Revision: https://reviews.llvm.org/D82958
It's perfectly valid to do certain DAG combines where we extract
subvectors from a concat vector when we have scalable vector types.
However, we can do this in a way that avoids generating compiler
warnings by replacing calls to getVectorNumElements() with
getVectorMinNumElements(). Due to the way subvector extracts are
designed to work with scalable vector types this is ok.
This eliminates some warnings from existing tests in this file:
llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
Differential Revision: https://reviews.llvm.org/D82655
Summary:
This is a fix for PR45009.
When working on D67492 I made DwarfExpression emit a single
DW_OP_entry_value operation covering the whole composite location
description that is produced if a register does not have a valid DWARF
number, and is instead composed of multiple register pieces. Looking
closer at the standard, this appears to not be valid DWARF. A
DW_OP_entry_value operation's block can only be a DWARF expression or a
register location description, so it appears to not be valid for it to
hold a composite location description like that.
See DWARFv5 sec. 2.5.1.7:
"The DW_OP_entry_value operation pushes the value that the described
location held upon entering the current subprogram. It has two
operands: an unsigned LEB128 length, followed by a block containing a
DWARF expression or a register location description (see Section
2.6.1.1.3 on page 39)."
Here is a dwarf-discuss mail thread regarding this:
http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2020-March/004610.html
There was not a strong consensus reached there, but people seem to lean
towards that operations specified under 2.6 (e.g. DW_OP_piece) may not
be part of a DWARF expression, and thus the DW_OP_entry_value operation
can't contain those.
Perhaps we instead want to emit a entry value operation per each
DW_OP_reg* operation, e.g.:
- DW_OP_entry_value(DW_OP_regx sub_reg0),
DW_OP_stack_value,
DW_OP_piece 8,
- DW_OP_entry_value(DW_OP_regx sub_reg1),
DW_OP_stack_value,
DW_OP_piece 8,
[...]
The question then becomes how the call site should look; should a
composite location description be emitted there, and we then leave it up
to the debugger to match those two composite location descriptions?
Another alternative could be to emit a call site parameter entry for
each sub-register, but firstly I'm unsure if that is even valid DWARF,
and secondly it seems like that would complicate the collection of call
site values quite a bit. As far as I can tell GCC does not emit any
entry values / call sites in these cases, so we do not have something to
compare with, but the former seems like the more reasonable approach.
Currently when trying to emit a call site entry for a parameter composed
of multiple DWARF registers a (DwarfRegs.size() == 1) assert is
triggered in addMachineRegExpression(). Until the call site
representation is figured out, and until there is use for these entry
values in practice, this commit simply stops the invalid DWARF from
being emitted.
Reviewers: djtodoro, vsk, aprantl
Reviewed By: djtodoro, vsk
Subscribers: jyknight, hiraditya, fedor.sergeev, jrtc27, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D75270
While validating live-out values, record instructions that look like
a reduction. This will comprise of a vector op (for now only vadd),
a vorr (vmov) which store the previous value of vadd and then a vpsel
in the exit block which is predicated upon a vctp. This vctp will
combine the last two iterations using the vmov and vadd into a vector
which can then be consumed by a vaddv.
Once we have determined that it's safe to perform tail-predication,
we need to change this sequence of instructions so that the
predication doesn't produce incorrect code. This involves changing
the register allocation of the vadd so it updates itself and the
predication on the final iteration will not update the falsely
predicated lanes. This mimics what the vmov, vctp and vpsel do and
so we then don't need any of those instructions.
Differential Revision: https://reviews.llvm.org/D75533
It's pretty silly to diagnose on a scalar copy but the build does that:
loop variable 'SibReg' of type 'const llvm::Register' creates a copy from type 'const llvm::Register' [-Wrange-loop-analysis]
With an undef operand, it's possible for getVRegDef to fail and return
null. This is an edge case very little code bothered to
consider. Proper gMIR should use G_IMPLICIT_DEF instead.
I initially tried to apply this restriction to all SSA MIR, so then
getVRegDef would never fail anywhere. However, ProcessImplicitDefs
does technically run while the function is in SSA. ProcessImplicitDefs
and DetectDeadLanes would need to either move, or a new pseudo-SSA
type of function property would need to be introduced.
Basically a NFC, but allows subclasses access to the entire PeelingModuloScheduleExpander
class. We are doing this to allow backends, particularly one that are not necessarily
upstreamed, to inherit from PeelingModuloScheduleExpander and access its basic structures.
Renames Info into LoopInfo for consistency in PeelingModuloScheduleExpander.
Differential Revision: https://reviews.llvm.org/D82673
In RISC-V vector extension, users could group multiple vector registers
as one pseudo register. In mixed width operations, users could use
partial vector registers to reduce the register pressure. The parameter
to control register grouping and partial use is called LMUL. LMUL is a
part of the type. So, we have a bunch of vector types. In order to
support all these types, we need new MVT types in LLVM. In this patch, I
added several MVT types that are used in RISC-V vector implementation.
This is a standalone patch for MVT types without RISC-V related implementation.
Differential revision: https://reviews.llvm.org/D81724
This is a followup on D78403.
I'm unsure about `getAtomicOpAlign` overloads that take `AtomicRMWInst` and `AtomicCmpXchgInst`, shouldn't `getAlign` provide the correct answer already?
Differential Revision: https://reviews.llvm.org/D81369
Fix a warning in getNode() when extracting a subvector from a
concat vector. We can simply replace the call to getVectorNumElements
with getVectorMinNumElements as this follows the defined behaviour
for EXTRACT_SUBVECTOR.
Differential Revision: https://reviews.llvm.org/D82746
When trying to reduce a BUILD_VECTOR to a SHUFFLE_VECTOR it's
important that we carefully check the vector types that led to
that BUILD_VECTOR. In the test I have attached to this commit
there is a case where the results of two SVE faddv instructions
are being stored to consecutive memory locations. With my fix,
as part of merging those stores we discover that each BUILD_VECTOR
element came from an extract of a SVE vector element and
therefore bail out.
Differential Revision: https://reviews.llvm.org/D82564
If a constant is only allsignbits in the demanded/active bits, then sign extend it to an allsignbits bool pattern for OR/XOR ops.
This also requires SimplifyDemandedBits XOR handling to be modified to call ShrinkDemandedConstant on any (non-NOT) XOR pattern to account for non-splat cases.
Next step towards fixing PR45808 - with this patch we now get a <-1,-1,0,0> v4i64 constant instead of <1,1,0,0>.
Differential Revision: https://reviews.llvm.org/D82257
Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.
reduceBuildVecExtToExtBuildVec was breaking a splat(zext(x)) pattern into buildvector(x, 0, x, 0, ..) resulting in much more complex insert+shuffle codegen.
We already go to some lengths to avoid this in SimplifyDemandedVectorElts etc. when we encounter splat buildvectors.
It should be OK to fold all splat(aext(x)) patterns - we might need to tighten this if we find a case where we mustn't introduce a buildvector(x, undef, x, undef, ..) but I can't find one.
Fixes PR46461.
The translation of cmpxchg added by
9481399c0f specifically skipped weak
cmpxchg due to not understanding the meaning. Weak cmpxchg was added
in 420a216817. As explained in the
commit message, the weak mode is implicit in how
ATOMIC_CMP_SWAP_WITH_SUCCESS is lowered. If it's expanded to a regular
ATOMIC_CMP_SWAP, it's replaced with a strong cmpxchg.
This handling seems weird to me, but this was already following the
DAG behavior. I would expect the strong IR instruction to not have the
boolean output. Failing that, I might expect the IRTranslator to emit
ATOMIC_CMP_SWAP and a constant for the boolean.
This lowers intrinsic @llvm.get.active.lane.mask to a setcc node, i.e. an icmp
ule, and creates vectors for its 2 arguments on which the comparison is
performed.
Differential Revision: https://reviews.llvm.org/D82292
Summary:
The printer seems to intend to not print the trailing comma but has a
copy-paste error for the last value in the escape, and the parser
enforces having no trailing comma, but somehow a test was never included
to actually confirm it.
Reviewers: thegameg, arsenm
Reviewed By: thegameg, arsenm
Subscribers: wdng, arsenm, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82478
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value. If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.
This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.
Differential Revision: https://reviews.llvm.org/D80368
Implement them on top of sdiv/udiv, similar to what we do for integer
types.
Potential future work: implementing i8/i16 srem/urem, optimizations for
constant divisors, optimizing the mul+sub to mls.
Differential Revision: https://reviews.llvm.org/D81511
Summary:
This patch adds base support for code generating fixed length
vector operations targeting a known SVE vector length. To achieve
this we lower fixed length vector operations to equivalent scalable
vector operations, whereby SVE predication is used to limit the
elements processed to those present within the fixed length vector.
Specifically this patch implements load and store operations, which
get lowered to their masked counterparts thusly:
V = load(Addr) =>
V = extract_fixed_vector(masked_load(make_pred(V.NumElts), Addr))
store(V, (Addr)) =>
masked_store(insert_fixed_vector(V), make_pred(V.NumElts), Addr))
Reviewers: rengolin, efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80385
Summary:
- AssertAlign node records the guaranteed alignment on its source node,
where these alignments are retrieved from alignment attributes in LLVM
IR. These tracked alignments could help DAG combining and lowering
generating efficient code.
- In this patch, the basic support of AssertAlign node is added. So far,
we only generate AssertAlign nodes on return values from intrinsic
calls.
- Addressing selection in AMDGPU is revised accordingly to capture the
new (base + offset) patterns.
Reviewers: arsenm, bogner
Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81711
Following on from this RFC[0] from a while back, this is the first patch towards
implementing variadic debug values.
This patch specifically adds a set of functions to MachineInstr for performing
operations specific to debug values, and replacing uses of the more general
functions where appropriate. The most prevalent of these is replacing
getOperand(0) with getDebugOperand(0) for debug-value-specific code, as the
operands corresponding to values will no longer be at index 0, but index 2 and
upwards: getDebugOperand(x) == getOperand(x+2). Similar replacements have been
added for the other operands, along with some helper functions to replace
oft-repeated code and operate on a variable number of value operands.
[0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139376.html<Paste>
Differential Revision: https://reviews.llvm.org/D81852
We have many cases where we call SimplifyMultipleUseDemandedBits and demand specific vector elements, but all the bits from them - this adds a helper wrapper to handle this.
For little endian targets, if we only need the lowest element and none of the extended bits then we can just use the (bitcasted) source vector directly.
We already do this in SimplifyDemandedBits, this adds the SimplifyMultipleUseDemandedBits equivalent.
If a collection of interconnected phi nodes is only ever loaded, stored
or bitcast then we can convert the whole set to the bitcast type,
potentially helping to reduce the number of register moves needed as the
phi's are passed across basic block boundaries. This has to be done in
CodegenPrepare as it naturally straddles basic blocks.
The alorithm just looks from phi nodes, looking at uses and operands for
a collection of nodes that all together are bitcast between float and
integer types. We record visited phi nodes to not have to process them
more than once. The whole subgraph is then replaced with a new type.
Loads and Stores are bitcast to the correct type, which should then be
folded into the load/store, changing it's type.
This comes up in the biquad testcase due to the way MVE needs to keep
values in integer registers. I have also seen it come up from aarch64
partner example code, where a complicated set of sroa/inlining produced
integer phis, where float would have been a better choice.
I also added undef and extract element handling which increased the
potency in some cases.
This adds it with an option that defaults to off, and disabled for 32bit
X86 due to potential issues around canonicalizing NaNs.
Differential Revision: https://reviews.llvm.org/D81827
At the moment we use Global ISel by default at -O0, however it is
currently not capable of dealing with scalable vectors for two
reasons:
1. The register banks know nothing about SVE registers.
2. The LLT (Low Level Type) class knows nothing about scalable
vectors.
For now, the easiest way to avoid users hitting issues when using
the SVE ACLE is to fall back on normal DAG ISel when encountering
instructions that operate on scalable vector types.
I've added a couple of RUN lines to existing SVE tests to ensure
we can compile at -O0. I've also added some new tests to
CodeGen/AArch64/GlobalISel/arm64-fallback.ll
that demonstrate we correctly fallback to DAG ISel at -O0 when
lowering formal arguments or translating instructions that involve
scalable vector types.
Differential Revision: https://reviews.llvm.org/D81557
Without this fix, handleMoveUp can create an invalid live range like
this:
[98904e,98908r:0)[98908e,227504r:1)
where the two segments overlap, but only because we have lost the "e"
(early-clobber) on the end point of the first segment.
Differential Revision: https://reviews.llvm.org/D82110
For now I have changed SimplifyDemandedBits and it's various callers
to assume we know nothing for scalable vectors and to ignore the
demanded bits completely. I have also done something similar for
SimplifyDemandedVectorElts. These changes fix up lots of warnings
due to calls to EVT::getVectorNumElements() for types with scalable
vectors. These functions are all used for optimisations, rather than
functional requirements. In future we can revisit this code if
there is a need to improve code quality for SVE.
Differential Revision: https://reviews.llvm.org/D80537
When trying to calculate the number of sign bits for scalable vectors
we should just bail out for now and pretend we know nothing.
Differential Revision: https://reviews.llvm.org/D81093
Summary:
Extend StackLifetime with option to calculate liveliness
where alloca is only considered alive on basic block entry
if all non-dead predecessors had it alive at terminators.
Depends on D82043.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82124
This was passing in all the parameters needed to construct a
LegalizerHelper in the custom legalization, when it's simpler to just
pass in the existing helper.
This is slightly more annoying to use in the common case where you
don't need the legalizer helper, but we could add back the common
parameters back in addition to the helper.
I didn't propagate this to all the internal target changes that this
logically implies, but did update a sample one for
legalizeMinNumMaxNum.
This is in preparation for moving AMDGPU load/store legalization
entirely into custom lowering. The current set of legalization actions
is really constraining and not really capable of expressing all the
actions needed to legalize loads/stores. In particular there's no way
to express when the memory access itself needs to change size vs. the
result type. There's also a lot of redundancy since the same
split/widen actions need to be applied in both vector and scalar
cases. All of the sub-cases logically belong as steps in the legalizer
helper, but it will be easier to consider everything at once in custom
lowering.
This patch adds some missing information to the LF_BUILDINFO which allows for rebuilding an .OBJ without any external dependency but the .OBJ itself (other than the compiler executable).
Some tools need this information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO therefore stores a full path to the compiler, the PWD (which is the CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variable). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.
For more information see PR36198 and D43002.
Differential Revision: https://reviews.llvm.org/D80833
Summary:
Half-precision floating point arguments and returns are currently
promoted to either float or int32 in clang's CodeGen and there's
no existing support for the lowering of `half` arguments and returns
from IR in AArch32's backend.
Such frontend coercions, implemented as coercion through memory
in clang, can cause a series of issues in argument lowering, as causing
arguments to be stored on the wrong bits on big-endian architectures
and incurring in missing overflow detections in the return of certain
functions.
This patch introduces the handling of half-precision arguments and returns in
the backend using the actual "half" type on the IR. Using the "half"
type the backend is able to properly enforce the AAPCS' directions for
those arguments, making sure they are stored on the proper bits of the
registers and performing the necessary floating point convertions.
Reviewers: rjmccall, olista01, asl, efriedma, ostannard, SjoerdMeijer
Reviewed By: ostannard
Subscribers: stuij, hiraditya, dmgreen, llvm-commits, chill, dnsampaio, danielkiss, kristof.beyls, cfe-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75169
We're missing a plain English explanation of how this pass is supposed
to operate -- add one to the file comment.
Differential Revision: https://reviews.llvm.org/D80929
Added NextPowerOf2() routine to TypeSize and rewritten the code
in getVectorTypeBreakdown to avoid warnings being generated.
Differential Revision: https://reviews.llvm.org/D81578
Instead of asserting the number of elements is the same, we should be
comparing the element counts instead. In addition, when looking at
concats of extract_subvectors it's fine to use getVectorMinNumElements()
for scalable vectors.
I discovered these warnings when compiling the structured loads tests in
this file:
test/CodeGen/AArch64/sve-intrinsics-loads.ll
Differential Revision: https://reviews.llvm.org/D81936
Summary:
This invariant is being violated in the test case
https://reviews.llvm.org/D77849, related to the use of the relatively
new ability for callbr to have return values, and MachineBasicBlocks
with INLINEASM_BR terminators to emit live out register defs.
As noted in the comment, this triggers invariant violations in
MachineVerifier via `llc -verify-machineinstrs` or
`llc -verify-regalloc`, since only MachineInstrs that are terminators
are allowed to follow the first terminator.
https://reviews.llvm.org/D75098 may rework this very assertion if we're
spilling via a (proposed) TCOPY MachineInstr.
Reviewers: void, efriedma, arsenm
Reviewed By: efriedma
Subscribers: qcolombet, wdng, hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78166
When the zext gets promoted, it used to retain the original location,
which pessimizes the debugging experience causing an unexpected
jump in stepping at -Og.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46120 (which also
contains a full C repro).
Differential Revision: https://reviews.llvm.org/D81437
Summary:
Add a flag to omit the xray_fn_idx to cut size overhead and relocations
roughly in half at the cost of reduced performance for single function
patching. Minor additions to compiler-rt support per-function patching
without the index.
Reviewers: dberris, MaskRay, johnislarry
Subscribers: hiraditya, arphaman, cfe-commits, #sanitizers, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D81995
Summary:
This code is going to be used in StackSafety.
This patch is file move with minimal changes. Identifiers
will be fixed in the followup patch.
Reviewers: eugenis, pcc
Reviewed By: eugenis
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81831
It's possible to end up with a zext or something in the way of a G_CONSTANT,
even pre-legalization. This can happen with memsets.
e.g.
https://godbolt.org/z/Bjc8cw
To make sure we can catch these cases, use `getConstantVRegValWithLookThrough`
instead of `mi_match`.
Differential Revision: https://reviews.llvm.org/D81875
The promotion machinery in CGP moves instructions retaining
debug locations. When the transformation is local, this is mostly
correct, but when instructions are moved cross-BBs, this is not
always true and causes jumpiness in line tables. This is the first
of a series of commits. sext(s) and zext(s) need to be treated
similarly.
Differential Revision: https://reviews.llvm.org/D81879
This implements the following combines:
((0-A) + B) -> B-A
(A + (0-B)) -> A-B
Porting over the basic algebraic combines from the DAGCombiner. There are
several combines which fold adds away into subtracts. This is just the simplest
one.
I noticed that add combines are some of the most commonly hit across CTMark,
(via print statements when they fire), so I'm porting over some of the obvious
ones.
This gives some minor code size improvements on CTMark at -O3 on AArch64.
Differential Revision: https://reviews.llvm.org/D77453
Summary:
Teach MachineVerifier to check branches for MBB operands if they are not declared indirect.
Add `isBarrier`, `isIndirectBranch` to `G_BRINDIRECT` and `G_BRJT`.
Without these, `MachineInstr.isConditionalBranch()` was giving a
false-positive for those instructions.
Reviewers: aemerson, qcolombet, dsanders, arsenm
Reviewed By: dsanders
Subscribers: hiraditya, wdng, simoncook, s.egerton, arsenm, rovka, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81587
This patch tries to reassociate two patterns related to FMA to expose
more ILP on PowerPC.
// Pattern 1:
// A = FADD X, Y (Leaf)
// B = FMA A, M21, M22 (Prev)
// C = FMA B, M31, M32 (Root)
// -->
// A = FMA X, M21, M22
// B = FMA Y, M31, M32
// C = FADD A, B
// Pattern 2:
// A = FMA X, M11, M12 (Leaf)
// B = FMA A, M21, M22 (Prev)
// C = FMA B, M31, M32 (Root)
// -->
// A = FMUL M11, M12
// B = FMA X, M21, M22
// D = FMA A, M31, M32
// C = FADD B, D
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D80175
Current implementation of division estimation isn't correct for some
cases like 1.0/0.0 (result is nan, not expected inf).
And this change exposes a potential infinite loop: we use
isConstOrConstSplatFP in combineRepeatedFPDivisors to look up if the
divisor is some constant. But it doesn't work after legalized on some
platforms. This patch restricts the method to act before LegalDAG.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D80542
This decreases the time consumed by the pass [during RawSpeed unity build]
by 25% (0.0586 s -> 0.04388 s).
While that isn't really impressive overall, that wasn't the goal here.
The memory results here are noticeable.
The baseline results are:
```
total runtime: 55.65s.
calls to allocation functions: 19754254 (354960/s)
temporary memory allocations: 4951609 (88974/s)
peak heap memory consumption: 239.13MB
peak RSS (including heaptrack overhead): 463.79MB
total memory leaked: 198.01MB
```
While with this patch the results are:
```
total runtime: 55.37s.
calls to allocation functions: 19068237 (344403/s) # -3.47 %
temporary memory allocations: 4261772 (76974/s) # -13.93 % (!!!)
peak heap memory consumption: 239.13MB
peak RSS (including heaptrack overhead): 463.73MB
total memory leaked: 198.01MB
```
So we get rid of *a lot* of temporary allocations.
Using `SmallSet<8>` makes sense to me because at least here
for x86 BdVer2, the size of that set is *never* more than 3,
over all of llvm test-suite + RawSpeed.
The story might be different on other targets,
not sure if it will ever justify whole DenseSet,
but if it does SmallDenseSet might be a compromise.
SUMMARY:
Since we deal with aix emitLinkage in the PPCAIXAsmPrinter::emitLinkage() in the patch https://reviews.llvm.org/D75866. It do not go to AsmPrinter::emitLinkage() any more, we clean up some aix related code in the AsmPrinter::emitLinkage()
Reviewers: Jason liu
Differential Revision: https://reviews.llvm.org/D81613
Put AND before ADD in LegalizerHelper::lowerFPTRUNC_F64_TO_F16
in order to match algorithm from AMDGPUTargetLowering::LowerFP_TO_FP16.
Differential Revision: https://reviews.llvm.org/D81666
Summary:
Fix crash when using -debug caused by the GlobalISel observer trying to print
an incomplete DBG_VALUE instruction. This was caused by the MachineIRBuilder
using buildInstr, which immediately inserts the instruction causing print,
instead of using BuildMI to first build up the instruction and using
insertInstr when finished.
Add RUN-line to existing debug-insts.ll test with -debug flag set to make sure
no crash is happening.
Also fixed a missing %s in the 2nd RUN-line of the same test.
Reviewers: t.p.northover, aditya_nandakumar, aemerson, dsanders, arsenm
Reviewed By: arsenm
Subscribers: wdng, arsenm, rovka, hiraditya, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76934
Until we have a real need for computing known bits for scalable
vectors I have simply changed the code to bail out for now and
pretend we know nothing. I've also fixed up some simple callers of
computeKnownBits too.
Differential Revision: https://reviews.llvm.org/D80437
If the target explicitly requested custom legalization, it should be
required to implement this. Also move default legalizeIntrinsic
implementation into the header so it's next to the related
legalizeCustom.
The memory folding raplaced the old instruction without copying the symbols assigned. Which will resulted in built fail due to the lost symbols.
Reviewed by craig.topper
Differential Revision: https://reviews.llvm.org/D78471
SUMMARY:
in the aix assembly , it do not have .hidden and .protected directive.
in current llvm. if a function or a variable which has visibility attribute, it will generate something like the .hidden or .protected , it can not recognize by aix as.
in aix assembly, the visibility attribute are support in the pseudo-op like
.extern Name [ , Visibility ]
.globl Name [, Visibility ]
.weak Name [, Visibility ]
in this patch, we implement the visibility attribute for the global variable, function or extern function .
for example.
extern __attribute__ ((visibility ("hidden"))) int
bar(int* ip);
__attribute__ ((visibility ("hidden"))) int b = 0;
__attribute__ ((visibility ("hidden"))) int
foo(int* ip){
return (*ip)++;
}
the visibility of .comm linkage do not support , we will have a separate patch for it.
we have the unsupported cases ("default" and "internal") , we will implement them in a a separate patch for it.
Reviewers: Jason Liu ,hubert.reinterpretcast,James Henderson
Differential Revision: https://reviews.llvm.org/D75866
It was annoying enough that every custom lowering needed to set the
insert point, but this was made worse since now these all needed to be
updated to setInstrAndDebugLoc. Consolidate these so every
legalization action has the right insert position by default.
This should fix dropping debug info in every custom AMDGPU
legalization.
The current relationship between LegalizerHelper and MachineIRBuilder
confuses me, because the LegalizerHelper modifies the MachineIRBuilder
which it does not own. Constructing a LegalizerHelper destroys the
insert point, since the constructor calls setMF, which clears all the
fields. Try to separate these functions, so it's possible to construct
a LegalizerHelper from an existing MachineIRBuilder without losing the
insert point/debug loc.
The construction APIs for MachineIRBuilder don't make much sense, and
it's been annoying to sort through it with these trivial functions
separate from the declaration.
New instructions were getting printed both in createdInstr, and in the
final printNewInstrs, so it made it look like the same instructions
were created twice. This overall made reading the debug output
harder. Stop printing the initial construction and only print new
instructions in the summary at the end. This avoids printing the less
useful case where instructions are sometimes initially created with no
operands.
I'm not sure this is the correct instance to remove; now the visible
ordering is different. Now you will typically see the one erased
instruction message before all the new instructions in order. I think
this is the more logical view of typical legalization changes,
although it's mechanically backwards from the normal
insert-new-erase-old pattern.
If a resource can be held for multiple cycles in the schedule model
then an instruction can be placed into the available queue, another
instruction can be scheduled, but the first will not be taken back out if
the two instructions hazard. To fix this make sure that we update the
available queue even on the first MOp of a cycle, pushing available
instructions back into the pending queue if they now conflict.
This happens with some downstream schedules we have around MVE
instruction scheduling where we use ResourceCycles=[2] to show the
instruction executing over two beats. Apparently the test changes here
are OK too.
Differential Revision: https://reviews.llvm.org/D76909
If fmul and fadd are separated by an fma, we can fold them together
to save an instruction:
fadd (fma A, B, (fmul C, D)), N1 --> fma(A, B, fma(C, D, N1))
The fold implemented here is actually a specialization - we should
be able to peek through >1 fma to find this pattern. That's another
patch if we want to try that enhancement though.
This transform was guarded by the TLI hook enableAggressiveFMAFusion(),
so it was done for some in-tree targets like PowerPC, but not AArch64
or x86. The hook is protecting against forming a potentially more
expensive computation when fma takes longer to execute than a single
fadd. That hook may be needed for other transforms, but in this case,
we are replacing fmul+fadd with fma, and the fma should never take
longer than the 2 individual instructions.
'contract' FMF is all we need to allow this transform. That flag
corresponds to -ffp-contract=fast in Clang, so we are allowed to form
fma ops freely across expressions.
Differential Revision: https://reviews.llvm.org/D80801
Summary:
The naked function attribute is meant to suppress all function
prologue/epilogue instructions.
On ARM, some are still emitted if an argument greater than 64 bytes in size
(the threshold for using the byval attribute in IR) is passed partially
in registers.
Perform the check for Attribute::Naked and early exit in
SelectionDAGISel::LowerArguments().
Checking in ARMFrameLowering::determineCalleeSaves() is too late.
A test case is included.
Reviewers: llvm-commits, olista01, danielkiss
Reviewed By: danielkiss
Subscribers: kristof.beyls, hiraditya, danielkiss
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80715
Change-Id: Icedecf2a4ad31bc3c35ab0df7489a9d346e1f7cc
Summary:
Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMisalignedMemoryAccesses` without marking it override.
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81374
Summary:
Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMemoryAccess` without marking it override.
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81379
Summary:
Currently, MachineVerifier will attempt to verify that tied operands
satisfy register constraints as soon as the function is no longer in
SSA form. However, PHIElimination will take the function out of SSA
form while TwoAddressInstructionPass will actually rewrite tied operands
to match the constraints. PHIElimination runs first in the pipeline.
Therefore, whenever the MachineVerifier is run after PHIElimination,
it will encounter verification errors on any tied operands.
This patch adds a function property called TiedOpsRewritten that will be
set by TwoAddressInstructionPass and will control when the verifier checks
tied operands.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D80538
In two instances of CreateStackTemporary we are sometimes promoting
alignments beyond the stack alignment. I have introduced a new function
called getReducedAlign that will return the alignment for the broken
down parts of illegal vector types. For example, on NEON a <32 x i8>
type is made up of two <16 x i8> types - in this case the sensible
alignment is 16 bytes, not 32.
In the legalization code wherever we create stack temporaries I have
started using the reduced alignments instead for illegal vector types.
I added a test to
CodeGen/AArch64/build-one-lane.ll
that tries to insert an element into an illegal fixed vector type
that involves creating a temporary stack object.
Differential Revision: https://reviews.llvm.org/D80370
Commit d77ae1552f ("[DebugInfo] Support to emit debugInfo
for extern variables") added support to emit debuginfo
for extern variables. Currently, only BPF target enables to
emit debuginfo for extern variables.
But if the extern variable has "void" type, the compilation will
fail.
-bash-4.4$ cat t.c
extern void bla;
void *test() {
void *x = &bla;
return x;
}
-bash-4.4$ clang -target bpf -g -O2 -S t.c
missing global variable type
!1 = distinct !DIGlobalVariable(name: "bla", scope: !2, file: !3, line: 1,
isLocal: false, isDefinition: false)
...
fatal error: error in backend: Broken module found, compilation aborted!
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace,
preprocessed source, and associated run script.
Stack dump:
...
The IR requires a DIGlobalVariable must have a valid type and the
"void" type does not generate any type, hence the above fatal error.
Note that if the extern variable is defined as "const void", the
compilation will succeed.
-bash-4.4$ cat t.c
extern const void bla;
const void *test() {
const void *x = &bla;
return x;
}
-bash-4.4$ clang -target bpf -g -O2 -S t.c
-bash-4.4$ cat t.ll
...
!1 = distinct !DIGlobalVariable(name: "bla", scope: !2, file: !3, line: 1,
type: !6, isLocal: false, isDefinition: false)
!6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: null)
...
Since currently, "const void extern_var" is supported by the
debug info, it is natural that "void extern_var" should also
be supported. This patch disabled assertion of "void extern_var"
in IR verifier and add proper guarding when emiting potential
null debug info type to dwarf types.
Differential Revision: https://reviews.llvm.org/D81131
This moves the SuffixTree test used in the Machine Outliner and moves it into Support for use in other outliners elsewhere in the compilation pipeline.
Differential Revision: https://reviews.llvm.org/D80586
We sometimes have functions with large numbers of sibling basic
blocks (usually with an error path exit from each one). This was
triggering the qudratic behavior in this function - after visiting
each child llvm would re-scan the parent from the beginning again. We
modify the work stack to record the next index to be worked on
alongside the pointer. This avoids the need to linearly search for
the next unfinished child.
Differential Revision: https://reviews.llvm.org/D80029
Summary: This is a followup on D81196.
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81362
Summary: Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::HandleByVal` without marking it `override`.
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81365
Scalable vectors cannot use 'BUILD_VECTOR', so it is necessary to
properly split and widen scalable vectors when passing them
to CopyToReg/CopyFromReg.
This functionality is added to TargetLoweringBase::getVectorTypeBreakdown().
This patch only adds support for 'splitting' scalable vectors that
are a multiple of some legal type, e.g.
<vscale x 6 x i64> -> 3 x <vscale x 2 x i64>
Reviewers: efriedma, c-rhodes
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80139
There's two properties we want to verify:
1. That the successors returned by analyzeBranch are in the CFG
successor list, and
2. That there are no extraneous successors are in the CFG successor
list.
The previous implementation mostly accomplished this, but in a very
convoluted manner.
Differential Revision: https://reviews.llvm.org/D79793
Previously, it tried to infer the correct destination block from the
successor list, but this is a rather tricky propspect, given the
existence of successors that occur mid-block, such as invoke, and
potentially in the future, callbr/INLINEASM_BR. (INLINEASM_BR, in
particular would be problematic, because its successor blocks are not
distinct from "normal" successors, as EHPads are.)
Instead, require the caller to pass in the expected fallthrough
successor explicitly. In most callers, the correct block is
immediately clear. But, in MachineBlockPlacement, we do need to record
the original ordering, before starting to reorder blocks.
Unfortunately, the goal of decoupling the behavior of end-of-block
jumps from the successor list has not been fully accomplished in this
patch, as there is currently no other way to determine whether a block
is intended to fall-through, or end as unreachable. Further work is
needed there.
Differential Revision: https://reviews.llvm.org/D79605
Just computing the alignment makes sense without caring about the
general known bits, such as for non-integral pointers. Separate the
two and start calling into the TargetLowering hooks for frame indexes.
Start calling the TargetLowering implementation for FrameIndexes,
which improves the AMDGPU matching for stack addressing modes. Also
introduce a new hook for returning known alignment of target
instructions. For AMDGPU, it would be useful to report the known
alignment implied by certain intrinsic calls.
Also stop using MaybeAlign.
PendingInLocs ends up having the same value as InLocs, just computed
a bit more indirectly. It is a leftover of a previous implementation
approach.
This patch drops PendingInLocs, as well as the Diff and Removed
calulations, which are no longer needed.
Differential Revision: https://reviews.llvm.org/D80868
This patch updates TargetLoweringBase::computeRegisterProperties and
TargetLoweringBase::getTypeConversion to support scalable vectors,
and make the right calls on how to legalise them. These changes are required
to legalise both MVTs and EVTs.
Reviewers: efriedma, david-arm, ctetreau
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80640
Current implementation of emitPatchpoint() is very inefficient:
for every FrameIndex operand if creates new MachineInstr with
that operand expanded and all other copied as is.
Since PATCHPOINT/STATEPOINT instructions may have *a lot* of
FrameIndex operands, we end up creating and erasing many
machine instructions. But we can do it in single pass, with only
one new machine instruction generated.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D81181
Summary:
This patch adds legalisation of extensions where the operand
of the extend is a legal scalable type but the result is not.
EXTRACT_SUBVECTOR is used to split the result, before
being replaced by target-specific [S|U]UNPK[HI|LO] operations.
For example:
```
zext <vscale x 16 x i8> %a to <vscale x 16 x i16>
```
should emit:
```
uunpklo z2.h, z0.b
uunpkhi z1.h, z0.b
```
Reviewers: sdesmalen, efriedma, david-arm
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79587
Summary:
Cache the results from getMachineBasicBlocks in LexicalScopes to speed
up UserValueScopes::dominates queries. This replaces the caching done
in UserValueScopes. Compared to the old caching method, this reduces
memory traffic when a VarLoc is copied (e.g. when a VarLocMap grows),
and enables caching across basic blocks.
When compiling sqlite 3.5.7 (CTMark version), this patch reduces the
number of calls to getMachineBasicBlocks from 10,207 to 1,093. I also
measured a small compile-time reduction (~ 0.1% of total wall time, on
average, on my machine).
As a drive-by, I made the DebugLoc in UserValueScopes a const reference
to cut down on MetadataTracking traffic.
Reviewers: jmorse, Orlando, aprantl, nikic
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80957
This wasn't getting much value from the DAG or depth arguments, since
it's only called on the frame index root nodes. FrameIndexes can also
only return a scalar value, so it also didn't need DemandedElts.
D79003/rG9fa58d1bf2f8 exposed an issue with scalarizeBinOpOfSplats that we were extracting from the splatted vector result instead of the source, the splat index is only valid for the source vector not the result, which may contain undefs, including at the splat index.
This reverts commit 21dadd774f.
In at least PromoteIntBinOps, they wanted to know about users of *all* values
produced by the node not just the integer being promoted. For example not
replacing chain users if the operation was a load breaks the ordering of the
DAG.
Summary:
This patch adds support for dumping .dot
representation of SelectionDAG. It is inspired from the fact that,
a developer may want to just dump the graph at
a predictable path with a simple name to compare.
The exisitng utility (i.e. viewGraph) are overkill
for this motive hence this patch adds the requires support
while using the core routines from GraphWriter.
Example usage: DAG.dumpDotGraph("/tmp/graph.dot", "MyGraph")
will create /tmp/graph.dot file when DAG is an
object of SelectionDAG class.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D80711
To do so, I had to sink the old school inline operand handling into GCStatepointInst which is non ideal. This code should be removed shortly and I was able to at least clean it up a bunch.
The AMDGPU lowering for unconstrained G_FDIV sometimes needs to
introduce a mode switch in the middle, so it's helpful to have
constrained instructions available to legalize this. Right now nothing
is preventing reordering of the mode switch with the other
instructions in the expansion.
When we rematerialize a value as part of the coalescing, we may
widen the register class of the destination register.
When this happens, updateRegDefUses may create additional subranges
to account for the wider register class.
The created subranges are empty and if they are not defined by
the rematerialized instruction we clean them up.
However, if they are defined by the rematerialized instruction but
unused, we failed to flag them as dead definition and would leave
them as empty live-range.
This is wrong because empty live-ranges don't interfere with anything,
thus if we don't fix them, we would fail to account that the
rematerialized instruction clobbers some lanes.
E.g., let us consider the following pseudo code:
def.lane_low64:reg128 = ldimm
newdef:reg32 = COPY def.lane_low64_low32
When rematerialization happens for newdef, we end up with:
newdef.lane_low64:reg128 = ldimm
= use newdef.lane_low64_low32
Let's look at the live interval of newdef.
Before rematerialization, we would get:
newdef [defIdx, useIdx:0) 0@defIdx
Right after updateRegDefUses, newdef register class is widen to reg128
and the subrange definitions will be augmented to fill the subreg that
is used at the definition point, here lane_low64.
The resulting live interval would be:
newdef [newDefIdx, useIdx:0) 0@newDefIdx
* lane_low64_high32 EMPTY
* lane_low64_low32 [newDefIdx, useIdx:0)
Before this patch this would be the final status of the live interval.
Therefore we miss that lane_low64_high32 is actually live on the
definition point of newdef.
With this patch, after rematerializing, we check all the added subranges
and for the ones that are defined but empty, we flag them as dead def.
Thus, in that case, newdef would look like this:
newdef [newDefIdx, useIdx:0) 0@newDefIdx
* lane_low64_high32 [newDefIdx, newDefIdxDead) ; <-- instead of EMPTY
* lane_low64_low32 [newDefIdx, useIdx:0)
This fixes https://www.llvm.org/PR46154
Record internal state based on register units. This is often more
efficient as there are typically fewer register units to update
compared to iterating over all the aliases of a register.
Original patch by Matthias Braun, but I've been rebasing and fixing it
for almost 2 years and fixed a few bugs causing intermediate failures
to make this patch independent of the changes in
https://reviews.llvm.org/D52010.
In the function "Analysis.cpp:isInTailCallPosition", it only checks whether
a call is in a tail call position if the call has side effects, access memory
or it is not safe to speculative execute. Therefore, a speculatable function
will not go through tail call position check and improperly tail called when
it is not in a tail-call position. This patch enables tail call position check
for speculatable functions.
Differential Revision: https://reviews.llvm.org/D80661
Summary:
In the patch D73152, it adds a new function LiveVariables::addNewBlock.
This new function will add the reg which PHI used to the MBB which reg
is from.
But the new function may cause LiveVariable Verification failed when the
Src reg in PHI is undef.
Reviewed By: bjope
Differential Revision: https://reviews.llvm.org/D80077
If we're only demanding the (shifted) sign bits of the shift source value, then we can use the value directly.
This handles SimplifyDemandedBits/SimplifyMultipleUseDemandedBits for both ISD::SHL and X86ISD::VSHLI.
Differential Revision: https://reviews.llvm.org/D80869
Move TargetFrameLowering.h include to the top of the TargetFrameLoweringImpl.cpp includes (clang-format doesn't do this by default as the filenames don't match).
This adds call site info support for call instructions with delay slot.
Search for instructions inside call delay slot, which load value
into parameter forwarding registers.
Return address of the call points to instruction after call delay slot,
which is not the one, immediately after the call instruction.
Patch by Nikola Tesic
Differential revision: https://reviews.llvm.org/D78107
This patch implements a target independent DAG combine to produce multiply-high
instructions from shifts. This DAG combine will combine shifts for any type as
long as the MULH on the narrow type is legal.
For now, it is enabled on PowerPC as PowerPC is the only target that has an
implementation of the isMulhCheaperThanMulShift TLI hook introduced in
D78271.
Moreover, this DAG combine focuses on catching the pattern:
(shift (mul (ext <narrow_type>:$a to <wide_type>), (ext <narrow_type>:$b to <wide_type>)), <narrow_width>)
to produce mulhs when we have a sign-extend, and mulhu when we have
a zero-extend.
The patch performs the following checks:
- Operation is a right shift arithmetic (sra) or logical (srl)
- Input to the shift is a multiply
- Both operands to the shift are sext/zext nodes
- The extends into the multiply are both the same
- The narrow type is half the width of the wide type
- The shift amount is the width of the narrow type
- The respective mulh operation is legal
Differential Revision: https://reviews.llvm.org/D78272
The collectCallSiteParameters() method searches for instructions
which load values into registers used for parameters passing.
Previously, interpretation of those values, loaded by one such
instruction, was implemented inside collectCallSiteParameters() method.
This patch moves the interpretation code from collectCallSiteParameters()
method into a separate static method named interpretValue. New method is
called from collectCallSiteParameters() to process each instruction from
targeted instruction scope.
The collectCallSiteParameters() searches for loaded parameter value
among instructions which precede the call instruction, inside the same
basic block. When needed, new method (interpretValue) could be used for
searching any instruction scope.
This is preparation for search of parameter value, loaded inside call
delay slot.
Patch by Nikola Tesic
Differential revision: https://reviews.llvm.org/D78106
This patch adds clang options:
-fbasic-block-sections={all,<filename>,labels,none} and
-funique-basic-block-section-names.
LLVM Support for basic block sections is already enabled.
+ -fbasic-block-sections={all, <file>, labels, none} : Enables/Disables basic
block sections for all or a subset of basic blocks. "labels" only enables
basic block symbols.
+ -funique-basic-block-section-names: Enables unique section names for
basic block sections, disabled by default.
Differential Revision: https://reviews.llvm.org/D68049
Do not spill UNDEF GC values. Instead, replace corresponding
gc.relocate intrinsic with an (arbitrary, but recognizable) constant.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D80714
These cases all follow the same pattern:
struct A {
friend class X;
//...
class X {};
};
But 'friend class X;' injects 'X' into the surrounding namespace scope,
rather than introducing a class member. So the second 'class X {}' is a
completely different type, which changes the meaning of the earlier name
'X' from '::X' to 'A::X'.
Additionally, the friend declaration is pointless -- members of a class
don't need to be befriended to be able to access private members.
Summary:
Instead of iterating over all VarLoc IDs in removeEntryValue(), just
iterate over the interval reserved for entry value VarLocs. This changes
the iteration order, hence the test update -- otherwise this is NFC.
This appears to give an ~8.5x wall time speed-up for LiveDebugValues when
compiling sqlite3.c 3.30.1 with a Release clang (on my machine):
```
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
Before: 2.5402 ( 18.8%) 0.0050 ( 0.4%) 2.5452 ( 17.3%) 2.5452 ( 17.3%) Live DEBUG_VALUE analysis
After: 0.2364 ( 2.1%) 0.0034 ( 0.3%) 0.2399 ( 2.0%) 0.2398 ( 2.0%) Live DEBUG_VALUE analysis
```
The change in removeEntryValue() is the only one that appears to affect
wall time, but for consistency (and to resolve a pending TODO), I made
the analogous changes for iterating over SpillLocKind VarLocs.
Reviewers: nikic, aprantl, jmorse, djtodoro
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80684
The AMDGPU non-strict fdiv lowering needs to introduce an FP mode
switch in some cases, and has custom nodes to provide chain/glue for
the intermediate FP operations. We need to propagate nofpexcept here,
but getNode was dropping the flags.
Adding nofpexcept in the AMDGPU custom lowering is left to a future
patch.
Also fix a second case where flags were dropped, but in this case it
seems it just didn't handle this number of operands.
Test will be included in future AMDGPU patch.
Summary:
While clustering mem ops, AMDGPU target needs to consider number of clustered bytes
to decide on max number of mem ops that can be clustered. This patch adds support to pass
number of clustered bytes to target mem ops clustering logic.
Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar
Reviewed By: foad
Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80545
I inverted the mask when I ported to the new form of G_PTRMASK in
8bc03d2168.
I don't think this really broke anything, since G_VASTART isn't
handled for types with an alignment higher than the stack alignment.
In some cases ScheduleDAGRRList has to add new nodes to resolve problems
with interfering physical registers. When new nodes are added, it
completely re-computes the topological order, which can take a long
time, but is unnecessary. We only add nodes one by one, and initially
they do not have any predecessors. So we can just insert them at the end
of the vector. Later we add predecessors, but the helper function
properly updates the topological order much more efficiently. With this
change, the compile time for the program below drops from 300s to 30s on
my machine.
define i11129 @test1() {
%L1 = load i11129, i11129* undef
%B30 = ashr i11129 %L1, %L1
store i11129 %B30, i11129* undef
ret i11129 %L1
}
This should be generally beneficial, as we can skip a large amount of
work. Theoretically there are some scenarios where we might not safe
much, e.g. when we add a dependency between the first and last node.
Then we would have to shift all nodes. But we still do not have to spend
the time re-computing the initial order.
Reviewers: MatzeB, atrick, efriedma, niravd, paquette
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D59722
This code was repeated in two callers of CommitTargetLoweringOpt.
But CommitTargetLoweringOpt is also called from TargetLowering.
We should print a message for those calls to. So sink the
repeated code into CommitTargetLoweringOpt to catch those calls.
We are calling getValidShiftAmountConstant first followed by getValidMinimumShiftAmountConstant/getValidMaximumShiftAmountConstant if that failed. But both are used in the same way in ComputeNumSignBits and the Min/Max variants call getValidShiftAmountConstant internally anyhow.
This patch adds support for emission of following DWARFv5 macro
forms in .debug_macro.dwo section:
- DW_MACRO_start_file
- DW_MACRO_end_file
- DW_MACRO_define_strx
- DW_MACRO_undef_strx
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D78866
Summary:
This caused incorrect debug information for parameters:
Previously, after a COPY of a parameter that changes the width,
we would emit a DBG_VALUE that continues to be associated to that
parameter, even though it now used a different width.
This made the LiveDebugValues pass assume the parameter value
got clobbered and it stopped tracking the parameter entry
value, leading to incorrect debug information.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39715
Subscribers: aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80819
Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given
Differential Revision: https://reviews.llvm.org/D79537
DW_MACRO_define_strx forms are supported now in llvm-dwarfdump and these
forms can be used in both debug_macro[.dwo] sections. An added advantage
for using strx forms over strp forms is that it uses indices
approach instead of a relocation to debug_str section.
This patch unify the emission for debug_macro section.
Reviewed by: dblaikie, ikudrin
Differential Revision: https://reviews.llvm.org/D78865
Since on AIX, our strategy is to not use -u to suppress any undefined
symbols, we need to emit .extern for the symbols with AvailableExternally
linkage.
Differential Revision: https://reviews.llvm.org/D80642
optimizations
As discussed in the thread http://lists.llvm.org/pipermail/llvm-dev/2020-May/141838.html,
some bit field access width can be reduced by ReduceLoadOpStoreWidth, some
can't. If two accesses are very close, and the first access width is reduced,
the second is not. Then the wide load of second access will be stalled for long
time.
This patch add command line options to guard ReduceLoadOpStoreWidth and
ShrinkLoadReplaceStoreWithStore, so users can use them to disable these
store width reduction optimizations.
Differential Revision: https://reviews.llvm.org/D80745
Currently combineInsertEltToShuffle turns insert_vector_elt into a
vector_shuffle, even if the inserted element is a vector with a single
element. In this case, it should be unlikely that the additional shuffle
would be more efficient than a insert_vector_elt.
Additionally, this fixes a infinite cycle in DAGCombine, where
combineInsertEltToShuffle turns a insert_vector_elt into a shuffle,
which gets turned back into a insert_vector_elt/extract_vector_elt by
a custom AArch64 lowering (in visitVECTOR_SHUFFLE).
Such insert_vector_elt and extract_vector_elt combinations can be
lowered efficiently using mov on AArch64.
There are 2 test changes in arm64-neon-copy.ll: we now use one or two
mov instructions instead of a single zip1. The reason that we need a
second mov in ins1f2 is that we have to move the result to the result
register and is not really related to the DAGCombine fold I think.
But in any case, on most uarchs, mov should be cheaper than zip1. On a
Cortex-A75 for example, zip1 is twice as expensive as mov
(https://developer.arm.com/docs/101398/latest/arm-cortex-a75-software-optimization-guide-v20)
Reviewers: spatel, efriedma, dmgreen, RKSimon
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D80710
AddressingModeMatcher::matchScaledValue was calling getSExtValue for a constant before ensuring that we can actually represent the value as int64_t
Fixes OSSFuzz#22723 which is a followup to rGc479052a74b2 (PR46004 / OSSFuzz#22357)
Summary:
The description of EXTACT_SUBVECTOR and INSERT_SUBVECTOR has been
changed to accommodate scalable vectors (see ISDOpcodes.h). This
patch updates the asserts used to verify these requirements when
using SelectionDAG's getNode interface.
This patch introduces the MVT function getVectorMinNumElements
that can be used against fixed-length and scalable vectors when
only the known minimum vector length is required.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80709
We should be using getVectorElementCount() to assert that two types
have the same numbers of elements. I encountered the warnings while
compiling this test:
CodeGen/AArch64/sve-intrinsics-ld1.ll
Differential Revision: https://reviews.llvm.org/D80616
I have tried to ensure that SelectionDAG and DAGCombiner do
sensible things for scalable vectors, and added support for a
limited number of simple folds. Codegen support for the vector
extract patterns have also been added to the AArch64 backend.
New vector extract tests have been added here:
CodeGen/AArch64/sve-extract-element.ll
and I have also added new folds using inserts and extracts here:
CodeGen/AArch64/sve-insert-element.ll
Differential Revision: https://reviews.llvm.org/D80208
During legalization we can end up with extends of loads, which in the case of
zexts causes us to not hit tablegen imported patterns.
The caveat here is that we don't want anyext load forming, since some variants
are illegal. This change also prevents the combine from creating any illegal
loads.
Differential Revision: https://reviews.llvm.org/D80458
I get confused by a lot of the predicate names here, since I would
assume they apply to vectors as well. Rename to reflect they only
apply to scalars.
Also add a few predicates AMDGPU uses that should be generally useful.
Also add any() to complement all. I've wanted to use this a few times
but then worked around it not being there.
Summary:
We received a report of LiveDebugValues consuming 25GB+ of RAM when
compiling code generated by Unity's IL2CPP scripting backend.
There's an initial 5GB spike due to repeatedly copying cached lists of
MachineBasicBlocks within the UserValueScopes members of VarLocs.
But the larger scaling issue arises due to the fact that prior to range
extension, there are 81K basic blocks and 156K DBG_VALUEs: given enough
memory, LiveDebugValues would insert 101 million MIs (I counted this by
incrementing a counter inside of VarLoc::BuildDbgValue).
It seems like LiveDebugValues would have to be rearchitected to support
this kind of input (we'd need some new represntation for DBG_VALUEs that
get inserted into ~every block via flushPendingLocs). OTOH, large globs
of auto-generated code are typically not debugged interactively.
So: add cutoffs to disable range extension when the input is too big. I
chose the cutoffs experimentally, erring on the conservative side. When
compiling a large collection of Apple software, range extension never
got disabled.
rdar://63418929
Reviewers: aprantl, friss, jmorse, Orlando
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80662
Summary:
Verify that each DBG_VALUE has a debug location. This is required by
LiveDebugValues, and perhaps by other late passes.
There's an exception for tests: lots of tests use a two-operand form of
DBG_VALUE for convenience. There's no reason to prevent that.
This is an extension of D80665, but there's no dependency.
Reviewers: aprantl, jmorse, davide, chrisjackson
Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80670
Summary:
Assert that MachineLICM does not move a debug instruction and then drop
its debug location. Later passes require each debug instruction to have
a location.
Testing: check-llvm, clang stage2 RelWithDebInfo build (x86_64)
Reviewers: aprantl, davide, chrisjackson, jmorse
Subscribers: hiraditya, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80665
These are the two operand sets which are expected to survive more than another week or so. Instead of bothering to update the deopt and gc-transition operands, we'll just wait until those are removed and delete the code.
For those following along, this is likely to be the last (major) change in this sequence for about a week. I want to wait until all of this has been merged downstream to ensure I haven't introduced any bugs (and migrate some downstream code to the new interfaces). Once that's done, we should be able to delete Statepoint/ImmutableStatepoint without too much work.
I'd apparently only grepped in the lib directories and missed a few used in the Statepoint header itself. Beyond simple mechanical cleanup, changed the type of one routine to reflect the fact it also returns a statepoint.
Sinking logic around actual callee from Statepoint to GCStatepointInst. While doing so, adjust naming to be consistent about refering to "actual" callee and follow precedent on naming from CallBase otherwise.
Use the result to simplify one consumer. This is mostly just to ensure the new code is exercised, but is also a helpful cleanup on it's own.
While LazyBlockFrequencyInfo itself is lazy, the dominator tree
and loop info analyses it requires are not. Drop the dependency
on this pass in SelectionDAGIsel at O0.
This makes for a ~0.6% O0 compile-time improvement.
Differential Revision: https://reviews.llvm.org/D80387
This patch upgrades DISubrange to support fortran requirements.
Summary:
Below are the updates/addition of fields.
lowerBound - Now accepts signed integer or DIVariable or DIExpression,
earlier it accepted only signed integer.
upperBound - This field is now added and accepts signed interger or
DIVariable or DIExpression.
stride - This field is now added and accepts signed interger or
DIVariable or DIExpression.
This is required to describe bounds of array which are known at runtime.
Testing:
unit test cases added (hand-written)
check clang
check llvm
check debug-info
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D80197
Now that all of the statepoint related routines have classes with isa support, let's cleanup.
I'm leaving the (dead) utitilities in tree for a few days so that I can do the same cleanup downstream without breakage.
Can't test this since I can't directly use the default expansion for
AMDGPU. It needs to scale the amount by the wave size, rather than use
the raw byte size value.
If we have a memory instruction (e.g. a load), we shouldn't combine it away in
some trivial combine.
It's possible that, say, a call lives between the instructions. This could
modify the value loaded, making the load instructions not safe to fold.
Differential Revision: https://reviews.llvm.org/D80053
Use getFunctionEntryPointSymbol whenever possible to enclose the
implementation detail and reduce duplicate logic.
Differential Revision: https://reviews.llvm.org/D80402
In the current statepoint design, we have four distinct groups of operands to the call: call args, gc transition args, deopt args, and gc args. This format prexisted the support in IR for operand bundles and was in fact one of the inspirations for the extension. However, we never went back and rearchitected statepoints to fully leverage bundles.
This change is the first in a small sequence to do so. All this does is extend the SelectionDAG lowering code to allow deopt and gc transition operands to be specified in either inline argument bundles or operand bundles.
Differential Revision: https://reviews.llvm.org/D8059
Summary:
Previously, we only added early-clobber flags to the 'group' immediate flag operand
of an inline asm operand.
However, we also have to add the EarlyClobber flag to the MachineOperand itself.
This fixes PR46028
Reviewers: arsenm, leonardchan
Reviewed By: arsenm, leonardchan
Subscribers: phosek, wdng, rovka, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80467
Summary:
A struct argument can be passed-by-value to a callee via a pointer to a
temporary stack copy. Add support for emitting an entry value DBG_VALUE
when an indirect parameter DBG_VALUE becomes unavailable. This is done
by omitting DW_OP_stack_value from the entry value expression, to make
the expression describe the location of an object.
rdar://63373691
Reviewers: djtodoro, aprantl, dstenb
Subscribers: hiraditya, lldb-commits, llvm-commits
Tags: #lldb, #llvm
Differential Revision: https://reviews.llvm.org/D80345
Confusingly, these were unrelated and had different semantics. The
G_PTR_MASK instruction predates the llvm.ptrmask intrinsic, but has a
different format. G_PTR_MASK only allows clearing the low bits of a
pointer, and only a constant number of bits. The ptrmask intrinsic
allows an arbitrary mask. Replace G_PTR_MASK to match the intrinsic.
Only selects the cases that look like the old instruction. More work
is needed to select the general case. Also new legalization code is
still needed to deal with the case where the incoming mask size does
not match the pointer size, which has a specified behavior in the
langref.
This intrinsic implements IEEE-754 operation roundToIntegralTiesToEven,
and performs rounding to the nearest integer value, rounding halfway
cases to even. The intrinsic represents the missed case of IEEE-754
rounding operations and now llvm provides full support of the rounding
operations defined by the standard.
Differential Revision: https://reviews.llvm.org/D75670
binop (splat X), (splat C) --> splat (binop X, C)
binop (splat C), (splat X) --> splat (binop C, X)
We do this in IR, and there's a similar fold for the case with 2
non-constant operands just above the code diff in this patch.
This was discussed in D79718, and the extra shuffle in the test
(llvm/test/CodeGen/X86/vector-fshl-128.ll::sink_splatvar) where it
was noticed disappears because demanded elements analysis is no
longer blocked. The large majority of the test diffs seem to be
benign code scheduling changes, but I do see another type of win:
moving the splat later allows binop narrowing in some cases.
Regressions were avoided on x86 and ARM with the INSERT_VECTOR_ELT
restriction.
Differential Revision: https://reviews.llvm.org/D79886
Summary:
Clean-up code around mem ops clustering logic. This patch cleans up code within
the function clusterNeighboringMemOps(). It is WIP, and this patch is a first cut.
Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar
Reviewed By: foad
Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80119
-fno-PIC and -fPIE code generally cannot be linked in -shared mode and there is no benefit accessing via local aliases.
Actually, a .Lfoo$local reference will be converted to a STT_SECTION (if no section relaxation) reference which will cause the section symbol (sizeof(Elf64_Sym)=24) to be generated.
-fno-semantic-interposition is currently the CC1 default. (The opposite
disables some interprocedural optimizations.) However, it does not infer
dso_local: on most targets accesses to ExternalLinkage functions/variables
defined in the current module still need PLT/GOT.
This patch makes explicit -fno-semantic-interposition infer dso_local,
so that PLT/GOT can be eliminated if targets implement local aliases
for AsmPrinter::getSymbolPreferLocal (currently only x86).
Currently we check whether the module flag "SemanticInterposition" is 0.
If yes, infer dso_local. In the future, we can infer dso_local unless
"SemanticInterposition" is 1: frontends other than clang will also
benefit from the optimization if they don't bother setting the flag.
(There will be risks if they do want ELF interposition: they need to set
"SemanticInterposition" to 1.)
For the supported binops (basic arithmetic, logicals + shifts), if we fail to simplify the demanded vector elts, then call SimplifyMultipleUseDemandedBits and try to peek through ops to remove unnecessary dependencies.
This helps with PR40502.
Differential Revision: https://reviews.llvm.org/D79003
Fixes a build issue with libc++ configured with _LIBCPP_RAW_ITERATORS (ADL not effective)
```
llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp:1602:3: error: no matching function for call to 'transform'
transform(HexString.begin(), HexString.end(), HexString.begin(), tolower);
^~~~~~~~~
```
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D80475
For the 'inverse shift', we currently always perform a subtraction of the original (masked) shift amount.
But for the case where we are handling power-of-2 type widths, we can replace:
(sub bw-1, (and amt, bw-1) ) -> (and (xor amt, bw-1), bw-1) -> (and ~amt, bw-1)
This allows x86 shifts to fold away the and-mask.
Followup to D77301 + D80466.
http://volta.cs.utah.edu:8080/z/Nod0Gr
Differential Revision: https://reviews.llvm.org/D80489
This patch introduces a TargetLowering query, isMulhCheaperThanMulShift.
Currently in DAG Combine, it will transform mulhs/mulhu into a
wider multiply and a shift if the wide multiply is legal.
This TLI function is implemented on 64-bit PowerPC, as it is more desirable to
have multiply-high over multiply + shift for words and doublewords. Having
multiply-high can also aid in further transformations that can be done.
Differential Revision: https://reviews.llvm.org/D78271
Disable pruning of unreachable resumes in the DwarfEHPrepare pass
at optnone. While I expect the pruning itself to be essentially free,
this does require a dominator tree calculation, that is not used for
anything else. Saving this DT construction makes for a 0.4% O0
compile-time improvement.
Differential Revision: https://reviews.llvm.org/D80400
Replace with forward declaration and move dependency down to source files that actually need it.
Both TargetLowering.h and TargetMachine.h are 2 of the most expensive headers (top 10) in the ClangBuildAnalyzer report when building llc.
When performing codegen at optnone, don't add alias analysis to
the pipeline. We don't need it, but it causes an unnecessary
dominator tree calculation.
I've also moved the module verifier call to the top so that a bunch
of disabled-at-optnone passes group more nicely.
Differential Revision: https://reviews.llvm.org/D80378
If the caller needs to reponsible for making sure the MaybeAlign
has a value, then we should just make the caller convert it to an Align
with operator*.
I explicitly deleted the relational comparison operators that
were being inherited from Optional. It's unclear what the meaning
of two MaybeAligns were one is defined and the other isn't
should be. So make the caller reponsible for defining the behavior.
I left the ==/!= operators from Optional. But now that exposed a
weird quirk that ==/!= between Align and MaybeAlign required the
MaybeAlign to be defined. But now we use the operator== from
Optional that takes an Optional and the Value.
Differential Revision: https://reviews.llvm.org/D80455
This temporarily reverts commit 7019cea26d.
It seems that, for some targets, there are instructions with a lot of memory operands (probably more than would be expected). This causes a lot of buildbots to timeout and notify failed builds. While investigations are ongoing to find out why this happens, revert the changes.
AddressingModeMatcher::matchAddr was calling getSExtValue for a constant before ensuring that we can actually represent the value as int64_t
Fixes PR46004 / OSSFuzz#22357
(This patch is by Jessica, I'm just committing it on her behalf because I need
a post-legalizer combiner for something else).
This supersedes D77250, which did equivalent work in the selector. This can be
done pre-legalization or post-legalization. Post-legalization is more likely to
hit, since G_IMPLICIT_DEFs tend to appear during legalization. There's no reason
to not do it pre-legalization though-- if it can be caught earlier, great.
(I also think that it might be worth reimplementing D78769 using a
target-specific post-legalization combine too after thinking about it for a
while.)
Differential Revision: https://reviews.llvm.org/D78852
Summary:
To support all targets, the mayAlias member function needs to support instructions with multiple operands.
This revision also changes the order of the emitted instructions in some test cases.
Reviewers: efriedma, hfinkel, craig.topper, dmgreen
Reviewed By: efriedma
Subscribers: MatzeB, dmgreen, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80161
Summary:
For some targets generic combines don't really do much and they
consume a disproportionate amount of time.
There's not really a mechanism in SDISel to tactically disable
combines, but we can have a switch to disable all of them and
let the targets just implement what they specifically need.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79112
When moving an instruction into a block where it was referenced by a phi when peeling,
refer to the phi's register number and assert that the instruction has it in its destinations.
This way, it also covers instructions with more than one destination.
Patch by Hendrik Greving!
Differential Revision: https://reviews.llvm.org/D80027
This is split off from D80316, slightly tightening the definition of overloaded
hardwareloop intrinsic llvm.loop.decrement.reg specifying that both operands
its result have the same type.
We do not have any special handling for constant FP deopt arguments.
They are just spilled to stack or generated in register by MOVS
instruction. This is inefficient and, when we have too many such
constant arguments, may result in register allocation failure.
Instead, we can bitcast such constant FP operands to appropriately
sized integer and record as constant into statepoint and later, into
StackMap.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D80318
Will make it easier to pass the pointer info and alignment
correctly to the loads/stores.
While there also make the i32 stores independent and use a token
factor to join before the load.
If we don't know anything about the alignment of a pointer, Align(1) is
still correct: all pointers are at least 1-byte aligned.
Included in this patch is a bugfix for an issue discovered during this
cleanup: pointers with "dereferenceable" attributes/metadata were
assumed to be aligned according to the type of the pointer. This
wasn't intentional, as far as I can tell, so Loads.cpp was fixed to
stop making this assumption. Frontends may need to be updated. I
updated clang's handling of C++ references, and added a release note for
this.
Differential Revision: https://reviews.llvm.org/D80072
Previously this code just used a default constructed
MachinePointerInfo. But we know the accesses are to a fixed stack
object or at least somewhere on the stack.
While there fix the alignment passed to the full vector load/stores.
I don't think this function is currently exercised in tree so I
don't know how to test it. I just noticed it when I removed
non-constant index support in this function.
Differential Revision: https://reviews.llvm.org/D80058
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.
In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.
This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.
The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.
Force any function containing a preallocated call to use the frame
pointer.
Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.
Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).
Aside from the tests added here, I checked that this codegen produces
correct code for something like
```
struct A {
A();
A(A&&);
~A();
};
void bar() {
foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```
by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.
Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77689
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.
In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.
This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.
The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.
Force any function containing a preallocated call to use the frame
pointer.
Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.
Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).
Aside from the tests added here, I checked that this codegen produces
correct code for something like
```
struct A {
A();
A(A&&);
~A();
};
void bar() {
foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```
by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77689
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.
This patch was originally committed as b8a3c34eee, but broke the
modules build, as LoopAccessAnalysis was using the Expander.
The code-gen part of LAA was moved to lib/Transforms recently, so this
patch can be landed again.
Reviewers: sanjoy.google, efriedma, reames
Reviewed By: sanjoy.google
Differential Revision: https://reviews.llvm.org/D71537
Replace with forward declarations and move necessary includes down to source files.
Exposes an implicit dependency on TargetMachine.h in llvm-opt-fuzzer.cpp
We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression.
However, during negating the expression, the cost might change as we are changing the DAG,
and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore.
This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression,
and check the cost during negating the expression. It also reduce the duplicated code between
getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638
Reviewed By: RKSimon, spatel
Differential Revision: https://reviews.llvm.org/D77319
This was looking for a compare condition, and copying the compare
flags. I don't think this was ever correct outside of certain min/max
patterns which aren't checked, but this probably predates select
instructions having fast math flags.
Replace with forward declarations and move includes down to source files where required.
I also needed to move the TargetLoweringObjectFile::SectionForGlobal wrapper implementation down into TargetLoweringObjectFile.cpp
This reverts commit 525a591f0f.
Fixed an issue with pointers to members based on typedefs. In this case,
LLVM would emit a second UDT. I fixed it by not passing the class type
to getTypeIndex when the base type is not a function type. lowerType
only uses the class type for direct function types. This suggests if we
have a PMF with a function typedef, there may be an issue, but that can
be solved separately.
verifyFunction/verifyModule don't assert or error internally. They
also don't print anything if you don't pass a raw_ostream to them.
So the caller needs to check the result and ideally pass a stream
to get the messages. Otherwise they're just really expensive no-ops.
I've filed PR45965 for another instance in SLPVectorizer
that causes a lit test failure.
Differential Revision: https://reviews.llvm.org/D80106
I have changed the pass so that we ignore shuffle vectors with
scalable vector types, and replaced VectorType with FixedVectorType
in the rest of the pass. I couldn't think of an easy way to test
this change, since for scalable vectors we shouldn't be using
shufflevectors for interleaving. This change fixes up some
type size assert warnings I found in the following test:
CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
Differential Revision: https://reviews.llvm.org/D79700
> Before this patch, S_[L|G][THREAD32|DATA32] records were emitted with a simple name, not the fully qualified name (namespace + class scope).
>
> Differential Revision: https://reviews.llvm.org/D79447
This causes asserts in Chromium builds:
CodeViewDebug.cpp:2997: void llvm::CodeViewDebug::emitDebugInfoForUDTs(const std::vector<std::pair<std::string, const DIType *>> &):
Assertion `OriginalSize == UDTs.size()' failed.
I will follow up on the Phabricator issue.
for variables in nested scopes (including inlined functions) if there is a
single location which covers the entire scope and the scope is contained in a
single block.
Based on work by @jmorse.
Reviewed By: vsk, aprantl
Differential Revision: https://reviews.llvm.org/D79571
Now that load/store alignment is required, we no longer need most
of them. Also switch the getLoadStoreAlignment() helper to return
Align instead of MaybeAlign.
This is a no-op/NFC at the moment & generally makes the code /somewhat/
cleaner/less reliant on assumptions about what will produce a debug_addr
section.
It's still a bit "spooky action at a distance" - the add ranges code
pre-emptively inserts addresses into the address pool it knows will
eventually be used by the range emission code (or low/high pc).
The 'ideal' would be either to actually compute the addresses needed for
range (& loc) emission earlier - which would mean decanonicalizing the
range/loc representation earlier to account for whether it was going to
use addrx encodings or not (which would be unfortunate, but could be
refactored to be relatively unobtrusive).
Alternatively, emitting the range/loc sections earlier would cause them
to request the needed addresses sooner - but then you endup having to
split finalizeModuleInfo because some things need to be handled there
before the ranges/locs are emitted, I think...
We know the pointer somewhere on the stack, we just don't know
exactly where since the index may be variable.
Differential Revision: https://reviews.llvm.org/D80060
Along the lines of D77454 and D79968. Unlike loads and stores, the
default alignment is getPrefTypeAlign, to match the existing handling in
various places, including SelectionDAG and InstCombine.
Differential Revision: https://reviews.llvm.org/D80044
This is basically the same patch as D63233, but converted to
funnel shifts rather than regular shifts. I did not see a
way to effectively share code for these 2 cases though.
This follows D79718 and D79827 to re-fix PR37426 because
that gets canonicalized to funnel shift intrinsics in IR.
I did draft an alternative patch as an enhancement to
"shouldSinkOperands()", but that was awkward because
we have to key the transform from the select, but then
look at both its users and its operands.
The code was calculating an offset from a stack pointer SDValue.
This is exactly what getMemBasePlusOffset does. I also replaced
sizeof(int) with a hardcoded 4. We know the type we're operating
on is 4 bytes. But the size of int that the source code is being
compiled with isn't guaranteed to be 4 bytes.
While here replace another use of getMemBasePlusOffset that was
proceeded with a call to getConstant with the other signature
that call getConstant internally.
This bug is exposed by Test7 of ehthrow.cxx in MSVC EH suite where
a rethrow occurs in a try-catch inside a catch (i.e., a nested Catch
handlers). See the test code in
https://github.com/microsoft/compiler-tests/blob/master/eh/ehthrow.cxx#L346
When an object is rethrown in a Catch handler, the copy-ctor of this
object must be executed after the destructions of live objects, but
BEFORE the dtors of live objects in parent handlers.
Today Windows 64-bit runtime (__CxxFrameHandler3 & 4) expects nested Catch
handers
are stored in pre-order (outer first, inner next) in $tryMap$ table, so
that given a State, its Catch's beginning State can be properly
retrieved. The Catch beginning state (which is also the ending State) is
the State where rethrown object's copy-ctor must take place.
LLVM currently stores nested catch handlers in post-ordering because
it's the natural way to compute the highest State in Catch.
The fix is to simply store TryCatch handler in pre-order, but update
Catch's highest State after child Catches are all processed.
Differential Revision: https://reviews.llvm.org/D79474?id=263919
Summary:
In the the given example, a stack slot pointer is merged
between a setjmp and longjmp. This pointer is spilled,
so it does not get correctly restored, addinga undefined
behaviour where it shouldn't.
Change-Id: I60ec010844f2a24ce01ceccf12eb5eba5ab94abb
Reviewers: eli.friedman, thanm, efriedma
Reviewed By: efriedma
Subscribers: MatzeB, qcolombet, tpr, rnk, efriedma, hiraditya, llvm-commits, chill
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77767
This is D77454, except for stores. All the infrastructure work was done
for loads, so the remaining changes necessary are relatively small.
Differential Revision: https://reviews.llvm.org/D79968
Before this patch, S_[L|G][THREAD32|DATA32] records were emitted with a simple name, not the fully qualified name (namespace + class scope).
Differential Revision: https://reviews.llvm.org/D79447
For now I have changed FoldConstantVectorArithmetic to return early
if we encounter a scalable vector, since the subsequent code assumes
you can perform lane-wise constant folds. However, in future work we
should be able to extend this to look at splats of a constant value
and fold those if possible. I have also added the same code to
FoldConstantArithmetic, since that deals with vectors too.
The warnings I fixed in this patch were being generated by this
existing test:
CodeGen/AArch64/sve-int-arith.ll
Differential Revision: https://reviews.llvm.org/D79421
Summary:
The BFloat IR type is introduced to provide support for, initially, the BFloat16
datatype introduced with the Armv8.6 architecture (optional from Armv8.2
onwards). It has an 8-bit exponent and a 7-bit mantissa and behaves like an IEEE
754 floating point IR type.
This is part of a patch series upstreaming Armv8.6 features. Subsequent patches
will upstream intrinsics support and C-lang support for BFloat.
Reviewers: SjoerdMeijer, rjmccall, rsmith, liutianle, RKSimon, craig.topper, jfb, LukeGeeson, sdesmalen, deadalnix, ctetreau
Subscribers: hiraditya, llvm-commits, danielkiss, arphaman, kristof.beyls, dexonsmith
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78190
Summary:
D78319 introduced basic support for inline asm input operands in GlobalISel.
However, that patch did not handle the case where a memory input operand still needs to
be indirectified. Later code asserts that the memory operand is already indirect.
This patch adds an early return false to trigger the SelectionDAG fallback for now.
Reviewers: arsenm, paquette
Reviewed By: arsenm
Subscribers: thakis, wdng, rovka, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79955
I've created a new variant of CreateStackTemporary that takes
TypeSize and Align arguments, and made the older instances of
CreateStackTemporary call this new function. This refactoring is
in preparation for more patches in this area related to scalable
vectors and improving the alignment calculations.
Differential Revision: https://reviews.llvm.org/D79933
This patch adds support for DWARF attribute DW_AT_data_location.
Summary:
Dynamic arrays in fortran are described by array descriptor and
data allocation address. Former is mapped to DW_AT_location and
later is mapped to DW_AT_data_location.
Testing:
unit test cases added (hand-written)
check llvm
check debug-info
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D79592
llvm rejects DWARF operator DW_OP_push_object_address.This DWARF
operator is needed for Flang to support allocatable array.
Summary:
Currently llvm rejects DWARF operator DW_OP_push_object_address.
below error is produced when llvm finds this operator.
[..]
invalid expression
!DIExpression(151)
warning: ignoring invalid debug info in pushobj.ll
[..]
There are some parts missing in support of this operator, need to
be completed.
Testing
-added a unit testcase
-check-debuginfo
-check-llvm
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D79306
Summary:
In the patch D78849, it uses llvm::any_of to instead of for loop to
simplify the function addRequired().
It's obvious that above code is not a NFC conversion. Because any_of
will return if any addRequired(Reg) is true immediately, but we want
every element to call addRequired(Reg).
This patch uses for_range loop to fix above any_of bug.
Reviewed By: MaskRay, nickdesaulniers
Differential Revision: https://reviews.llvm.org/D79872
Summary:
D78319 introduced basic support for inline asm input operands in GlobalISel.
However, that patch did not handle the case where a memory input operand still needs to
be indirectified. Later code asserts that the memory operand is already indirect.
This patch adds an early return false to trigger the SelectionDAG fallback for now.
Reviewers: arsenm, paquette
Reviewed By: arsenm
Subscribers: wdng, rovka, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79955
We need to use it to handle <16 x double> indirect indexes
in the AMDGPU BE.
The only visible change from adding it is in ARM cost model.
To me it looks reasonable. With doubling a vector size it
quadruples the cost up to the size 8 and then it did only
double it. Now it also quadruples, which seems a logical
progression to me.
Actual AMDGPU code is to follow, this is a common part, plus
load/store legalization in the AMDGPU BE not to break what
works now.
Differential Revision: https://reviews.llvm.org/D79952
The fact that loads and stores can have the alignment missing is a
constant source of confusion: code that usually works can break down in
rare cases. So fix the LoadInst API so the alignment is never missing.
To reduce the number of changes required to make this work, IRBuilder
and certain LoadInst constructors will grab the module's datalayout and
compute the alignment automatically. This is the same alignment
instcombine would eventually apply anyway; we're just doing it earlier.
There's a minor risk that the way we're retrieving the datalayout
could break out-of-tree code, but I don't think that's likely.
This is the last in a series of patches, so most of the necessary
changes have already been merged.
Differential Revision: https://reviews.llvm.org/D77454
For IR generated by a compiler, this is really simple: you just take the
datalayout from the beginning of the file, and apply it to all the IR
later in the file. For optimization testcases that don't care about the
datalayout, this is also really simple: we just use the default
datalayout.
The complexity here comes from the fact that some LLVM tools allow
overriding the datalayout: some tools have an explicit flag for this,
some tools will infer a datalayout based on the code generation target.
Supporting this properly required plumbing through a bunch of new
machinery: we want to allow overriding the datalayout after the
datalayout is parsed from the file, but before we use any information
from it. Therefore, IR/bitcode parsing now has a callback to allow tools
to compute the datalayout at the appropriate time.
Not sure if I covered all the LLVM tools that want to use the callback.
(clang? lli? Misc IR manipulation tools like llvm-link?). But this is at
least enough for all the LLVM regression tests, and IR without a
datalayout is not something frontends should generate.
This change had some sort of weird effects for certain CodeGen
regression tests: if the datalayout is overridden with a datalayout with
a different program or stack address space, we now parse IR based on the
overridden datalayout, instead of the one written in the file (or the
default one, if none is specified). This broke a few AVR tests, and one
AMDGPU test.
Outside the CodeGen tests I mentioned, the test changes are all just
fixing CHECK lines and moving around datalayout lines in weird places.
Differential Revision: https://reviews.llvm.org/D78403
Use an extra shift-by-1 instead of a compare and select to handle the
shift-by-zero case. This sometimes saves one instruction (if the compare
couldn't be combined with a previous instruction). It also works better
on targets that don't have good select instructions.
Note that currently this change doesn't affect most targets because
expandFunnelShift is not used because funnel shift intrinsics are
lowered early in SelectionDAGBuilder. But there is work afoot to change
that; see D77152.
Differential Revision: https://reviews.llvm.org/D77301
Expands on the enablement of the shouldSinkOperands() TLI hook in:
D79718
The last codegen/IR test diff shows what I suspected could happen - we were
sinking all splat shift operands into a loop. But that's not what we want in
general; we only want to sink the *shift amount* operand if it is a splat.
Differential Revision: https://reviews.llvm.org/D79827
It sounds like an interesting idea in theory, but nothing is actually
taking advantage of it, and specifying/implementing the edge cases is
painful. So just forbid it.
Differential Revision: https://reviews.llvm.org/D79814
Under MVE a vdup will always take a gpr register, not a floating point
value. During DAG combine we convert the types to a bitcast to an
integer in an attempt to fold the bitcast into other instructions. This
is OK, but only works inside the same basic block. To do the same trick
across a basic block boundary we need to convert the type in
codegenprepare, before the splat is sunk into the loop.
This adds a convertSplatType function to codegenprepare to do that,
putting bitcasts around the splat to force the type to an integer. There
is then some adjustment to the code in shouldSinkOperands to handle the
extra bitcasts.
Differential Revision: https://reviews.llvm.org/D78728
This patch extends DIModule Debug metadata in LLVM to support
Fortran modules. DIModule is extended to contain File and Line
fields, these fields will be used by Flang FE to create debug
information necessary for representing Fortran modules at IR level.
Furthermore DW_TAG_module is also extended to contain these fields.
If these fields are missing, debuggers like GDB won't be able to
show Fortran modules information correctly.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D79484
GNU ld's internal linker script uses (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=add44f8d5c5c05e08b11e033127a744d61c26aee)
.text :
{
*(.text.unlikely .text.*_unlikely .text.unlikely.*)
*(.text.exit .text.exit.*)
*(.text.startup .text.startup.*)
*(.text.hot .text.hot.*)
*(SORT(.text.sorted.*))
*(.text .stub .text.* .gnu.linkonce.t.*)
/* .gnu.warning sections are handled specially by elf.em. */
*(.gnu.warning)
}
Because `*(.text.exit .text.exit.*)` is ordered before `*(.text .text.*)`, in a -ffunction-sections build, the C library function `exit` will be placed before other functions.
gold's `-z keep-text-section-prefix` has the same problem.
In lld, `-z keep-text-section-prefix` recognizes `.text.{exit,hot,startup,unlikely,unknown}.*`, but not `.text.{exit,hot,startup,unlikely,unknown}`, to avoid the strange placement problem.
In -fno-function-sections or -fno-unique-section-names mode, a function whose `function_section_prefix` is set to `.exit"`
will go to the output section `.text` instead of `.text.exit` when linked by lld.
To address the problem, append a dot to become `.text.exit.`
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D79600
Summary:
ConstantExprs involving operations on <1 x Ty> could translate into MIR
that failed to verify with:
*** Bad machine code: Reading virtual register without a def ***
The problem was that translate(const Constant &C, Register Reg) had
recursive calls that passed the same Reg in for the translation of a
subexpression, but without updating VMap for the subexpression first as
translate(const Constant &C, Register Reg) expects.
Fix this by using the same translateCopy helper function that we use for
translating Instructions. In some cases this causes extra G_COPY
MIR instructions to be generated.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45576
Reviewers: arsenm, volkan, t.p.northover, aditya_nandakumar
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78378
I have fixed up some places in SelectionDAG::getNode() where we
used to assert that the number of vector elements for two types
are the same. I have changed such cases to assert that the
element counts are the same instead. I've added new tests that
exercise the code paths for all the truncations. All the extend
operations are covered by this existing test:
CodeGen/AArch64/sve-sext-zext.ll
For the ISD::SETCC case I fixed this code path is exercised by
these existing tests:
CodeGen/AArch64/sve-fcmp.ll
CodeGen/AArch64/sve-intrinsics-int-compares-with-imm.ll
Differential Revision: https://reviews.llvm.org/D79399
allocas in LLVM IR have a specified alignment. When that alignment is
specified, the alloca has at least that alignment at runtime.
If the specified type of the alloca has a higher preferred alignment,
SelectionDAG currently ignores that specified alignment, and increases
the alignment. It does this even if it would trigger stack realignment.
I don't think this makes sense, so this patch changes that.
I was looking into this for SVE in particular: for SVE, overaligning
vscale'ed types is extra expensive because it requires realigning the
stack multiple times, or using dynamic allocation. (This currently isn't
implemented.)
I updated the expected assembly for a couple tests; in particular, for
arg-copy-elide.ll, the optimization in question does not increase the
alignment the way SelectionDAG normally would. For the rest, I just
increased the specified alignment on the allocas to match what
SelectionDAG was inferring.
Differential Revision: https://reviews.llvm.org/D79532
It is bad practice to capture by default (via [&] in this case) when
using lambdas, so we should avoid that as much as possible.
This patch fixes that in the getForwardingRegsDefinedByMI
from DwarfDebug module.
Differential Revision: https://reviews.llvm.org/D79616
We should use explicit type instead of auto type deduction when
the type is so obvious. In addition, we remove ambiguity, since auto
type deduction sometimes is not that intuitive, so that could lead
us to some unwanted behavior.
This patch fixes that in the collectCallSiteParameters() from
DwarfDebug module.
Differential Revision: https://reviews.llvm.org/D79624
We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression.
However, during negating the expression, the cost might change as we are changing the DAG,
and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore.
This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression,
and check the cost during negating the expression. It also reduce the duplicated code between
getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638
Reviewed By: RKSimon, spatel
Differential Revision: https://reviews.llvm.org/D77319
I don't have any test cases since X86 doesn't return any tied
operands from getUndefRegClearance today. But conceivably we could
want BreakFalseDeps to insert a dependency breaking XOR for
a tied operand in the future.
Currently this code exists in widenScalar for G_MERGE_VALUE
sources. I'm not sure if the existing expansion in widenScalar should
be removed or not. The widenScalar variant tries to extend to the
requested size, but this just uses the original bitwidth.
This patch stores the alignment for ConstantPoolSDNode as an
Align and updates the getConstantPool interface to take a MaybeAlign.
Removing getAlignment() will be done as a follow up.
Differential Revision: https://reviews.llvm.org/D79436
This fixes a verifier failure on a bot:
http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-aarch64-O0-g/
```
*** Bad machine code: MBB has duplicate entries in its successor list. ***
- function: foo
- basic block: %bb.5 indirectgoto (0x7fe3d687ca08)
```
One of the GCC torture suite tests (pr70460.c) has an indirectbr instruction
which has duplicate blocks in its destination list.
According to the langref this is allowed:
> Blocks are allowed to occur multiple times in the destination list, though
> this isn’t particularly useful.
(https://www.llvm.org/docs/LangRef.html#indirectbr-instruction)
We don't allow this in MIR. So, when we translate such an instruction, the
verifier screams.
This patch makes `translateIndirectBr` check if a successor has already been
added to a block. If the successor is present, it is skipped rather than added
twice.
Differential Revision: https://reviews.llvm.org/D79609
them in a special text section.
For sampleFDO, because the optimized build uses profile generated from
previous release, previously we couldn't tell a function without profile
was truely cold or just newly created so we had to treat them conservatively
and put them in .text section instead of .text.unlikely. The result was when
we persuing the best performance by locking .text.hot and .text in memory,
we wasted a lot of memory to keep cold functions inside.
In https://reviews.llvm.org/D66374, we introduced profile symbol list to
discriminate functions being cold versus functions being newly added.
This mechanism works quite well for regular use cases in AutoFDO. However,
in some case, we can only have a partial profile when optimizing a target.
The partial profile may be an aggregated profile collected from many targets.
The profile symbol list method used for regular sampleFDO profile is not
applicable to partial profile use case because it may be too large and
introduce many false positives.
To solve the problem for partial profile use case, we provide an option called
--profile-unknown-in-special-section. For functions without profile, we will
still treat them conservatively in compiler optimizations -- for example,
treat them as warm instead of cold in inliner. When we use profile info to
add section prefix for functions, we will discriminate functions known to be
not cold versus functions without profile (being unknown), and we will put
functions being unknown in a special text section called .text.unknown.
Runtime system will have the flexibility to decide where to put the special
section in order to achieve a balance between performance and memory saving.
Differential Revision: https://reviews.llvm.org/D62540
If the SimplifyMultipleUseDemandedBits calls BITCASTs that peek through back to the original type then we can remove the BITCASTs entirely.
Differential Revision: https://reviews.llvm.org/D79572
With a fix to uninitialized EndOffset.
DW_OP_call_ref is the only operation that has an operand which depends
on the DWARF format. The patch fixes handling that operation in DWARF64
units.
Differential Revision: https://reviews.llvm.org/D79501
DW_OP_call_ref is the only operation that has an operand which depends
on the DWARF format. The patch fixes handling that operation in DWARF64
units.
Differential Revision: https://reviews.llvm.org/D79501
CorrectExtraCFGEdges function.
The latter was a workaround for "Various pieces of code" leaving bogus
extra CFG edges in place. Where by "various" it meant only
IfConverter::MergeBlocks, which failed to clear all of the successors
of dead blocks it emptied out. This wouldn't matter a whole lot,
except that the dead blocks remained listed as predecessors of
still-useful blocks, inhibiting optimizations.
This fix slightly changed two thumb tests, because the correct CFG
successors allowed for the "diamond" if-conversion pattern to be
detected, when it could only use "simple" before.
Additionally, the removal of a now-redundant call to analyzeBranch
(with AllowModify=true) in BranchFolder::OptimizeFunction caused a
later check for an empty block in BranchFolder::OptimizeBlock to
fail. Correct this by moving the call to analyzeBranch in
OptimizeBlock higher.
Differential Revision: https://reviews.llvm.org/D79527
Summary:
This helps detect some missed BFI updates during CodeGenPrepare.
This is debug build only and disabled behind a flag.
Fix a missed update in CodeGenPrepare::dupRetToEnableTailCallOpts().
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77417
When peeling out the epilogue we need to ignore illegal phis coming from stages
greater than the producer stage. Otherwise we end up with circular phi
dependencies.
Differential Revision: https://reviews.llvm.org/D79581
Summary:
This patch handles illegal scalable types when lowering IR operations,
addressing several places where the value of isScalableVector() is
ignored.
For types such as <vscale x 8 x i32>, this means splitting the
operations. In this example, we would split it into two
operations of type <vscale x 4 x i32> for the low and high halves.
In cases such as <vscale x 2 x i32>, the elements in the vector
will be promoted. In this case they will be promoted to
i64 (with a vector of type <vscale x 2 x i64>)
Reviewers: sdesmalen, efriedma, huntergr
Reviewed By: efriedma
Subscribers: david-arm, tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78812
Calling getShiftAmountTy with LegalTypes set may return a type that's too narrow to hold the shift amount for integer type it's applied to.
Fixes the regression introduced by D79096
Differential Revision: https://reviews.llvm.org/D79405
Try to combine N short vector cast ops into 1 wide vector cast op:
concat (cast X), (cast Y)... -> cast (concat X, Y...)
This is part of solving PR45794:
https://bugs.llvm.org/show_bug.cgi?id=45794
As noted in the code comment, this is uglier than I was hoping because
the opcode determines whether we pass the source or destination type
to isOperationLegalOrCustom(). Also IIUC, there's no way to validate
what the other (dest or src) type is. Without the extra legality check
on that, there's an ARM regression test in:
test/CodeGen/ARM/isel-v8i32-crash.ll
...that will crash trying to lower an unsupported v8f32 to v8i16.
Differential Revision: https://reviews.llvm.org/D79360
This patch adds ORE for MachinePipeliner, so that people can anaylyze
their code using opt-viewer or other tools, then optimize the code to
catch more piplining opportunities.
Reviewed By: bcahoon
Differential Revision: https://reviews.llvm.org/D79368
Make the kind of cost explicit throughout the cost model which,
apart from making the cost clear, will allow the generic parts to
calculate better costs. It will also allow some backends to
approximate and correlate the different costs if they wish. Another
benefit is that it will also help simplify the cost model around
immediate and intrinsic costs, where we currently have multiple APIs.
RFC thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html
Differential Revision: https://reviews.llvm.org/D79002
Summary:
I have fixed several places in getSplatSourceVector and isSplatValue
to work correctly with scalable vectors. I added new support for
the ISD::SPLAT_VECTOR DAG node as one of the obvious cases we can
support with scalable vectors. In other places I have tried to do
the sensible thing, such as bail out for vector types we don't yet
support or don't intend to support.
It's not possible to add IR test cases to cover these changes, since
they are currently only ever exercised on certain targets, e.g.
only X86 targets use the result of getSplatSourceVector. I've
assumed that X86 tests already exist to test these code paths for
fixed vectors. However, I have added some AArch64 unit tests that
test the specific functions I have changed.
Differential revision: https://reviews.llvm.org/D79083
Register live ranges may have had gaps that after coalescing should be
removed. This is done by adding a new segment to the range, and merging
it with neighboring segments. When doing so, do not assume that each
subrange of the register ended at the same index. If a subrange ended
earlier, adding this segment could make the live range invalid.
Instead, if the subrange is not live at the start of the segment,
extend it first.
Today symbol names generated for machine basic block sections use a
unary encoding to reduce bloat. This is essential when every basic block
in the binary is assigned a symbol however with basic block clusters
(rG05192e585ce175b55f2a26b83b4ed7882785c8e6) when we only need to
generate a few non-temporary symbols we can assign more descriptive
names making them more user friendly. With this change -
Cold cluster section for function foo is named "foo.cold"
Exception cluster section for function foo is named "foo.eh"
Other cluster sections identified by their ids are named "foo.ID"
Using this format works well with existing tools. It will demangle as
expected and works with existing symbolizers, profilers and debuggers
out of the box.
$ c++filt _Z3foov.cold
foo() [clone .cold]
$ c++filt _Z3foov.eh
foo() [clone .eh]
$c++filt _Z3foov.1234
foo() [clone 1234]
Tests for basicblock-sections are updated with some cleanup where
appropriate.
Differential Revision: https://reviews.llvm.org/D79221
Before this patch, global variables didn't have their namespace prepended in the Codeview debug symbol stream. This prevented Visual Studio from displaying them in the debugger (they appeared as 'unspecified error')
Differential Revision: https://reviews.llvm.org/D79028
We allocated a suitably aligned frame index so we know that all the values
have ABI alignment.
For MIPS this avoids using pair of lwl + lwr instructions instead of a
single lw. I found this when compiling CHERI pure capability code where
we can't use the lwl/lwr unaligned loads/stores and and were to falling
back to a byte load + shift + or sequence.
This should save a few instructions for MIPS and possibly other backends
that don't have fast unaligned loads/stores.
It also improves code generation for CodeGen/X86/pr34653.ll and
CodeGen/WebAssembly/offset.ll since they can now use aligned loads.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78999
The two code paths have the same goal, legalizing a load of a non-byte-sized vector by loading the "flattened" representation in memory, slicing off each single element and then building a vector out of those pieces.
The technique employed by `ExpandLoad` is slightly more convoluted and produces slightly better codegen on ARM, AMDGPU and x86 but suffers from some bugs (D78480) and is wrong for BE machines.
Differential Revision: https://reviews.llvm.org/D79096
rL368553 added SimplifyMultipleUseDemandedBits handling for ISD::TRUNCATE to SimplifyDemandedBits so we don't need to duplicate this (and it gets rid of another GetDemandedBits call which is slowly being replaced with SimplifyMultipleUseDemandedBits anyhow).
Also fix some cost tables for vXi1 types to match the costs entries for the types they will be promoted to.
Differential Revision: https://reviews.llvm.org/D79045
X86 matches several 'shift+xor' funnel shift patterns:
fold (or (srl (srl x1, 1), (xor y, 31)), (shl x0, y)) -> (fshl x0, x1, y)
fold (or (shl (shl x0, 1), (xor y, 31)), (srl x1, y)) -> (fshr x0, x1, y)
fold (or (shl (add x0, x0), (xor y, 31)), (srl x1, y)) -> (fshr x0, x1, y)
These patterns are also what we end up with the proposed expansion changes in D77301.
This patch moves these to DAGCombine's generic MatchFunnelPosNeg.
All existing X86 test cases still pass, and we just have a small codegen change in pr32282.ll.
Reviewed By: @spatel
Differential Revision: https://reviews.llvm.org/D78935
Summary:
This patch tries to ensure that we do something sensible when
generating code for the ISD::INSERT_VECTOR_ELT DAG node when operating
on scalable vectors. Previously we always returned 'undef' when
inserting an element into an out-of-bounds lane index, whereas now
we only do this for fixed length vectors. For scalable vectors it
is assumed that the backend will do the right thing in the same way
that we have to deal with variable lane indices.
In this patch I have permitted a few basic combinations for scalable
vector types where it makes sense, but in general avoided most cases
for now as they currently require the use of BUILD_VECTOR nodes.
This patch includes tests for all scalable vector types when inserting
into lane 0, but I've only included one or two vector types for other
cases such as variable lane inserts.
Differential Revision: https://reviews.llvm.org/D78992
Prior to D69446 I had done some NFC cleanup to make landing an iterative
outliner a cleaner more straight-forward patch. Since then, it seems that has
landed but I noticed some ways it could be cleaned up. Specifically:
1) doOutline was meant to be the re-runable function, but instead
runOnceOnModule was created that just calls doOutline.
2) In D69446 we discussed that the flag allowing the re-run of the
outliner should be a flag to tell how many additional times to run
the outliner again, not the total number of times. I don't think it
makes sense to introduce a flag, but print an error if the flag is
set to 0.
This is an NFCi, the i being that I get rid of the way that the
machine-outline-runs flag could be used to tell the outliner to not run
at all, and because I renamed the flag to '-machine-outliner-reruns'.
Differential Revision: https://reviews.llvm.org/D79070
Call getNegatedExpression(Cost) and check the Cost to make the code more clear.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D78347
There are several different types of cost that TTI tries to provide
explicit information for: throughput, latency, code size along with
a vague 'intersection of code-size cost and execution cost'.
The vectorizer is a keen user of RecipThroughput and there's at least
'getInstructionThroughput' and 'getArithmeticInstrCost' designed to
help with this cost. The latency cost has a single use and a single
implementation. The intersection cost appears to cover most of the
rest of the API.
getUserCost is explicitly called from within TTI when the user has
been explicit in wanting the code size (also only one use) as well
as a few passes which are concerned with a mixture of size and/or
a relative cost. In many cases these costs are closely related, such
as when multiple instructions are required, but one evident diverging
cost in this function is for div/rem.
This patch adds an argument so that the cost required is explicit,
so that we can make the important distinction when necessary.
Differential Revision: https://reviews.llvm.org/D78635
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
The code assumed that zero-extending the integer constant to the
designated alloc size would be fine even for BE targets, but that's not
the case as that pulls in zeros from the MSB side while we actually
expect the padding zeros to go after the LSB.
I've changed the codepath handling the constant integers to use the
store size for both small(er than u64) and big constants and then add
zero padding right after that.
Differential Revision: https://reviews.llvm.org/D78011
like .cfi_restore"
Insert .cfi_offset/.cfi_register when IncomingCSRSaved of current block
is larger than OutgoingCSRSaved of its previous block.
Original commit message:
https://reviews.llvm.org/D42848 only handled CFA related cfi directives but
didn't handle CSR related cfi. The patch adds the CSR part. Basically it reuses
the framework created in D42848. For each basicblock, the patch tracks which
CSR set have been saved at its CFG predecessors's exits, and compare the CSR
set with the set at its previous basicblock's exit (The previous block is the
block laid before the current block). If the saved CSR set at its previous
basicblock's exit is larger, .cfi_restore will be inserted.
The patch also generates proper .cfi_restore in epilogue to make sure the
saved CSR set is consistent for the incoming edges of each block.
Differential Revision: https://reviews.llvm.org/D74303
Summary:
Reviewing failures identified in D78586, I was finding the identifiers
for these iterators hard to read.
Reviewers: efriedma, MaskRay, jyknight
Reviewed By: MaskRay
Subscribers: hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78849
The tl;dr story is that this causes jumps in the emitted line
tables, even at `-O0`. We could at some point consider more fancy
solutions to preserve locations, but it doesn't seem to be worth
the effort for now.
<rdar://problem/62460788>
Differential Revision: https://reviews.llvm.org/D78947
Summary:
When generating code for the LLVM IR zeroinitialiser operation, if
the vector type is scalable we should be using SPLAT_VECTOR instead
of BUILD_VECTOR.
Differential Revision: https://reviews.llvm.org/D78636
This is a NFC patch for D77319. The idea is to hide the getNegatibleCost inside the getNegatedExpression()
to have it return null if the cost is expensive, and add some helper function for easy to use. And
rename the old getNegatedExpression to negateExpression to avoid the semantic conflict.
Reviewed By: RKSimon
Differential revision: https://reviews.llvm.org/D78291
Summary:
Instead of adding a ".unlikely" or ".eh" suffix for machine basic blocks,
this change updates the behaviour to use an appropriate prefix
instead. This allows lld to group basic block sections together
when -z,keep-text-section-prefix is specified and matches the behaviour
observed in gcc.
Reviewers: tmsriram, mtrofin, efriedma
Reviewed By: tmsriram, efriedma
Subscribers: eli.friedman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78742
Follow-up of D78082 and D78590.
Otherwise, because xray_instr_map is now read-only, the absolute
relocation used for Sled.Function will cause a text relocation.
A previous bug fix for varargs introduced a regression where we would
incorrectly widen some stores to memory when passing i8/i16 parameters on the
stack. This didn't show up seemingly because it only happens when there is
no signext/zeroext parameter attribute, which I think for Darwin clang adds.
Swift however seems to be a different story, and a plain anyext on the parameter
triggered the bug.
To fix this, I've added a new ValueHandler::assignValueToAddress type override
which lets us distiguish between varargs and fixed args (we still need this
widening behaviour for varargs to fix the original bug in 2018).
rdar://61353552
Summary:
Add a check to make sure that MachineInstr::mayAlias returns prematurely if at least one of its instruction parameters does not access memory. This prevents calls to TargetInstrInfo::areMemAccessesTriviallyDisjoint with incompatible instructions.
A side effect of this change is to render the mayAlias helper in the AArch64 load/store optimizer obsolete. We can now directly call the MachineInstr::mayAlias member function.
Reviewers: hfinkel, t.p.northover, mcrosier, eli.friedman, efriedma
Reviewed By: efriedma
Subscribers: efriedma, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78823
Follow-up of D78082 (x86-64).
This change avoids dynamic relocations in `xray_instr_map` for ARM/AArch64/powerpc64le.
MIPS64 cannot use 64-bit PC-relative addresses because R_MIPS_PC64 is not defined.
Because MIPS32 shares the same code, for simplicity, we don't use PC-relative addresses for MIPS32 as well.
Tested on AArch64 Linux and ppc64le Linux.
Reviewed By: ianlevesque
Differential Revision: https://reviews.llvm.org/D78590
Summary:
Given a VL=14 that is enveloped by a proper VL=16, splitting the
masked load using the enveloping halving VL=8/8 should yields
should eventually yield V=8/5. This fixes various assert failures
in getHalfNumVectorElementsVT() and IncrementMemoryAddress().
Note, I suspect similar fixes will be needed for other masked
operations, but for now I send out a fix for masked load only.
Bugzilla issue 45563
https://bugs.llvm.org/show_bug.cgi?id=45563
Reviewers: craig.topper, mehdi_amini, nicolasvasilache
Reviewed By: craig.topper
Subscribers: hiraditya, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78608
Using getValueType() is not correct for architectures extended with CHERI since
we need a pointer type and not the value that is loaded. While stack
protector is useless when you have CHERI (since CHERI provides much
stronger security guarantees), we still have a test to check that we can
generate correct code for checks. Merging b281138a1b
into our tree broke this test. Fix by using TLI.getFrameIndexTy().
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D77785
Loosen the restriction on what kinds of opcodes can be CSEd as
targets may want to CSE some generic target specific pseudos.
NFC as far as this change is concerned as CSEConfig still pretty much is
a subset of this check.
Differential Revision: https://reviews.llvm.org/D78684
Summary:
Fix an issue where the presence of debug info could disable a peephole
optimization due to areCFlagsAccessedBetweenInstrs returning the wrong
result.
In test/CodeGen/AArch64/arm64-csel.ll, the issue was found in the
function @foo5, in which the first compare could successfully be
optimized but not the second.
Reviewers: t.p.northover, eastig, paquette
Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, dsanders, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78157
Summary:
Fix an issue where the presence of debug info could disable a peephole
optimization in optimizeCompareInstr due to canInstrSubstituteCmpInstr
returning the wrong result.
Depends on D78137.
Reviewers: t.p.northover, eastig, paquette
Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, llvm-commits, dsanders
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78151
Summary:
While lowering memory intrinsics, GIsel attempts to form a tail call to
a library routine.
There might be a DBG_LABEL or something after the intrinsic call,
though: in that case, GIsel should still be able to form the tail call,
and should also delete the debug insts after the tail call as the
transform makes them invalid.
Reviewers: dsanders, aemerson
Subscribers: hiraditya, aprantl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78335
Summary:
Fix an issue which could result in ElideBrByInvertingCond or
CombineIndexedLoadStore being missed when debug info is present. In both
cases the fix is s/hasOneUse/hasOneNonDbgUse/.
Reviewers: aemerson, dsanders
Subscribers: hiraditya, aprantl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78254
Summary:
This fixes several issues where the presence of debug instructions could
disable certain combines, due to dominance queries finding uses/defs that
don't actually exist.
Reviewers: dsanders, fhahn, paquette, aemerson
Subscribers: hiraditya, arphaman, aprantl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78253
Summary:
It looks like RegBankSelect can try to assign a bank based on a
DBG_VALUE instead of ignoring it. This eventually leads to an assert
in AArch64RegisterBankInfo::getInstrMapping because there is some info
missing from the DBG_VALUE MachineOperand (I see: `Assertion failed:
(RawData != 0 && "Invalid Type"), function getScalarSizeInBits`).
I'm not 100% sure it's safe to insert DBG_VALUE instructions right
before RegBankSelect (that's what -debugify-and-strip-all-safe is
doing). Any advice appreciated.
Depends on D78135.
Reviewers: ab, qcolombet, dsanders, aprantl
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78137
Summary:
These helpers are exercised by follow-up commits in this patch series,
which is all about removing CodeGen differences with vs. without debug
info in the AArch64 backend.
Reviewers: fhahn, aprantl, jpaquette, paquette
Subscribers: kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78260
Summary:
Teach MachineDebugify how to insert DBG_VALUE instructions. This can
help find bugs causing CodeGen differences when debug info is present.
DBG_VALUE instructions are only emitted when -debugify-level is set to
locations+variables.
There is essentially no attempt made to match up DBG_VALUE register
operands with the local variables they ought to correspond to. I'm not
sure how to improve the situation. In some cases (MachineMemOperand?)
it's possible to find the IR instruction a MachineInstr corresponds to,
but in general this seems to call for "undoing" the work done by ISel.
Reviewers: dsanders, aprantl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78135
Summary:
Make GenericScheduler compute SchedDFSResult on initialization if
the policy is set. This makes it possible to create classes
that extend GenericScheduler and rely on the results of SchedDFSResult,
e.g. to perform subtree scheduling.
NFC unless the policy is set.
Subscribers: MatzeB, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78432
Preserving liveness can be useful even late in the pipeline, if we're
doing substantial optimization work afterwards. (See, for example,
D76065.) Teach MachineOutliner how to correctly set live-ins on the
basic block in outlined functions.
Differential Revision: https://reviews.llvm.org/D78605
Hash Jump Table Indices uniquely within a basic block for MIR
Canonicalizer / MIR VReg Renamer passes.
Differential Revision: https://reviews.llvm.org/D77966
We were disabling verification for no reason in a bunch of places; just
turn it on.
At this point, there are two key places where we don't run verification:
during register allocation, and after addPreEmitPass. Regalloc probably
isn't worth messing with; it has its own invariants, and verifying
afterwards is probably good enough. For after addPreEmitPass, it's
probably worth investigating improvements.
In a future change we should properly fix xray_fn_idx to use PC-relative
addresses as well, but for now let's keep absolute addresses until sled
addresses are all fixed.
Summary:
Machine Block Frequency Info (MBFI) is being computed but unused in AsmPrinter.
MBFI computation was introduced with PGO change D71149 and then its use was
removed in D71106. No need to keep computing it.
Reviewers: MaskRay, jyknight, skan, yamauchi, davidxl, efriedma, huihuiz
Reviewed By: MaskRay, skan, yamauchi
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78526
xray_instr_map contains absolute addresses of sleds, which are relocated
by `R_*_RELATIVE` when linked in -pie or -shared mode.
By making these addresses relative to PC, we can avoid the dynamic
relocations and remove the SHF_WRITE flag from xray_instr_map. We can
thus save VM pages containg xray_instr_map (because they are not
modified).
This patch changes x86-64 and bumps the sled version to 2. Subsequent
changes will change powerpc64le and AArch64.
Reviewed By: dberris, ianlevesque
Differential Revision: https://reviews.llvm.org/D78082
Summary:
The repeated use of std::next() on a MachineBasicBlock::iterator was
clever, but we only need to reconstruct the iterator post creation of
the spill instruction.
This helps simplifying where we plan to place the spill, as discussed in
D77849.
From here, we can simplify the code a little by flipping the return code
of a helper.
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: qcolombet, hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78520
- Adding changes to support comments on outlined functions with outlining for the conditions through which it was outlined (e.g. Thunks, Tail calls)
- Adapts the emitFunctionHeader to print out a comment next to the header if the target specifies it based on information in MachineFunctionInfo
- Adds mir test for function annotiation
Differential Revision: https://reviews.llvm.org/D78062
Summary:
Similar to the CallLowering class used for lowering LLVM IR calls to MIR calls,
we introduce a separate class for lowering LLVM IR inline asm to MIR INLINEASM.
There is no functional change yet, all existing tests should pass.
Reviewers: arsenm, dsanders, aemerson, volkan, t.p.northover, paquette
Reviewed By: aemerson
Subscribers: gargaroff, wdng, mgorny, rovka, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78316
Summary:
The patch D29014 has added the new ISD::FREEZE and can deal with the
integer.
The patch D76980 has added SoftenFloatRes_FREEZE for float point.
But we still lack of expand for ppc_fp128, this will cause assertion for
some cases.
This patch is to support freeze expand for ppc_fp128.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78278
[MachineOutliner] fix test for excluding CFI and add test to include CFI in outlining
New test to check that we only outline CFI instruction if all CFI
Instructions in the function would be captured by the outlining
adding x86 tests analagous to AARCH64 cfi tests
Revision: https://reviews.llvm.org/D77852
We should only modify existing ones. Previously, we were creating
MachineFunctions for externally-available functions. AFAICT this was benign
in tree but ultimately led to asan bugs in our out of tree target.
Summary:
Remove asserting vector getters from Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: dexonsmith, sdesmalen, efriedma
Reviewed By: efriedma
Subscribers: cfe-commits, hiraditya, llvm-commits
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D77278
Summary:
AIX symbol have qualname and unqualified name. The stock getSymbol
could only return unqualified name, which leads us to patch many
caller side(lowerConstant, getMCSymbolForTOCPseudoMO).
So we should try to address this problem in the callee
side(getSymbol) and clean up the caller side instead.
Note: this is a "mostly" NFC patch, with a fix for the original
lowerConstant behavior.
Differential Revision: https://reviews.llvm.org/D78045
This allows targets to know exactly which operands are contributing to
the dependency, which is required for targets with per-operand
scheduling models.
Differential Revision: https://reviews.llvm.org/D77135
I've always found the "findValue" a little odd and
inconsistent with other things in SDB.
This simplfifies the code in SDB to just handle a splat constant
address or a 2 operand GEP in the same BB. This removes the
need for "findValue" since the operands to the GEP are
guaranteed to be available. The splat constant handling is
new, but was needed to avoid regressions due to constant
folding combining GEPs created in CGP.
CGP is now responsible for canonicalizing gather/scatters into
this form. The pattern I'm using for scalarizing, a scalar GEP
followed by a GEP with an all zeroes index, seems to be subject
to constant folding that the insertelement+shufflevector was not.
Differential Revision: https://reviews.llvm.org/D76947
Ensure that symbols explicitly* assigned a section name are placed into
a section with a compatible entry size.
This is done by creating multiple sections with the same name** if
incompatible symbols are explicitly given the name of an incompatible
section, whilst:
- Avoiding using uniqued sections where possible (for readability and
to maximize compatibly with assemblers).
- Creating as few SHF_MERGE sections as possible (for efficiency).
Given that each symbol is assigned to a section in a single pass, we
must decide which section each symbol is assigned to without seeing the
properties of all symbols. A stable and easy to understand assignment is
desirable. The following rules facilitate this: The "generic" section
for a given section name will be mergeable if the name is a mergeable
"default" section name (such as .debug_str), a mergeable "implicit"
section name (such as .rodata.str2.2), or MC has already created a
mergeable "generic" section for the given section name (e.g. in response
to a section directive in inline assembly). Otherwise, the "generic"
section for a given name is non-mergeable; and, non-mergeable symbols
are assigned to the "generic" section, while mergeable symbols are
assigned to uniqued sections.
Terminology:
"default" sections are those always created by MC initially, e.g. .text
or .debug_str.
"implicit" sections are those created normally by MC in response to the
symbols that it encounters, i.e. in the absence of an explicit section
name assignment on the symbol, e.g. a function foo might be placed into
a .text.foo section.
"generic" sections are those that are referred to when a unique section
ID is not supplied, e.g. if there are multiple unique .bob sections then
".quad .bob" will reference the generic .bob section. Typically, the
generic section is just the first section of a given name to be created.
Default sections are always generic.
* Typically, section names might be explicitly assigned in source code
using a language extension e.g. a section attribute: _attribute_
((section ("section-name"))) -
https://clang.llvm.org/docs/AttributeReference.html
** I refer to such sections as unique/uniqued sections. In assembly the
", unique," assembly syntax is used to express such sections.
Fixes https://bugs.llvm.org/show_bug.cgi?id=43457.
See https://reviews.llvm.org/D68101 for previous discussions leading to
this patch.
Some minor fixes were required to LLVM's tests, for tests had been using
the old behavior - which allowed for explicitly assigning globals with
incompatible entry sizes to a section.
This fix relies on the ",unique ," assembly feature. This feature is not
available until bintuils version 2.35
(https://sourceware.org/bugzilla/show_bug.cgi?id=25380). If the
integrated assembler is not being used then we avoid using this feature
for compatibility and instead try to place mergeable symbols into
non-mergeable sections or issue an error otherwise.
Differential Revision: https://reviews.llvm.org/D72194
Summary:
Original description (https://reviews.llvm/org/D69924)
Without this change, when a nested tag type of any kind (enum, class,
struct, union) is used as a variable type, it is emitted without
emitting the parent type. In CodeView, parent types point to their inner
types, and inner types do not point back to their parents. We already
walk over all of the parent scopes to build the fully qualified name.
This change simply requests their type indices as we go along to enusre
they are all emitted.
Now, while walking over the parent scopes, add the types to
DeferredCompleteTypes, since they might already be in the process of
being emitted.
Fixes PR43905
Reviewers: rnk, amccarth
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78249
Summary:
This verifier tries to ensure that DebugLoc's don't just disappear as
we transform the MIR. It observes the instructions created, erased, and
changed and at checkpoints chosen by the client algorithm verifies the
locations affected by those changes.
In particular, it verifies that:
* Every DebugLoc for an erased/changing instruction is still present on
at least one new/changed instruction
* Failing that, that there is a line-0 location in the new/changed
instructions. It's not possible to confirm which locations were merged so
it conservatively assumes all unaccounted for locations are accounted
for by any line-0 location to avoid false positives.
If that fails, it prints the lost locations in the debug output along with
the instructions that should have accounted for them.
In theory, this is usable by the legalizer, combiner, selector and any other
pass that performs incremental changes to the MIR. However, it has so far
only really been tested on the legalizer (not including the artifact
combiner) where it has caught lots of lost locations, particularly in Custom
legalizations. There's only one example here as my initial testing was on an
out-of-tree target and I haven't done a pass over the in-tree targets yet.
Depends on D77575, D77446
Reviewers: bogner, aprantl, vsk
Subscribers: jvesely, nhaehnle, mgorny, rovka, hiraditya, volkan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77576
Summary:
This will allow us to fix the issue where the lost locations
verifier causes CodeGen changes on lost locations because it
falls back on DAGISel
Reviewers: qcolombet, bogner, aprantl, vsk, paquette
Subscribers: rovka, hiraditya, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78261
BreakPHIEdge would be set based on whether the instruction needs to
insert a new critical edge to allow sinking into a block where the uses
are PHI nodes. But for instructions with multiple defs it would be reset
on the second def, allowing the instruciton to sink where it should not.
Fixes PR44981
Differential Revision: https://reviews.llvm.org/D78087
Summary:
The INLINEASM MIR instructions use immediate operands to encode the values of some operands.
The MachineInstr pretty printer function already handles those operands and prints human readable annotations instead of the immediates. This patch adds similar annotations to the output of the MIRPrinter, however uses the new MIROperandComment feature.
Reviewers: SjoerdMeijer, arsenm, efriedma
Reviewed By: arsenm
Subscribers: qcolombet, sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78088
Summary:
The current handleMoveIntoBundle implementation is unusable,
it attempts to access the slot indexes of bundled instructions.
It also leaves bundled instructions with slot indexes assigned.
Replace handleMoveIntoBundle this with a more explicit
handleMoveIntoNewBundle function which recalculates the live
intervals for all instructions moved into a newly formed bundle,
and removes slot indexes from these instructions.
Reviewers: arsenm, MaskRay, kariddi, tpr, qcolombet
Reviewed By: qcolombet
Subscribers: MatzeB, wdng, hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77969
In D68209, LiveDebugValues::transferDebugValue had a call to
OpenRanges.erase shifted, and by accident this led to a code path where
DBG_VALUEs of $noreg would not have their open range terminated, allowing
variable locations to extend past blocks where they were terminated.
This patch correctly terminates the open range, if present, when such a
DBG_VAUE is encountered, and adds a test for this behaviour.
Differential Revision: https://reviews.llvm.org/D78218
The "Align" passed into getMachineMemOperand etc. is the alignment of
the MachinePointerInfo, not the alignment of the memory operation.
(getAlign() on a MachineMemOperand automatically reduces the alignment
to account for this.)
We were passing on wrong (overconservative) alignment in a bunch of
places. Fix a bunch of these, mostly in legalization. And while I'm
here, switch to the new Align APIs.
The test changes are all scheduling changes: the biggest effect of
preserving large alignments is that it improves alias analysis, so the
scheduler has more freedom.
(I was originally just trying to do a minor cleanup in
SelectionDAGBuilder, but I accidentally went deeper down the rabbit
hole.)
Differential Revision: https://reviews.llvm.org/D77687
Summary:
As a follow up to https://reviews.llvm.org/D29014, add translation
support for freeze.
Introduce a new generic instruction G_FREEZE and translate freeze to it.
Reviewers: dsanders, aqjune, arsenm, aditya_nandakumar, t.p.northover, lebedev.ri, paquette, aemerson
Reviewed By: aqjune, arsenm
Subscribers: fhahn, lebedev.ri, wdng, rovka, hiraditya, jfb, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77795
Summary:
No error or warning is emitted when specific reserved registers are
written to in inline assembly. Therefore, writes to the program counter
or to the frame pointer, for instance, were permitted, which could have
led to undesirable behaviour.
Example:
int foo() {
register int a __asm__("r7"); // r7 = frame-pointer in M-class ARM
__asm__ __volatile__("mov %0, r1" : "=r"(a) : : );
return a;
}
In contrast, GCC issues an error in the same scenario.
This patch detects writes to specific reserved registers in inline
assembly for ARM and emits an error in such case. The detection works
for output and input operands. Clobber operands are not handled here:
they are already covered at a later point in
AsmPrinter::emitInlineAsm(const MachineInstr *MI). The registers
covered are: program counter, frame pointer and base pointer.
This is ARM only. Therefore the implementation of other targets'
counterparts remain open to do.
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76848
To simplify future work on statepoint representation, hide
direct access to statepoint field indices and provide getters
for them. Add getters for couple more statepoint fields.
This also fixes two bugs in MachineVerifier for statepoint:
First, the `break` statement was falling out of `if` statement
scope, thus disabling following checks.
Second, it was incorrectly accessing some fields like CallingConv -
StatepointOpers gives index to their value directly, not to
preceeding field type encoding.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D78119
This is a minor NFC change to make the code more clear. We have the NegatibleCost that
has cheaper, neutral, and expensive. Typically, the smaller one means the less cost.
It is inverse for current implementation, which makes following code not easy to read.
If (CostX > CostY) negate(X)
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D77993
It can be used to avoid passing the begin and end of a range.
This makes the code shorter and it is consistent with another
wrappers we already have.
Differential revision: https://reviews.llvm.org/D78016
in the same section.
This allows specifying BasicBlock clusters like the following example:
!foo
!!0 1 2
!!4
This places basic blocks 0, 1, and 2 in one section in this order, and
places basic block #4 in a single section of its own.
Summary:
Share logic to strip debugify metadata between the IR and MIR level
debugify passes. This makes it simpler to hunt for bugs by diffing IR
with vs. without -debugify-each turned on.
As a drive-by, fix an issue causing CallGraphNodes to become invalid
when a dead llvm.dbg.value prototype is deleted.
Reviewers: dsanders, aprantl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77915
Since 1725f28841, this should check
isFMADLegalForFAddFSub rather than the the plain isOperationLegal.
This would assert in a subset of cases due to an oddity in how FMAD is
selected. We will allow FMA formation pre-legalize, but not FMAD even
in cases where it would be valid.
The current hook requires passing in the root fadd/fsub. However, in
this distributed case, this would be far more complicated to pass in
the relevant operand. AMDGPU doesn't get any value from the node, and
only needs the type and is the only implementor, so I'm not sure why
we have this complexity. Just rename and expand the assert to avoid
the more complicated checks spread through the distribution logic.
Sometimes LegalizeTypes knows about common subexpressions before SelectionDAG
does, leading to accidental SDValue removal before its reference count was
truly zero.
Differential Revision: https://reviews.llvm.org/D76994
Reviewed-By: bjope
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45049
Reverted in 3ce77142a6 because the previous patch
broke the expensive-checks bots. The new patch removes the broken check.
Summary: A count profile may affect tail duplication's heuristic causing a block to be duplicated in only a part of its predecessors. This is not allowed in the Machine Block Placement pass where an assert will go off. I'm removing the assert and making the optimization bail out when such case happens.
Reviewers: wenlei, davidxl, Carrot
Reviewed By: Carrot
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77748
As proposed in D77881, we'll have the related widening operation,
so this name becomes too vague.
While here, change the function signature to take an 'int' rather
than 'size_t' for the scaling factor, add an assert for overflow of
32-bits, and improve the documentation comments.
This is the same as what was done to the CallLoweringInfo in
TargetLowering.h in r309159.
This is just a step on the way to replacing this with CallBase.
I only left it at the interface to ParseConstraints since that
needs updates to other callers in different files. I'll do that
as a follow up.
Differential Revision: https://reviews.llvm.org/D77892
Summary:
This allows us to test each backend pass under the presence
of debug info using pre-existing tests. The tests should not
fail as a result of this so long as it's true that debug info
does not affect CodeGen.
In practice, a few tests are sensitive to this:
* Tests that check the pass structure (e.g. O0-pipeline.ll)
* Tests that check --debug output. Specifically instruction
dumps containing MMO's (e.g. prelegalizercombiner-extends.ll)
* Tests that contain debugify metadata as mir-strip-debug will
remove it (e.g. fastisel-debugvalue-undef.ll)
* Tests with partial debug info (e.g.
patchable-function-entry-empty.mir had debug info but no
!llvm.dbg.cu)
* Tests that check optimization remarks overly strictly (e.g.
prologue-epilogue-remarks.mir)
* Tests that would inject the pass in an unsafe region (e.g.
seqpairspill.mir would inject between register alloc and
virt reg rewriter)
In all cases, the checks can either be updated or
--debugify-and-strip-all-safe=0 can be used to avoid being
affected by something like llvm-lit -Dllc='llc --debugify-and-strip-all-safe'
I tested this without the lost debug locations verifier to
confirm that AArch64 behaviour is unaffected (with the fixes
in this patch) and with it to confirm it finds the problems
without the additional RUN lines we had before.
Depends on D77886, D77887, D77747
Reviewers: aprantl, vsk, bogner
Subscribers: qcolombet, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77888
Summary:
A few tests start out with debug info and expect it to reach
the output. For these tests we shouldn't strip the debug info
Reviewers: aprantl, vsk, bogner
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77886
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: stoklund, sdesmalen, efriedma
Reviewed By: sdesmalen
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77272
Summary:
At the moment, any changes we make to the passes that can be
injected before/after others (e.g. -verify-machineinstrs and
-print-after-all) have to be duplicated in both
TargetPassConfig (for normal execution, -start-before/
-stop-before/etc) and llc (for -run-pass). Unify this pass
injection into addMachinePrePass/addMachinePostPass that both
TargetPassConfig and llc can use.
Reviewers: vsk, aprantl, bogner
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77887
Summary:
Refactor LiveRangeCalc such that it is now split into two classes
The objective is to split all the "register specific" logic away
from LiveRangeCalc.
The two new classes created are:
- LiveRangeCalc - is meant as a generic class to compute and modify
live ranges in a generic way. This class should deal only with
SlotIndices and VNInfo objects.
- LiveIntervalCals - is meant to be equivalent to the old LiveRangeCalc.
It computes the liveness virtual registers tracked by a LiveInterval
object.
With this refactoring LiveRangeCalc can be used to implement tracking of
liveness of LiveRanges that represent other things than just registers.
Subscribers: MatzeB, qcolombet, mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76584
Remove a number of includes that aren't necessary (nor are we relying on the remaining includes to provide the declarations), we just needed a llvm::Instruction forward declaration.
This exposed a couple of source files that were implicitly replying on the includes for their use of llvm::SmallSet or std::set, requiring local includes to be added there instead.
The change introduces the usage of physical registers for non-gc deopt values.
This require runtime support to know how to take a value from register.
By default usage is off and can be switched on by option.
The change also introduces additional fix-up patch which forces the spilling
of caller saved registers (clobbered after the call) and re-writes statepoint
to use spill slots instead of caller saved registers.
Reviewers: reames, danstrushin
Reviewed By: dantrushin
Subscribers: mgorny, hiraditya, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D77797
Summary:
Removes:
* All LLVM-IR level debug info using StripDebugInfo()
* All debugify metadata
* 'Debug Info Version' module flag
* All (valid*) DEBUG_VALUE MachineInstrs
* All DebugLocs from MachineInstrs
This is a more complete solution than the previous MIRPrinter
option that just causes it to neglect to print debug-locations.
* The qualifier 'valid' is used here because AArch64 emits
an invalid one and tests depend on it
Reviewers: vsk, aprantl, bogner
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77747
The change introduces the usage of physical registers for non-gc deopt values.
This require runtime support to know how to take a value from register.
By default usage is off and can be switched on by option.
The change also introduces additional fix-up patch which forces the spilling
of caller saved registers (clobbered after the call) and re-writes statepoint
to use spill slots instead of caller saved registers.
Reviewers: reames, dantrushin
Reviewed By: reames, dantrushin
Subscribers: mgorny, hiraditya, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D77371
Summary:
There are at least three clients for KnownBits calculations:
ValueTracking, SelectionDAG and GlobalISel. To reduce duplication the
common logic should be moved out of these clients and into KnownBits
itself.
This patch does this for AND, OR and XOR calculations by implementing
and using appropriate operator overloads KnownBits::operator& etc.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74060
Summary:
Preserve call site info for duplicated instructions. We copy over the
call site info in CloneMachineInstrBundle to avoid repeated calls to
copyCallSiteInfo in CloneMachineInstr.
(Alternatively, we could copy call site info higher up the stack, e.g.
into TargetInstrInfo::duplicate, or even into individual backend passes.
However, I don't see how that would be safer or more general than the
current approach.)
Reviewers: aprantl, djtodoro, dstenb
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77685
This is a performance patch that hoists two conditions in DwarfDebug's
validThroughout to avoid a linear-scan of all instructions in a block. We
now exit early if validThrougout will never return true for the variable
location.
The first added clause filters for the two circumstances where
validThroughout will return true. The second added clause should be
identical to the one that's deleted from after the linear-scan.
Differential Revision: https://reviews.llvm.org/D77639
Summary:
This fixes PR45302.
Previously the case
BB1
/ \
| |
TBB FBB
| |
\ /
BB2
was treated as a valid diamond also when TBB and FBB was the same basic
block. This then lead to a failed assertion in IfConvertDiamond.
Since TBB == FBB is quite a degenerated case of a diamond, we now
don't treat it as a valid diamond anymore, and thus we will avoid the
trouble of making IfConvertDiamond handle it correctly.
Reviewers: efriedma, kparzysz
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D77651
Summary:
When narrowing G_IMPLICIT_DEF where the original size is not a multiple
of the narrow size, emit a smaller G_IMPLICIT_DEF and use G_ANYEXT.
To prevent a potential endless loop in the legalizer, the condition
to combine G_ANYEXT(G_IMPLICIT_DEF) is changed from isInstUnsupported
to !isInstLegal, since in this case the combine is only valid if
consequent legalization of the newly combined G_IMPLICIT_DEF does not
introduce G_ANYEXT due to narrowing.
Although this legalization for G_IMPLICIT_DEF would also be valid for
the general case, it actually caused a lot of code regressions when
tried due to superfluous COPYs and combines not getting hit anymore.
Reviewers: dsanders, aemerson, volkan, arsenm, aditya_nandakumar
Reviewed By: arsenm
Subscribers: jvesely, nhaehnle, kerbowa, wdng, rovka, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76598
Summary:
Re-used the IR-level debugify for the most part. The MIR-level code then
adds locations to the MachineInstrs afterwards based on the LLVM-IR debug
info.
It's worth mentioning that the resulting locations make little sense as
the range of line numbers used in a Function at the MIR level exceeds that
of the equivelent IR level function. As such, MachineInstrs can appear to
originate from outside the subprogram scope (and from other subprogram
scopes). However, it doesn't seem worth worrying about as the source is
imaginary anyway.
There's a few high level goals this pass works towards:
* We should be able to debugify our .ll/.mir in the lit tests without
changing the checks and still pass them. I.e. Debug info should not change
codegen. Combining this with a strip-debug pass should enable this. The
main issue I ran into without the strip-debug pass was instructions with MMO's and
checks on both the instruction and the MMO as the debug-location is
between them. I currently have a simple hack in the MIRPrinter to
resolve that but the more general solution is a proper strip-debug pass.
* We should be able to test that GlobalISel does not lose debug info. I
recently found that the legalizer can be unexpectedly lossy in seemingly
simple cases (e.g. expanding one instr into many). I have a verifier
(will be posted separately) that can be integrated with passes that use
the observer interface and will catch location loss (it does not verify
correctness, just that there's zero lossage). It is a little conservative
as the line-0 locations that arise from conflicts do not track the
conflicting locations but it can still catch a fair bit.
Depends on D77439, D77438
Reviewers: aprantl, bogner, vsk
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77446
This removes a call to getScalarType from a bunch of call sites.
It also makes the behavior consistent with SIGN_EXTEND_INREG.
Differential Revision: https://reviews.llvm.org/D77631
These should not be assuming address space 0. Calling getPointerTy is
generally the wrong thing to do, since you should already know the
type from the incoming IR.
RDA sometimes needs to visit blocks twice, to take into account
reaching defs coming in along loop back edges. Currently it handles
repeated visitation the same way as usual, which means that it will
scan through all instructions and their reg unit defs again. Not
only is this very inefficient, it also means that all reaching defs
in loops are going to be inserted twice.
We can do much better than this. The only thing we need to handle
is a new reaching def from a predecessor, which either needs to be
prepended to the reaching definitions (if there was no reaching def
from a predecessor), or needs to replace an existing predecessor
reaching def, if it is more recent. Since D77508 we only store the
most recent predecessor reaching def, so that's the only one that
may need updating.
This also has the nice side-effect that reaching definitions are
now automatically sorted and unique, so drop the llvm::sort() call
in favor of an assertion.
Differential Revision: https://reviews.llvm.org/D77511
An instruction may define the same reg unit multiple times,
avoid inserting the same reaching def multiple times in that case.
Also print the reg unit, rather than the super-register, in the
debug code.
Move the logic whether lowering of deopt value requires a spill slot in
a separate lambda.
Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D77629
Now that we have scalable vectors, there's a distinction that isn't
getting captured in the original SequentialType: some vectors don't have
a known element count, so counting the number of elements doesn't make
sense.
In some cases, there's a better way to express the commonality using
other methods. If we're dealing with GEPs, there's GEP methods; if we're
dealing with a ConstantDataSequential, we can query its element type
directly.
In the relatively few remaining cases, I just decided to write out
the type checks. We're talking about relatively few places, and I think
the abstraction doesn't really carry its weight. (See thread "[RFC]
Refactor class hierarchy of VectorType in the IR" on llvmdev.)
Differential Revision: https://reviews.llvm.org/D75661
Summary:
In lieu of a proper pass that strips debug info, add a way
to omit debug-locations from the MIR output so that
instructions with MMO's continue to match CHECK's when
mir-debugify is used
Reviewers: aprantl, bogner, vsk
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77575
Summary:
To debugify MIR, we need to be able to create metadata and to do that, we
need a non-const Module. However, MachineFunction only had a const reference
to the Function preventing this.
Reviewers: aprantl, bogner
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77439
A global symbol that is defined in a comdat should not generate an alias since
call sites that would've referred to that symbol will refer to their own
independent local aliases rather than the surviving global comdat one. This
could result in something that looks like:
```
ld.lld: error: relocation refers to a discarded section: .text._ZN3fbl8internal18NullFunctionTargetIvJjjPjEED1Ev.stub
>>> defined in user-x64-clang/obj/system/ulib/minfs/libminfs.a(minfs._sources.file.cc.o)
>>> section group signature: _ZN3fbl8internal18NullFunctionTargetIvJjjPjEED1Ev.stub
>>> prevailing definition is in user-x64-clang/obj/system/ulib/minfs/libminfs.a(minfs._sources.vnode.cc.o)
>>> referenced by function.h:169 (../../zircon/system/ulib/fbl/include/fbl/function.h:169)
>>> minfs._sources.file.cc.o:(minfs::File::AllocateAndCommitData(std::__2::unique_ptr<minfs::Transaction, std::__2::default_delete<minfs::Transaction> >)) in archive user-x64-clang/obj/system/ulib/minfs/libminfs.a
```
We ran into this when experimenting with a new C++ ABI for fuchsia
(refer to D72959) which takes relative offsets between comdat'd functions
which is why the normal C++ user wouldn't run into this.
Differential Revision: https://reviews.llvm.org/D77429
Summary:
A bug report mentioned that LLVM was producing jumps off the end of a
function when using "asm goto with outputs". Further digging pointed to
MachineBasicBlocks that had their address taken and were indirect
targets of INLINEASM_BR being removed by BranchFolder, because their
predecessor list was empty, so they appeared to have no entry.
This was a cascading failure caused earlier, during Pre-RA instruction
scheduling. We have a few special cases in Pre-RA instruction scheduling
where we split a MachineBasicBlock in two. This requires careful
handing of predecessor and successor lists for a MachineBasicBlock that
was split, and careful handing of PHI MachineInstrs that referred to the
MachineBasicBlock before it was split.
The clue that led to this fix was the observation that many callers of
MachineBasicBlock::splice() frequently call
MachineBasicBlock::transferSuccessorsAndUpdatePHIs() to update their PHI
nodes after a splice. We don't want to reuse that method, as we have
custom successor transferring logic for this block split.
This patch fixes 2 pre-existing bugs, and adds tests.
The first bug was that MachineBasicBlock::splice() correctly handles
updating most successors and predecessors; we don't need to do anything
more than removing the previous fallthrough block from the first half of
the split block post splice. Previously, we were updating the successor
list incorrectly (updating successors updates predecessors).
The second bug was that PHI nodes that needed registers from the first
half of the split block were not having entries populated. The register
live out information was correct, and the FuncInfo->PHINodesToUpdate was
correct. Specifically, the check in SelectionDAGISel::FinishBasicBlock:
for (unsigned i = 0, e = FuncInfo->PHINodesToUpdate.size(); i != e; ++i) {
MachineInstrBuilder PHI(*MF, FuncInfo->PHINodesToUpdate[i].first);
if (!FuncInfo->MBB->isSuccessor(PHI->getParent()))
continue;
PHI.addReg(FuncInfo->PHINodesToUpdate[i].second).addMBB(FuncInfo->MBB);
was `continue`ing because FuncInfo->MBB tracks the second half of
the post-split block; no one was updating PHI entries for the first half
of the post-split block.
SelectionDAGBuilder::UpdateSplitBlock() already expects to perform
special handling for MachineBasicBlocks that were split post calls to
ScheduleDAGSDNodes::EmitSchedule(), so I'm confident that it's both
correct for ScheduleDAGSDNodes::EmitSchedule() to return the second half
of the split block `CopyBB` which updates `FuncInfo->MBB` (ie. the
current MachineBasicBlock being processed), and perform special handling
for this in SelectionDAGBuilder::UpdateSplitBlock().
Reviewers: void, craig.topper, efriedma
Reviewed By: void, efriedma
Subscribers: hfinkel, fhahn, MatzeB, efriedma, hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76961
The previous code used the type of the first field for the VT
passed to getNode for every field.
I've based the implementation here off what is done in visitSelect
as it removes the need to special case aggregates.
Differential Revision: https://reviews.llvm.org/D77093
When entering a basic block, RDA inserts reaching definitions coming
from predecessor blocks (which will be negative numbers) in a rather
peculiar way. If you have incoming reaching definitions -4, -3, -2, -1,
it will insert those. If you have incoming reaching definitions
-1, -2, -3, -4, it will insert -1, -1, -1, -1, as the max is taken
at each step. That's probably not what was intended...
However, RDA only actually cares about the most recent reaching
definition from a predecessor (to calculate clearance), so this
ends up working fine as far as behavior is concerned. It does
waste memory on unnecessary reaching definitions though.
This patch changes the implementation to first compute the most
recent reaching definition in one loop, and then insert only that
one in a separate loop.
Differential Revision: https://reviews.llvm.org/D77508
At the end of a basic block, RDA adjusts all the reaching defs it
found to be relative to the end of the basic block, rather than the
start of it. However, it also does this to registers which don't
have a reaching def, indicated by ReachingDefDefaultVal. This means
that code checking against ReachingDefDefaultVal will not skip them,
and may insert them into the reaching definition list. This is
ultimately harmless, but causes unnecessary work and is logically
not right.
Differential Revision: https://reviews.llvm.org/D77506
Summary:
This patch adds support for emission of following DWARFv5 macro forms
in .debug_macro section.
1. DW_MACRO_start_file
2. DW_MACRO_end_file
3. DW_MACRO_define_strp
4. DW_MACRO_undef_strp.
Reviewed By: dblaikie, ikudrin
Differential Revision: https://reviews.llvm.org/D72828
Summary:
This is a roll forward of D77394 minus AlignmentFromAssumptions (which needs to be addressed separately)
Differences from D77394:
- DebugStr() now prints the alignment value or `None` and no more `Align(x)` or `MaybeAlign(x)`
- This is to keep Warning message consistent (CodeGen/SystemZ/alloca-04.ll)
- Removed a few unneeded headers from Alignment (since it's included everywhere it's better to keep the dependencies to a minimum)
Reviewers: courbet
Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77537
We're ANDing with 1 right after which will cause the SIGN_EXTEND to
be combined to ANY_EXTEND later. Might as well just start with an
ANY_EXTEND.
While there replace create the AND using the getZeroExtendInReg
helper to remove the need to explicitly create the VecOnes constant.
This code is replacing a shift with a new shift on an extended type.
If the shift amount type can't represent the maximum shift amount
for the new type, the amount needs to be extended to a type that
can.
Previously, the code just hardcoded a check for 256 bits which
seems to have been an assumption that the original shift amount
was MVT::i8. But that seems more catered to a specific target
like X86 that uses i8 as its legal shift amount type. Other
targets may use different types.
This commit changes the code to look at the real type of the shift
amount and makes sure it has enough bits for the Log2 of the
new type. There are similar checks to this in SelectionDAGBuilder
and LegalizeIntegerTypes.
Previously line table symbol was represented as `DIE::value_iterator`
inside `DwarfCompileUnit` and subsequent function `intStmtList`
was used to create a local `MCSymbol` to initialize it.
This patch removes `DIE::value_iterator` from `DwarfCompileUnit`
and intoduce `MCSymbol` for representing this units symbol for
`debug_line` section. As a result `applyStmtList` is also modified
to utilize this. Further more a helper function `getLineTableStartSym`
is also introduced to get this symbol, this would be used by clients
which need to access this line table, i.e `debug_macro`.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D77489
The newly-created constant zero will need an extra register to hold it
in the current statepoint lowering implementation. Remove it if there
exists one.
Summary:
D77423 started using a dominator tree in WasmEHPrepare, but we deleted
BBs in `prepareThrows` before we used the domtree in `prepareEHPads`,
and those CFG changes were not reflected in the domtree. This uses
`DomTreeUpdater` to make sure we update the domtree every time we delete
BBs from the CFG. This fixes ubsan/msan/expensive_check errors caught in
LLVM buildbots.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77465
Summary:
When we insert a call to the personality function wrapper
(`_Unwind_CallPersonality`) for a catch pad, we store some necessary
info in `__wasm_lpad_context` struct and pass it. One of the info is the
LSDA address for the function. For this, we insert a call to
`wasm.lsda()`, which will be lowered down to the address of LSDA, and
store it in a field in `__wasm_lpad_context`.
There are exceptions to this personality call insertion: catchpads for
`catch (...)` and cleanuppads (for destructors) don't need personality
function calls, because we don't need to figure out whether the current
exception should be caught or not. (They always should.)
There was a little optimization to `wasm.lsda()` call insertion. Because
the LSDA address is the same throughout a function, we don't need to
insert a store of `wasm.lsda()` return value in every catchpad. For
example:
```
try {
foo();
} catch (int) {
// wasm.lsda() call and a store are inserted here, like, in
// pseudocode,
// %lsda = wasm.lsda();
// store %lsda to a field in __wasm_lpad_context
try {
foo();
} catch (int) {
// We don't need to insert the wasm.lsda() and store again, because
// to arrive here, we have already stored the LSDA address to
// __wasm_lpad_context in the outer catch.
}
}
```
So the previous algorithm checked if the current catch has a parent EH
pad, we didn't insert a call to `wasm.lsda()` and its store.
But this was incorrect, because what if the outer catch is `catch (...)`
or a cleanuppad?
```
try {
foo();
} catch (...) {
// wasm.lsda() call and a store are NOT inserted here
try {
foo();
} catch (int) {
// We need wasm.lsda() here!
}
}
```
In this case we need to insert `wasm.lsda()` in the inner catchpad,
because the outer catchpad does not have one.
To minimize the number of inserted `wasm.lsda()` calls and stores, we
need a way to figure out whether we have encountered `wasm.lsda()` call
in any of EH pads that dominates the current EH pad. To figure that
out, we now visit EH pads in BFS order in the dominator tree so that we
visit parent BBs first before visiting its child BBs in the domtree.
We keep a set named `ExecutedLSDA`, which basically means "Do we have
`wasm.lsda()` either in the current EH pad or any of its parent EH
pads in the dominator tree?". This is to prevent scanning the domtree up
to the root in the worst case every time we examine an EH pad: each EH
pad only needs to examine its immediate parent EH pad.
- If any of its parent EH pads in the domtree has `wasm.lsda()`, this
means we don't need `wasm.lsda()` in the current EH pad. We also insert
the current EH pad in `ExecutedLSDA` set.
- If none of its parent EH pad has `wasm.lsda()`
- If the current EH pad is a `catch (...)` or a cleanuppad, done.
- If the current EH pad is neither a `catch (...)` nor a cleanuppad,
add `wasm.lsda()` and the store in the current EH pad, and add the
current EH pad to `ExecutedLSDA` set.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77423
Summary:
For current architect, we always require setContainingCsect to be
called on every MCSymbol got used in XCOFF context.
This is very hard to achieve because symbols gets created everywhere
and other MCSymbol types(ELF, COFF) do not have similar rules.
It's very easy to miss setting the containing csect, and we would
need to add a lot of XCOFF specialized code around some common code area.
This patch intendeds to do
1. Rely on getFragment().getParent() to get csect from labels.
2. Only use get/setRepresentedCsect (was get/setContainingCsect)
if symbol itself represents a csect.
Reviewers: DiggerLin, hubert.reinterpretcast, daltenty
Differential Revision: https://reviews.llvm.org/D77080
Summary: I think it would be better to require the alignment to be >= 1. It is currently confusing to allow both values.
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77372
isGCValue should detect whether the deopt value is a GC pointer.
Currently it checks by finding the value in SI.Bases and SI.Ptrs.
However these data structures contain only those values which
have corresponding gc.relocate call. So we can miss GC value if it
does not have gc.relocate call (dead after the call).
Check GC strategy whether pointer is GC one or consider any pointer
to be GC one conservatively.
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D77130
When compiling AMDGPUDisassembler.cpp in a stage 1 trunk build with
CMAKE_BUILD_TYPE=RelWithDebInfo LLVM_USE_SANITIZER=Address LiveDebugVariables
accounts for 21.5% wall clock time. This fix reduces that to 1.2% by switching
out a linked list lookup with a map lookup.
Note that the linked list is still used to group UserValues by vreg. The vreg
lookups don't cause any problems in this pathological case.
This is the same idea as D68816, which was reverted, except that it is a less
intrusive fix.
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D77226
The legalizer has a tendency to lose DebugLoc's when expanding or
combining instructions. The verifier that detected these isn't ready
for upstreaming yet but this patch fixes the cases that came up when
applying it to our out-of-tree backend's CodeGen tests.
This pattern comes up a few more times in this file and probably in
the backends too but I'd prefer to fix the others separately (and
preferably when the lost-location verifier detects them).
Summary:
Currently, the comparison argument used for ATOMIC_CMP_XCHG is legalised
with GetPromotedInteger, which leaves the upper bits of the value
undefind. Since this is used for comparing in an LR/SC loop with a
full-width comparison, we must sign extend it. We introduce a new
getExtendForAtomicCmpSwapArg to complement getExtendForAtomicOps, since
many targets have compare-and-swap instructions (or pseudos) that
correctly handle an any-extend input, and the existing function
determines the extension of the result, whereas we are concerned with
the input.
This is related to https://reviews.llvm.org/D58829, which solved the
issue for ATOMIC_CMP_SWAP_WITH_SUCCESS, but not the simpler
ATOMIC_CMP_SWAP.
Reviewers: asb, lenary, efriedma
Reviewed By: asb
Subscribers: arichardson, hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, evandro, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74453
Currently, DAG combiner uses (fmul (rsqrt x) x) to estimate square
root of x. However, this method would return NaN if x is +Inf, which
is incorrect.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D76853
Summary: These were templated due to SelectionDAG using int masks for shuffles and IR using unsigned masks for shuffles. But now that D72467 has landed we have an int mask version of IRBuilder::CreateShuffleVector. So just use int instead of a template
Reviewers: spatel, efriedma, RKSimon
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D77183
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.
This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.
I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors. Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it. Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.
Differential Revision: https://reviews.llvm.org/D72467
The attached test case is simplified from tcmalloc. Both function calls should be optimized as tailcall. But llvm can only optimize the first call. The second call can't be optimized because function dupRetToEnableTailCallOpts failed to duplicate ret into block case2.
There 2 problems blocked the duplication:
1 Intrinsic call llvm.assume is not handled by dupRetToEnableTailCallOpts.
2 The control flow is more complex than expected, dupRetToEnableTailCallOpts can only duplicate ret into its predecessor, but here we have an intermediate block between call and ret.
The solutions:
1 Since CodeGenPrepare is already at the end of LLVM IR phase, we can simply delete the intrinsic call to llvm.assume.
2 A general solution to the complex control flow is hard, but for this case, after exit2 is duplicated into case1, exit2 is the only successor of exit1 and exit1 is the only predecessor of exit2, so they can be combined through eliminateFallThrough. But this function is called too late, there is no more dupRetToEnableTailCallOpts after it. We can add an earlier call to eliminateFallThrough to solve it.
Differential Revision: https://reviews.llvm.org/D76539
Summary:
In method SelectionDAGBuilder::LowerStatepoint, array SI.GCTransitionArgs
is initialized from wrong part of ImmutableStatepoint class.
We copy gc args instead of transitions args.
Reviewers: reames, skatkov
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77075
When we have
```
a = G_OR x, x
```
or
```
b = G_AND y, y
```
We can drop the G_OR/G_AND and just use x/y respectively.
Also update arm64-fallback.ll because there was an or in there which hits this
transformation.
Differential Revision: https://reviews.llvm.org/D77105
Implement identity combines for operations like the following:
```
%a = G_SUB %b, 0
```
This can just be replaced with %b.
Over CTMark, this gives some minor size improvements at -O3.
Differential Revision: https://reviews.llvm.org/D76640
This reverts commit b3297ef051.
This change is incorrect. The current semantic of null in the IR is a
pointer with the bitvalue 0. It is not a cast from an integer 0, so
this should preserve the pointer type.
Summary:
This code was throwing away the opcode for a boolean, which was then
reconstructing the opcode from that boolean. Just pass the opcode, and
forget the boolean.
Reviewers: srhines
Reviewed By: srhines
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77100
Summary:
This change adds amdgcn.reloc.constant intrinsic to the amdgpu backend, which will compile into a relocation entry in the resulting elf.
The intrinsics takes a MetadataNode (String) as its only argument, which specifies the symbol name of the relocation entry.
`SelectionDAGBuilder::getValueImpl` is changed to allow metadata operands passed through to ISel.
Author: csyonghe <yonghe@google.com>
Reviewers: tpr, nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76440
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jyknight, sdardis, nemanjai, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, jfb, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77059
Summary:
Also deprecate getOriginalAlignment, getAlignment will take much more time as it is pervasive through the codebase (including TableGened files).
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76933
There was already a test case for landingpads to handle this case, but I
had forgotten to consider PHI instructions preceding the EH_LABEL in the
landingpad.
PR45261
MC already knows how to emulate the .weak directive (with its ELF
semantics; i.e., an undefined weak symbol resolves to 0, and a defined
weak symbol has lower link precedence than a strong symbol of the same
name) using COFF weak externals. Plumb this through the ASM printer too,
so that definitions marked with __attribute__((weak)) at the language
level (which gets translated to weak linkage at the IR level) have the
corresponding .weak directive emitted. Note that declarations marked
with __attribute__((weak)) at the language level (which translates to
extern_weak at the IR level) already have .weak directives emitted.
Weak*/linkonce* symbols without an associated comdat (in particular, ones
generated with __attribute__((weak)) in C/C++) were earlier emitted as
normal unique globals, as the comdat is required to provide the linkonce
semantics. This change makes sure they are emitted as .weak instead,
allowing other symbols to override them.
Rename the existing coff-weak.ll test to coff-linkonce.ll. I'm not
quite sure what that test covers, since the behavior being tested in it
(the emission of a one_only section) is just a result of passing
-function-sections to llc; the linkonce_odr makes no difference.
Add a new coff-weak.ll which tests the new directive emission.
Based on an previous patch by Shoaib Meenai.
Differential Revision: https://reviews.llvm.org/D44543
When we see this:
```
%a = COPY $physreg
...
SOMETHING implicit-def $physreg
...
%b = COPY $physreg
```
The two copies are not equivalent, and so we shouldn't perform any folding
on them.
When we have two instructions which use a physical register check that they
define the same virtual register(s) as well.
e.g., if we run into this case
```
%a = COPY $physreg
...
%b = COPY %a
```
we can say that the two copies are the same, and can be folded.
Differential Revision: https://reviews.llvm.org/D76890
Make these behave the same way unsafe-fp-math and co. The command line
flag should add the attribute to functions that do not already have
it, and leave existing attributes. The attribute is the actual
implementation, but the flag is useful in some testing situations.
AMDGPU has a variety of tests with denormals enabled/disabled that
would require a painful level of test duplication without a flag. This
doesn't expose setting the separate input/output modes, or add a flag
for the f32 version yet.
Tests will be included in future patch.
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, jrtc27, atanasyan, jfb, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76925
Summary: This patch is the first effort to adding basic optimizations for FREEZE in SelDag.
Reviewers: spatel, lebedev.ri
Reviewed By: spatel
Subscribers: xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76707
These transforms rely on a vector reduction flag on the SDNode
set by SelectionDAGBuilder. This flag exists because SelectionDAG
can't see across basic blocks so SelectionDAGBuilder is looking
across and saving the info. X86 is the only target that uses this
flag currently. By removing the X86 code we can remove the flag
and the SelectionDAGBuilder code.
This pass adds a dedicated IR pass for X86 that looks across the
blocks and transforms the IR into a form that the X86 SelectionDAG
can finish.
An advantage of this new approach is that we can enhance it to
shrink the phi nodes and final reduction tree based on the zeroes
that we need to concatenate to bring the partially reduced
reduction back up to the original width.
Differential Revision: https://reviews.llvm.org/D76649
SUMMARY:
SUMMARY
for a source file "test.c"
void foo() {};
llc will generate assembly code as (assembly patch)
.globl foo
.globl .foo
.csect foo[DS]
foo:
.long .foo
.long TOC[TC0]
.long 0
and symbol table as (xcoff object file)
[4] m 0x00000004 .data 1 unamex foo
[5] a4 0x0000000c 0 0 SD DS 0 0
[6] m 0x00000004 .data 1 extern foo
[7] a4 0x00000004 0 0 LD DS 0 0
After first patch, the assembly will be as
.globl foo[DS] # -- Begin function foo
.globl .foo
.align 2
.csect foo[DS]
.long .foo
.long TOC[TC0]
.long 0
and symbol table will as
[6] m 0x00000004 .data 1 extern foo
[7] a4 0x00000004 0 0 DS DS 0 0
Change the code for the assembly path and xcoff objectfile patch for llc.
Reviewers: Jason Liu
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D76162
Summary:
The existing helper function can only create a libcall to functions available in
RTLIB. Add a helper function that can create a libcall to a given function name
using the provided calling convention.
Reviewers: aditya_nandakumar, t.p.northover, rovka, arsenm, dsanders
Reviewed By: arsenm
Subscribers: wdng, hiraditya, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76845
In some scalarize/split result methods (unary, binary, ...), flags in
SDNode were not passed down, which may lead to unexpected results in
unsafe float-point optimization. This patch fixes them. (maybe not
complete)
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D76832
For some operations, the type is unimportant and only the number of
bits matters. For example I don't want to treat <4 x s8> as a legal
type, but I also don't want to decompose loads of this into smaller
pieces to get legal register types.
On AMDGPU in SelectionDAG, we legalize a number of operations (most
notably load and store) by coercing all types to vectors of i32. For
GlobalISel, I'm trying very hard to avoid doing this for every type,
but I don't think this strategy can be completely avoided. I'm trying
to avoid bitcasts for any legitimately legal type we can operate on,
since the intervening bitcasts have proven to be a hassle.
For loads, I think I can get away without ever casting the result
type, and handling any arbitrary bitwidth during selection (I will
eventually want new tablegen support to help with this, rather than
having to add every possible type as legal). The unmerge required to
do anything with the value should expand to the expected shifts. This
is trickier for stores, since it would now require handling a wide
array of truncates during selection which I don't want.
Future potentially interesting case are for vector indexing, where
sub-dword type should be indexed in s32 pieces.
Record the address of a tail-calling branch instruction within its call
site entry using DW_AT_call_pc. This allows a debugger to determine the
address to use when creating aritificial frames.
This creates an extra attribute + relocation at tail call sites, which
constitute 3-5% of all call sites in xnu/clang respectively.
rdar://60307600
Differential Revision: https://reviews.llvm.org/D76336
The current implementation collects all Preds/Succs of a Dep of kind Output, creating a long chain and subsequently a schedule with an unnecessarily large II.
Was this done on purpose for a reason I'm missing?
Reviewed By: bcahoon
Differential Revision: https://reviews.llvm.org/D75424
When we find something like this:
```
%a:_(s32) = G_SOMETHING ...
...
%select:_(s32) = G_SELECT %cond(s1), %a, %a
```
We can remove the select and just replace it entirely with `%a` because it's
always going to result in `%a`.
Same if we have
```
%select:_(s32) = G_SELECT %cond(s1), %a, %b
```
where we can deduce that `%a == %b`.
This implements the following cases:
- `%select:_(s32) = G_SELECT %cond(s1), %a, %a` -> `%a`
- `%select:_(s32) = G_SELECT %cond(s1), %a, %some_copy_from_a` -> `%a`
- `%select:_(s32) = G_SELECT %cond(s1), %a, %b` -> `%a` when `%a` and `%b`
are defined by identical instructions
This gives a few minor code size improvements on CTMark at -O3 for AArch64.
Differential Revision: https://reviews.llvm.org/D76523
I think we can save the MRI argument from these since it's in
GISelKnownBits already, but currently not accessible.
Implementation deferred to avoid dependency on other patches.
Otherwise, the Win64 unwinder considers direct branches to such empty
trailing BBs to be a branch out of the function. It treats such a branch
as a tail call, which can only be part of an epilogue. If the unwinder
misclassifies such a branch as part of the epilogue, it will fail to
unwind the stack further. This can lead to bad stack traces, or failure
to handle exceptions properly. This is described in
https://llvm.org/PR45064#c4, and by the comment at the top of the
X86AvoidTrailingCallPass.cpp file.
It should be safe to insert int3 for such blocks. An empty trailing BB
that reaches this pass is pretty much guaranteed to be unreachable. If
a program executed such a block, it would fall off the end of the
function.
Most of the complexity in this patch comes from threading through the
"EHFuncletEntry" boolean on the MIRParser and registering the pass so we
can stop and start codegen around it. I used an MIR test because we
should teach LLVM to optimize away these branches as a follow-up.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D76531
Summary:
Add new generic MIR opcodes G_SADDSAT etc. Add support in IRTranslator
for translating the saturating add/subtract intrinsics to the new
opcodes.
Reviewers: aemerson, dsanders, paquette, arsenm
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76600
We have some long-standing missing shuffle optimizations that could
use this transform via VectorCombine now:
https://bugs.llvm.org/show_bug.cgi?id=35454
(and we still don't get that case in the backend either)
This function is apparently templated because there's existing code
in IR that treats mask values as unsigned and backend code that
treats masks values as signed.
The mask values are not endian-dependent (as shown by the existing
bitcast transform from DAGCombiner).
Differential Revision: https://reviews.llvm.org/D76508
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: dylanmckay, sdardis, nemanjai, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76551
When decided whether to generate a post-inc load/store, look at the
other memory nodes that use the same base address and, if any proceed
the current node, then don't do the combine.
The change only seems to be affecting the Arm backend, which I was
surprised at, but it appears to fix a lot of our issues around MVE
masked load/stores having to store a temporary address after an early
post-increment on a shared base address.
Differential Revision: https://reviews.llvm.org/D75847
Extract the decision to combine into a post-inc address into a
couple of functions to make the logic more clear and re-usable.
Differential Revision: https://reviews.llvm.org/D76060
Summary:
Widening G_UNMERGE_VALUES to a type which is larger than the
original source type is the same as widening it to the same
type as the source type: in both cases, G_UNMERGE_VALUES has
to be replaced with bit arithmetic which. Although the arithmetic
itself is independent of whether the source type is smaller
or equal to the widen type, widening the source type to the
widen type should result in less artifacts being emitted,
since this is the type that the user explicitly requested.
Reviewers: arsenm, dsanders, aemerson, aditya_nandakumar
Reviewed By: arsenm, dsanders
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76494
For folding pattern `x-(fma y,z,u*v) -> (fma -y,z,(fma -u,v,x))`, if
`yz` is 1, `uv` is -1 and `x` is -0, sign of result would be changed.
Differential Revision: https://reviews.llvm.org/D76419
Technically we can permit EXTLOAD of the LHS operand but only if all the extended bits are shifted out. Until we test coverage for that case, I'm just disabling this to fix PR45265.
-fuse-init-array is now the CC1 default but TargetLoweringObjectFileELF::UseInitArray still defaults to false.
The following two unknown OS target triples continue using .ctors/.dtors because InitializeELF is not called.
clang -target i386 -c a.c
clang -target x86_64 -c a.c
This cleanup fixes this as a bonus.
X86SpeculativeLoadHardeningPass::tracePredStateThroughCall can call
MCContext::createTempSymbol before TargetLoweringObjectFileELF::Initialize().
We need to call TargetLoweringObjectFileELF::Initialize() ealier.
test/CodeGen/X86/speculative-load-hardening-indirect.ll
Differential Revision: https://reviews.llvm.org/D71360
Use the advanceToLowerBound operation available on CoalescingBitVector
iterators to speed up collection of variables which reside within some
set of registers.
The speedup comes from avoiding repeated top-down traversals in
IntervalMap::find. The linear scan forward from one register interval to
the next is unlikely to be as expensive as a full IntervalMap search
starting from the root.
This reduces time spent in LiveDebugValues when compiling sqlite3 by
200ms (about 0.1% - 0.2% of the total User Time).
Depends on D76466.
rdar://60046261
Differential Revision: https://reviews.llvm.org/D76467
Avoid making a heap allocation when constructing a CoalescingBitVector.
This reduces time spent in LiveDebugValues when compiling sqlite3 by
700ms (0.5% of the total User Time).
rdar://60046261
Differential Revision: https://reviews.llvm.org/D76465
UseInitArray is now the CC1 default but TargetLoweringObjectFileELF::UseInitArray still defaults to false.
The following two unknown OS target triples continue using .ctors/.dtors because InitializeELF is not called.
clang -target i386 -c a.c
clang -target x86_64 -c a.c
This cleanup fixes this as a bonus.
Differential Revision: https://reviews.llvm.org/D71360
Summary:
It can be the case that a vector type is legal but the corresponding
scalar type is not legal for an architecture (i8 vs. v16i8 on AArch64).
Check if the scalar type created when folding
truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y))
is legal if we are running after the type legalizer.
This fixes https://github.com/android/ndk/issues/1207.
Reviewers: RKSimon, srhines
Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76312
Summary:
For some reason the order in which we call getNegatedExpression
for the involved operands, after a call to isCheaperToUseNegatedFPOps,
seem to matter. This patch includes a new test case in
test/CodeGen/X86/fdiv.ll that crashes if we reverse the order of
those calls. Before this patch that could happen depending on
which compiler that were used when buildind llvm. With my GCC
version (7.4.0) I got the crash, because it seems like it is
using a different order for the argument evaluation compared
to clang.
All other users of isCheaperToUseNegatedFPOps already used this
pattern with unfolded/ordered calls to getNegatedExpression, so
this patch is aligning visitFDIV with the other use cases.
This patch simply deals with the non-determinism for FDIV. While
the underlying problem with getNegatedExpression is discussed
further in D76439.
Reviewers: spatel, RKSimon
Reviewed By: spatel
Subscribers: hiraditya, mgrang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76319
This reverts commit e9f22fd429.
When building with -DLLVM_USE_SANITIZER="Thread", check-llvm has 70
failing tests with this revision, and 29 without this revision.
Port over the following:
- shuffle undef, undef, any_mask -> undef
- shuffle anything, anything, undef_mask -> undef
This sort of thing shows up a lot when you try to bugpoint code containing
shufflevector.
Differential Revision: https://reviews.llvm.org/D76382
Summary:
* Remove a bunch of asserts checking for unsupported scalable types and
add some more now that they are supported.
* Propagate the scalable flag where necessary.
* Add another `EVT::getExtendedVectorVT` method that takes an
ElementCount parameter.
* Add `EVT::isExtendedScalableVector` and
`EVT::getExtendedVectorElementCount` - latter is currently unused.
Reviewers: sdesmalen, efriedma, rengolin, craig.topper, huntergr
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75672
Summary:
Related to D75672, this patch adds EVT::isFixedLengthVector to determine
if the underlying vector type is of fixed length.
An assert is introduced in EVT::getVectorNumElements that triggers for
types that aren't fixed length. This is currently guarded by a flag
added D75297 that is off by default and has been renamed to the more
generic ENABLE_STRICT_FIXED_SIZE_VECTORS.
Ideally we want to get rid of getVectorNumElements but a quick grep
shows there are >350 uses in lib/CodeGen and 75 in lib/Target/AArch64
alone. All of these probably aren't EVT::getVectorNumElements (some may
be the MVT equivalent), but there are many places to fixup and having
the assert on by default would make the SVE upstreaming effort
difficult.
Reviewers: sdesmalen, efriedma, ctetreau, huntergr, rengolin
Reviewed By: efriedma
Subscribers: mgorny, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76376
Gather/scatter don't access one memory location, they access multiple disjoint locations. So using a fixed size isn't accurate. But we don't have a way to represent the true behavior so just use UnknownSize.
Previously we "split" the memory VT and use that size for the MMO of each half. But the memory VT is scalar so splitting usually just returned the original scalar VT, but on 32-bit X86 if the scalar VT was i64 it probably returned i32?
Differential Revision: https://reviews.llvm.org/D76388
The existence of the class is more confusing than helpful, I think; the
commonality is mostly just "GEP is legal", which can be queried using
APIs on GetElementPtrInst.
Differential Revision: https://reviews.llvm.org/D75660
SelectionDAG CSEs nodes based on their result type and operands, but not their flags. The flags are expected to be intersected when they are CSEd. In SelectionDAGBuilder, for FP nodes we manage both the fast math flags and the nofpexcept flag after the nodes have already been CSEd when they were created with getNode. The management of the fastmath flags before the constrained nodes prevents the nofpexcept management from working correctly.
This commit moves the FMF handling for constrained intrinsics into their visitor and disables the common FMF handling for these nodes.
Differential Revision: https://reviews.llvm.org/D75224
This patch generates TableGen descriptions for the specified register
banks which contain a list of register sizes corresponding to the
available HwModes. The appropriate size is used during codegen according
to the current HwMode. As this HwMode was not available on generation,
it is set upon construction of the RegisterBankInfo class. Targets
simply need to provide the HwMode argument to the
<target>GenRegisterBankInfo constructor.
The RISC-V RegisterBankInfo constructor has been updated accordingly
(plus an unused argument removed).
Differential Revision: https://reviews.llvm.org/D76007
This ports some combines from DAGCombiner.cpp which perform some trivial
transformations on instructions with undef operands.
Not having these can make it extremely annoying to find out where we differ
from SelectionDAG by looking at existing lit tests. Without them, we tend to
produce pretty bad code generation when we run into instructions which use
undef operands.
Also remove the nonpow2_store_narrowing testcase from arm64-fallback.ll, since
we no longer fall back on the add.
Differential Revision: https://reviews.llvm.org/D76339
Summary: The following change is to allow the machine outlining can be applied for Nth times, where N is specified by the compiler option. By default the value of N is 1. The motivation is that the repeated machine outlining can further reduce code size. Please refer to the presentation "Improving Swift Binary Size via Link Time Optimization" in LLVM Developers' Meeting in 2019.
Reviewers: aschwaighofer, tellenbach, paquette
Reviewed By: paquette
Subscribers: tellenbach, hiraditya, llvm-commits, jinlin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71027
When optimising for code size at the expense of performance, it is often
worth saving and restoring some of r0-r3, if IPRA will be able to take
advantage of them. This doesn't cost any extra code size if we already
have a PUSH/POP pair, and increases the number of available registers
across any calls to the function.
We already have an optimisation which tries fold the subtract/add of the
SP into the PUSH/POP by using extra registers, which somewhat conflicts
with this. I've made the new optimisation less aggressive in cases where
the existing one is likely to trigger, which gives better results than
either of these optimisations by themselves.
Differential revision: https://reviews.llvm.org/D69936
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jholewinski, arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76348
Skip debug instructions before calling functions not expecting them.
In particular, LIS.getInstructionIndex(*mi) would fail if mi was a debg instr.
Differential Revision: https://reviews.llvm.org/D76129
If it is a*b-c*d, it could be also folded into fma(a, b, -c*d) or fma(-c, d, a*b).
This patch is trying to respect the uses of a*b and c*d to make the best choice.
Differential Revision: https://reviews.llvm.org/D75982
Summary: The following change is to allow the machine outlining can be applied for Nth times, where N is specified by the compiler option. By default the value of N is 1. The motivation is that the repeated machine outlining can further reduce code size. Please refer to the presentation "Improving Swift Binary Size via Link Time Optimization" in LLVM Developers' Meeting in 2019.
Reviewers: aschwaighofer, tellenbach, paquette
Reviewed By: paquette
Subscribers: tellenbach, hiraditya, llvm-commits, jinlin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71027
ISD::ROTL/ROTR rotation values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits.
Differential Revision: https://reviews.llvm.org/D76201
When compiling
```
struct S {
float w;
};
void f(long w, long b);
void g(struct S s) {
int w = s.w;
f(w, w*4);
}
```
I get Assertion failed: ((!CombinedExpr || CombinedExpr->isValid()) && "Combined debug expression is invalid").
That's because we combine two epxressions that both end in DW_OP_stack_value:
```
(lldb) p Expr->dump()
!DIExpression(DW_OP_LLVM_convert, 32, DW_ATE_signed, DW_OP_LLVM_convert, 64, DW_ATE_signed, DW_OP_stack_value)
(lldb) p Param.Expr->dump()
!DIExpression(DW_OP_constu, 4, DW_OP_mul, DW_OP_LLVM_convert, 32, DW_ATE_signed, DW_OP_LLVM_convert, 64, DW_ATE_signed, DW_OP_stack_value)
(lldb) p CombinedExpr->isValid()
(bool) $0 = false
(lldb) p CombinedExpr->dump()
!DIExpression(4097, 32, 5, 4097, 64, 5, 16, 4, 30, 4097, 32, 5, 4097, 64, 5, 159, 159)
```
I believe that in this particular case combining two stack values is
safe, but I didn't want to sink the special handling into
DIExpression::append() because I do want everyone to think about what
they are doing.
Patch by Adrian Prantl.
Fixes PR45181.
rdar://problem/60383095
Differential Revision: https://reviews.llvm.org/D76164
RDF is designed to be target agnostic. Therefore it would be useful to have it available for other targets, such as X86.
Based on a previous patch by Krzysztof Parzyszek
Differential Revision: https://reviews.llvm.org/D75932
I believe we were previously calculating a pointer info with the scalar base and an offset of 0. But that's not really where the gather is pointing. The offset is a function of the indices of the GEP we looked through.
Also set the size of the MachineMemOperand to UnknownSize
Differential Revision: https://reviews.llvm.org/D76157
Summary: The following change is to allow the machine outlining can be applied for Nth times, where N is specified by the compiler option. By default the value of N is 1. The motivation is that the repeated machine outlining can further reduce code size. Please refer to the presentation "Improving Swift Binary Size via Link Time Optimization" in LLVM Developers' Meeting in 2019.
Reviewers: aschwaighofer, tellenbach, paquette
Reviewed By: paquette
Subscribers: tellenbach, hiraditya, llvm-commits, jinlin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71027
Under certain circumstances we'll end up in the position where the negated shift amount will get truncated to the type specified getScalarShiftAmountTy(), so we need to test for a truncated version of the shift amount as well.
This allows us to remove half of the remaining patterns tested for by X86ISelLowering's combineOrShiftToFunnelShift.
MCTargetOptionsCommandFlags.inc and CommandFlags.inc are headers which contain
cl::opt with static storage.
These headers are meant to be incuded by tools to make it easier to parametrize
codegen/mc.
However, these headers are also included in at least two libraries: lldCommon
and handle-llvm. As a result, when creating DYLIB, clang-cpp holds a reference
to the options, and lldCommon holds another reference. Linking the two in a
single executable, as zig does[0], results in a double registration.
This patch explores an other approach: the .inc files are moved to regular
files, and the registration happens on-demand through static declaration of
options in the constructor of a static object.
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1756977#c5
Differential Revision: https://reviews.llvm.org/D75579