Use pad with undef and unmerge with unused results. This is annoyingly
similar to several other places in LegalizerHelper, but they're all
slightly different.
The patch restricts DIEDelta::SizeOf() to accept only DWARF forms that
are actually used in the LLVM codebase. This should make the use of the
class more explicit and help to avoid issues similar to fixed in D83958
and D84094.
Differential Revision: https://reviews.llvm.org/D84095
DIELabel can emit only 32- or 64-bit values, while it was created in
some places with DW_FORM_udata, which implies emitting uleb128.
Nevertheless, these places also expected to emit U32 or U64, but just
used a misleading DWARF form. The patch updates those places to use more
appropriate DWARF forms and restricts DIELabel::SizeOf() to accept only
forms that are actually used in the LLVM codebase.
Differential Revision: https://reviews.llvm.org/D84094
DIELocList is used with a limited number of DWARF forms, see the only
place where it is instantiated, DwarfCompileUnit::addLocationList().
The patch marks the unexpected execution path in DIELocList::SizeOf()
as unreachable, to reduce ambiguity.
Differential Revision: https://reviews.llvm.org/D84092
For AMDGPU, vectors with elements < 32 bits should be indexed in
32-bit elements and the desired bits extracted from there. For
elements > 64-bits, these should be reduce to 64/32 elements to enable
the normal dynamic indexing paths.
In the dynamic index cases, this produces shorter code most of the
time. This does immediately regress the constant index cases, but this
should be fixed once we have the most basic of shift combines.
The element size > 64 case is pretty much ported from the exisiting
DAG implementation for extract element promote. The increasing element
size case is new.
Try to be more consistent with the SDLoc param in the TargetLowering methods.
This also exposes an issue where we were passing a SDNode as a SDLoc, relying on the implicit SDLoc(SDNode) constructor.
D68049 created options for basic block sections: -fbasic-block-sections=,
-funique-basic-block-section-names. Rename options in llc and lld (--lto-)
to be consistent. Specifically,
+ Rename basicblock-sections to basic-block-sections
+ Rename unique-bb-section-names to unique-basic-block-section-names
Differential Revision: https://reviews.llvm.org/D84462
https://reviews.llvm.org/D84909
Patch adds two new GICombinerRules, one for G_INTTOPTR and one for
G_PTRTOINT. The G_INTTOPTR elides ptr2int(int2ptr(x)) to a copy of x, if
the cast is within the same address space. The G_PTRTOINT elides
int2ptr(ptr2int(x)) to a copy of x. Patch additionally adds new combiner
tests for the AArch64 target to test these new combiner rules.
Patch by mkitzan
Just the obvious implementation that rewrites the result type. Also fix
warning from EXTRACT_SUBVECTOR legalization that triggers on the test.
Differential Revision: https://reviews.llvm.org/D84706
This fixes an assertion failure that was being triggered in
SelectionDAG::getZeroExtendInReg(), where it was trying to extend the <2xi32>
to i64 (which should have been <2xi64>).
Fixes: rdar://66016901
Differential Revision: https://reviews.llvm.org/D84884
In cases where the alignment of the datatype is smaller than
expected by the instruction, the address is aligned. The aligned
address is used for the load, but wasn't used for the store
conditional, which resulted in a run-time alignment exception.
Summary:
This patch implements -ffunction-sections on AIX.
This patch focuses on assembly generation.
Follow-on patch needs to handle:
1. -ffunction-sections implication for jump table.
2. Object file generation path and associated testing.
Differential Revision: https://reviews.llvm.org/D83875
Summary:
In the phi-node-elimination pass, we set the killed flag incorrectly.
When we eliminate the PHI node, we replace the PHI with a copy for the
incoming value.
Before this patch, we will set incoming value as killed(PHICopy). And
we will remove the killed flag from last using incoming value(OldKill).
This is correct, only if the new PHICopy is after the OldKill.
Reviewed By: bjope
Differential Revision: https://reviews.llvm.org/D80886
I still think it's highly questionable that we have two intrinsics
with identical behavior and only vary by the name of the libcall used
if it happens to be lowered that way, but try to reduce the feature
delta between SDAG and GlobalISel for recently added intrinsics. I'm
not sure which opcode should be considered the canonical one, but
lower roundeven back to round.
This change is mechanical, it just removes the restriction and updates tests. The key building blocks were submitted in 31342eb and 8fe2abc.
Note that this (and preceeding changes) entirely subsumes D83965. I did includes a couple of it's tests.
From the codegen changes, an interesting observation: this doesn't actual reduce spilling, it just let's the register allocator do it's job. That results in a slightly different overall result which has both pros and cons over the eager spill lowering. (i.e. We'll have some perf tuning to do once this is stable.)
Change the way we track how a particular pointer was relocated at a statepoint in selection dag. Previously, we used an optional<location> for the spill lowering, and a block local Register for the newly introduced vreg lowering. Combine all three lowerings (norelocate, spill, and vreg) into a single helper class, and keep a single copy of the information.
This is submitted separately as it really does make the code more readible on it's own, but the indirect motivation is to move vreg tracking from StatepointLowering to FunctionLoweringInfo. This is the last piece needed to support cross block relocations with vregs; that will follow in a separate (non-NFC) patch.
In future, we'd like to use the perfect-shuffle mechanism to deal with these
shuffle permutations. For now, this improves performance by avoiding the
super-expensive const-pool load + tbl instruction.
Differential Revision: https://reviews.llvm.org/D84866
This builds on 3da1a96 on the path towards supporting invokes and cross block relocations. The actual change attempts to be NFC, but does fail in one corner-case explained below.
The change itself is fairly mechanical. Rather than remember SDValues - which are inherently block local - immediately produce a virtual register copy and remember that.
Once this lands, we'll update the FunctionLoweringInfo::StatepointSpillMap map to allow register based lowerings, delete VirtRegs from StatepointLowering, and drop the restriction against cross block relocations. I deliberately separate the semantic part into it's own change for easy of understanding and fault isolation.
The corner-case which isn't quite NFC is that the old implementation implicitly CSEd gc.relocates of the same SDValue regardless of type. The new implementation still only relocates once, but it produces distinct vregs for the bitcast and it's source, whereas SelectionDAG's generic CSE was able to remove the bitcast in the old implementation. Note that the final assembly doesn't change (at least in the test), as our MI level optimizations catch the duplication.
I assert that this is an uninteresting corner-case. It's functionally correct, and if we find a case where this influences performance, we should really be canonicalizing types to i8* at the IR level.
Differential Revision: https://reviews.llvm.org/D84692
Summary:
When doing MachineVerifier for LiveVariables, the MachineVerifier pass
will calculate the LiveVariables, and compares the result with the
result livevars pass gave. If they are different, verifyLiveVariables()
will give error.
But when we calculate the LiveVariables in MachineVerifier, we don't
consider the PHI node, while livevars considers.
This patch is to fix above bug.
Reviewed By: bjope
Differential Revision: https://reviews.llvm.org/D80274
In MachineCopyPropagation::BackwardPropagatableCopy(),
a check is added for multiple destination registers.
The copy propagation is avoided if the copied destination register
is the same register as another destination on the same instruction.
A new test is added. This used to fail on ARM like this:
error: unpredictable instruction, RdHi and RdLo must be different
umull r9, r9, lr, r0
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D82638
I have added tests to:
CodeGen/AArch64/sve-intrinsics-int-arith.ll
for doing simple integer add operations on tuple types. Since these
tests introduced new warnings due to incorrect use of
getVectorNumElements() I have also fixed up these warnings in the
same patch. These fixes are:
1. In narrowExtractedVectorBinOp I have changed the code to bail out
early for scalable vector types, since we've not yet hit a case that
proves the optimisations are profitable for scalable vectors.
2. In DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS I have replaced
calls to getVectorNumElements with getVectorMinNumElements in cases
that work with scalable vectors. For the other cases I have added
asserts that the vector is not scalable because we should not be
using shuffle vectors and build vectors in such cases.
Differential revision: https://reviews.llvm.org/D84016
In DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR I have replaced
calls to getVectorNumElements with getVectorMinNumElements, since
this code path works for both fixed and scalable vector types. For
scalable vectors the index will be multiplied by VSCALE.
Fixes warnings in this test:
sve-sext-zext.ll
Differential revision: https://reviews.llvm.org/D83198
The refactoring encapsulates frequency calculation in MachineBlockFrequencyInfo,
and renames the API to clarify its motivation. It should clarify
frequencies may not be reset 'freely' by users of the analysis, as the
API serves as a partial update to avoid a full analysis recomputation.
Differential Revision: https://reviews.llvm.org/D84427
I think these were added as a workaround for SelectionDAG lacking half
legalization support in the past. I think they should probably be
removed from the IR, but clang does still have a target control to
emit these instead of the native half fpext/fptrunc.
The move constructor of MachineModuleInfo currently does not copy the
MachineFunctions map. This commit fixes this issue.
Patch by Sridhar Gopinath. Thanks!
Differential Revision: https://reviews.llvm.org/D84274
Speed up the method RequiresStackProtector by checking the intrinsic
value of the call. The original code calls getName() that returns an
allocating std::string on each check. This change removes about 96072
std::string instances when compiling sqlite3.c; The function was
discovered with a Facebook-internal performance tool.
Differential Revision: https://reviews.llvm.org/D84620
I have introduced a new TargetFrameLowering query function:
isStackIdSafeForLocalArea
that queries whether or not it is safe for objects of a given stack
id to be bundled into the local area. The default behaviour is to
always bundle regardless of the stack id, however for AArch64 this is
overriden so that it's only safe for fixed-size stack objects.
There is future work here to extend this algorithm for multiple local
areas so that SVE stack objects can be bundled together and accessed
from their own virtual base-pointer.
Differential Revision: https://reviews.llvm.org/D83859
Store Addr and Store Addr+8 are clusterable pair. They have memory(ctrl) dependency on different loads.
Current implementation will put these two stores into different group and miss to cluster them.
Reviewed By: evandro
Differential Revision: https://reviews.llvm.org/D84139
Summary:
In parallelizeChainedStores, a TokenFactor was created with the size greater than 3000.
We found that DAGCombiner::visitTokenFactor will consume a huge amount of time on
such nodes. Since the number of operands already exceeds TokenFactorInlineLimit, we propose
to give up simplification with the consideration of compile time.
Reviewers:
@spatel, @arsenm
Differential Revision:
https://reviews.llvm.org/D84204
(Disabled under flag for the moment)
This is part of a larger project wherein we are finally integrating lowering of gc live operands with the register allocator. Today, we force spill all operands in SelectionDAG. The code to do so is distinctly non-optimal. The approach this patch is working towards is to instead lower the relocations directly into the MI form, and let the register allocator pick which ones get spilled and which stack slots they get spilled to. In terms of performance, the later part is actually more important as it avoids redundant shuffling of values between stack slots.
This particular change adds ISEL support to produce the variadic def STATEPOINT form required by the above. In particular, the first N are lowered to variadic tied def/use pairs. So new statepoint looks like this:
reloc1,reloc2,... = STATEPOINT ..., base1, derived1<tied-def0>, base2, derived2<tied-def1>, ...
N is limited by the maximal number of tied registers machine instruction can have (15 at the moment).
The current patch is restricted to handling relocations within a single basic block. Cross block relocations (e.g. invokes) are handled via the legacy mechanism. This restriction will be relaxed in future patches.
Patch By: dantrushin
Differential Revision: https://reviews.llvm.org/D81648
Common up some existing MBB name printing logic into a single place.
Note that basic block dumping now prints the same set of attributes as
the MIRPrinter.
Change-Id: I8f022bbd922e831bc96d63143d7472c03282530b
Differential Revision: https://reviews.llvm.org/D83253
Emit DWARF 5 call-site symbols even though DWARF 4 is set,
only in the case of LLDB tuning.
This patch addresses PR46643.
Differential Revision: https://reviews.llvm.org/D83463
PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list.
This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.
SONY debugger does not prefer debug entry values feature, so
the plan is to avoid production of the entry values
by default when the tuning is SCE debugger.
The feature still can be enabled with the -debug-entry-values
option for the testing/development purposes.
This patch addresses PR46643.
Differential Revision: https://reviews.llvm.org/D83462
In the included test case the align 16 allowed the v23f32 load to handled as load v16f32, load v4f32, and load v4f32(one element not used). These loads all need to be concatenated together into a final vector. In this case we tried to concatenate the two v4f32 loads to match the type of the v16f32 load so we could do a second concat_vectors, but those loads alone only add up to v8f32. So we need to two v4f32 undefs to pad it.
It appears we've tried to hack around a similar issue in this code before by adding undef padding to loads in one of the earlier loops in this function. Originally in r147964 by padding all loads narrower than previous loads to the same size. Later modifed to only the last load in r293088. This patch removes that earlier code and just handles it on demand where we know we need it.
Fixes PR46820
Differential Revision: https://reviews.llvm.org/D84463
Widen or narrow a type to a type with the same scalar size as
another. This can be used to force G_PTR_ADD/G_PTRMASK's scalar
operand to match the bitwidth of the pointer type. Use this to
disallow narrower types for G_PTRMASK.
This adds the llvm.abs(), llvm.umin(), llvm.umax(), llvm.smin(),
and llvm.smax() intrinsics specified in D81829. For SelectionDAG,
the ISD opcodes and all the legalization and lowering already exist,
so this just wires them up to the intrinsic in the SDAG builder and
adds rudimentary tests. For GlobalISel only the min/max intrinsics
are wired up, as llvm.abs() will require the addition of a G_ABS op,
and corresponding legalization support.
Differential Revision: https://reviews.llvm.org/D84125
Clarify the relation between a block's BlockFrequency and the
getEntryFreq() API, and added an API for the relatively common case of
finding a block's frequency relative to the entrypoint.
Added / moved some comments to header.
Differential Revision: https://reviews.llvm.org/D84357
Add support in LegalizerHelper for lowering G_SADDSAT etc. either
using add/subtract-with-overflow or using max/min instructions.
Enable this lowering for AMDGPU so it can be tested. The legalization
rules are still approximate and skips out on using the clamp bit to
treat these as legal, which has never been used before. This also
doesn't yet try to deal with expanding SALU cases.
Summary: We do this already for output operands, but missed it for (non-tied) input operands.
Reviewers: arsenm, Petar.Avramovic
Reviewed By: arsenm
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, llvm-commits, kerbowa
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83763
On systems where size() doesn't return unsigned long, this leads to an
overloading mismatch. Convert the constant to whatever type is used for
Q.size() on the system.
Currently popFromQueueImpl iterates over all candidates to find the best
one. While the candidate queue is small, this is not a problem. But it
becomes a problem once the queue gets larger. For example, the snippet
below takes 330s to compile with llc -O0, but completes in 3s with this
patch.
define void @test(i4000000* %ptr) {
entry:
store i4000000 0, i4000000* %ptr, align 4
ret void
}
This patch limits the number of candidates to check to 1000. This limit
ensures that it never triggers for test-suite/SPEC2000/SPEC2006 on X86
and AArch64 with -O3, while still drastically limiting the compile-time
in case of very large queues.
It would be even better to use a binary heap to manage to queue
(D83335), but some heuristics change the score of a node in the queue
after another node has been scheduled. I plan to address this for
backends that use the MachineScheduler in the future, but that requires
a more careful evaluation. In the meantime, the limit should help users
impacted by this issue.
The patch includes a slightly smaller version of the motivating example
as test case, to guard against the issue.
Reviewers: efriedma, paquette, niravd
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84328
Test case `test/CodeGen/WebAssembly/stackified-debug.ll`
was failing due to malformed DwarfExpression.
This failure has been seen in lot of bots, for instance in:
http://lab.llvm.org:8011/builders/lld-x86_64-ubuntu-fast/builds/18794
: 'RUN: at line 1'
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/build/bin/llc
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/build/bin/FileCheck /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/test/CodeGen/WebAssembly/stackified-debug.ll
home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/test/CodeGen/WebAssembly/stackified-debug.ll:26:10: error: CHECK: expected string not found in input
CHECK: .int16 4 # Loc expr size
^
<stdin>:34:2: note: scanning from here
.int16 3 # Loc expr size
Differential Revision: https://reviews.llvm.org/D83560
This patch was reverted in 9d2da6759b due to assertion failure seen
in `test/DebugInfo/Sparc/subreg.ll`. Assertion failure was happening
due to malformed/unhandeled DwarfExpression.
Differential Revision: https://reviews.llvm.org/D83560
Summary:
llvm is missing support for DW_OP_implicit_value operation.
DW_OP_implicit_value op is indispensable for cases such as
optimized out long double variables.
For intro refer: DWARFv5 Spec Pg: 40 2.6.1.1.4 Implicit Location Descriptions
Consider the following example:
```
int main() {
long double ld = 3.14;
printf("dummy\n");
ld *= ld;
return 0;
}
```
when compiled with tunk `clang` as
`clang test.c -g -O1` produces following location description
of variable `ld`:
```
DW_AT_location (0x00000000:
[0x0000000000201691, 0x000000000020169b): DW_OP_constu 0xc8f5c28f5c28f800, DW_OP_stack_value, DW_OP_piece 0x8, DW_OP_constu 0x4000, DW_OP_stack_value, DW_OP_bit_piece 0x10 0x40, DW_OP_stack_value)
DW_AT_name ("ld")
```
Here one may notice that this representation is incorrect(DWARF4
stack could only hold integers(and only up to the size of address)).
Here the variable size itself is `128` bit.
GDB and LLDB confirms this:
```
(gdb) p ld
$1 = <invalid float value>
(lldb) frame variable ld
(long double) ld = <extracting data from value failed>
```
GCC represents/uses DW_OP_implicit_value in these sort of situations.
Based on the discussion with Jakub Jelinek regarding GCC's motivation
for using this, I concluded that DW_OP_implicit_value is most appropriate
in this case.
Link: https://gcc.gnu.org/pipermail/gcc/2020-July/233057.html
GDB seems happy after this patch:(LLDB doesn't have support
for DW_OP_implicit_value)
```
(gdb) p ld
p ld
$1 = 3.14000000000000012434
```
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83560
Summary: These should have half float as the element type
Reviewers: cameron.mcinally, efriedma, sdesmalen, paulwalker-arm
Reviewed By: paulwalker-arm
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D84211
This is an alternative proposal to D81476 (and D82084) - the details were sufficiently confusing to me it seemed easier to write some code and see how it looks.
Reviewers: SouraVX
Differential Revision: https://reviews.llvm.org/D84278
This was structured in a way that implied every split argument is in
memory, or in registers. It is possible to pass an original argument
partially in registers, and partially in memory. Transpose the logic
here to only consider a single piece at a time. Every individual
CCValAssign should be treated independently, and any merge to original
value needs to be handled later.
This is in preparation for merging some preprocessing hacks in the
AMDGPU calling convention lowering into the generic code.
I'm also not sure what the correct behavior for memlocs where the
promoted size is larger than the original value. I've opted to clamp
the memory access size to not exceed the value register to avoid the
explicit trunc/extend/vector widen/vector extract instruction. This
happens for AMDGPU for i8 arguments that end up stack passed, which
are promoted to i16 (I think this is a preexisting DAG bug though, and
they should not really be promoted when in memory).
Summary:
AIX assembly's .set directive is not usable for aliasing purpose.
We need to use extra-label-at-defintion strategy to generate symbol
aliasing on AIX.
Reviewed By: DiggerLin, Xiangling_L
Differential Revision: https://reviews.llvm.org/D83252
Summary:
This patch reduces file size in debug builds by dropping variable locations a
debugger user will not see.
After building the debug entity history map we loop through it. For each
variable we look at each entry. If the entry opens a location range which does
not intersect any of the variable's scope's ranges then we mark it for removal.
After visiting the entries for each variable we also mark any clobbering
entries which will no longer be referenced for removal, and then finally erase
the marked entries. This all requires the ability to query the order of
instructions, so before this runs we number them.
Tests:
Added llvm/test/DebugInfo/X86/trim-var-locs.mir
Modified llvm/test/DebugInfo/COFF/register-variables.ll
Branch folding merges the tails of if.then and if.else into if.else. Each
blocks' debug-locations point to different scopes so when they're merged we
can't use either. Because of this the variable 'c' ends up with a location
range which doesn't cover any instructions in its scope; with the patch
applied the location range is dropped and its flag changes to IsOptimizedOut.
Modified llvm/test/DebugInfo/X86/live-debug-variables.ll
Modified llvm/test/DebugInfo/ARM/PR26163.ll
In both tests an out of scope location is now removed. The remaining location
covers the entire scope of the variable allowing us to emit it as a single
location.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D82129
There are a few questionable things about this intrinsic and existing
DAG implementation. For some reason the intrinsic hardcodes the second
operand to be scalar-only i32, and SelectionDAG builder makes a
legalization decision based on whether the operand is constant.
The AMDGPU handling of f16 vectors is terrible still since it gets
scalarized even when the vector operation is legal.
The code is is essentially duplicated between the non-strict and
strict case. Apparently no other expansions are currently trying to do
this. This is mostly because I found the behavior of
getStrictFPOperationAction to be confusing. In the ARM case, it would
expand strict_fsub even though it shouldn't due to the later check. At
that point, the logic required to check for legality was more complex
than just duplicating the 2 instruction expansion.
Current tail duplication in machine block placement pass uses block frequency
information in cost model. But frequency number has only relative meaning
compared to other basic blocks in the same function. A large frequency number
doesn't mean it is hot and a small frequency number doesn't mean it is cold.
To overcome this problem, this patch uses profile count in cost model if it's
available. So we can tail duplicate real hot basic blocks.
Differential Revision: https://reviews.llvm.org/D83265
Try to make the behavior more consistent with getGCDType, and bias
towards returning something closer to the source type whenever there's
an ambiguity.
Try harder to find a canonical unmerge type when trying to cover the
desired target type. Handle finding a compatible unmerge type for two
vectors with different element types. This will return the largest
multiple of the source vector element that will evenly divide the
target vector type.
Also make the handling mixing scalars and vectors, and prefer the
source element type as the unmerge target type.
This isn't a natively supported operation, so convert it to a
mask+compare.
In addition to the operation itself, fix up some surrounding stuff to
make the testcase work: we need concat_vectors on i1 vectors, we need
legalization of i1 vector truncates, and we need to fix up all the
relevant uses of getVectorNumElements().
Differential Revision: https://reviews.llvm.org/D83811
Its effect could be achieved by
`-stop-after`,`-print-after`,`-print-after-all`. But a few tests need to
print MIR after ISel which could not be done with
`-print-after`/`-stop-after` since isel pass does not have commandline name.
That's the reason `--print-machineinstrs` is downgraded to
`--print-after-isel` in this patch. `--print-after-isel` could be
removed after we switch to new pass manager since isel pass would have a
commandline text name to use `print-after` or equivalent switches.
The motivation of this patch is to reduce tests dependency on
would-be-deprecated feature.
Reviewed By: arsenm, dsanders
Differential Revision: https://reviews.llvm.org/D83275
Summary:
This support is needed for the Fortran array variables with pointer/allocatable
attribute. This support enables debugger to identify the status of variable
whether that is currently allocated/associated.
for pointer array (before allocation/association)
without DW_AT_associated
(gdb) pt ptr
type = integer (140737345375288:140737354129776)
(gdb) p ptr
value requires 35017956 bytes, which is more than max-value-size
with DW_AT_associated
(gdb) pt ptr
type = integer (:)
(gdb) p ptr
$1 = <not associated>
for allocatable array (before allocation)
without DW_AT_allocated
(gdb) pt arr
type = integer (140737345375288:140737354129776)
(gdb) p arr
value requires 35017956 bytes, which is more than max-value-size
with DW_AT_allocated
(gdb) pt arr
type = integer, allocatable (:)
(gdb) p arr
$1 = <not allocated>
Testing
- unit test cases added
- check-llvm
- check-debuginfo
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83544
Add narrowScalarFor action.
Add narrow scalar for typeIndex == 0 for G_FPTOSI/G_FPTOUI.
Legalize using narrowScalarFor as s16->s32 G_FPTOSI/G_FPTOUI
followed by s32->s64 G_SEXT/G_ZEXT.
Differential Revision: https://reviews.llvm.org/D84010
There were cases where a do-while loop would be converted to a while
loop before finding out that it would be unsafe to expand the SCEV in
this situation and then bailing out of hardware loop conversion.
This patch checks if it would be unsafe to expand the SCEV and if so stops converting the do-while into a while, allowing conversion to a hardware loop.
Differential Revision: https://reviews.llvm.org/D83953
tryLatency compares two sched candidates. For the top zone it prefers
the one with lesser depth, but only if that depth is greater than the
total latency of the instructions we've already scheduled -- otherwise
its latency would be hidden and there would be no stall.
Unfortunately it only tests the depth of one of the candidates. This can
lead to situations where the TopDepthReduce heuristic does not kick in,
but a lower priority heuristic chooses the other candidate, whose depth
*is* greater than the already scheduled latency, which causes a stall.
The fix is to apply the heuristic if the depth of *either* candidate is
greater than the already scheduled latency.
All this also applies to the BotHeightReduce heuristic in the bottom
zone.
Differential Revision: https://reviews.llvm.org/D72392
MBBs are not allowed to have non-terminator instructions after the first
terminator. Currently in some cases (see the modified test),
EmitSchedule can add DBG_VALUEs after the last terminator, for example
when referring a debug value that gets folded into a TCRETURN
instruction on ARM.
This patch updates EmitSchedule to move inserted DBG_VALUEs just before
the first terminator. I am not sure if there are terminators produce
values that can in turn be used by a DBG_VALUE. In that case, moving the
DBG_VALUE might result in referencing an undefined register. But in any
case, it seems like currently there is no way to insert a proper DBG_VALUEs
for such registers anyways.
Alternatively it might make sense to just remove those extra DBG_VALUES.
I am not too familiar with the details of debug info in the backend and
would appreciate any suggestions on how to address the issue in the best
possible way.
Reviewers: vsk, aprantl, jpaquette, efriedma, paquette
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83561
For now, DIEExpr is used only in two places:
1) in the debug info library unit test suite to emit
a DW_AT_str_offsets_base attribute with the DW_FORM_sec_offset
form, see dwarfgen::DIE::addStrOffsetsBaseAttribute();
2) in DwarfCompileUnit::addLocationAttribute() to generate the location
attribute for a TLS variable.
The later case used an incorrect DWARF form of DW_FORM_udata, which
implies storing an uleb128 value, not a 4/8 byte constant. The generated
result was as expected because DIEExpr::SizeOf() did not handle the used
form, but returned the size of the code pointer by default.
The patch fixes the issue by using more appropriate DWARF forms for
the problematic case and making DIEExpr::SizeOf() more straightforward.
Differential Revision: https://reviews.llvm.org/D83958
Replace std::vector with SmallVector to reduce the number of mallocs.
This method is frequently executed, and the number of elements in the
vector is typically small.
https://reviews.llvm.org/D83920
In an upcoming AMDGPU patch, the scalar cases will be legal and vector
ops should be scalarized, rather than producing a long sequence of
vector ops which will also need to be scalarized.
Use a lazy heuristic that seems to work and improves the thumb2 MVE
test.
Basic support for variadic-def MIR Statepoint:
- Change TableGen STATEPOINT description to variadic out list
(For self-documentation purpose; by itself it does not affect
code generation in any way).
- Update StatepointOpers helper class to handle variadic defs.
- Update MachineVerifier to properly handle them, too.
With this change, new Statepoint instruction can be passed through
backend (excluding ISEL) without errors.
Full change set is available at D81603.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D81645
When the byref attribute is added, there will need to be two similar
functions for the existing cases which have an associate value copy,
and byref which does not. Most, but not all of the existing uses will
use the existing version.
The associated size function added by D82679 also needs to
contextually differ, and will help eliminate a few places still
relying on pointee element types.
Add widenScalar for TypeIdx == 0 for G_SITOFP/G_UITOFP.
Legailize, using widenScalar, as s64->s32 G_SITOFP/G_UITOFP
followed by s32->s16 G_FPTRUNC.
Differential Revision: https://reviews.llvm.org/D83880
This function has a bug which will incorrectly reschedule instructions
after an INLINEASM_BR (which can branch). (The bug may also allow
scheduling past a throwing-CALL, I'm not certain.)
I could fix that bug, but, as the removed FIXME notes, it's better to
attempt rescheduling before converting to 3-addr form, as that may
remove the need to convert in the first place. In fact, the code to do
such reordering was added to this pass only a few months later, in
2011, via the addition of the function rescheduleMIBelowKill. That
code does not contain the same bug.
The removal of the sink3AddrInstruction function is not a no-op: in
some cases it would move an instruction post-conversion, when
rescheduleMIBelowKill would not move the instruction pre-converison.
However, this does not appear to be important: the machine instruction
scheduler can reorder the after-conversion instructions, in any case.
This patch fixes a kernel panic 4.4 LTS x86_64 Linux kernels, when
built with clang after 4b0aa5724f.
Link: https://github.com/ClangBuiltLinux/linux/issues/1085
Differential Revision: https://reviews.llvm.org/D83708
Summary:
This patch modifies IncrementMemoryAddress to use a vscale
when calculating the new address if the data type is scalable.
Also adds tablegen patterns which match an extract_subvector
of a legal predicate type with zip1/zip2 instructions
Reviewers: sdesmalen, efriedma, david-arm
Reviewed By: efriedma, david-arm
Subscribers: tschuett, hiraditya, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83137
When we calculate the weight of a live-interval, add some code to
check if the original live-interval was markied as not spillable and
if so, progagate that information down to the new interval.
Previously we would just recompute a weight for the new interval,
thus, we could in theory just spill live-intervals marked as not
spillable by just splitting them. That goes against the spirit of
a non-spillable live-interval.
E.g., previously we could do:
v1 = // v1 must not be spilled
...
= v1
Split:
v1 = // v1 must not be spilled
...
v2 = v1 // v2 can be spilled
...
v3 = v2 // v3 can be spilled
= v3
There's no test case for that one as we would need to split a
non-spillable live-interval without using LiveRangeEdit to see this
happening.
RegAlloc inserts non-spillable intervals only as part of the spilling
mechanism, thus at this point the intervals are not splittable anymore.
On top of that, RegAlloc uses the LiveRangeEdit API, which already
properly propagate that information.
In other words, this could only happen if a target was to mark
a live-interval as not spillable before register allocation and
split it without using LRE, e.g., through
LiveIntervals::splitSeparateComponent.
The operands of a BUILD_VECTOR must all have the same type, so we can hoist this invariant condition out of the loop.
Differential Revision: https://reviews.llvm.org/D83882
CodeGenPrepare keeps fairly close track of various instructions it's
seen, particularly GEPs, in maps and vectors. However, sometimes those
instructions become dead and get removed while it's still executing.
This triggers AssertingVH references to them in an asserts build and
could lead to miscompiles in a release build (I've only seen a later
segfault though).
So this patch adds a callback to
RecursivelyDeleteTriviallyDeadInstructions which can make sure the
instruction about to be deleted is removed from CodeGenPrepare's data
structures.
Some of the system registers readable on AArch64 and ARM platforms
return different values with each read (for example a timer counter),
these shouldn't be hoisted outside loops or otherwise interfered with,
but the normal @llvm.read_register intrinsic is only considered to read
memory.
This introduces a separate @llvm.read_volatile_register intrinsic and
maps all system-registers on ARM platforms to use it for the
__builtin_arm_rsr calls. Registers declared with asm("r9") or similar
are unaffected.
The existing code already considered this case. Unfortunately a typo in
the condition prevents it from triggering. Also the existing code, had
it run, forgot to do the folding.
This fixes PR42876.
Differential Revision: https://reviews.llvm.org/D65802
This patch handles CFI with basic block sections, which unlike DebugInfo does
not support ranges. The DWARF standard explicitly requires emitting separate
CFI Frame Descriptor Entries for each contiguous fragment of a function. Thus,
the CFI information for all callee-saved registers (possibly including the
frame pointer, if necessary) have to be emitted along with redefining the
Call Frame Address (CFA), viz. where the current frame starts.
CFI directives are emitted in FDE’s in the object file with a low_pc, high_pc
specification. So, a single FDE must point to a contiguous code region unlike
debug info which has the support for ranges. This is what complicates CFI for
basic block sections.
Now, what happens when we start placing individual basic blocks in unique
sections:
* Basic block sections allow the linker to randomly reorder basic blocks in the
address space such that a given basic block can become non-contiguous with the
original function.
* The different basic block sections can no longer share the cfi_startproc and
cfi_endproc directives. So, each basic block section should emit this
independently.
* Each (cfi_startproc, cfi_endproc) directive will result in a new FDE that
caters to that basic block section.
* Now, this basic block section needs to duplicate the information from the
entry block to compute the CFA as it is an independent entity. It cannot refer
to the FDE of the original function and hence must duplicate all the stuff that
is needed to compute the CFA on its own.
* We are working on a de-duplication patch that can share common information in
FDEs in a CIE (Common Information Entry) and we will present this as a follow up
patch. This can significantly reduce the duplication overhead and is
particularly useful when several basic block sections are created.
* The CFI directives are emitted similarly for registers that are pushed onto
the stack, like callee saved registers in the prologue. There are cfi
directives that emit how to retrieve the value of the register at that point
when the push happened. This has to be duplicated too in a basic block that is
floated as a separate section.
Differential Revision: https://reviews.llvm.org/D79978
This fixes warnings raised by Clang's new -Wsuggest-override, in preparation for enabling that warning in the LLVM build. This patch also removes the virtual keyword where redundant, but only in places where doing so improves consistency within a given file. It also removes a couple unnecessary virtual destructor declarations in derived classes where the destructor inherited from the base class is already virtual.
Differential Revision: https://reviews.llvm.org/D83709
ComputeNumSignBits and computeKnownBits both trigger "Scalable flag
may be dropped" warnings when a fixed length vector is extracted
from a scalable vector. This patch assumes nothing about the
demanded elements thus matching the behaviour when extracting a
scalable vector from a scalable vector.
Differential Revision: https://reviews.llvm.org/D83642
In DAGCombiner::TransformFPLoadStorePair we were dropping the scalable
property of TypeSize when trying to create an integer type of equivalent
size. In fact, this optimisation makes no sense for scalable types
since we don't know the size at compile time. I have changed the code
to bail out when encountering scalable type sizes.
I've added a test to
llvm/test/CodeGen/AArch64/sve-fp.ll
that exercises this code path. The test already emits an error if it
encounters warnings due to implicit TypeSize->uint64_t conversions.
Differential Revision: https://reviews.llvm.org/D83572
Caused by uninitialized load of llvm::DwarfDebug::PrevCU:
llvm::DwarfCompileUnit::addRange () at ../lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:276
llvm::DwarfDebug::endFunctionImpl () at ../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:1586
llvm::DebugHandlerBase::endFunction () at ../lib/CodeGen/AsmPrinter/DebugHandlerBase.cpp:319
llvm::AsmPrinter::EmitFunctionBody () at ../lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1230
llvm::ARMAsmPrinter::runOnMachineFunction () at ../lib/Target/ARM/ARMAsmPrinter.cpp:161
Most of the DebugInfo tests under `LLVM_LIT_ARGS:STRING=-sv --vg` prior to this fix, and pass with the fix applied.
Reviewed By: aprantl, dblaikie
Differential Revision: https://reviews.llvm.org/D81631
We have this generic transform in IR (instcombine),
but as shown in PR41098:
http://bugs.llvm.org/PR41098
...the pattern may emerge in codegen too.
x86 has a potential refinement/reversal opportunity here,
but that should come later or needs a target hook to
avoid the transform. Converting to bswap is the more
specific form, so we should use it if it is available.
This carves out an exception for a pair of consecutive loads that are
reversed from the consecutive order of a pair of stores. All of the
existing profitability/legality checks for the memops remain between
the 2 altered hunks of code.
This should give us the same x86 base-case asm that gcc gets in
PR41098 and PR44895:
http://bugs.llvm.org/PR41098http://bugs.llvm.org/PR44895
I think we are missing a potential subsequent conversion to use "movbe"
if the target supports that. That might be similar to what AArch64
would use to get "rev16".
Differential Revision: https://reviews.llvm.org/D83567
This carves out an exception for a pair of consecutive loads that are
reversed from the consecutive order of a pair of stores. All of the
existing profitability/legality checks for the memops remain between
the 2 altered hunks of code.
This should give us the same x86 base-case asm that gcc gets in
PR41098 and PR44895:i
http://bugs.llvm.org/PR41098http://bugs.llvm.org/PR44895
I think we are missing a potential subsequent conversion to use "movbe"
if the target supports that. That might be similar to what AArch64
would use to get "rev16".
Differential Revision:
Summary:
Helper used when splitting load & store operations to calculate
the pointer + offset for the high half of the split
Reviewers: efriedma, sdesmalen, david-arm
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83577
Check that input size matches size of destination reg class.
Attempt to extend input size when needed.
Differential Revision: https://reviews.llvm.org/D83384
fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)
This is only allowed when "reassoc" is present on the fadd.
As discussed in D80801, this transform goes beyond
what is allowed by "contract" FMF (-ffp-contract=fast).
That is because we are fusing the trailing add of 'E' with a
multiply, but without "reassoc", the code mandates that the
products A*B and C*D are added together before adding in 'E'.
I've added this example to the LangRef to try to clarify the
meaning of "contract". If that seems reasonable, we should
probably do something similar for the clang docs because
there does not appear to be any formal spec for the behavior
of -ffp-contract=fast.
Differential Revision: https://reviews.llvm.org/D82499
This patch adds some missing information to the LF_BUILDINFO which allows for rebuilding an .OBJ without any external dependency but the .OBJ itself (other than the compiler executable).
Some tools need this information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO therefore stores a full path to the compiler, the PWD (which is the CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variable). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.
For more information see PR36198 and D43002.
Differential Revision: https://reviews.llvm.org/D80833
In DAGTypeLegalizer::SetSplitVector I have changed calls in the assert
from getVectorNumElements() to getVectorElementCount(), since this
code path works for both fixed and scalable vectors.
This fixes up one warning in the test:
sve-sext-zext.ll
Differential Revision: https://reviews.llvm.org/D83196
This patch replaces some invalid calls to getVectorNumElements() with calls
to getVectorMinNumElements() instead, since the code paths changed in this
patch work for both fixed and scalable vector types.
Fixes warnings in this test:
sve-sext-zext.ll
Differential Revision: https://reviews.llvm.org/D83203
Since the `RISCVExpandPseudo` pass has been split from
`RISCVExpandAtomicPseudo` pass, it would be nice to run the former as
early as possible (The latter has to be run as late as possible to
ensure correctness). Running earlier means we can reschedule these pairs
as we see fit.
Running earlier in the machine pass pipeline is good, but would mean
teaching many more passes about `hasLabelMustBeEmitted`. Splitting the
basic blocks also pessimises possible optimisations because some
optimisations are MBB-local, and others are disabled if the block has
its address taken (which is notionally what `hasLabelMustBeEmitted`
means).
This patch uses a new approach of setting the pre-instruction symbol on
the AUIPC instruction to a temporary symbol and referencing that. This
avoids splitting the basic block, but allows us to reference exactly the
instruction that we need to. Notionally, this approach seems more
correct because we do actually want to address a specific instruction.
This then allows the pass to be moved much earlier in the pass pipeline,
before both scheduling and register allocation. However, to do so we
must leave the MIR in SSA form (by not redefining registers), and so use
a virtual register for the intermediate value. By using this virtual
register, this pass now has to come before register allocation.
Reviewed By: luismarques, asb
Differential Revision: https://reviews.llvm.org/D82988
Summary:
When legalizing a biscast operation from an fp16 operand to an i16 on a
target that requires both input and output types to be promoted to
32-bits, an assertion can fail when building the new node due to a
mismatch between the the operation's result size and the type specified to
the node.
This patches fix the issue by making sure the bit width of the types
match for the FP_TO_FP16 node, covering the difference with an extra
ANYEXTEND operation.
Reviewers: ostannard, efriedma, pirama, jmolloy, plotfi
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82552
This should be a typo introduced in D69275, which may cause an unknown
segment fault in getNode.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D83376
9cac4e6d1403554b06ec2fc9d834087b1234b695/D32628 intended to eliminate
this, and move all isel pseudo expansion to FinalizeISel. This was a
bad rebase or something, and failed to actually delete this call.
GlobalISel also has a redundant call of finalizeLowering. However, it
requires more work to remove it since it currently triggers a lot of
verifier errors in tests.
It looks like 9cac4e6d14 accidentally
added a second copy of this from a bad rebase or something. This
second copy was added, and the finalizeLowering call was not deleted
as intended.
Updated the AArch64 tests the best I could with my vague, inferred
understanding of AArch64 register banks. As far as I can tell, there
is only one 32-bit/64-bit type which will use the gpr register bank,
so we have to use the fpr bank for the other operand.
This removes existing code duplication and allows us to
assert that we are handling the expected cases.
We have a list of outstanding bugs that could benefit by
handling truncated source values, so that's a possible
addition going forward.
ExpandVectorBuildThroughStack is also used for CONCAT_VECTORS.
However, when calculating the offsets for each of the operands we
incorrectly use the element size rather than actual size and thus
the stores overlap.
Differential Revision: https://reviews.llvm.org/D83303
Summary:
The following combine currently breaks in the DAGCombiner:
```
extract_vector_elt (concat_vectors v4i16:a, v4i16:b), x
-> extract_vector_elt a, x
```
This happens because after we have combined these nodes we have inserted nodes
that use individual instances of the vector element type. In the above example
i16. However this isn't a legal type on all backends, and when the combining pass calls
the legalizer it breaks as it expects types to already be legal. The type legalizer has
already been run, and running it again would make a mess of the nodes.
In the example code at least, the generated code is still efficient after the change.
Reviewers: miyuki, arsenm, dmgreen, lebedev.ri
Reviewed By: miyuki, lebedev.ri
Subscribers: lebedev.ri, wdng, hiraditya, steven.zhang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83231
Occasionally we see absolutely massive basic blocks, typically in global
constructors that are vulnerable to heavy inlining. When these blocks are
dense with DBG_VALUE instructions, we can hit near quadratic complexity in
DwarfDebug's validThroughout function. The problem is caused by:
* validThroughout having to step through all instructions in the block to
examine their lexical scope,
* and a high proportion of instructions in that block being DBG_VALUEs
for a unique variable fragment,
Leading to us stepping through every instruction in the block, for (nearly)
each instruction in the block.
By adding this guard, we force variables in large blocks to use a location
list rather than a single-location expression, as shown in the added test.
This shouldn't change the meaning of the output DWARF at all: instead we
use a less efficient DWARF encoding to avoid a poor-performance code path.
Differential Revision: https://reviews.llvm.org/D83236
In DAGTypeLegalizer::SplitVecRes_ExtendOp I have replaced an invalid
call to getVectorNumElements() with a call to getVectorMinNumElements(),
since the code path works for both fixed and scalable vectors.
This fixes up a warning in the following test:
sve-sext-zext.ll
Differential Revision: https://reviews.llvm.org/D83197
Calling getVectorNumElements() is not safe for scalable vectors and we
should normally use getVectorElementCount() instead. However, for the
code changed in this patch I decided to simply move the instantiation of
the variable 'OutNumElems' lower down to the place where only fixed-width
vectors are used, and hence it is safe to call getVectorNumElements().
Fixes up one warning in this test:
sve-sext-zext.ll
Differential Revision: https://reviews.llvm.org/D83195
For the GetElementPtr case in function
AddressingModeMatcher::matchOperationAddr
I've changed the code to use the TypeSize class instead of relying
upon the implicit conversion to a uint64_t. As part of this we now
check for scalable types and if we encounter one just bail out for
now as the subsequent optimisations doesn't currently support them.
This changes fixes up all warnings in the following tests:
llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll
llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll
Differential Revision: https://reviews.llvm.org/D83124
`__stack_chk_fail` does not return, but `unreachable` was not generated
following `call __stack_chk_fail`. This had a possibility to generate an
invalid binary for functions with a return type, because
`__stack_chk_fail`'s return type is void and `call __stack_chk_fail` can
be the last instruction in the function whose return type is non-void.
Generating `unreachable` after it makes sure CFGStackify's
`fixEndsAtEndOfFunction` handles it correctly.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D83277
handleAssignments was assuming every argument type is an MVT, and
assignArg would always fail. This fixes one of the hacks in the
current AMDGPU calling convention code that pre-processes the
arguments.
This is inspired by D81648. The basic idea is to have the set of SDValues which are lowered as either constants or direct frame references explicit in one place, and to separate them clearly from the spilling logic.
This is not NFC in that the handling of constants larger than > 64 bit has changed. The old lowering would crash on values which could not be encoded as a sign extended 64 bit value. The new lowering just spills all constants > 64 bits. We could be consistent about doing the sext(Con64) optimization, but I happen to know that this code path is utterly unexercised in practice, so simple is better for now.
handleMoveDown or handleMoveUp cannot properly repair a main
range of a LiveInterval since they only get LiveRange. There
is a problem if certain use has moved few segments away and
there is a hole in the main range in between of these two
locations. We may get a SubRange with a very extended Segment
spanning several Segments of the main range and also spanning
that hole. If that happens then we end up with the main range
not covering its SubRange which is an error.
It might be possible to attempt fixing the main range in place
just between of the old and new index by extending all of its
Segments in between, but it is unclear this logic will be
faster than just straight constructMainRangeFromSubranges,
which itself is pretty cheap since it only contains interval
logic. That will also require shrinkToUses() call after which
is probably even more expensive.
In the test second move is from 64B to 92B for the sub1.
Subrange is correctly fixed:
L000000000000000C [16r,32B:0)[32B,92r:1) 0@16r 1@32B-phi
But the main range has a hole in between 80d and 88r after
updateRange():
%1 [16r,32B:0)[32B,80r:4)[80r,80d:3)[88r,96r:1)[96r,160B:2)
Since source position is 64B this segment is not even considered
by the updateRange().
Differential Revision: https://reviews.llvm.org/D82916
Summary:
When splitting a store of a scalable type, the new address is
calculated in SplitVecOp_STORE using a vscale and an add instruction.
Reviewers: sdesmalen, efriedma, david-arm
Reviewed By: david-arm
Subscribers: tschuett, hiraditya, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83041
Summary:
When splitting a load of a scalable type, the new address is
calculated in SplitVecRes_LOAD using a vscale and an add instruction.
This patch also adds a DAG combiner fold to visitADD for vscale:
- Fold (add (vscale(C0)), (vscale(C1))) to (add (vscale(C0 + C1)))
Reviewers: sdesmalen, efriedma, david-arm
Reviewed By: david-arm
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82792
In an earlier commit 584d0d5c17 I
added functionality to allow AArch64 CodeGen support for falling
back to DAG ISel when Global ISel encounters scalable vector
types. However, it seems that we were not falling back early
enough as llvm::getLLTForType was still being invoked for scalable
vector types.
I've added a new fallback function to the call lowering class in
order to catch this problem early enough, rather than wait for
lowerFormalArguments to reject scalable vector types.
Differential Revision: https://reviews.llvm.org/D82524
This patch fixes all remaining warnings in:
llvm/test/CodeGen/AArch64/sve-trunc.ll
llvm/test/CodeGen/AArch64/sve-vector-splat.ll
I hit some warnings related to getCopyPartsToVector. I fixed two
issues:
1. In widenVectorToPartType() we assumed that we'd always be
using BUILD_VECTOR nodes to expand from one vector type to another,
which is incorrect for scalable vector types. I've fixed this for now
by simply bailing out immediately for scalable vectors.
2. In getCopyToPartsVector() I've changed the code to compare
the element counts of different types.
Differential Revision: https://reviews.llvm.org/D83028
X / (fabs(A) * sqrt(Z)) --> X / sqrt(A*A*Z) --> X * rsqrt(A*A*Z)
In the motivating case from PR46406:
https://bugs.llvm.org/show_bug.cgi?id=46406
...this is restoring the sequence that was originally in the source code.
We extracted a term from within the sqrt because we do not know in
instcombine whether a target will expand a sqrt call.
Note: we could say that the transform in IR should be restricted, but
that would not solve the problem if the source was originally in the
pattern shown here.
This is a gray area for fast-math-flag requirements. I think we should at
least check fast-math-flags on the fdiv and fmul because I view this
transform as 2 pieces: reassociate the fmul operands and form reciprocal
from the fdiv (as with the existing transform). We could argue that the
sqrt also needs FMF, but that was not required before, so we should change
that in a follow-up patch if that seems better.
We don't currently have a way to check that the target will produce a sqrt
or recip estimate without actually creating nodes (the APIs are SDValue
getSqrtEstimate() and SDValue getRecipEstimate()), so we clean up
speculatively created nodes if we are not able to create an estimate.
The x86 test with doubles verifies that we are not changing a test with
no estimate sequence.
Differential Revision: https://reviews.llvm.org/D82716
Summary:
Avoid exposing details about how children are stored. This will enable
subsequent type-erasure changes.
New methods are introduced to cover common access patterns.
Change-Id: Idb5f4b1b9c84e4cc71ddb39bb52a388682f5674f
Reviewers: arsenm, RKSimon, mehdi_amini, courbet
Subscribers: qcolombet, sdardis, wdng, hiraditya, jrtc27, zzheng, atanasyan, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83083
Summary:
When a desired symbol name contains invalid character that the
system assembler could not process, we need to emit .rename
directive in assembly path in order for that desired symbol name
to appear in the symbol table.
Reviewed By: hubert.reinterpretcast, DiggerLin, daltenty, Xiangling_L
Differential Revision: https://reviews.llvm.org/D82481
This matches the DAG behavior where this is called after the loop
checking for calls. The AMDGPU implementation depends on knowing if
there are calls in the function or not, so move this later.
Another problem is finalizeLowering is actually called twice; I was
seeing weird inconsistencies since the first call would produce
unexpected results and the second run would correct them in some
contexts. Since this requires disabling the verifier, and it's useful
to serialize the MIR immediately after selection, FinalizeISel should
probably not be a real pass.
Use a simpler code sequence when the shift amount is known not to be
zero modulo the bit width.
Nothing much uses this until D77152 changes the translation of fshl and
fshr intrinsics.
Differential Revision: https://reviews.llvm.org/D82540
Using a negation instead of a subtraction from a constant can save an
instruction on some targets.
Nothing much uses this until D77152 changes the translation of fshl and
fshr intrinsics.
Differential Revision: https://reviews.llvm.org/D82539
We need to ensure that the sign bits of the result all match
so we can't fold to undef.
Similar to PR46585.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D83163
zext_vector_inreg needs to produces 0s in the extended bits and
sext_vector_inreg needs to produce upper bits that are all the
same. So we should fold them to a 0 vector instead of undef.
Fixes PR46585.
Currently matchBinOpReduction only handles shufflevector reduction patterns, but in many cases these only occur in the final stages of a reduction, once we're down to legal vector widths.
Before this its likely that we are performing reductions using subvector extractions to repeatedly split the source vector in half and perform the binop on the halves.
Assuming we've found a non-partial reduction, this patch continues looking for subvector reductions as far as it can beyond the last shufflevector.
Fixes PR37890
Given a loop with two subloops, it should be possible for both to be
converted to hardware loops. That's what this patch does, simply enough.
It slightly alters the loop iterating order to try and convert all
subloops. If one (or more) succeeds, it stops as before.
Differential Revision: https://reviews.llvm.org/D78502
SelectionDAGBuilder converts logic-of-compares into multiple branches based
on a boolean TLI setting in isJumpExpensive(). But that probably never
considered the pattern of extracted bools from a vector compare - it seems
unlikely that we would want to turn vector logic into control-flow.
The motivating x86 reduction case is shown in PR44565:
https://bugs.llvm.org/show_bug.cgi?id=44565
...and that test shows the expected improvement from using pmovmsk codegen.
For AArch64, I modified the test to include an extra op because the simpler
test gets transformed by a codegen invocation of SimplifyCFG.
Differential Revision: https://reviews.llvm.org/D82602
There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics,
that shouldn't really have been there (I suspect this was a remnant from when
we expected the wider vector always to have come from a vector CONCAT).
When I tried to create a more minimal reproducer, I found a bug in
DAGCombiner where it drops the scalable flag when trying to fold:
extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')
This patch fixes both issues.
Reviewers: david-arm, efriedma, spatel
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82910
Whilst trying to assemble the following test:
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_set2.c
I discovered we were hitting some warnings about possible invalid
calls to getVectorNumElements() in getCopyToPartsVector(). I've
tried to fix these by using ElementCount types where possible and
I've made the assumption that we don't support using a fixed width
vector to copy parts of a scalable vector, and vice versa. Looking
at how the copy is implemented I think that's the right thing for
now.
Differential Revision: https://reviews.llvm.org/D82744
This patch uses ranges for debug information when a function contains basic block sections rather than using [lowpc, highpc]. This is also the first in a series of patches for debug info and does not contain the support for linker relaxation. That will be done as a follow up patch.
Differential Revision: https://reviews.llvm.org/D78851
The caller can't handle the node having multiple results like a
masked load does. So we need to detect the case and do our own
result replacement.
Fixes PR46532.
In visitSCALAR_TO_VECTOR we try to optimise cases such as:
scalar_to_vector (extract_vector_elt %x)
into vector shuffles of %x. However, it led to numerous warnings
when %x is a scalable vector type, so for now I've changed the
code to only perform the combination on fixed length vectors.
Although we probably could change the code to work with scalable
vectors in certain cases, without a proper profit analysis it
doesn't seem worth it at the moment.
This change fixes up one of the warnings in:
llvm/test/CodeGen/AArch64/sve-merging-stores.ll
I've also added a simplified version of the same test to:
llvm/test/CodeGen/AArch64/sve-fp.ll
which already has checks for no warnings.
Differential Revision: https://reviews.llvm.org/D82872
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.
Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.
Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.
Reviewed By: nickdesaulniers, void
Differential Revision: https://reviews.llvm.org/D79794
This prevents the outlined functions from pulling in a lot of unnecessary code
in our downstream libraries/linker. Which stops outlining making codesize
worse in c++ code with no-exceptions.
Differential Revision: https://reviews.llvm.org/D57254
As per documentation of `hasPairLoad`:
"`RequiredAlignment` gives the minimal alignment constraints that must be met to be able to select this paired load."
In this sense, `0` is strictly equivalent to `1`. We make this obvious by using `Align` instead of unsigned.
There is only one implementor of this interface.
Differential Revision: https://reviews.llvm.org/D82958
It's perfectly valid to do certain DAG combines where we extract
subvectors from a concat vector when we have scalable vector types.
However, we can do this in a way that avoids generating compiler
warnings by replacing calls to getVectorNumElements() with
getVectorMinNumElements(). Due to the way subvector extracts are
designed to work with scalable vector types this is ok.
This eliminates some warnings from existing tests in this file:
llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
Differential Revision: https://reviews.llvm.org/D82655
Summary:
This is a fix for PR45009.
When working on D67492 I made DwarfExpression emit a single
DW_OP_entry_value operation covering the whole composite location
description that is produced if a register does not have a valid DWARF
number, and is instead composed of multiple register pieces. Looking
closer at the standard, this appears to not be valid DWARF. A
DW_OP_entry_value operation's block can only be a DWARF expression or a
register location description, so it appears to not be valid for it to
hold a composite location description like that.
See DWARFv5 sec. 2.5.1.7:
"The DW_OP_entry_value operation pushes the value that the described
location held upon entering the current subprogram. It has two
operands: an unsigned LEB128 length, followed by a block containing a
DWARF expression or a register location description (see Section
2.6.1.1.3 on page 39)."
Here is a dwarf-discuss mail thread regarding this:
http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2020-March/004610.html
There was not a strong consensus reached there, but people seem to lean
towards that operations specified under 2.6 (e.g. DW_OP_piece) may not
be part of a DWARF expression, and thus the DW_OP_entry_value operation
can't contain those.
Perhaps we instead want to emit a entry value operation per each
DW_OP_reg* operation, e.g.:
- DW_OP_entry_value(DW_OP_regx sub_reg0),
DW_OP_stack_value,
DW_OP_piece 8,
- DW_OP_entry_value(DW_OP_regx sub_reg1),
DW_OP_stack_value,
DW_OP_piece 8,
[...]
The question then becomes how the call site should look; should a
composite location description be emitted there, and we then leave it up
to the debugger to match those two composite location descriptions?
Another alternative could be to emit a call site parameter entry for
each sub-register, but firstly I'm unsure if that is even valid DWARF,
and secondly it seems like that would complicate the collection of call
site values quite a bit. As far as I can tell GCC does not emit any
entry values / call sites in these cases, so we do not have something to
compare with, but the former seems like the more reasonable approach.
Currently when trying to emit a call site entry for a parameter composed
of multiple DWARF registers a (DwarfRegs.size() == 1) assert is
triggered in addMachineRegExpression(). Until the call site
representation is figured out, and until there is use for these entry
values in practice, this commit simply stops the invalid DWARF from
being emitted.
Reviewers: djtodoro, vsk, aprantl
Reviewed By: djtodoro, vsk
Subscribers: jyknight, hiraditya, fedor.sergeev, jrtc27, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D75270
While validating live-out values, record instructions that look like
a reduction. This will comprise of a vector op (for now only vadd),
a vorr (vmov) which store the previous value of vadd and then a vpsel
in the exit block which is predicated upon a vctp. This vctp will
combine the last two iterations using the vmov and vadd into a vector
which can then be consumed by a vaddv.
Once we have determined that it's safe to perform tail-predication,
we need to change this sequence of instructions so that the
predication doesn't produce incorrect code. This involves changing
the register allocation of the vadd so it updates itself and the
predication on the final iteration will not update the falsely
predicated lanes. This mimics what the vmov, vctp and vpsel do and
so we then don't need any of those instructions.
Differential Revision: https://reviews.llvm.org/D75533
It's pretty silly to diagnose on a scalar copy but the build does that:
loop variable 'SibReg' of type 'const llvm::Register' creates a copy from type 'const llvm::Register' [-Wrange-loop-analysis]
With an undef operand, it's possible for getVRegDef to fail and return
null. This is an edge case very little code bothered to
consider. Proper gMIR should use G_IMPLICIT_DEF instead.
I initially tried to apply this restriction to all SSA MIR, so then
getVRegDef would never fail anywhere. However, ProcessImplicitDefs
does technically run while the function is in SSA. ProcessImplicitDefs
and DetectDeadLanes would need to either move, or a new pseudo-SSA
type of function property would need to be introduced.
Basically a NFC, but allows subclasses access to the entire PeelingModuloScheduleExpander
class. We are doing this to allow backends, particularly one that are not necessarily
upstreamed, to inherit from PeelingModuloScheduleExpander and access its basic structures.
Renames Info into LoopInfo for consistency in PeelingModuloScheduleExpander.
Differential Revision: https://reviews.llvm.org/D82673
In RISC-V vector extension, users could group multiple vector registers
as one pseudo register. In mixed width operations, users could use
partial vector registers to reduce the register pressure. The parameter
to control register grouping and partial use is called LMUL. LMUL is a
part of the type. So, we have a bunch of vector types. In order to
support all these types, we need new MVT types in LLVM. In this patch, I
added several MVT types that are used in RISC-V vector implementation.
This is a standalone patch for MVT types without RISC-V related implementation.
Differential revision: https://reviews.llvm.org/D81724
This is a followup on D78403.
I'm unsure about `getAtomicOpAlign` overloads that take `AtomicRMWInst` and `AtomicCmpXchgInst`, shouldn't `getAlign` provide the correct answer already?
Differential Revision: https://reviews.llvm.org/D81369
Fix a warning in getNode() when extracting a subvector from a
concat vector. We can simply replace the call to getVectorNumElements
with getVectorMinNumElements as this follows the defined behaviour
for EXTRACT_SUBVECTOR.
Differential Revision: https://reviews.llvm.org/D82746
When trying to reduce a BUILD_VECTOR to a SHUFFLE_VECTOR it's
important that we carefully check the vector types that led to
that BUILD_VECTOR. In the test I have attached to this commit
there is a case where the results of two SVE faddv instructions
are being stored to consecutive memory locations. With my fix,
as part of merging those stores we discover that each BUILD_VECTOR
element came from an extract of a SVE vector element and
therefore bail out.
Differential Revision: https://reviews.llvm.org/D82564
If a constant is only allsignbits in the demanded/active bits, then sign extend it to an allsignbits bool pattern for OR/XOR ops.
This also requires SimplifyDemandedBits XOR handling to be modified to call ShrinkDemandedConstant on any (non-NOT) XOR pattern to account for non-splat cases.
Next step towards fixing PR45808 - with this patch we now get a <-1,-1,0,0> v4i64 constant instead of <1,1,0,0>.
Differential Revision: https://reviews.llvm.org/D82257
Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.
reduceBuildVecExtToExtBuildVec was breaking a splat(zext(x)) pattern into buildvector(x, 0, x, 0, ..) resulting in much more complex insert+shuffle codegen.
We already go to some lengths to avoid this in SimplifyDemandedVectorElts etc. when we encounter splat buildvectors.
It should be OK to fold all splat(aext(x)) patterns - we might need to tighten this if we find a case where we mustn't introduce a buildvector(x, undef, x, undef, ..) but I can't find one.
Fixes PR46461.
The translation of cmpxchg added by
9481399c0f specifically skipped weak
cmpxchg due to not understanding the meaning. Weak cmpxchg was added
in 420a216817. As explained in the
commit message, the weak mode is implicit in how
ATOMIC_CMP_SWAP_WITH_SUCCESS is lowered. If it's expanded to a regular
ATOMIC_CMP_SWAP, it's replaced with a strong cmpxchg.
This handling seems weird to me, but this was already following the
DAG behavior. I would expect the strong IR instruction to not have the
boolean output. Failing that, I might expect the IRTranslator to emit
ATOMIC_CMP_SWAP and a constant for the boolean.
This lowers intrinsic @llvm.get.active.lane.mask to a setcc node, i.e. an icmp
ule, and creates vectors for its 2 arguments on which the comparison is
performed.
Differential Revision: https://reviews.llvm.org/D82292
Summary:
The printer seems to intend to not print the trailing comma but has a
copy-paste error for the last value in the escape, and the parser
enforces having no trailing comma, but somehow a test was never included
to actually confirm it.
Reviewers: thegameg, arsenm
Reviewed By: thegameg, arsenm
Subscribers: wdng, arsenm, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82478
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value. If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.
This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.
Differential Revision: https://reviews.llvm.org/D80368
Implement them on top of sdiv/udiv, similar to what we do for integer
types.
Potential future work: implementing i8/i16 srem/urem, optimizations for
constant divisors, optimizing the mul+sub to mls.
Differential Revision: https://reviews.llvm.org/D81511
Summary:
This patch adds base support for code generating fixed length
vector operations targeting a known SVE vector length. To achieve
this we lower fixed length vector operations to equivalent scalable
vector operations, whereby SVE predication is used to limit the
elements processed to those present within the fixed length vector.
Specifically this patch implements load and store operations, which
get lowered to their masked counterparts thusly:
V = load(Addr) =>
V = extract_fixed_vector(masked_load(make_pred(V.NumElts), Addr))
store(V, (Addr)) =>
masked_store(insert_fixed_vector(V), make_pred(V.NumElts), Addr))
Reviewers: rengolin, efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80385
Summary:
- AssertAlign node records the guaranteed alignment on its source node,
where these alignments are retrieved from alignment attributes in LLVM
IR. These tracked alignments could help DAG combining and lowering
generating efficient code.
- In this patch, the basic support of AssertAlign node is added. So far,
we only generate AssertAlign nodes on return values from intrinsic
calls.
- Addressing selection in AMDGPU is revised accordingly to capture the
new (base + offset) patterns.
Reviewers: arsenm, bogner
Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81711
Following on from this RFC[0] from a while back, this is the first patch towards
implementing variadic debug values.
This patch specifically adds a set of functions to MachineInstr for performing
operations specific to debug values, and replacing uses of the more general
functions where appropriate. The most prevalent of these is replacing
getOperand(0) with getDebugOperand(0) for debug-value-specific code, as the
operands corresponding to values will no longer be at index 0, but index 2 and
upwards: getDebugOperand(x) == getOperand(x+2). Similar replacements have been
added for the other operands, along with some helper functions to replace
oft-repeated code and operate on a variable number of value operands.
[0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139376.html<Paste>
Differential Revision: https://reviews.llvm.org/D81852
We have many cases where we call SimplifyMultipleUseDemandedBits and demand specific vector elements, but all the bits from them - this adds a helper wrapper to handle this.
For little endian targets, if we only need the lowest element and none of the extended bits then we can just use the (bitcasted) source vector directly.
We already do this in SimplifyDemandedBits, this adds the SimplifyMultipleUseDemandedBits equivalent.
If a collection of interconnected phi nodes is only ever loaded, stored
or bitcast then we can convert the whole set to the bitcast type,
potentially helping to reduce the number of register moves needed as the
phi's are passed across basic block boundaries. This has to be done in
CodegenPrepare as it naturally straddles basic blocks.
The alorithm just looks from phi nodes, looking at uses and operands for
a collection of nodes that all together are bitcast between float and
integer types. We record visited phi nodes to not have to process them
more than once. The whole subgraph is then replaced with a new type.
Loads and Stores are bitcast to the correct type, which should then be
folded into the load/store, changing it's type.
This comes up in the biquad testcase due to the way MVE needs to keep
values in integer registers. I have also seen it come up from aarch64
partner example code, where a complicated set of sroa/inlining produced
integer phis, where float would have been a better choice.
I also added undef and extract element handling which increased the
potency in some cases.
This adds it with an option that defaults to off, and disabled for 32bit
X86 due to potential issues around canonicalizing NaNs.
Differential Revision: https://reviews.llvm.org/D81827
At the moment we use Global ISel by default at -O0, however it is
currently not capable of dealing with scalable vectors for two
reasons:
1. The register banks know nothing about SVE registers.
2. The LLT (Low Level Type) class knows nothing about scalable
vectors.
For now, the easiest way to avoid users hitting issues when using
the SVE ACLE is to fall back on normal DAG ISel when encountering
instructions that operate on scalable vector types.
I've added a couple of RUN lines to existing SVE tests to ensure
we can compile at -O0. I've also added some new tests to
CodeGen/AArch64/GlobalISel/arm64-fallback.ll
that demonstrate we correctly fallback to DAG ISel at -O0 when
lowering formal arguments or translating instructions that involve
scalable vector types.
Differential Revision: https://reviews.llvm.org/D81557
Without this fix, handleMoveUp can create an invalid live range like
this:
[98904e,98908r:0)[98908e,227504r:1)
where the two segments overlap, but only because we have lost the "e"
(early-clobber) on the end point of the first segment.
Differential Revision: https://reviews.llvm.org/D82110
For now I have changed SimplifyDemandedBits and it's various callers
to assume we know nothing for scalable vectors and to ignore the
demanded bits completely. I have also done something similar for
SimplifyDemandedVectorElts. These changes fix up lots of warnings
due to calls to EVT::getVectorNumElements() for types with scalable
vectors. These functions are all used for optimisations, rather than
functional requirements. In future we can revisit this code if
there is a need to improve code quality for SVE.
Differential Revision: https://reviews.llvm.org/D80537
When trying to calculate the number of sign bits for scalable vectors
we should just bail out for now and pretend we know nothing.
Differential Revision: https://reviews.llvm.org/D81093
Summary:
Extend StackLifetime with option to calculate liveliness
where alloca is only considered alive on basic block entry
if all non-dead predecessors had it alive at terminators.
Depends on D82043.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82124
This was passing in all the parameters needed to construct a
LegalizerHelper in the custom legalization, when it's simpler to just
pass in the existing helper.
This is slightly more annoying to use in the common case where you
don't need the legalizer helper, but we could add back the common
parameters back in addition to the helper.
I didn't propagate this to all the internal target changes that this
logically implies, but did update a sample one for
legalizeMinNumMaxNum.
This is in preparation for moving AMDGPU load/store legalization
entirely into custom lowering. The current set of legalization actions
is really constraining and not really capable of expressing all the
actions needed to legalize loads/stores. In particular there's no way
to express when the memory access itself needs to change size vs. the
result type. There's also a lot of redundancy since the same
split/widen actions need to be applied in both vector and scalar
cases. All of the sub-cases logically belong as steps in the legalizer
helper, but it will be easier to consider everything at once in custom
lowering.
This patch adds some missing information to the LF_BUILDINFO which allows for rebuilding an .OBJ without any external dependency but the .OBJ itself (other than the compiler executable).
Some tools need this information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO therefore stores a full path to the compiler, the PWD (which is the CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variable). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.
For more information see PR36198 and D43002.
Differential Revision: https://reviews.llvm.org/D80833
Summary:
Half-precision floating point arguments and returns are currently
promoted to either float or int32 in clang's CodeGen and there's
no existing support for the lowering of `half` arguments and returns
from IR in AArch32's backend.
Such frontend coercions, implemented as coercion through memory
in clang, can cause a series of issues in argument lowering, as causing
arguments to be stored on the wrong bits on big-endian architectures
and incurring in missing overflow detections in the return of certain
functions.
This patch introduces the handling of half-precision arguments and returns in
the backend using the actual "half" type on the IR. Using the "half"
type the backend is able to properly enforce the AAPCS' directions for
those arguments, making sure they are stored on the proper bits of the
registers and performing the necessary floating point convertions.
Reviewers: rjmccall, olista01, asl, efriedma, ostannard, SjoerdMeijer
Reviewed By: ostannard
Subscribers: stuij, hiraditya, dmgreen, llvm-commits, chill, dnsampaio, danielkiss, kristof.beyls, cfe-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75169
We're missing a plain English explanation of how this pass is supposed
to operate -- add one to the file comment.
Differential Revision: https://reviews.llvm.org/D80929
Added NextPowerOf2() routine to TypeSize and rewritten the code
in getVectorTypeBreakdown to avoid warnings being generated.
Differential Revision: https://reviews.llvm.org/D81578
Instead of asserting the number of elements is the same, we should be
comparing the element counts instead. In addition, when looking at
concats of extract_subvectors it's fine to use getVectorMinNumElements()
for scalable vectors.
I discovered these warnings when compiling the structured loads tests in
this file:
test/CodeGen/AArch64/sve-intrinsics-loads.ll
Differential Revision: https://reviews.llvm.org/D81936
Summary:
This invariant is being violated in the test case
https://reviews.llvm.org/D77849, related to the use of the relatively
new ability for callbr to have return values, and MachineBasicBlocks
with INLINEASM_BR terminators to emit live out register defs.
As noted in the comment, this triggers invariant violations in
MachineVerifier via `llc -verify-machineinstrs` or
`llc -verify-regalloc`, since only MachineInstrs that are terminators
are allowed to follow the first terminator.
https://reviews.llvm.org/D75098 may rework this very assertion if we're
spilling via a (proposed) TCOPY MachineInstr.
Reviewers: void, efriedma, arsenm
Reviewed By: efriedma
Subscribers: qcolombet, wdng, hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78166
When the zext gets promoted, it used to retain the original location,
which pessimizes the debugging experience causing an unexpected
jump in stepping at -Og.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46120 (which also
contains a full C repro).
Differential Revision: https://reviews.llvm.org/D81437
Summary:
Add a flag to omit the xray_fn_idx to cut size overhead and relocations
roughly in half at the cost of reduced performance for single function
patching. Minor additions to compiler-rt support per-function patching
without the index.
Reviewers: dberris, MaskRay, johnislarry
Subscribers: hiraditya, arphaman, cfe-commits, #sanitizers, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D81995
Summary:
This code is going to be used in StackSafety.
This patch is file move with minimal changes. Identifiers
will be fixed in the followup patch.
Reviewers: eugenis, pcc
Reviewed By: eugenis
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81831
It's possible to end up with a zext or something in the way of a G_CONSTANT,
even pre-legalization. This can happen with memsets.
e.g.
https://godbolt.org/z/Bjc8cw
To make sure we can catch these cases, use `getConstantVRegValWithLookThrough`
instead of `mi_match`.
Differential Revision: https://reviews.llvm.org/D81875
The promotion machinery in CGP moves instructions retaining
debug locations. When the transformation is local, this is mostly
correct, but when instructions are moved cross-BBs, this is not
always true and causes jumpiness in line tables. This is the first
of a series of commits. sext(s) and zext(s) need to be treated
similarly.
Differential Revision: https://reviews.llvm.org/D81879
This implements the following combines:
((0-A) + B) -> B-A
(A + (0-B)) -> A-B
Porting over the basic algebraic combines from the DAGCombiner. There are
several combines which fold adds away into subtracts. This is just the simplest
one.
I noticed that add combines are some of the most commonly hit across CTMark,
(via print statements when they fire), so I'm porting over some of the obvious
ones.
This gives some minor code size improvements on CTMark at -O3 on AArch64.
Differential Revision: https://reviews.llvm.org/D77453
Summary:
Teach MachineVerifier to check branches for MBB operands if they are not declared indirect.
Add `isBarrier`, `isIndirectBranch` to `G_BRINDIRECT` and `G_BRJT`.
Without these, `MachineInstr.isConditionalBranch()` was giving a
false-positive for those instructions.
Reviewers: aemerson, qcolombet, dsanders, arsenm
Reviewed By: dsanders
Subscribers: hiraditya, wdng, simoncook, s.egerton, arsenm, rovka, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81587