This will currently accept the old number of bytes syntax, and convert
it to a scalar. This should be removed in the near future (I think I
converted all of the tests already, but likely missed a few).
Not sure what the exact syntax and policy should be. We can continue
printing the number of bytes for non-generic instructions to avoid
test churn and only allow non-scalar types for generic instructions.
This will currently print the LLT in parentheses, but accept parsing
the existing integers and implicitly converting to scalar. The
parentheses are a bit ugly, but the parser logic seems unable to deal
without either parentheses or some keyword to indicate the start of a
type.
In various circumstances, when we clobber a register there may be
alternative locations that the value is live in. The classic example would
be a value loaded from the stack, and then clobbered: the value is still
available on the stack. InstrRefBasedLDV was coping with this at block
starts where it's forced to pick a location, however it wasn't searching
for alternative locations when values were clobbered.
This patch notifies the "Transfer Tracker" object when clobbers occur, and
it's able to find alternatives and issue DBG_VALUEs for that location. See:
the added test.
Differential Revision: https://reviews.llvm.org/D88405
When clamping the index for a memory access to a stacked vector we must
take into account the entire type being accessed, not just assume that
we are accessing only a single element.
Differential Revision: https://reviews.llvm.org/D105016
GlobalISel is relying on regular MachineMemOperands to track all of
the memory properties of accesses. Just the raw byte size is
insufficent to disambiguate all situations. For example, if we need to
split an unaligned extending load, we need to know the number of bits
in the original source value and can't infer it from the result
type. This is also a problem for extending vector loads.
This does decrease the maximum representable size from the full
uint64_t bytes to a maximum of 16-bits. No in tree testcases hit this,
other than places using UINT64_MAX for unknown sizes. This may be an
issue for G_MEMCPY and co., although they can just use unknown size
for large static sizes. This also has potential for backend abuse by
relying on the type when it really shouldn't be relevant after
selection.
This does not include the necessary MIR printer/parser changes to
represent this.
We were trying to expand these if they were going to be expanded
in op legalization so that we generated the minimum number of
operations. We failed to take into account that NVT could be
promoted to another legal type in op legalization.
Hoping this fixes the issue on the VE target reported as a follow
up to D96681. The check line changes were taken from before
1e46b6f401 so this patch does
appear to improve some cases that had previously regressed.
This patch reads machine value numbers from DBG_PHI instructions (marking
where SSA PHIs used to be), and matches them up with DBG_INSTR_REF
instructions that refer to them. Essentially they are two separate parts of
a DBG_VALUE: the place to read the value (register and program position),
and where the variable is assigned that value.
Sometimes these DBG_PHIs can be duplicated, usually by tail duplication.
This corresponds to the SSA structure of the program being destroyed, and
the original PHI being split. When this happens: run LLVMs standard
SSAUpdater utility, to work out what values should appear in which blocks.
The majority of this patch is boilerplate to make use of SSAUpdater.
If there are any additional PHIs on the path between multiple DBG_PHIs and
their using DBG_INSTR_REF, their existance is validated, just in case a
value gets clobbered along the way (see dbg-phis-with-loops.mir for
several examples).
Differential Revision: https://reviews.llvm.org/D86814
- Add standalone metadata parsing support so that machine metadata nodes
could be populated before and accessed during MIR is parsed.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D103282
Add UNIQUED and DISTINCT properties in Metadata.def and use them to
implement restrictions on the `distinct` property of MDNodes:
* DIExpression can currently be parsed from IR or read from bitcode
as `distinct`, but this property is silently dropped when printing
to IR. This causes accepted IR to fail to round-trip. As DIExpression
appears inline at each use in the canonical form of IR, it cannot
actually be `distinct` anyway, as there is no syntax to describe it.
* Similarly, DIArgList is conceptually always uniqued. It is currently
restricted to only appearing in contexts where there is no syntax for
`distinct`, but for consistency it is treated equivalently to
DIExpression in this patch.
* DICompileUnit is already restricted to always being `distinct`, but
along with adding general support for the inverse restriction I went
ahead and described this in Metadata.def and updated the parser to be
general. Future nodes which have this restriction can share this
support.
The new UNIQUED property applies to DIExpression and DIArgList, and
forbids them to be `distinct`. It also implies they are canonically
printed inline at each use, rather than via MDNode ID.
The new DISTINCT property applies to DICompileUnit, and requires it to
be `distinct`.
A potential alternative change is to forbid the non-inline syntax for
DIExpression entirely, as is done with DIArgList implicitly by requiring
it appear in the context of a function. For example, we would forbid:
!named = !{!0}
!0 = !DIExpression()
Instead we would only accept the equivalent inlined version:
!named = !{!DIExpression()}
This essentially removes the ability to create a `distinct` DIExpression
by construction, as there is no syntax for `distinct` inline. If this
patch is accepted as-is, the result would be that the non-canonical
version is accepted, but the following would be an error and produce a diagnostic:
!named = !{!0}
; error: 'distinct' not allowed for !DIExpression()
!0 = distinct !DIExpression()
Also update some documentation to consistently use the inline syntax for
DIExpression, and to describe the restrictions on `distinct` for nodes
where applicable.
Reviewed By: StephenTozer, t-tye
Differential Revision: https://reviews.llvm.org/D104827
This intrinsic blocks floating point transformations by the optimizer.
Author: Pengfei
Reviewed By: LuoYuanke, Andy Kaylor, Craig Topper, kpn
Differential Revision: https://reviews.llvm.org/D99675
This patch relands https://reviews.llvm.org/D104454, but fixes some failing
builds on Mac OS which apparently has a different definition for size_t,
that caused 'ambiguous operator overload' for the implicit conversion
of TypeSize to a scalar value.
This reverts commit b732e6c9a8.
Peephole optimizer should not be introducing sub-reg definitions
as they are illegal in machine SSA phase. This patch modifies
the optimizer to not emit sub-register definitions.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D103408
Adds legalizer, register bank select, and instruction
select support for G_SBFX and G_UBFX. These opcodes generate
scalar or vector ALU bitfield extract instructions for
AMDGPU. The instructions allow both constant or register
values for the offset and width operands.
The 32-bit scalar version is expanded to a sequence that
combines the offset and width into a single register.
There are no 64-bit vgpr bitfield extract instructions, so the
operations are expanded to a sequence of instructions that
implement the operation. If the width is a constant,
then the 32-bit bitfield extract instructions are used.
Moved the AArch64 specific code for creating G_SBFX to
CombinerHelper.cpp so that it can be used by other targets.
Only bitfield extracts with constant offset and width values
are handled currently.
Differential Revision: https://reviews.llvm.org/D100149
A combination of features ^ that lead to a mismatch of expectations
about how a subprogram definition DIE would be produced with/without a
declaration when taking full -g debug info and inlining it into a -gmlt
CU - specifically when using Split DWARF that doesn't support cross-CU
references, so we have to put the -g debug info into the -gmlt CU, which
gets confusing about which mode is respected.
This patch comes down on respecting the CU the debug info is emitted
into, rather than preserving the full debug info when it's emitted into
the gmlt CU.
This ports the AArch64 SABD and USBD over to DAG Combine, where they can be
used by more backends (notably MVE in a follow-up patch). The matching code
has changed very little, just to handle legal operations and types
differently. It selects from (ABS (SUB (EXTEND a), (EXTEND b))), producing
a ubds/abdu which is zexted to the original type.
Differential Revision: https://reviews.llvm.org/D91937
This add as a fold of sub(0, splat(sub(0, x))) -> splat(x). This can
come up in the lowering of right shifts under AArch64, where we generate
a shift left of a negated number.
Differential Revision: https://reviews.llvm.org/D103755
This change is NFC upstream. We pass in the loop's block to the kernel
rewriter explicitly, instead of assuming it's the loop's top block. This
change is made for downstream targets where this assumption doesn't hold.
Differential Revision: https://reviews.llvm.org/D104811
To reflect that the size may be scalable, a TypeSize is returned
instead of an unsigned. In places where the result is used,
it currently relies on an implicit cast of TypeSize -> uint64_t,
which asserts that the type is not scalable.
This patch is NFC for fixed-width vectors.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D104454
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
We don't constant fold based on demanded bits elsewhere in
SimplifyDemandedBits, so I don't think we should shrink them either.
The affected ARM test changes because a constant become non-opaque
and eventually enabled some constant folding. This no longer happens.
I checked and InstCombine is able to simplify this test. I'm not sure exactly
what it was trying to test.
Reviewed By: lebedev.ri, dmgreen
Differential Revision: https://reviews.llvm.org/D104832
This also adds new interfaces for the fixed- and scalable case:
* LLT::fixed_vector
* LLT::scalable_vector
The strategy for migrating to the new interfaces was as follows:
* If the new LLT is a (modified) clone of another LLT, taking the
same number of elements, then use LLT::vector(OtherTy.getElementCount())
or if the number of elements is halfed/doubled, it uses .divideCoefficientBy(2)
or operator*. That is because there is no reason to specifically restrict
the types to 'fixed_vector'.
* If the algorithm works on the number of elements (as unsigned), then
just use fixed_vector. This will need to be fixed up in the future when
modifying the algorithm to also work for scalable vectors, and will need
then need additional tests to confirm the behaviour works the same for
scalable vectors.
* If the test used the '/*Scalable=*/true` flag of LLT::vector, then
this is replaced by LLT::scalable_vector.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D104451
This is a partial reapply of the original commit and the followup commit
that were previously reverted; this reapply also includes a small fix
for a potential source of non-determinism, but also has a small change
to turn off variadic debug value salvaging, to ensure that any future
revert/reapply steps to disable and renable this feature do not risk
causing conflicts.
Differential Revision: https://reviews.llvm.org/D91722
This reverts commit 386b66b2fc.
Having type symmetry with these is somewhat necessary when implementing support for 192-bit values.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D104621
Stats added:
1. NumCleanupLandingPadsUnreachable: how many cleanup landing pads were optimized as unreachable
1. NumCleanupLandingPadsRemaining: how many cleanup landing pads remain
1. NumNoUnwind: Number of functions with nounwind attribute
1. NumUnwind: Number of functions with unwind attribute
DwarfEHPrepare is always run a single time as part of `TargetPassConfig::addISelPasses()` which makes it an ideal place near the end of the pipeline to record this information.
Example output from clang built with exceptions cumulative during thinLTO backend (NumCleanupLandingPadsUnreachable was not incremented):
"dwarfehprepare.NumCleanupLandingPadsRemaining": 123660,
"dwarfehprepare.NumNoUnwind": 323836,
"dwarfehprepare.NumUnwind": 472893,
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104161
This optimization pre-promotes the input and constants for a
switch instruction to a legal type so that all the generated compares
share the same extend. Since RISCV prefers sext for i32 to i64
extends, we should honor that to use sext.w instead of a pair
of shifts.
Reviewed By: jrtc27
Differential Revision: https://reviews.llvm.org/D104612
When inserting UnregisterFn, if there is a musttail call, we must insert before the call so that we don't break the musttail call contract.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104807
When inserting UnregisterFn, if there is a musttail call, we must insert before the call so that we don't break the musttail call contract.
Differential Revision: https://reviews.llvm.org/D104807
The is from discussion in https://reviews.llvm.org/D104247#inline-993387
The contract and reassoc flags shouldn't imply each other .
All the aggressive fsub fusion reassociate operations,
we should guard them with reassoc flag check.
Reviewed By: mcberg2017
Differential Revision: https://reviews.llvm.org/D104723
Summary:
generate eh_info when vector registers are saved according to the traceback table.
struct eh_info_t {
unsigned version; /* EH info version 0 */
#if defined(64BIT)
char _pad[4]; /* padding */
#endif
unsigned long lsda; /* Pointer to Language Specific Data Area */
unsigned long personality; /* Pointer to the personality routine */
};
the value of lsda and personality is zero when the number of vector registers saved is large zero and there is not personality of the function
Reviewers: Jason Liu
Differential Revision: https://reviews.llvm.org/D103651
This patch aims to add the scalable property to LLT. The rest of the
patch-series changes the interfaces to take/return ElementCount and
TypeSize, which both have the ability to represent the scalable property.
The changes are mostly mechanical and aim to be non-functional changes
for fixed-width vectors.
For scalable vectors some unit tests have been added, but no effort has
been put into making any of the GlobalISel algorithms work with scalable
vectors yet. That will be left as future work.
The work is split into a series of 5 patches to make reviews easier.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D104450
Since this method can apply to cmpxchg operations, make sure it's clear
what value we're actually retrieving. This will help ensure we don't
accidentally ignore the failure ordering of cmpxchg in the future.
We could potentially introduce a getOrdering() method on AtomicSDNode
that asserts the operation isn't cmpxchg, but not sure that's
worthwhile.
Differential Revision: https://reviews.llvm.org/D103338
According to IR LangRef, the FMF flag:
contract
Allow floating-point contraction (e.g. fusing a multiply followed by an
addition into a fused multiply-and-add).
reassoc
Allow reassociation transformations for floating-point instructions.
This may dramatically change results in floating-point.
My understanding is that these two flags shouldn't imply each other,
as we might have a SDNode that can be reassociated with others, but
not contractble.
eg: We may want following fmul/fad/fsub to freely reassoc, but don't
want fma being generated here.
%F = fmul reassoc double %A, %B ; <double> [#uses=1]
%G = fmul reassoc double %C, %D ; <double> [#uses=1]
%H = fadd reassoc double %F, %G ; <double> [#uses=1]
%I = fsub reassoc double %H, %E ; <double> [#uses=1]
Before https://reviews.llvm.org/D45710, `reassoc` flag actually
did not imply isContratable either.
The current implementation also only check the flag in fadd node,
ignoring fmul node, this patch update that as well.
Reviewed By: spatel, qiucf
Differential Revision: https://reviews.llvm.org/D104247
Fixes a minor bug when trying to iterate through use operands when
updating debug use operands.
Extends a test to include above.
Differential Revision: https://reviews.llvm.org/D104576
TypePromotion is meant to be a generic pass and doesn't reference
any ARM intrinsics so it shouldn't include IntrinsicsARM.h.
The other Intrinsic related headers appear to be unneeded as well.
- Distinct metadata needs generating in the codegen to attach correct
AAInfo on the loads/stores after lowering, merging, and other relevant
transformations.
- This patch adds 'MachhineModuleSlotTracker' to help assign slot
numbers to these newly generated unnamed metadata nodes.
- To help 'MachhineModuleSlotTracker' track machine metadata, the
original 'SlotTracker' is rebased from 'AbstractSlotTrackerStorage',
which provides basic interfaces to create/retrive metadata slots. In
addition, once LLVM IR is processsed, additional hooks are also
introduced to help collect machine metadata and assign them slot
numbers.
- Finally, if there is any such machine metadata, 'MIRPrinter' outputs
an additional 'machineMetadataNodes' field containing all the
definition of those nodes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D103205
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.
I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104477
The Interleave Access pass will convert shuffle(binop(load, load)) to
binop(shuffle(load), shuffle(load)), in order to create more
interleaving load patterns (VLD2/3/4) that might have been messed up by
instcombine. As shown in D104247 we were missing copying IR flags to the
new instruction though, which should just be kept the same as the
original instruction.
Differential Revision: https://reviews.llvm.org/D104255
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.
The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.
One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.
Differential Revision: https://reviews.llvm.org/D99439
This only applies to FastIsel. GlobalIsel seems to sidestep
the issue.
This fixes https://bugs.llvm.org/show_bug.cgi?id=46996
One of the things we do in llvm is decide if a type needs
consecutive registers. Previously, we just checked if it
was an array or not.
(plus an SVE specific check that is not changing here)
This causes some confusion when you arbitrary IR like:
```
%T1 = type { double, i1 };
define [ 1 x %T1 ] @foo() {
entry:
ret [ 1 x %T1 ] zeroinitializer
}
```
We see it is an array so we call CC_AArch64_Custom_Block
which bails out when it sees the i1, a type we don't want
to put into a block.
This leaves the location of the double in some kind of
intermediate state and leads to odd codegen. Which then crashes
the backend because it doesn't know how to implement
what it's been asked for.
You get this:
```
renamable $d0 = FMOVD0
$w0 = COPY killed renamable $d0
```
Rather than this:
```
$d0 = FMOVD0
$w0 = COPY $wzr
```
The backend knows how to copy 64 bit to 64 bit registers,
but not 64 to 32. It can certainly be taught how but the real
issue seems to be us even trying to assign a register block
in the first place.
This change makes the logic of
AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters
a bit more in depth. If we find an array, also check that all the
nested aggregates in that array have a single member type.
Then CC_AArch64_Custom_Block's assumption of a type that looks
like [ N x type ] will be valid and we get the expected codegen.
New tests have been added to exercise these situations. Note that
some of the output is not ABI compliant. The aim of this change is
to simply handle these situations and not to make our processing
of arbitrary IR ABI compliant.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D104123
We create flag variable "__llvm_fs_discriminator__" in the binary
to indicate that FSAFDO hierarchical discriminators are used.
This variable might be GC'ed by the linker since it is not explicitly
reference. I initially added the var to the use list in pass
MIRFSDiscriminator but it did not work. It turned out the used global
list is collected in lowering (before MIR pass) and then emitted in
the end of pass pipeline.
Here I add the variable to the use list in IR level's AddDiscriminators
pass. The machine level code is still keep in the case IR's
AddDiscriminators is not invoked. If this is the case, this just use
-Wl,--export-dynamic-symbol=__llvm_fs_discriminator__
to force the emit.
Differential Revision: https://reviews.llvm.org/D103988
We create flag variable "__llvm_fs_discriminator__" in the binary
to indicate that FSAFDO hierarchical discriminators are used.
This variable might be GC'ed by the linker since it is not explicitly
reference. I initially added the var to the use list in pass
MIRFSDiscriminator but it did not work. It turned out the used global
list is collected in lowering (before MIR pass) and then emitted in
the end of pass pipeline.
In this patch, we use a "common" linkage for this variable so that
it will be GC'ed by the linker.
Differential Revision: https://reviews.llvm.org/D103988
Iff we have `SCALAR_TO_VECTOR` (and we demand it's only defined 0'th element),
and said scalar was produced by `EXTRACT_VECTOR_ELT` from the 0'th element
of some vector, then we can just continue traversal into said source vector.
This comes up in X86 vector uniform shift lowering.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D104250
6e5628354e regressed the Windows build as
the return type no longer matched in both branches for the return value
type deduction. This uses a bit more compiler magic to deal with that.
The sorting, obviously, must be stable, else we will have random assembly fluctuations.
Apparently there was no test coverage that would benefit from that,
so i've added one test.
The sorting consists of two parts - just sort the input vectors,
and recompute the shuffle mask -> input vector mapping.
I don't believe we need to do anything else.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D104187
Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary.
This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288
(as mentioned by @fhahn), after committing D91250.
Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99173
Register allocation may spill virtual registers to the stack, which can
increase alignment requirements of the stack frame. If the the function
did not require stack realignment before register allocation, the
registers required to do so may not be reserved/available. This results
in a stack frame that requires realignment but can not be realigned.
Instead, only increase the alignment of the stack if we are still able
to realign.
The register SpillAlignment will be ignored if we can't realign, and the
backend will be responsible for emitting the correct unaligned loads and
stores. This seems to be the assumed behaviour already, e.g.
ARMBaseInstrInfo::storeRegToStackSlot and X86InstrInfo::storeRegToStackSlot
are both `canRealignStack` aware.
Differential Revision: https://reviews.llvm.org/D103602
<string> is currently the highest impact header in a clang+llvm build:
https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html
One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.
This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.
Differential Revision: https://reviews.llvm.org/D103888
When reducing vector builds to shuffles it possible that
the DAG combiner may try to extract invalid subvectors.
This happens as the existing code assumes vectors will be power
of 2 sizes, which is already untrue, but becomes more noticable
with v6 and v7 types.
Specifically the existing code assumes that half PowerOf2Ceil of
a given vector index will fit twice into a given vector.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D103880
-Wframe-larger-than= is an interesting warning; we can't know the frame
size until PrologueEpilogueInsertion (PEI); very late in the compilation
pipeline.
-Wframe-larger-than= was propagated through CC1 as an -mllvm flag, then
was a cl::opt in LLVM's PEI pass; this meant it was dropped during LTO
and needed to be re-specified via -plugin-opt.
Instead, make it part of the IR proper as a module level attribute,
similar to D103048. Introduce -fwarn-stack-size CC1 option.
Reviewed By: rsmith, qcolombet
Differential Revision: https://reviews.llvm.org/D103928
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D95425
We will need to set the ssp canary bit in traceback table to communicate
with unwinder about the canary.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D103202
As shown in:
https://llvm.org/PR50623
...and the similar tests here, we were not accounting for
store merging of different sizes that do not cover the
entire range of the wide value to be stored.
This is the easy fix: just make sure that all of the
original stores are the same size, so when we calculate
the wide width, it's a simple N * M check.
This still allows all of the motivating optimizations from:
D86420 / 54a5dd485c
D87112 / 7a06b166b1
We could enhance this code to track individual bytes and
allow merging multiple sizes.
This patch changes RVV's policy for its supported list of fixed-length
vector types by capping by vector size rather than element count. Now
all 1024-byte vectors (of supported element types) are supported, rather
than all 256-element vectors.
This is a more natural fit for the architecture, and allows us to, for
example, improve the support for vector bitcasts.
This change necessitated the adding of some new simple types to avoid
"regressing" on the number of currently-supported vectors. We round out
the 1024-byte types by adding `v512i8`, `v1024i8`, `v512i16` and
`v512f16`.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D103884
G_INSERT legalization is incomplete and doesn't work very
well. Instead try to use sequences of G_MERGE_VALUES/G_UNMERGE_VALUES
padding with undef values (although this can get pretty large).
For the case of load/store narrowing, this is still performing the
load/stores in irregularly sized pieces. It might be cleaner to split
this down into equal sized pieces, and rely on load/store merging to
optimize it.
When narrowing G_ADD and G_SUB, handle types that aren't a multiple of
the type we're narrowing to. This allows us to handle types like s96
on 64 bit targets.
Note that the test here has a couple of dead instructions because of
the way the setup legalizes. I wasn't able to come up with a way to
write this test that avoids that easily.
Differential Revision: https://reviews.llvm.org/D97811
When narrowing G_INSERT, handle types that aren't a multiple of the
type we're narrowing to. This comes up if we're narrowing something
like an s96 to fit in 64 bit registers and also for non-byte multiple
packed types if they come up.
This implementation handles these cases by extending the extra bits to
the narrow size and truncating the result back to the destination
size.
Differential Revision: https://reviews.llvm.org/D97791
shuffle(concat(x,undef),concat(y,undef)) -> concat(shuffle(x,y),shuffle(x,y))
If the original shuffle references any of the upper (undef) subvector elements, ensure the split shuffle masks uses undef instead of an out-of-bounds value.
Fixes PR50609
> This reapplies c0f3dfb9, which was reverted following the discovery of
> crashes on linux kernel and chromium builds - these issues have since
> been fixed, allowing this patch to re-land.
This reverts commit 36ec97f76a.
The change caused non-determinism in the compiler, see comments on the code
review at https://reviews.llvm.org/D91722.
Reverting to unbreak people's builds until that can be addressed.
This also reverts the follow-up "[DebugInfo] Limit the number of values
that may be referenced by a dbg.value" in
a0bd6105d8.
Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector
type cannot be legalized by widening/splitting. When this is the method of legalization
found, getTypeLegalizationCost will return an Invalid cost.
The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call
getTypeLegalizationCost and will now also return an Invalid cost for unsupported types.
Reviewed By: sdesmalen, david-arm
Differential Revision: https://reviews.llvm.org/D102515
This sets the AllowTruncation flag on isConstOrConstSplat in
isNullOrNullSplat, allowing it to see truncated constant zeroes on
architectures such as AArch64, where only a i32.i64 are legal. As a
truncation of 0 is always 0, this should always be valid, allowing some
extra folding to happen including some of the cases from D103755.
Differential Revision: https://reviews.llvm.org/D103756
Needs to be discussed more.
This reverts commit 255a5c1baa6020c009934b4fa342f9f6dbbcc46
This reverts commit df2056ff3730316f376f29d9986c9913b95ceb1
This reverts commit faff79b7ca144e505da6bc74aa2b2f7cffbbf23
This reverts commit d2a9020785c6e02afebc876aa2778fa64c5cafd
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.
Differential Revision: https://reviews.llvm.org/D103759