We have several places where we need to extract the raw bits data from a BUILD_VECTOR node, so consolidate this to a single helper function that handles Undefs and Integer/FP constants, including implicit truncation.
This should make it easier to extend D113202 to handle more constant folding of bitcasted constant data.
Differential Revision: https://reviews.llvm.org/D113351
VE integrated asm has been the default in Clang. Also use the default setting for integrated asm in the backend.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D113384
We already have patterns for fptosi and fptoui plus fmul to fixed point
convert, this adds equivalent patterns for fptosi.sat and fptoui.sat,
which should apply equally well for the legal saturating variants.
Differential Revision: https://reviews.llvm.org/D113199
At the moment in LoopVectorizationCostModel::selectEpilogueVectorizationFactor
we bail out if the main vector loop uses a scalable VF. This patch adds
support for generating epilogue vector loops using a fixed-width VF when the
main vector loop uses a scalable VF.
I've changed LoopVectorizationCostModel::selectEpilogueVectorizationFactor
so that we convert the scalable VF into a fixed-width VF and do profitability
checks on that instead. In addition, since the scalable and fixed-width VFs
live in different VPlans that means I had to change the calls to
LVP.hasPlanWithVFs so that we only pass in the fixed-width VF.
New tests added here:
Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll
Differential Revision: https://reviews.llvm.org/D109432
Performing the rearrangement for add/sub and mul instructions to match the madd/msub pattern
Reviewed By: dmgreen, sdesmalen, david-arm
Differential Revision: https://reviews.llvm.org/D111862
Add comments to explain why XXPERMDIs and XXPERMDI have different input register
classes, vsfrc for XXPERMDIs and vsrc for XXPERMDI.
This addresses the comments in abandoned patch D113178, we keep using `f0` instead
of using `vs0` for XXPERMDIs on purpose.
CSKY is a ARCH which supports mixture of 16-bit and 32-bit instructions natively,
and there is not an indivual predictor or feature to enable/disable 16-bit instruction.
So I think it's better to add 16-bit instruction early, and naturally to use 16-bit and 32-bit instructions.
Differential Revision: https://reviews.llvm.org/D112919
For v8i16 shuffle patterns that are lowered with AND+PACKUS, check to see if the sources are from a 256-bit vector and perform the masking using BLENDW at the 256-bit level.
With the test changes we can see more examples of duplicate XMM/YMM zero vectors (PR26018) :(
For some optimizations on comparisons it's necessary that the
union/intersect is exact and not a superset. Add methods that
return Optional<ConstantRange> only if the result is exact.
For the sake of simplicity this is implemented by comparing
the subset and superset approximations for now, but it should be
possible to do this more directly, as unionWith() and intersectWith()
already distinguish the cases where the result is imprecise for the
preferred range type functionality.
When accumulating the GEP offset in BasicAA, we should use the
pointer index size rather than the pointer size.
Differential Revision: https://reviews.llvm.org/D112370
Be more consistent in the naming convention for the various RET instructions to specify in terms of bitwidth.
Helps prevent future scheduler model mismatches like those that were only addressed in D44687.
Differential Revision: https://reviews.llvm.org/D113302
Currently FoldConstantArithmetic only handles binops, so replacing other uses of FoldConstantVectorArithmetic (in particular for SETCC nodes), still require more work.
Add a variant of getEquivalentICmp() that produces an optional
offset. This allows us to create an equivalent icmp for all ranges.
Use this in the with.overflow folding code, which was doing this
adjustment separately -- this clarifies that the fold will indeed
always apply.
As described in https://bugs.llvm.org/show_bug.cgi?id=52429 this
fold is incorrect, because inbounds only guarantees that the
pointers don't wrap in the unsigned space: It is possible that
the sign boundary is crossed by an object.
I'm dropping the fold entirely rather than adjusting it, because
computePointerICmp() fully subsumes it (just with correct predicate
handling).
Differential Revision: https://reviews.llvm.org/D113343
The public API for this functionality is forgetValue(). There was
only one call from LoopVectorize, which was directly next to a
forgetValue() call and as such redundant.
This finally creates proper test coverage for replication shuffles,
that are used by LV for conditional loads, and will allow to add
proper costmodel at least for AVX512.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D113324
Hiding it in `getInterleavedMemoryOpCost()` is problematic for a number of reasons,
including testability and reuse, let's do better.
In a followup `getUserCost()` will be taught to use to to estimate the mask costs,
which will allow for better cost model tests for it.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D113313
umax(X, Op1) - Op1 --> usub.sat(X, Op1)
https://alive2.llvm.org/ce/z/HpcGiJ
This happens in 2 or more steps with an icmp-select idiom
instead of an intrinsic. This is another step towards
canonicalization of the min/max intrinsics. See:
D98152
Matching a recent clang change I've made, now 'int[3]' is formatted
without the space between the type and array bound. This commit updates
libDebugInfoDWARF/llvm-dwarfdump to match that formatting.
This patch abstracts VWholeLoad* classes into VWholeLoadN, simplifies
existing code as well as fixes a typo.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D109319
This patch refactors classes for load/store of V extension by:
- Introduce new class for VUnitStrideLoadFF and VUnitStrideSegmentLoadFF
so that uses of L/SUMOP* are not spread around different places.
- Reorder classes for Unit-Stride load/store in line with table
describing lumop/sumop in riscv-v-spec.pdf.
Reviewed By: HsiangKai, craig.topper
Differential Revision: https://reviews.llvm.org/D109318
Actually we can, for now, remove the explicit "operator" handling
entirely - since clang currently won't try to flag any of these as
rebuildable. That seems like a reasonable state for now, but it could be
narrowed down to only apply to conversion operators, most likely - but
would need more nuance for op> and op>> since they would be incorrectly
flagged as already having their template arguments (due to the trailing
'>').
The basic idea here is that given a zero extended narrow IV, we can prove the inner IV to be NUW if we can prove there's a value the inner IV must take before overflow which must exit the loop.
Differential Revision: https://reviews.llvm.org/D109457
These `M` parameters shadow the `M` member in `VerifierSupport`, and
both always refer to the same module. Eliminate the redundant parameters
and always use the member.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D106474
In TwoAddressInstructionPass::processTiedPairs with
-early-live-intervals, update any preexisting physreg live intervals,
as well as virtreg live intervals. By default (without
-precompute-phys-liveness) physreg live intervals only exist for
registers that are live-in to some basic block.
Differential Revision: https://reviews.llvm.org/D113191
When we have an actual shuffle, we can impose the additional restriction
that the mask replicates the elements of the first operand, so we know
the replication factor as a ratio of output and op0 vector sizes.
This fixes lld/COFF/pdb-natvis.test (which only is run on Windows)
when using paths with forward slashes on Windows.
Differential Revision: https://reviews.llvm.org/D113265
This is the 'or' sibling for the fold added with:
D113212
https://alive2.llvm.org/ce/z/tgnp7K
Note that neither of these transforms is poison-safe,
but it does not seem to matter at this level. We have
had the scalar version of D113212 for a long time, so
this is just making optimizer behavior consistent.
We do not have the scalar version of *this* fold,
however, so that is another follow-up.
This is to revert commit f95bd18b5f (Revert "[Attr] support
btf_type_tag attribute") plus a bug fix.
Previous change failed to handle cases like below:
$ cat reduced.c
void a(*);
void a() {}
$ clang -c reduced.c -O2 -g
In such cases, during clang IR generation, for function a(),
CGCodeGen has numParams = 1 for FunctionType. But for
FunctionTypeLoc we have FuncTypeLoc.NumParams = 0. By using
FunctionType.numParams as the bound to access FuncTypeLoc
params, a random crash is triggered. The bug fix is to
check against FuncTypeLoc.NumParams before accessing
FuncTypeLoc.getParam(Idx).
Differential Revision: https://reviews.llvm.org/D111199
Try simplify G_UADDO with 8 or 16 bit operands to wide G_ADD and TBNZ if
result is only used in the no-overflow case. It is restricted to cases
where we know that the high-bits of the operands are 0. If there's an
overflow, then the the 9th or 17th bit must be set, which can be checked
using TBNZ.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D111888
Having a NoOpLoopNestPass can ensure that only outermost loop is invoked
for a LoopNestPass with a lit test.
There are some existing passes that are implemented as LoopNestPass, but
they are still using LOOP_PASS macro.
It would be easier to identify LoopNestPasses with a LOOPNEST_PASS
macro.
Differential Revision: https://reviews.llvm.org/D113185
There is a combine in instcombine to transform a saturated add/sub into
a saddsat/ssubsat, currently handling inputs which are both sign
extended (https://alive2.llvm.org/ce/z/68qpTn). This can generalize to,
for example ashr of at least the bitwidth (https://alive2.llvm.org/ce/z/4TFyX-
and https://alive2.llvm.org/ce/z/qDWzFs for example). Which means it
generalizes further to "the number of sign bits", needing to be enough
to truncate to the size of the saturate. (An example using `or` for
instance: https://alive2.llvm.org/ce/z/EI_h_A).
So this patch makes use of ComputeNumSignBits (with the newly added
ComputeMinSignedBits) in matchSAddSubSat to generalize the fold to any
inputs with enough sign bits known, truncating the inputs to the new
size of the saturate.
Differential Revision: https://reviews.llvm.org/D112298
This introduces a new ComputeMinSignedBits method for ValueTracking that
returns the BitWidth - SignBits + 1 from ComputeSignBits, and represents
the minimum bit size for the value as a signed integer. Similar to the
existing APInt::getMinSignedBits method, this can make some of the
reasoning around ComputeSignBits more natural.
See https://reviews.llvm.org/D112298
Another minor step towards merging FoldConstantVectorArithmetic into FoldConstantArithmetic.
We don't use SDNodeFlags in any constant folding inside DAG, so passing the Flags argument is a waste of time - an alternative would be to wire up FoldConstantArithmetic to take SDNodeFlags just-in-case we someday start using it, but we don't have any way to test it and I'd prefer to avoid dead code.
Differential Revision: https://reviews.llvm.org/D113276
Even though AVX512's masked mem ops (unlike AVX1/2) have a mask
that is a `VF x i1`, replication of said masks happens after
promotion of it to `VF x i8`, so we should use `i8`, not `i1`,
when calculating the cost of mask replication.
(X s< 0) ? Y : 0 --> (X s>> BW-1) & Y
We canonicalize to the icmp+select form in IR, and we already have this fold
for scalar select in SDAG, so I think it's an oversight that we don't have
the fold for vectors. It seems neutral for AArch64 and saves some instructions
on x86.
Whether we should also have the sibling folds for the inverse condition or
all-ones true value may depend on target-specific factors such as whether
there's an "and-not" instruction.
Differential Revision: https://reviews.llvm.org/D113212
Avid readers of this saga may recall from previous installments,
that replication mask replicates (lol) each of the `VF` elements
in a vector `ReplicationFactor` times. For example, the mask for
`ReplicationFactor=3` and `VF=4` is: `<0,0,0,1,1,1,2,2,2,3,3,3>`.
More importantly, replication mask is used by LoopVectorizer
when using masked interleaved memory operations.
As discussed in previous installments, while it is used by LV,
and we **seem** to support masked interleaved memory operations on X86,
it's support in cost model leaves a lot to be desired:
until basically yesterday even for AVX512 we had no cost model for it.
As it has been witnessed in the recent
AVX2 `X86TTIImpl::getInterleavedMemoryOpCost()`
costmodel patches, while it is hard-enough to query the cost
of a particular assembly sequence [from llvm-mca],
afterwards the check lines LV costmodel tests must be updated manually.
This is, at the very least, boring.
Okay, now we have decent costmodel coverage for interleaving shuffles,
but now basically the same mind-killing sequence has to be performed
for replication mask. I think we can improve at least the second half
of the problem, by teaching
the `TargetTransformInfoImplCRTPBase::getUserCost()` to recognize
`Instruction::ShuffleVector` that are repetition masks,
adding exhaustive test coverage
using `-cost-model -analyze` + `utils/update_analyze_test_checks.py`
This way we can have good exhaustive coverage for cost model,
and only basic coverage for the LV costmodel.
This patch adds precise undef-aware `isReplicationMask()`,
with exhaustive test coverage.
* `InstructionsTest.ShuffleMaskIsReplicationMask` shows that
it correctly detects all the known masks.
* `InstructionsTest.ShuffleMaskIsReplicationMask_undef`
shows that replacing some mask elements in a known replication mask
still allows us to recognize it as a replication mask.
Note, with enough undef elts, we may detect a different tuple.
* `InstructionsTest.ShuffleMaskIsReplicationMask_Exhaustive_Correctness`
shows that if we detected the replication mask with given params,
then if we actually generate a true replication mask with said params,
it matches element-wise ignoring undef mask elements.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D113214
When created a UUNPKLO/HI node with an undef input then the
output should also be undef. I've added a target DAG combine
function to ensure we avoid creating an unnecessary uunpklo/hi
instruction.
Differential Revision: https://reviews.llvm.org/D113266
[NFC] This patch fixes URLs containing "master". Old URLs were either broken or
redirecting to the new URL.
Reviewed By: #libc, ldionne, mehdi_amini
Differential Revision: https://reviews.llvm.org/D113186
NumOps represents the number of elements for vector constant folding, rename this NumElts so in future we can the consistently use NumOps to represent the number of operands of the opcode.
Minor cleanup before trying to begin generalizing FoldConstantArithmetic to support opcodes other than binops.
A pattern has selected wrong uaddlv MI. It should be as below.
uaddv(uaddlp(v8i8)) ==> uaddlv(v8i8)
Differential Revision: https://reviews.llvm.org/D113263
This symbol is defined in libc.so so it is definitely not DSO-Local.
Marking it as such causes problems on some platforms (such as PowerPC).
Differential revision: https://reviews.llvm.org/D109090
To constant fold bitwise logic ops where we've legalized constant build vectors to a different type (e.g. v2i64 -> v4i32), this patch adds a basic ability to peek through the bitcasts and perform the constant fold on the inner operands.
The MVE predicate v2i64 regressions will be addressed by future support for basic v2i64 type support.
One of the yak shaving fixes for D113192....
Differential Revision: https://reviews.llvm.org/D113202
In TwoAddressInstructionPass::processTiedPairs with
-early-live-intervals, update any preexisting physreg live intervals,
as well as virtreg live intervals. By default (without
-precompute-phys-liveness) physreg live intervals only exist for
registers that are live-in to some basic block.
Differential Revision: https://reviews.llvm.org/D113191
ppc_fp128 and fp128 are both 128-bit floating point types. However, we
can't do conversion between them now, since trunc/ext are not allowed
for same-size fp types.
This patch adds two new intrinsics: llvm.ppc.convert.f128.to.ppcf128 and
llvm.convert.ppcf128.to.f128, to support such conversion.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D109421
Default to preferring forward slashes when built for MinGW, as
many usecases, when e.g. Clang is used as a drop-in replacement
for GCC, requires the compiler to output paths with forward slashes.
Not all tests pass yet, if configuring to prefer forward slashes though.
Differential Revision: https://reviews.llvm.org/D112787
This normalizes most paths (except ones input from the user as command
line arguments) into the preferred form, if `real_style()` evaluates to
`windows_forward`.
Differential Revision: https://reviews.llvm.org/D111880
This behaves just like the regular Windows style, with both separator
forms accepted, but with get_separator() returning forward slashes.
Add a more descriptive name for the existing style, keeping the old
name around as an alias initially.
Add a new function `make_preferred()` (like the C++17
`std::filesystem::path` function with the same name), which converts
windows paths to the preferred separator form (while this one works on
any platform and takes a `path::Style` argument).
Contrary to `native()` (just like `make_preferred()` in `std::filesystem`),
this doesn't do anything at all on Posix, it doesn't try to reinterpret
backslashes into forward slashes there.
Differential Revision: https://reviews.llvm.org/D111879
This reverts commits 737e4216c5 and
ce7ac9e66a.
After those commits, the compiler can crash with a reduced
testcase like this:
$ cat reduced.c
void a(*);
void a() {}
$ clang -c reduced.c -O2 -g
Constraint `*m` should be used when the address of a variable is passed
as a value. And the constraint is missing for MS inline assembly when sth
is written to the address of the variable.
The missing would cause FE delete the definition of the static varible,
and then result in "undefined reference to xxx" issue.
Reviewed By: xiangzhangllvm
Differential Revision: https://reviews.llvm.org/D113096
Added and implemented -asan-use-stack-safety flag, which control if ASan would use the Stack Safety results to emit less code for operations which are marked as 'safe' by the static analysis.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D112098
A new kind BTF_KIND_TYPE_TAG is defined. The tags associated
with a pointer type are emitted in their IR order as modifiers.
For example, for the following declaration:
int __tag1 * __tag1 __tag2 *g;
The BTF type chain will look like
VAR(g) -> __tag1 --> __tag2 -> pointer -> __tag1 -> pointer -> int
In the above "->" means BTF CommonType.Type which indicates
the point-to type.
Differential Revision: https://reviews.llvm.org/D113222
We almost always want to use the default AA pipeline. It's very easy for
users of PassBuilder to forget to customize the AAManager to use the
default AA pipeline (for example, the NewPM C API forgets to do this).
If somebody wants a custom AA pipeline, similar to what is being done
now with the default AA pipeline registration, they can
FAM.registerPass([&] { return std::move(MyAA); });
before calling
PB.registerFunctionAnalyses(FAM);
For example, LTOBackend.cpp and NewPMDriver.cpp do this.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D113210
If a tool wants to introduce new indirections via stubs at link-time in
ORC, it can cause fidelity issues around the address of the function if
some references to the function do not have relocations. This is known
to happen inside the body of the function itself on x86_64 for example,
where a PC-relative address is formed, but without a relocation.
```
_foo:
leaq -7(%rip), %rax ## form pointer to '_foo' without relocation
_bar:
leaq (%rip), %rax ## uses X86_64_RELOC_SIGNED to '_foo'
```
The consequence of introducing a stub for such a function at link time
is that if it forms a pointer to itself without relocation, it will not
have the same value as a pointer from outside the function. If the
function pointer is used as a key, this can cause problems.
This utility provides best-effort support for adding such missing
relocations using MCDisassembler and MCInstrAnalysis to identify the
problematic instructions. Currently it is only implemented for x86_64.
Note: the related issue with call/jump instructions is not handled
here, only forming function pointers.
rdar://83514317
Differential revision: https://reviews.llvm.org/D113038
Specifically in DWARFv5 the unit for the line table entry was correct
but the context was incorrect - leading to looking up .debug_line_str in
the dwp instead of the executable.
(perhaps we could/should remove the context pointer entirely, and rely
on the one in the unit... I might try that as a separate follow-up
commit)
This relaxes the one-use requirement on the rotation transform specifically for the case where we know we're zexting an IV of the loop. This allows us to discover trip count information in SCEV, which seems worth a single extra loop invariant truncate. Honestly, I'd prefer if SCEV could just compute the trip count directly (e.g. D109457), but this unblocks practical benefit.
This patch added clang codegen and llvm support
for btf_type_tag support. Currently, btf_type_tag
attribute info is preserved in DebugInfo IR only for
pointer types associated with typedef, global variable
and function declaration. Eventually, such information
is emitted to dwarf.
The following is an example:
$ cat test.c
#define __tag __attribute__((btf_type_tag("tag")))
int __tag *g;
$ clang -O2 -g -c test.c
$ llvm-dwarfdump --debug-info test.o
...
0x0000001e: DW_TAG_variable
DW_AT_name ("g")
DW_AT_type (0x00000033 "int *")
DW_AT_external (true)
DW_AT_decl_file ("/home/yhs/test.c")
DW_AT_decl_line (2)
DW_AT_location (DW_OP_addr 0x0)
0x00000033: DW_TAG_pointer_type
DW_AT_type (0x00000042 "int")
0x00000038: DW_TAG_LLVM_annotation
DW_AT_name ("btf_type_tag")
DW_AT_const_value ("tag")
0x00000041: NULL
0x00000042: DW_TAG_base_type
DW_AT_name ("int")
DW_AT_encoding (DW_ATE_signed)
DW_AT_byte_size (0x04)
0x00000049: NULL
Basically, a DW_TAG_LLVM_annotation tag will be inserted
under DW_TAG_pointer_type tag if that pointer has a btf_type_tag
associated with it.
Differential Revision: https://reviews.llvm.org/D111199
This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D111637
This diff makes several amendments to the local file caching mechanism
which was migrated from ThinLTO to Support in
rGe678c51177102845c93529d457b020f969125373 in response to follow-up
discussion on that commit.
Patch By: noajshu
Differential Revision: https://reviews.llvm.org/D113080
Currently when tail predicating loops, vpt blocks need to be created
with the vctp predicate in case we need to revert to non-tail predicated
form. This has the unfortunate side effect of severely hampering post-ra
scheduling at times as the instructions are already stuck in vpt blocks,
not allowed to be independently ordered.
This patch addresses that by just moving the creation of VPT blocks
later in the pipeline, after post-ra scheduling has been performed. This
allows more optimal scheduling post-ra before the vpt blocks are
created, leading to more optimal tail predicated loops.
Differential Revision: https://reviews.llvm.org/D113094
This is a reworked version of the reverted patch: https://reviews.llvm.org/D112487
Note that
a) it doesn't need the test changes anymore, and
b) I checked at least locally it passes other.test_pthread_lsan_leak
Differential Revision: https://reviews.llvm.org/D113208
The only binary-format-related field in the BBAddrMap structure is the function address (`Addr`), which will use uint64_t in 64B format and uint32_t in 32B format. This patch changes it to use uint64_t in both formats.
This allows non-templated use of the struct, at the expense of a marginal additional size overhead for the 32-bit format. The size of the BB address map section does not change.
Differential Revision: https://reviews.llvm.org/D112679
This interface should not have existed in the first place, let alone
be a public member.
It allows calling `ElementCount::get(..)->getValue()`, which is ambiguous.
The interfaces to be used are either getFixedValue() or getKnownMinValue().
Function specialisation was running at all optimisation levels (if enabled on
the command line, it is not on by default). That was an oversight and not
something we want to do. Function specialisation duplicates functions when it
triggers, so the backend is processing more functions/instructions resulting in
compile-time increases, which seems more appropriate with -O3 and inline with
GCC. Please note that since function specialisation is not enabled by default,
this didn't require updating any pass manager tests.
Differential Revision: https://reviews.llvm.org/D112129
Coroutines have weird semantics that don't quite match normal LLVM functions,
so trying to infer even simple attributes based on thier contents can go wrong.
The added test has poison lanes due to the vector shuffle. This can
cause an infinite loop of combines in instcombine where it folds
xor(ashr, -1) -> select (icmp slt 0), -1, 0 -> sext (icmp slt 0) -> xor(ashr, -1).
We usually prevent this by checking that the xor constant is not -1,
but with vectors some of the lanes may be -1, some may be poison. So
this changes the way we detect that from "!C1->isAllOnesValue()" to
"!match(C1, m_AllOnes())", which is more able to detect that some of the
lanes are poison.
Fixes PR52397
Currently, FPSCR is not modeled, so in some early passes (such as
early-cse), the read/set intrinsics to FPSCR may get incorrect
simplification.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D112380
There is no real source location for code inside prologue as it is
generated by compiler but source locations are being added to code
inside prologue as a side effect of https://reviews.llvm.org/D99269
because buildSpillLoadStore() is using source location of the real
instruction in the basic block if any.
Fixes: SWDEV-307590
Reviewed By: scott.linder, sebastian-ne
Differential Revision: https://reviews.llvm.org/D113100
Not sure these are correct. I think I missed a case when porting this from the original SCEV change to the IndVar changes. I may end up reapplying this later with a comment about how this is correct, but in case the current bad feeling turns out to be true, I'm removing from tree while investigating further.
We already make sure to properly clear analyses for deleted functions.
This makes investigating some future potential compile time improvements easier.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D113032
Change RISCVSubtarget.hasVInstructionAnyF() to call hasVInstructionsF32
so that any changes to hasVInstructionsF32 are reflected.
The files were missed in D112496.
This is a re-commit of e2c7ee0743 which
was reverted in a2a58d91e8. This includes
a fix to consistently check for EFLAGS being live-out. See phabricator
review.
Original Summary:
This extends `optimizeCompareInstr` to re-use previous comparison
results if the previous comparison was with an immediate that was 1
bigger or smaller. Example:
CMP x, 13
...
CMP x, 12 ; can be removed if we change the SETg
SETg ... ; x > 12 changed to `SETge` (x >= 13) removing CMP
Motivation: This often happens because SelectionDAG canonicalization
tends to add/subtract 1 often when optimizing for fallthrough blocks.
Example for `x > C` the fallthrough optimization switches true/false
blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into
`x < C + 1`.
Differential Revision: https://reviews.llvm.org/D110867
Detailed description: This change enables the bit field extract patterns
selection to s_bfe_u32 or v_bfe_u32 dependent on the pattern root node
divergence.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D110950
This takes care of cleaning up the temp files on crashes. It doesn't
handle cleanup when explicitly killed though.
Differential Revision: https://reviews.llvm.org/D112710
This change looks for cases where we can prove that an exit test of a loop can be performed in a narrower bitwidth, and that by doing so we can replace a loop-varying extend with a loop-invariant truncate.
The motivation here is that doing this unblocks the trip count analysis for narrow IVs involved in extended compare exit tests. It also has the nice side effect of simply making the code faster, even if we gain no other benefit from the improved analysis ability.
I've noted a few places this could be extended, but I think this stands reasonable on it's own as well.
Differential Revision: https://reviews.llvm.org/D112262
The check for whether a zero extension was needed was subtly wrong and
saw a value that was already 64 bits, so did not extend.
Fixes PR52357.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D112860
This casued miscompiles of switches, see comments on the code review.
> This extends `optimizeCompareInstr` to re-use previous comparison
> results if the previous comparison was with an immediate that was 1
> bigger or smaller. Example:
>
> CMP x, 13
> ...
> CMP x, 12 ; can be removed if we change the SETg
> SETg ... ; x > 12 changed to `SETge` (x >= 13) removing CMP
>
> Motivation: This often happens because SelectionDAG canonicalization
> tends to add/subtract 1 often when optimizing for fallthrough blocks.
> Example for `x > C` the fallthrough optimization switches true/false
> blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into
> `x < C + 1`.
>
> Differential Revision: https://reviews.llvm.org/D110867
This reverts commit e2c7ee0743.
I don't really buy that masked interleaved memory loads/stores are supported on X86.
There is zero costmodel test coverage, no actual cost modelling for the generation
of the mask repetition, and basically only two LV tests.
Additionally, i'm not very interested in AVX512.
I don't know if this really helps "soft" block over at
https://reviews.llvm.org/D111460#inline-1075467,
but i think it can't make things worse at least.
When we are being told that there is a masking, instead of
completely giving up and falling back to
fully scalarizing `BasicTTIImplBase::getInterleavedMemoryOpCost()`,
let's correctly query the cost of masked memory ops,
keep all the pretty shuffle cost modelling,
but scalarize the cost computation for the mask replication.
I think, not scalarizing the shuffles themselves
may adjust the computed costs a bit,
and maybe hopefully just enough to hide the "regressions"
at https://reviews.llvm.org/D111460#inline-1075467
I do mean hide, because the test coverage is non-existent.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D112873
The patch in D112349 added a previously nonexistant restriction on ifunc
resolvers that they MUST be defintions. However, the function
multiversioning depends on being able to resolve these resolvers at
link-time, so this additional restriction was breaking.
The recipe produces exactly one VPValue and can inherit directly from
it. This is in line with other recipes and avoids having to use
getVPSingleValue.
A reserved register:
- is not allocatable
- is considered always live
- is ignored by liveness tracking
NVPTX special registers match the criteria, and marking them as
reserved helps to avoid machine verifier error:
*** Bad machine code: Using an undefined physical register ***
- function: foo
- basic block: %bb.0 (0x557bb178b708)
- instruction: %0:int32regs = MOV_SPECIAL $envreg0
- operand 1: $envreg0
Differential Revision: https://reviews.llvm.org/D113008
Add a warning to TableGen for unused template arguments in classes and
multiclasses, for example:
multiclass Foo<int x> {
def bar;
}
$ llvm-tblgen foo.td
foo.td:1:20: warning: unused template argument: Foo::x
multiclass Foo<int x> {
^
A flag '--no-warn-on-unused-template-args' is added to disable the
warning. The warning is disabled for LLVM and sub-projects if
'LLVM_ENABLE_WARNINGS=OFF'.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D109359
TargetExternalSymbol is considered to be an immediate and not a
register, so machine verifier emits an error:
*** Bad machine code: Expected a register operand. ***
- function: static_offset
- basic block: %bb.0 bb (0x560e9b306028)
- instruction: %3:int64regs = MoveParamI64 &static_offset_param_1
- operand 1: &static_offset_param_1
The patch adds variants of this instruction with an immediate operand
for byval arguments on 64-bit and 32-bit targets.
Differential Revision: https://reviews.llvm.org/D113006
LLVM has the habit of turning adds with no common bits set into ors,
which means we need to detect them and treat them like adds again in the
MVE gather/scatter lowering pass.
Differential Revision: https://reviews.llvm.org/D112922
This teaches the MVE gather scatter lowering pass that SHL is
essentially the same as Mul, where we are able to optimize the
induction of a gather/scatter address by pushing them out of loops.
https://alive2.llvm.org/ce/z/wG4VyT
Differential Revision: https://reviews.llvm.org/D112920
Implement two builtins to pack/unpack IBM extended long double float,
according to GCC 'Basic PowerPC Builtin Functions Available ISA 2.05'.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D112055
Before this patch, flags such as undef were dropped by TII::insertBranch
(used by BranchFolding pass), resulting in the following error from
machine verifier:
*** Bad machine code: Reading virtual register without a def ***
- function: hoge
- basic block: %bb.0 bb (0x562e9c240e68)
- instruction: CBranch %2:int1regs, %bb.3
- operand 0: %2:int1regs
Differential Revision: https://reviews.llvm.org/D113001
In D71220 a pattern was added to replace shuffle's insertelement operand
if inserted scalar is not demanded. The pattern was added only for
the case where the shuffle's mask size is equal to element's vector size.
However, that condition is not required because the pattern does not
change the shuffle vector size.
This patch extends the pattern to also include cases where shuffle's mask
size is not equal to element's vector size.
Differential Revision: https://reviews.llvm.org/D112318
If we assume SPMD-mode during the fixpoint iteration we have to execute
the kernel in SPMD-mode. If we change our mind during manifest there is
the chance of a mismatch between the simplification, e.g., of
`__kmpc_is_spmd_exec_mode` calls, and the execution mode. This problem
was introduced in D109438.
This patch is compromise to resolve the problem purely in OpenMP-opt
while trying to keep the benefits of D109438 around. This might not
always work, see `get_hardware_num_threads_in_block_fold` but it often
does. At the same time we do keep value specialization and execution
mode in sync.
Proper solutions to this problem should be considered. I believe a new
execution mode is the easiest way forward (Singleton-SPMD).
Alternatively, SPMD-mode execution can be used with a way to provide a
new thread_limit (here 1) to the runtime. This is more general and could
be useful if we see `num_threads` clauses or workshared loops with small
trip counts in the kernel. In either proposal we need to disable the
guarding for the kernel (which was the motivation for D109438).
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D112894
Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was
OK to be used in the custom state machine. Now that SPMD barriers are
assumed to be aligned we need to use a "generic" barrier in places
that are not aligned.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D112893
Global symbols cannot have any name so we need to sanitize the string
first. Also remove an assertion that is not actually necessary nor
true in general.
Reviewed By: ggeorgakoudis
Differential Revision: https://reviews.llvm.org/D112892
The function to generate S_MOV_B64_IMM_PSEUDO was recently modified to
optimize AGPR to AGPR copy but it missed checking for the SGPR
clobbering for the S_MOV_B64_IMM_PSEUDO generation.
Differential Revision: https://reviews.llvm.org/D113005
Data references in a loop should not access elements over the
statically allocated size. So we can infer a loop max trip count
from this undefined behavior.
Reviewed By: reames, mkazantsev, nikic
Differential Revision: https://reviews.llvm.org/D109821
The changes in D80163 defered the assignment of MachineMemOperand (MMO)
until the X86ExpandPseudo pass. This will result in crash due to prolog
insert point been sunk across the pseudo instruction VASTART_SAVE_XMM_REGS.
Moving the assignment to the creation of the node can avoid the problem.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D112859
To correctly use Query, one had to first call collectInterferingVRegs to
pre-cache the query result, then call interferingVRegs. Failing the
former, interferingVRegs could be stale. This did cause a bug which was
addressed in D98232, but the underlying usability issue of the Query API
wasn't.
This patch addresses the latter by making collectInterferingVRegs an
implementation detail, and having interferingVRegs play both roles. One
side-effect of this is that interferingVRegs is not const anymore.
Differential Revision: https://reviews.llvm.org/D112882
All heuristics for variable accesses require both access sizes to
be known, so check this once at the start, rather than for each
particular heuristic.
If we know that the var * scale multiplication is nsw, we can use
a saturating multiplication on the range (as a good approximation
of an nsw multiply). This recovers some cases where the fix from
D112611 is unnecessarily strict. (This can be further strengthened
by using a saturating add, but we currently don't track all the
necessary information for that.)
This exposes an issue in our NSW tracking for multiplies. The code
was assuming that (X +nsw Y) *nsw Z results in
(X *nsw Z) +nsw (Y *nsw Z) -- however, it is possible that the
distributed multiplications overflow, even if the non-distributed
one does not. We should discard the nsw flag if the the offset is
non-zero. If we just have (X *nsw Y) *nsw Z then concluding
X *nsw (Y *nsw Z) is fine.
Differential Revision: https://reviews.llvm.org/D112848
* Move `);` outside the #ENDIF. Syntax highlighters that highlight missed
closing parens, but are not aware of the C Preprocessor saw the original
code as having missed parens.
This is part of rG1cfecf4fc427 that was reverted to fix PR51226 - concating the broadcasts is OK, its the splatted loads that crash (we're not detecting extloads). I'm still creating a reduced test case so haven't added the load handling again yet.
Similar to D110206, this patch optimizes unmasked vp.load intrinsics to
avoid the need of a vmset instruction to set the mask. It does so by
selecting a riscv_vle intrinsic rather than a riscv_vle_mask intrinsic.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D113022
Added support for peeling loops with exits that are followed either by an
unreachable-terminated block or block that has a terminatnig deoptimize call.
All blocks in the sequence must have an unique successor, maybe except
for the last one.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D110922
Summary:
Add new options -print-changed=[dot-cfg | dot-cfg-quiet] which create
a website of DOT files showing colourized changes as the IR is changed
by passes in the new pass manager pipeline.
A new change reporter is introduced that creates a website of changes made
by passes in the opt pipeline that change the IR. The hidden option
-dot-cfg-dir=<dir> specifies a directory (defaulting to "./") into which the
website will be created.
A file passes.html is created that contains a list of all the passes that
act on the IR. Those that do not change the IR are listed as omitted
because of no change, ignored or filtered out (using -filter-print-func
and -filter-passes) or not listed in quiet mode. Those that
do change the IR are listed as a link to a DOT file which contains a
CFG depiction of the IR (ala -dot-cfg) except that the instructions,
basic blocks and links that are only in the IR before the pass (ie, removed)
and those that are only in the IR after the pass (ie, added) are shown in
red and green, respectively, while the aspects of the CFG that do not change
are shown in black. Additional hidden options
-dot-cfg-before-color=<dot named color>,
-dot-cfg-after-color=<dot named color> and
-dot-cfg-common-color=<dot named color> are defined that allow the
customization of the colors used in colorizing the CFG.
-change-printer-dot-path=<path to dot exe> is also added.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D87202
These were added to prevent functions from being removed by WPO.
But that doesn't make sense, correct WPO will not remove functions we actually use.
I noticed these because compiling cc1_main.cpp was pulling in random LLVM pass headers.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D112971
Header "llvm/Transforms/Scalar/SimpleLoopUnswitch.h" is currently
included twice. This commit removes the duplicate 'include' line.
Previous commit 693eedb138
seems to have mistakenly added the duplicate 'include'.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D112979
The tests diffs are logically equivalent, and so this is
generally NFC, but this makes the code match the code
comment.
It should also be more efficient. If we choose the 'not'
operand (rather than the 'not' instruction) as the select
condition, then we don't have to invert the select
condition/operands as a subsequent transform.
The scalarizer pass seems to be inserting instructions in-between PHI nodes or debug intrinsics that end up staying at the end of the pass, resulting in malformed IR and violating assumptions.
This patch adds a check to make sure the `extractelement` instructions that it adds are correctly placed after all PHI nodes and debug intrinsics.
Patch by vettoreldaniele.
Reviewed By: bjope
Differential Revision: https://reviews.llvm.org/D112472
Fold (srl (mul (zext i32:$a to i64), i64:c), 32) -> (mulhu $a, $b),
if c can truncate to i32 without loss.
Reviewed By: frasercrmck, craig.topper, RKSimon
Differential Revision: https://reviews.llvm.org/D108129
If the type of a funnel shift needs to be expanded, expand it to two funnel shifts instead of regular shifts. For constant shifts, this doesn't make much difference, but for variable shifts it allows a more optimal lowering.
Also use the optimized funnel shift lowering for rotates.
Alive2: https://alive2.llvm.org/ce/z/TvHDB- / https://alive2.llvm.org/ce/z/yzPept
(Branched from D108058 as getting this completed should help unlock some other WIP patches).
Original Patch: @efriedma (Eli Friedman)
Differential Revision: https://reviews.llvm.org/D112443
Rely on the hasOperation() instead - as commented on D77804, the mid-term intention is to recognise rotate/funnel-by-constant pre-legalization to help avoid SimplifyDemandedBits regressions.
Notes generated in OpenBSD core files provide additional information
about the kernel state and CPU registers. These notes are described
in core.5, which can be viewed here: https://man.openbsd.org/core.5
Differential Revision: https://reviews.llvm.org/D111966
This patch updates VPReductionRecipe::execute so that the fast-math
flags associated with the underlying instruction of the VPRecipe are
propagated through to the reductions which are created.
Differential Revision: https://reviews.llvm.org/D112548
When a constant expression CE is being converted into a corresponding instruction I,
CE is supposed to be replaced by I. However, it is possible that CE is being used multiple
times within a parent instruction PI. Make sure that *all* the uses of CE within PI are
replaced by I.
Reviewed By: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D112717
When linkage name isn't available in dwarf (ususally the case of C code), looking up callee samples should be based on the dwarf name instead of using an empty string.
Also fixing a test issue where using empty string to look up callee samples accidentally returns the correct samples because it is treated as indirect call.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D112948
To support return (it not being supported well was the ground cause for
https://github.com/WebAssembly/wasi-sdk/issues/200) we also have to have
at least a basic notion of unreachable, which in this case just means to stop
type checking until there is an end_block (an incoming control flow edge).
This is conservative (may miss on some type checking opportunities) but is
simple and an improvement over what we had before.
Differential Revision: https://reviews.llvm.org/D112953
Commit acabad9ff6 ("[InstCombine] try to canonicalize icmp with
trunc op into mask and cmp") added a transformation to
convert "(conv)a < power_2_const" to "a & <const>" in certain
cases and bpf kernel verifier has to handle the resulted code
conservatively and this may reject otherwise legitimate program.
This commit tries to prevent such a transformation. A bpf backend
builtin llvm.bpf.compare is added. The ICMP insn, which is subject to
above InstCombine transformation, is converted to the builtin
function. The builtin function is later lowered to original ICMP insn,
certainly after InstCombine pass.
With this change, all affected bpf strobemeta* selftests are
passed now.
Differential Revision: https://reviews.llvm.org/D112938
blockaddresses do not participate in the call graph since the only
instructions that use them must all return to someplace within the
current function. And passes cannot retrieve a function address from a
blockaddress.
This was suggested by efriedma in D58260.
Fixes PR50881.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D112178
It it now sufficient to track only direct addrec users of a loop,
and let the SCEVUsers mechanism track and invalidate transitive users.
Differential Revision: https://reviews.llvm.org/D112875
For NEON, FMA matching is done in the MachineCombiner, and not the
DAGCombiner. That causes problems with VLS lowering, since the
vectors are fixed width at the DAGCombiner, but are scalable in
the MachineCombiner. This patch corrects it by matching FMAs for
VLS vectors in the DAGCombiner.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D112557
Similar to 54e969cffd (and with cosmetic updates to hopefully
make that easier to read), this fold has been around since early
in LLVM history.
Intermediate folds have been added subsequently, so extra uses
are required to exercise this code.
The test example actually shows an unintended consequence with
extra uses - we end up with an extra instruction compared to what
we started with. But this at least makes scalar/vector consistent.
General proof:
https://alive2.llvm.org/ce/z/tmuBza
Previously we only applied it to the first one, which could allow
subsequent global tags to exceed the valid number of bits.
Reviewed By: hctim
Differential Revision: https://reviews.llvm.org/D112853
This fold was added long ago (part of fixing PR4216),
and it matched scalars only. Intermediate folds have
been added subsequently, so extra uses are required
to exercise this code.
General proof:
https://alive2.llvm.org/ce/z/G6BBhB
One of the specific tests:
https://alive2.llvm.org/ce/z/t0JhEB