AArch64, X86 and Mips currently directly consumes these and custom
lowering to produce a libcall, but really these should follow the
normal legalization process through the libcall/lower action.
As discussed in
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143801.html.
Currently no users outside of unit tests.
Replace all instances in tests of -constprop with -instsimplify.
Notable changes in tests:
* vscale.ll - @llvm.sadd.sat.nxv16i8 is evaluated by instsimplify, use a fake intrinsic instead
* InsertElement.ll - insertelement undef is removed by instsimplify in @insertelement_undef
llvm/test/Transforms/ConstProp moved to llvm/test/Transforms/InstSimplify/ConstProp
Reviewed By: lattner, nikic
Differential Revision: https://reviews.llvm.org/D85159
Summary:
When looking for all reaching definitions, we sort basic blocks on dominance. When sorting looking for properlyDominates() handles the case A == B.
Authored by: pranavb
Differential Revision: https://reviews.llvm.org/D86661
This is a reboot of D84655, now performing the inner icmp
simplification query without undef folds.
It should be possible to handle the current foldMinMaxSharedOp()
fold based on this, by moving the logic into icmp of min/max instead,
making it more general. We can't drop the folds for constant operands,
because those also allow undef, which we exclude here.
The tests use assumes for exhaustive coverage, and have a few
more examples of misc folds we get based on icmp simplification.
Differential Revision: https://reviews.llvm.org/D85929
Original Commit Message:
After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register)
may be placed in the middle of a stack frame if a function has both callee-saved
general-purpose registers and floating point registers. This will break the stack unwinders
that simply walk through the frame records (based on the guarantee from AAPCS64
"The Frame Pointer" section). This commit fixes the problem by adding the frame record offset.
Patch By: logan
We have a gap in our store merging capabilities for shift+truncate
patterns as discussed in:
https://llvm.org/PR46662
I generalized the code/comments for this function in earlier commits,
so we only need ease the type restriction and adjust the address/endian
checking to make this work.
AArch64 lets us switch endian to make sure that patterns are matched
either way.
Differential Revision: https://reviews.llvm.org/D86420
`GetFinalPathNameByHandleW(,,N,)` returns:
- `< N` on success (this value does not include the size of the terminating null character)
- `>= N` if buffer is too small (this value includes the size of the terminating null character)
So, when `N == Buffer.capacity() - 1`, we need to resize buffer if return value is > `Buffer.capacity() - 2`.
Also, we can set `N` to `Buffer.capacity()`.
Thus, without this patch `realPathFromHandle()` returns unfilled buffer when length of the final path of the file is equal to `Buffer.capacity()` or `Buffer.capacity() - 1`.
Reviewed By: andrewng, amccarth
Differential Revision: https://reviews.llvm.org/D86564
InstSimplify should do all transformations that ConstProp does, but
one thing that ConstProp does that InstSimplify wouldn't is inline
vector instructions that are constants, e.g. into a ret.
Previously vector instructions wouldn't be inlined in InstSimplify
because llvm::Simplify*Instruction() would return nullptr for specific
instructions, such as vector instructions that were actually constants,
if it couldn't simplify them.
This changes SimplifyInsertElementInst, SimplifyExtractElementInst, and
SimplifyShuffleVectorInst to return a vector constant when possible.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85946
The version of `st1d` that operates with vector plus immediate
addressing mode uses the alias `st1d { <Zn>.d }, <Pg>, [<Za>.d]` for
rendering `st1d { <Zn>.d }, <Pg>, [<Za>.d, #0]`. The disassembler was
generating `<Zn>.s` instead of `<Zn>.d>`.
Differential Revision: https://reviews.llvm.org/D86633
For `ld64` which uses legacy LTOCodeGenerator, it relies on
writeMergedModule to perform `ld -r` (generates a linked object file).
If all the inputs to `ld -r` is fullLTO bitcode, `ld64` will linked the
bitcode module, internalize all the symbols and write out another
fullLTO bitcode object file. This bitcode file doesn't have all the
bitcode inputs and it should not have LTOPostLink module flag. It will
also cause error when this bitcode object file is linked with other LTO
object file.
Fix the issue by not applying LTOPostLink flag during writeMergedModule
function. The flag should only be added when all the bitcode are linked
and ready to be optimized.
rdar://problem/58462798
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D84789
This would assert with unaligned DS access enabled. The offset may not
be aligned. Theoretically the pattern predicate should check the
memory alignment, although it is possible to have the memory be
aligned but not the immediate offset.
In this case I would expect it to use ds_{read|write}_b64 with
unaligned access, but am not clear if there's a reason it doesn't.
and indirect call promotion candidate.
Profile remapping is a feature to match a function in the module with its
profile in sample profile if the function name and the name in profile look
different but are equivalent using given remapping rules. This is a useful
feature to keep the performance stable by specifying some remapping rules
when sampleFDO targets are going through some large scale function signature
change.
However, currently profile remapping support is only valid for outline
function profile in SampleFDO. It cannot match a callee with an inline
instance profile if they have different but equivalent names. We found
that without the support for inline instance profile, remapping is less
effective for some large scale change.
To add that support, before any remapping lookup happens, all the names
in the profile will be inserted into remapper and the Key to the name
mapping will be recorded in a map called NameMap in the remapper. During
name lookup, a Key will be returned for the given name and it will be used
to extract an equivalent name in the profile from NameMap. So with the help
of the NameMap, we can translate any given name to an equivalent name in
the profile if it exists. Whenever we try to match a name in the module to
a name in the profile, we will try the match with the original name first,
and if it doesn't match, we will use the equivalent name got from remapper
to try the match for another time. In this way, the patch can enhance the
profile remapping support for searching inline instance and searching
indirect call promotion candidate.
In a planned large scale change of int64 type (long long) to int64_t (long),
we found the performance of a google internal benchmark degraded by 2% if
nothing was done. If existing profile remapping was enabled, the performance
degradation dropped to 1.2%. If the profile remapping with the current patch
was enabled, the performance degradation further dropped to 0.14% (Note the
experiment was done before searching indirect call promotion candidate was
added. We hope with the remapping support of searching indirect call promotion
candidate, the degradation can drop to 0% in the end. It will be evaluated
post commit).
Differential Revision: https://reviews.llvm.org/D86332
This patch adds NoUndef to Intrinsics.td.
The attribute is attached to llvm.assume's operand, because llvm.assume(undef)
is UB.
It is attached to pointer operands of several memory accessing intrinsics
as well.
This change makes ValueTracking::getGuaranteedNonPoisonOps' intrinsic check
unnecessary, so it is removed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86576
This is an older syntax than the {disp32} and {disp8} pseudo
prefixes that were added a few weeks ago. We can reuse most of
the support for that to support .d32 and .d8 as well.
As FIXME said, they really should be checking for a single user,
not use, so let's do that. It is not *that* unusual to have
the same value as incoming value in a PHI node, not unlike
how a PHI may have the same incoming basic block more than once.
There isn't a nice way to do that, Value::users() isn't uniqified,
and Value only tracks it's uses, not Users, so the check is
potentially costly since it does indeed potentially involes
traversing the entire use list of a value.
Instead of computing GUID based on some assumption about symbol mangling
rule from IRName to symbol name, lookup the IRName from all the symtabs
from all the input files to see if there are any matching symbols entry
provides the IRName for GUID computation.
rdar://65853754
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D84803
Summary:
Support TOCU and TOCL relocation type for object file generation.
Reviewed by: DiggerLin
Differential Revision: https://reviews.llvm.org/D84549
The non-standard header file `<sysexits.h>` provides some return values.
`EX_IOERR` is used to as a special value to signal a broken pipe to the clang driver.
On z/OS Unix System Services, this header file does not exists. This patch
- adds a check for `<sysexits.h>`, removing the dependency on `LLVM_ON_UNIX`
- adds a new header file `llvm/Support/ExitCodes`, which either includes
`<sysexits.h>` or defines `EX_IOERR`
- updates the users of `EX_IOERR` to include the new header file
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D83472
When floating point callee-saved registers were used, the frame pointer would
incorrectly point to the bottom of the CSR space (containing saved floating-point
registers), rather than to the frame record.
While all frame offsets were calculated consistently, resulting in working code,
this prevented stack walkers from being about to traverse the frame list.
This implements 2 different vectorisation fallback strategies if tail-folding
fails: 1) don't vectorise at all, or 2) vectorise using a scalar epilogue. This
can be controlled with option -prefer-predicate-over-epilogue, that has been
changed to take a numeric value corresponding to the tail-folding preference
and preferred fallback.
Patch by: Pierre van Houtryve, Sjoerd Meijer.
Differential Revision: https://reviews.llvm.org/D79783
If the condition output is negated, swap the branch targets. This is
similar to what SelectionDAG does for when SelectionDAGBuilder
decides to invert the condition and swap the branches.
This is leaving behind a dead constant def for some reason.
This produces less work for addressing mode matching. I think this is
safe since I don't think machine IR is supposed to give the same
aliasing properties as getelementptr in the IR.
If a workgroup size is known to be not greater than wavefront size
the s_barrier instruction is not needed since all threads are guaranteed
to come to the same point at the same time.
This is the same optimization that was implemented for SelectionDAG in
D31731.
Differential Revision: https://reviews.llvm.org/D86609
This patch makes the unit_length and header_length fields of line tables
optional. yaml2obj is able to infer them for us.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D86590
Before calling target hook to determine if two loads/stores are clusterable,
we put them into different groups to avoid fake cluster due to dependency.
For now, we are putting the loads/stores into the same group if they have
the same predecessor. We assume that, if two loads/stores have the same
predecessor, it is likely that, they didn't have dependency for each other.
However, one SUnit might have several predecessors and for now, we just
pick up the first predecessor that has non-data/non-artificial dependency,
which is too arbitrary. And we are struggling to fix it.
So, I am proposing some better implementation.
1. Collect all the loads/stores that has memory info first to reduce the complexity.
2. Sort these loads/stores so that we can stop the seeking as early as possible.
3. For each load/store, seeking for the first non-dependency instruction with the
sorted order, and check if they can cluster or not.
Reviewed By: Jay Foad
Differential Revision: https://reviews.llvm.org/D85517
MVE Gather scatter codegeneration is looking a lot better than it used
to, but still has some issues. The instructions we currently model as 1
cycle per element, which is a bit low for some cases. Increasing the
cost by the MVECostFactor brings them in-line with our other instruction
costs. This will have the effect of only generating then when the extra
benefit is more likely to overcome some of the issues. Notably in
running out of registers and vectorizing loops that could otherwise be
SLP vectorized.
In the short-term whilst we look at other ways of dealing with those
more directly, we can increase the costs of gathers to make them more
likely to be beneficial when created.
Differential Revision: https://reviews.llvm.org/D86444
If the basic block of the instruction passed to getUniqueReachingMIDef
is a transitive predecessor of itself and has a definition of the
register, the function will return that definition even if it is after
the instruction given to the function. This patch stops the function
from scanning the instruction's basic block to prevent this.
Differential Revision: https://reviews.llvm.org/D86607
pointer.
mwaitx uses EBX as one of its argument.
Using this instruction clobbers RBX as it is defined to hold one of the
input. When the backend uses dynamically allocated stack, RBX is used as
a reserved register for the base pointer.
This patch is adapted from @qcolombet patch for cmpxchg at r263325.
This fixes PR43528.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D73475
The switch in AArch64Operand::print was changed in D45688 so the shift
can be printed after printing the register. This is implemented with
LLVM_FALLTHROUGH and was broken in D52485 when BTIHint was put between
the register and shift operands.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D86535
This fixes an issue where the restore point of callee-saves in the
function epilogues was incorrectly calculated when the basic block
consisted of only a RET instruction. This caused dealloc instructions
to be inserted in between the block of callee-save restore instructions,
rather than before it.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D86099
Currently `strace llvm-dwarfdump x.debug >/tmp/file`:
ioctl(1, TCGETS, 0x7ffd64d7f340) = -1 ENOTTY (Inappropriate ioctl for device)
write(1, " DW_AT_decl_line\t(89)\n"..., 4096) = 4096
ioctl(1, TCGETS, 0x7ffd64d7f400) = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(1, TCGETS, 0x7ffd64d7f410) = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(1, TCGETS, 0x7ffd64d7f400) = -1 ENOTTY (Inappropriate ioctl for device)
After this patch:
write(1, "0000000000001102 \"strlen\")\n "..., 4096) = 4096
write(1, "site\n DW_AT_low"..., 4096) = 4096
write(1, "d53)\n\n0x000e4d4d: DW_TAG_G"..., 4096) = 4096
The same speedup can be achieved by `--color=0` but that is not much convenient.
This implementation has been suggested by Joerg Sonnenberger.
Differential Revision: https://reviews.llvm.org/D86406
This patch produces an edge-based interface in AAIsDead.
By this, we can query a set of basic blocks that are directly reachable from a given basic block.
This is specifically useful for implementation of AAReachability.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85547
While since D86306 we do it's sibling fold for `insertvalue`,
we should also do this for `extractvalue`'s.
And unlike that one, the results here are, quite honestly, shocking,
as it can be observed here on vanilla llvm test-suite + RawSpeed results:
```
| statistic name | baseline | proposed | Δ | % | |%| |
|----------------------------------------------------|-----------|-----------|--------:|--------:|-------:|
| asm-printer.EmittedInsts | 7945095 | 7942507 | -2588 | -0.03% | 0.03% |
| assembler.ObjectBytes | 273209920 | 273069800 | -140120 | -0.05% | 0.05% |
| early-cse.NumCSE | 2183363 | 2183398 | 35 | 0.00% | 0.00% |
| early-cse.NumSimplify | 541847 | 550017 | 8170 | 1.51% | 1.51% |
| instcombine.NumAggregateReconstructionsSimplified | 2139 | 108 | -2031 | -94.95% | 94.95% |
| instcombine.NumCombined | 3601364 | 3635448 | 34084 | 0.95% | 0.95% |
| instcombine.NumConstProp | 27153 | 27157 | 4 | 0.01% | 0.01% |
| instcombine.NumDeadInst | 1694521 | 1765022 | 70501 | 4.16% | 4.16% |
| instcombine.NumPHIsOfExtractValues | 0 | 37546 | 37546 | 0.00% | 0.00% |
| instcombine.NumSunkInst | 63158 | 63686 | 528 | 0.84% | 0.84% |
| instcount.NumBrInst | 874304 | 871857 | -2447 | -0.28% | 0.28% |
| instcount.NumCallInst | 1757657 | 1758402 | 745 | 0.04% | 0.04% |
| instcount.NumExtractValueInst | 45623 | 11483 | -34140 | -74.83% | 74.83% |
| instcount.NumInsertValueInst | 4983 | 580 | -4403 | -88.36% | 88.36% |
| instcount.NumInvokeInst | 61018 | 59478 | -1540 | -2.52% | 2.52% |
| instcount.NumLandingPadInst | 35334 | 34215 | -1119 | -3.17% | 3.17% |
| instcount.NumPHIInst | 344428 | 331116 | -13312 | -3.86% | 3.86% |
| instcount.NumRetInst | 100773 | 100772 | -1 | 0.00% | 0.00% |
| instcount.TotalBlocks | 1081154 | 1077166 | -3988 | -0.37% | 0.37% |
| instcount.TotalFuncs | 101443 | 101442 | -1 | 0.00% | 0.00% |
| instcount.TotalInsts | 8890201 | 8833747 | -56454 | -0.64% | 0.64% |
| instsimplify.NumSimplified | 75822 | 75707 | -115 | -0.15% | 0.15% |
| simplifycfg.NumHoistCommonCode | 24203 | 24197 | -6 | -0.02% | 0.02% |
| simplifycfg.NumHoistCommonInstrs | 48201 | 48195 | -6 | -0.01% | 0.01% |
| simplifycfg.NumInvokes | 2785 | 4298 | 1513 | 54.33% | 54.33% |
| simplifycfg.NumSimpl | 997332 | 1018189 | 20857 | 2.09% | 2.09% |
| simplifycfg.NumSinkCommonCode | 7088 | 6464 | -624 | -8.80% | 8.80% |
| simplifycfg.NumSinkCommonInstrs | 15117 | 14021 | -1096 | -7.25% | 7.25% |
```
... which tells us that this new fold fires whopping 38k times,
increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time),
decreasing total instruction count by -0.64% (-56454),
and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less)
and `extractvalue`'s (-74.83%, i.e. four times less).
This causes geomean -0.01% binary size decrease
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text
and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions
The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform
`InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold,
which is supposed to be dealing with such aggregate reconstructions,
fires a lot less now. There are two reasons why:
1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes.
We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun.
2. But then, EarlyCSE not only manages to fold such redundant PHI's,
it also sees that the extract-insert chain recreates the original aggregate,
and replaces it with the original aggregate.
The take-aways are
1. We maybe should do most trivial, same-BB PHI CSE in InstCombine
2. I need to check if what other patterns remain, and how they can be resolved.
(i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away)
This is a reland of the original commit fcb51d8c24,
because originally i forgot to ensure that the base aggregate types match.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D86530
This reverts commit fcb51d8c24.
As buildbots report, there's apparently some missing check to ensure
that the types of incoming values match the type of PHI.
Let's revert for a moment.
While since D86306 we do it's sibling fold for `insertvalue`,
we should also do this for `extractvalue`'s.
And unlike that one, the results here are, quite honestly, shocking,
as it can be observed here on vanilla llvm test-suite + RawSpeed results:
```
| statistic name | baseline | proposed | Δ | % | |%| |
|----------------------------------------------------|-----------|-----------|--------:|--------:|-------:|
| asm-printer.EmittedInsts | 7945095 | 7942507 | -2588 | -0.03% | 0.03% |
| assembler.ObjectBytes | 273209920 | 273069800 | -140120 | -0.05% | 0.05% |
| early-cse.NumCSE | 2183363 | 2183398 | 35 | 0.00% | 0.00% |
| early-cse.NumSimplify | 541847 | 550017 | 8170 | 1.51% | 1.51% |
| instcombine.NumAggregateReconstructionsSimplified | 2139 | 108 | -2031 | -94.95% | 94.95% |
| instcombine.NumCombined | 3601364 | 3635448 | 34084 | 0.95% | 0.95% |
| instcombine.NumConstProp | 27153 | 27157 | 4 | 0.01% | 0.01% |
| instcombine.NumDeadInst | 1694521 | 1765022 | 70501 | 4.16% | 4.16% |
| instcombine.NumPHIsOfExtractValues | 0 | 37546 | 37546 | 0.00% | 0.00% |
| instcombine.NumSunkInst | 63158 | 63686 | 528 | 0.84% | 0.84% |
| instcount.NumBrInst | 874304 | 871857 | -2447 | -0.28% | 0.28% |
| instcount.NumCallInst | 1757657 | 1758402 | 745 | 0.04% | 0.04% |
| instcount.NumExtractValueInst | 45623 | 11483 | -34140 | -74.83% | 74.83% |
| instcount.NumInsertValueInst | 4983 | 580 | -4403 | -88.36% | 88.36% |
| instcount.NumInvokeInst | 61018 | 59478 | -1540 | -2.52% | 2.52% |
| instcount.NumLandingPadInst | 35334 | 34215 | -1119 | -3.17% | 3.17% |
| instcount.NumPHIInst | 344428 | 331116 | -13312 | -3.86% | 3.86% |
| instcount.NumRetInst | 100773 | 100772 | -1 | 0.00% | 0.00% |
| instcount.TotalBlocks | 1081154 | 1077166 | -3988 | -0.37% | 0.37% |
| instcount.TotalFuncs | 101443 | 101442 | -1 | 0.00% | 0.00% |
| instcount.TotalInsts | 8890201 | 8833747 | -56454 | -0.64% | 0.64% |
| instsimplify.NumSimplified | 75822 | 75707 | -115 | -0.15% | 0.15% |
| simplifycfg.NumHoistCommonCode | 24203 | 24197 | -6 | -0.02% | 0.02% |
| simplifycfg.NumHoistCommonInstrs | 48201 | 48195 | -6 | -0.01% | 0.01% |
| simplifycfg.NumInvokes | 2785 | 4298 | 1513 | 54.33% | 54.33% |
| simplifycfg.NumSimpl | 997332 | 1018189 | 20857 | 2.09% | 2.09% |
| simplifycfg.NumSinkCommonCode | 7088 | 6464 | -624 | -8.80% | 8.80% |
| simplifycfg.NumSinkCommonInstrs | 15117 | 14021 | -1096 | -7.25% | 7.25% |
```
... which tells us that this new fold fires whopping 38k times,
increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time),
decreasing total instruction count by -0.64% (-56454),
and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less)
and `extractvalue`'s (-74.83%, i.e. four times less).
This causes geomean -0.01% binary size decrease
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text
and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions
The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform
`InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold,
which is supposed to be dealing with such aggregate reconstructions,
fires a lot less now. There are two reasons why:
1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes.
We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun.
2. But then, EarlyCSE not only manages to fold such redundant PHI's,
it also sees that the extract-insert chain recreates the original aggregate,
and replaces it with the original aggregate.
The take-aways are
1. We maybe should do most trivial, same-BB PHI CSE in InstCombine
2. I need to check if what other patterns remain, and how they can be resolved.
(i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away)
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D86530
This happens when using -flto and -Wl,--plugin-opt=emit-llvm to create a linked LTO bitcode file, and the bitcode file has a strtab with size > 2^29.
All the issues relate to a pattern like this
size_t x64 = y64 + z32 * C
When z32 is >= (2^32)/C, z32 * C overflows.
Reviewed-by: MaskRay
Differential Revision: https://reviews.llvm.org/D86500
A Mach-O universal binary may contain bitcode as a slice.
This diff adds proper handling of such binaries to llvm-lipo.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D85740
Since we can only copy to GR32 we had to EXTRACT from GR32, but
we would first go to GR16 and then the truncate would extra again
to GR8. This adds a special case to go directly from GR32 to GR8.
This would eventually get cleaned up, but though maybe we should
avoid doing it in the first place. Our k-register handling is weird
and we could probably stand to have some more special ISD nodes
for the conversions so the i32 type would be explicit.
The IsExtractedElement already called getOperand(0) so Extract
here is the source vector. We shouldn't call getOperand(0). This
worked for the original test cases because the result was a
bitcast so the getOperand(0) accidently peeked through the bitcast
which is what we wanted.
In the failing case here, the operand turns out to be undef so
the getOperand(0) asserts because undef has no operands.
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=25184
Differential Revision: https://reviews.llvm.org/D86428
KMOVWkr produces VK16, there's no reason to copy it to VK16 again.
Test changes are presumably because we were scheduling based on
the COPY that is no longer there.
We only need the C++ type and the corresponding TF Enum. The other
parameter was used for the output spec json file, but we can just
standardize on the C++ type name there.
Differential Revision: https://reviews.llvm.org/D86549
There are two ways .llvmbc can be produced:
* clang -c -fembed-bitcode=all (which also produces .llvmcmd)
* LTO backend: ld.lld -mllvm -lto-embed-bitcode or -plugin-opt=-lto-embed-bitcode
.llvmbc and .llvmcmd have the SHF_ALLOC flag, so they can be dropped by
--gc-sections.
This patch sets SectionKind::Metadata to drop the SHF_ALLOC flag. This
is conceptually correct: the two sections are not part of the process
image, so SHF_ALLOC is not appropriate.
`test/LTO/X86/embed-bitcode.ll`: changed `llvm-objcopy -O binary --only-section` to
`llvm-objcopy --dump-section`. `-O binary` does not dump non-SHF_ALLOC sections.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D86374
This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands.
Instead of special-casing llvm.assume, I think it is also a viable option to
add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86477
We're not changing IR while running a single MemDep query, so it's
safe to cache alias analysis results using BatchAA. This adds BatchAA
usage to getSimplePointerDependencyFrom(), which is non-intrusive --
covering larger parts (like a whole processNonLocalLoad query) is
also possible, but requires threading BatchAA through a bunch of APIs.
For the ThinLTO configuration, this is a 1% geomean improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D85583
Dead function has its body stripped away, and can cause various
analyses to panic. Also it does not make sense to apply analyses on
such function.
Reviewed By: xazax.hun, MaskRay, wenlei, hoy
Differential Revision: https://reviews.llvm.org/D84715
Summary: The LCSSA pass (required for all loop passes) sometimes adds
additional blocks containing LCSSA variables, and checkLoopsStructure
may return false even when the loops are perfectly nested in this case.
This is because the successor of the exit block of the inner loop now
points to the LCSSA block instead of the latch block of the outer loop.
Examples are shown in the test nests-with-lcssa.ll.
To fix the issue, the successor of the exit block of the inner loop can
now point to a block in which all instructions are LCSSA phi node
(except the terminator), and the sole successor of that block should
point to the latch block of the outer loop.
Reviewed By: Whitney, etiotto
Differential Revision: https://reviews.llvm.org/D86133
The 1st attempt (rG557b890) was reverted because it caused miscompiles.
That bug is avoided here by changing the order of folds and as verified
in the new tests.
Original commit message:
InstCombine currently has odd rules for folding insert-extract chains to shuffles,
so we miss collapsing seemingly simple cases as shown in the tests here.
But poison makes this not quite as easy as we might have guessed. Alive2 tests to
show the subtle difference (similar to the regression tests):
https://alive2.llvm.org/ce/z/hp4hv3 (this is ok)
https://alive2.llvm.org/ce/z/ehEWaN (poison leakage)
SLP tends to create these patterns (as shown in the SLP tests), and this could
help with solving PR16739.
Differential Revision: https://reviews.llvm.org/D86460
This adapts the verifier checks for intrinsic get.active.lane.mask to the new
semantics of it as described in D86147. I.e., the second argument %n, which
corresponds to the loop tripcount, must be greater than 0 if it is a constant,
so check that.
Differential Revision: https://reviews.llvm.org/D86301
This patch makes the 'Attributes' field optional. We don't need to
explicitly specify the 'Attributes' field in the future.
Reviewed By: jhenderson, grimar
Differential Revision: https://reviews.llvm.org/D86537
This adapts legalization of intrinsic get.active.lane.mask to the new semantics
as described in D86147. Because the second argument is now the loop tripcount,
we legalize this intrinsic to an 'icmp ULT' instead of an ULE when it was the
backedge-taken count.
Differential Revision: https://reviews.llvm.org/D86302
This patch adds the -Xclang option
"-fexperimental-debug-variable-locations" and same LLVM CodeGen option,
to pick which variable location tracking solution to use.
Right now all the switch does is pick which LiveDebugValues
implementation to use, the normal VarLoc one or the instruction
referencing one in rGae6f78824031. Over time, the aim is to add fragments
of support in aid of the value-tracking RFC:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html
also controlled by this command line switch. That will slowly move
variable locations to be defined by an instruction calculating a value,
and a DBG_INSTR_REF instruction referring to that value. Thus, this is
going to grow into a "use the new kind of variable locations" switch,
rather than just "use the new LiveDebugValues implementation".
Differential Revision: https://reviews.llvm.org/D83048
Most notably, we were incorrectly reporting <3 x s16> as a legal type
for these. Make sure these aren't legal to help make progress on
fixing the artifact combiner and vector legalizer
rules. Unfortunately, this means spreading the -global-isel-abort=0
hack, although this doesn't change the legalizer result in any
situation.
This adapts tail-predication to the new semantics of get.active.lane.mask as
defined in D86147. This means that:
- we can remove the BTC + 1 overflow checks because now the loop tripcount is
passed in to the intrinsic,
- we can immediately use that value to setup a counter for the number of
elements processed by the loop and don't need to materialize BTC + 1.
Differential Revision: https://reviews.llvm.org/D86303
This adapts LV to the new semantics of get.active.lane.mask as discussed in
D86147, which means that the LV now emits intrinsic get.active.lane.mask with
the loop tripcount instead of the backedge-taken count as its second argument.
The motivation for this is described in D86147.
Differential Revision: https://reviews.llvm.org/D86304
The arm backend does not handle select/select_cc on vectors with scalar
conditions, preferring to expand them in codegenprepare instead. This
usually works except when optimizing for size, where the optsize check
would end up overruling the backend isSelectSupported check.
We could handle the selects in ISel too, but this seems like smaller
code than trying to splat the condition to all lanes.
Differential Revision: https://reviews.llvm.org/D86433
Without the fix gcc 7.4 warns with
../lib/Target/PowerPC/PPCAsmPrinter.cpp: In member function 'void {anonymous}::PPCAsmPrinter::EmitTlsCall(const llvm::MachineInstr*, llvm::MCSymbolRefExpr::VariantKind)':
../lib/Target/PowerPC/PPCAsmPrinter.cpp:525:53: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
MCInstBuilder(Subtarget->isPPC64() ? Opcode : PPC::BL_TLS)
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A)
transformation when -1 is expressed as an ISD::SPLAT_VECTOR.
Differential Revision: https://reviews.llvm.org/D86415
In getCastInstrCost when the instruction is a truncate we were relying
upon the implicit TypeSize -> uint64_t cast when asking if a given type
has the same size as a legal integer. I've changed the code to only
ask the question if the type is fixed length.
I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail
out for now if the type is a scalable vector.
I've added the following new tests:
Analysis/CostModel/AArch64/sve-trunc.ll
Transforms/InstCombine/AArch64/sve-trunc.ll
for both of these fixes.
Differential revision: https://reviews.llvm.org/D86432
Currently we repeatedly check the same uses for read clobbers in some
cases. We can avoid unnecessary checks by keeping track of the memory
accesses we already found read clobbers for. To do so, we just add
memory access causing read-clobbers to a set. Note that marking all
visited accesses as read-clobbers would be to pessimistic, as that might
include accesses not on any path to the actual read clobber.
If we do not find any read-clobbers, we can add all visited instructions
to another set and use that to skip the same accesses in the next call.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D75025
Explicitly check that there is a local def prior to the given
instruction in getReachingLocalMIDef instead of just relying on
a nullptr return from getInstFromId.
Support -march=sapphirerapids for x86.
Compare with Icelake Server, it includes 14 more new features. They are
amxtile, amxint8, amxbf16, avx512bf16, avx512vp2intersect, cldemote,
enqcmd, movdir64b, movdiri, ptwrite, serialize, shstk, tsxldtrk, waitpkg.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D86503
For the Windows GNU platform, CMAKE_FIND_LIBRARY_PREFIXES is a list
containing an empty string, which ended up in a regex capturing group,
which is invalid in CMake's regex engine. With this change, we get the
following:
set(CMAKE_FIND_LIBRARY_PREFIXES "lib" "")
set(CMAKE_FIND_LIBRARY_SUFFIXES ".dll.a" ".a" ".lib")
get_system_libname(path/to/libz.dll.a zlib)
message("${zlib}")
outputs z, as expected.
Patch By: haampie
Differential Revision: https://reviews.llvm.org/D86434
If we use training algorithms that don't need partial rewards, we don't
need to worry about an ir2native model. In that case, training logs
won't contain a 'delta_size' feature either (since that's the partial
reward).
Differential Revision: https://reviews.llvm.org/D86481
With FMF ( "nsz" and " reassoc") fold X/Sqrt(X) to Sqrt(X).
This is done after targets have the chance to produce a
reciprocal sqrt estimate sequence because that expansion
is probably more efficient than an expansion of a
non-reciprocal sqrt. That is also why we deferred doing
this transform in IR (D85709).
Differential Revision: https://reviews.llvm.org/D86403
PC-Relative addressing introduces a fair bit of complexity for correctly
eliminating TOC accesses. FastISel does not include any of that handling so we
miscompile code with -mcpu=pwr10 -O0 if it includes an external call that
FastISel does not handle followed by any of the following:
Floating point constant materialization
Materialization of a GlobalValue
Call that FastISel does handle
This patch switches to SDISel for any of the above.
Differential revision: https://reviews.llvm.org/D86343
This is preparation for making clang default to -mtune=generic when no -march is specified. This will allow the default tuning to be "generic" even though our default march is "pentium4" or "x86-64".
To avoid llc lit test regressions, if no mcpu is specified, I've defaulted tune to use i586 to match the old tuning settings of no CPU. Some tests explicitly used -mcpu=generic which I've removed so they instead get this default of architecture features from generic and tune from i586.
I updated one llvm-mca test to check a different CPU since generic has a scheduler model now
Differential Revision: https://reviews.llvm.org/D86312
InstCombine currently has odd rules for folding insert-extract chains to shuffles,
so we miss collapsing seemingly simple cases as shown in the tests here.
But poison makes this not quite as easy as we might have guessed. Alive2 tests to
show the subtle difference (similar to the regression tests):
https://alive2.llvm.org/ce/z/hp4hv3 (this is ok)
https://alive2.llvm.org/ce/z/ehEWaN (poison leakage)
SLP tends to create these patterns (as shown in the SLP tests), and this could
help with solving PR16739.
Differential Revision: https://reviews.llvm.org/D86460
The "takeName" logic at the end of ScalarizerVisitor::finish
could end up renaming global variables when having simplified
and extractelement instruction to simply pick a single vector
element. If the input vector to the extractelement instruction
held pointers to global variables we ended up renaming the global
variable.
The patch make sure we only take the name of the replaced Op when
we have added new instructions that might need a useful name.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D86472
Current custom lowering of truncate vector handles a source of up to 128 bits, but that only uses one of the two shuffle vector operands. Extend it to use both operands to handle 256 bit sources.
Differential Revision: https://reviews.llvm.org/D68035
This reverts commit 2e43acfed8.
LLVMCoroutines (the library which contains Coroutines.h) depends on LLVMipo (the
library which contains SampleProfile.cpp). It is inappropriate for
SampleProfile.cpp to depent on Coroutines.h (circular dependency).
The test inverted dependencies as well:
llvm/test/Transforms/Coroutines/coro-inline.ll uses -sample-profile.
This interferes with GlobalISel's much better handling of the
situation.
This should really be disable for GlobalISel. However, the fallback
only re-runs the selection passes, and doesn't go back and rerun any
codegen IR passes. I haven't come up with a good solution to this
problem.
D77152 tried to do this but got it wrong in the shift-by-zero case.
D86430 reverted the wrong code. Reimplement the optimization with
different code depending on whether the shift amount is known to be
non-zero (modulo bitwidth).
This improves code quality for fshl tests on AMDGPU, which only has an
fshr instruction.
Differential Revision: https://reviews.llvm.org/D86438
Using callCapturesBefore potentially improves the precision and the
number of stores we can remove. But in practice, it seems to have very
little impact in terms of stores removed. For example, for
SPEC2000/SPEC2006/MultiSource with -O3 -flto, ~50 more stores are
removed (out of ~26900 stores removed). But in terms of compile-time, it
is very expensive and the patch gives substantial compile-time
improvements: Geomean O3 -0.24%, ReleaseThinLTO -0.47%, ReleaseLTO-g
-0.39%.
http://llvm-compile-time-tracker.com/compare.php?from=612a0bff88ed906c83b82f079d4c49e5fecfb9d0&to=e6c86b96d20d97dd88e903a409bd8d39b6114312&stat=instructions
This commit makes FileCheck print all CHECK-NOT directive failure in a
CHECK-NOT block even if one fails. Prior to that, it would stop trying
to match CHECK-NOT directive as soon as one in the block fails.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D86315
This patch adds frontend and backend options to enable and disable
the PowerPC MMA operations added in ISA 3.1. Instructions using these
options will be added in subsequent patches.
Differential Revision: https://reviews.llvm.org/D81442
summary:
When callee coroutine function is inlined into caller coroutine
function before coro-split pass, llvm will emits "coroutine should
have exactly one defining @llvm.coro.begin". It seems that coro-early
pass can not handle this quiet well.
So we believe that unsplited coroutine function should not be inlined.
This patch fix such issue by not inlining function if it has attribute
"coroutine.presplit" (it means the function has not been splited) to
fix this issue
TestPlan: check-llvm
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D85812
Changes:
* Change `ToVectorTy` to deal directly with `ElementCount` instances.
* `VF == 1` replaced with `VF.isScalar()`.
* `VF > 1` and `VF >=2` replaced with `VF.isVector()`.
* `VF <=1` is replaced with `VF.isZero() || VF.isScalar()`.
* Replaced the uses of `llvm::SmallSet<ElementCount, ...>` with
`llvm::SmallSetVector<ElementCount, ...>`. This avoids the need of an
ordering function for the `ElementCount` class.
* Bits and pieces around printing the `ElementCount` to string streams.
To guarantee that this change is a NFC, `VF.Min` and asserts are used
in the following places:
1. When it doesn't make sense to deal with the scalable property, for
example:
a. When computing unrolling factors.
b. When shuffle masks are built for fixed width vector types
In this cases, an
assert(!VF.Scalable && "<mgs>") has been added to make sure we don't
enter coepaths that don't make sense for scalable vectors.
2. When there is a conscious decision to use `FixedVectorType`. These
uses of `FixedVectorType` will likely be removed in favour of
`VectorType` once the vectorizer is generic enough to deal with both
fixed vector types and scalable vector types.
3. When dealing with building constants out of the value of VF, for
example when computing the vectorization `step`, or building vectors
of indices. These operation _make sense_ for scalable vectors too,
but changing the code in these places to be generic and make it work
for scalable vectors is to be submitted in a separate patch, as it is
a functional change.
4. When building the potential VFs in VPlan. Making the VPlan generic
enough to handle scalable vectorization factors is a functional change
that needs a separate patch. See for example `void
LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned
MaxVF)`.
5. The class `IntrinsicCostAttribute`: this class still uses `unsigned
VF` as updating the field to use `ElementCount` woudl require changes
that could result in changing the behavior of the compiler. Will be done
in a separate patch.
7. When dealing with user input for forcing the vectorization
factor. In this case, adding support for scalable vectorization is a
functional change that migh require changes at command line.
Note that in some places the idiom
```
unsigned VF = ...
auto VTy = FixedVectorType::get(ScalarTy, VF)
```
has been replaced with
```
ElementCount VF = ...
assert(!VF.Scalable && ...);
auto VTy = VectorType::get(ScalarTy, VF)
```
The assertion guarantees that the new code is (at least in debug mode)
functionally equivalent to the old version. Notice that this change had been
possible because none of the methods that are specific to `FixedVectorType`
were used after the instantiation of `VTy`.
Reviewed By: rengolin, ctetreau
Differential Revision: https://reviews.llvm.org/D85794
Handle workitem intrinsics. There isn't really away to adequately test
this right now, since none of the known bits users are fine grained
enough to test the edge conditions. This triggers a number of
instances of the new 64-bit to 32-bit shift combine in the existing
tests.
shl ([sza]ext x, y) => zext (shl x, y).
Turns expensive 64 bit shifts into 32 bit if it does not overflow the
source type:
This is a port of an AMDGPU DAG combine added in
5fa289f0d8. InstCombine does this
already, but we need to do it again here to apply it to shifts
introduced for lowered getelementptrs. This will help matching
addressing modes that use 32-bit offsets in a future patch.
TableGen annoyingly assumes only a single match data operand, so
introduce a reusable struct. However, this still requires defining a
separate GIMatchData for every combine which is still annoying.
Adds a morally equivalent function to the existing
getShiftAmountTy. Without this, we would have to do try to repeatedly
query the legalizer info and guess at what type to use for the shift.
Changes:
* Change `ToVectorTy` to deal directly with `ElementCount` instances.
* `VF == 1` replaced with `VF.isScalar()`.
* `VF > 1` and `VF >=2` replaced with `VF.isVector()`.
* `VF <=1` is replaced with `VF.isZero() || VF.isScalar()`.
* Add `<` operator to `ElementCount` to be able to use
`llvm::SmallSetVector<ElementCount, ...>`.
* Bits and pieces around printing the ElementCount to string streams.
* Added a static method to `ElementCount` to represent a scalar.
To guarantee that this change is a NFC, `VF.Min` and asserts are used
in the following places:
1. When it doesn't make sense to deal with the scalable property, for
example:
a. When computing unrolling factors.
b. When shuffle masks are built for fixed width vector types
In this cases, an
assert(!VF.Scalable && "<mgs>") has been added to make sure we don't
enter coepaths that don't make sense for scalable vectors.
2. When there is a conscious decision to use `FixedVectorType`. These
uses of `FixedVectorType` will likely be removed in favour of
`VectorType` once the vectorizer is generic enough to deal with both
fixed vector types and scalable vector types.
3. When dealing with building constants out of the value of VF, for
example when computing the vectorization `step`, or building vectors
of indices. These operation _make sense_ for scalable vectors too,
but changing the code in these places to be generic and make it work
for scalable vectors is to be submitted in a separate patch, as it is
a functional change.
4. When building the potential VFs in VPlan. Making the VPlan generic
enough to handle scalable vectorization factors is a functional change
that needs a separate patch. See for example `void
LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned
MaxVF)`.
5. The class `IntrinsicCostAttribute`: this class still uses `unsigned
VF` as updating the field to use `ElementCount` woudl require changes
that could result in changing the behavior of the compiler. Will be done
in a separate patch.
7. When dealing with user input for forcing the vectorization
factor. In this case, adding support for scalable vectorization is a
functional change that migh require changes at command line.
Differential Revision: https://reviews.llvm.org/D85794
Avoid computing InvisibleToCallerBefore/AfterRet up front. In most
cases, this information is not really needed. Instead, introduce helper
functions to compute and cache the result on demand.
Notably, this also does not use PointerMayBeCapturedBefore for
isInvisibleToCallerBeforeRet, as it requires the killing MemoryDef as
starting instruction, making the caching ineffective. But it appears the
use of PointerMayBeCapturedBefore has very limited benefits in practice
(e.g. on SPEC2000/SPEC2006/MultiSource there are no binary changes with
-O3 -flto). Refrain from using it for now, to limit-compile-time.
This gives some nice compile-time improvements:
http://llvm-compile-time-tracker.com/compare.php?from=db9345f6810f379a36752dc52caf5230585d0ebd&to=b4d091047e1b8a3d377d200137b79d03aca65663&stat=instructions
If gather/scatters are enabled, ARMTargetTransformInfo now allows
tail predication for loops with a much wider range of strides, up
to anything that is loop invariant.
Differential Revision: https://reviews.llvm.org/D85410
Limit elimination of stores at the end of a function to MemoryDefs with
a single underlying object, to save compile time.
In practice, the case with multiple underlying objects seems not very
important in practice. For -O3 -flto on MultiSource/SPEC2000/SPEC2006
this results in a total of 2 more stores being eliminated.
We can always re-visit that in the future.
Similar to the existing transform - peek through a select
to match a value and its negation.
https://alive2.llvm.org/ce/z/MXi5KG
define i8 @src(i1 %b, i8 %x) {
%0:
%neg = sub i8 0, %x
%sel = select i1 %b, i8 %x, i8 %neg
%abs = abs i8 %sel, 1
ret i8 %abs
}
=>
define i8 @tgt(i1 %b, i8 %x) {
%0:
%abs = abs i8 %x, 1
ret i8 %abs
}
Transformation seems to be correct!
This is a fixup of commit 0819a6416f (D77152) which could
result in miscompiles. The miscompile could only happen for targets
where isOperationLegalOrCustom could return different values for
FSHL and FSHR.
The commit mentioned above added logic in expandFunnelShift to
convert between FSHL and FSHR by swapping direction of the
funnel shift. However, that transform is only legal if we know
that the shift count (modulo bitwidth) isn't zero.
Basically, since fshr(-1,0,0)==0 and fshl(-1,0,0)==-1 then doing a
rewrite such as fshr(X,Y,Z) => fshl(X,Y,0-Z) would be incorrect if
Z modulo bitwidth, could be zero.
```
$ ./alive-tv /tmp/test.ll
----------------------------------------
define i32 @src(i32 %x, i32 %y, i32 %z) {
%0:
%t0 = fshl i32 %x, i32 %y, i32 %z
ret i32 %t0
}
=>
define i32 @tgt(i32 %x, i32 %y, i32 %z) {
%0:
%t0 = sub i32 32, %z
%t1 = fshr i32 %x, i32 %y, i32 %t0
ret i32 %t1
}
Transformation doesn't verify!
ERROR: Value mismatch
Example:
i32 %x = #x00000000 (0)
i32 %y = #x00000400 (1024)
i32 %z = #x00000000 (0)
Source:
i32 %t0 = #x00000000 (0)
Target:
i32 %t0 = #x00000020 (32)
i32 %t1 = #x00000400 (1024)
Source value: #x00000000 (0)
Target value: #x00000400 (1024)
```
It could be possible to add back the transform, given that logic
is added to check that (Z % BW) can't be zero. Since there were
no test cases proving that such a transform actually would be useful
I decided to simply remove the faulty code in this patch.
Reviewed By: foad, lebedev.ri
Differential Revision: https://reviews.llvm.org/D86430
D70867 introduced support for expanding most ppc_fp128 operations. But
sitofp/uitofp is missing. This patch adds that after D81669.
Reviewed By: uweigand
Differntial Revision: https://reviews.llvm.org/D81918
We may meet Invalid CTR loop crash when there's constrained ops inside.
This patch adds constrained FP intrinsics to the list so that CTR loop
verification doesn't complain about it.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D81924
This patch makes these operations legal, and add necessary codegen
patterns.
There's still some issue similar to D77033 for conversion from v1i128
type. But normal type tests synced in vector-constrained-fp-intrinsic
are passed successfully.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D83654
The following program miscompiles because rL216012 added static
relocation model support but not for PIC.
```
// clang -fpic -mcmodel=large -O0 a.cc
double foo() { return 42.0; }
```
This patch adds PIC support.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D86024
The pattern matching does not account for truncating stores,
so it is unlikely to work at later stages. So we are likely
wasting compile-time with no hope of improvement by running
this later.
This should be NFC in terms of output because the endian
check further down would bail out too, but we are wasting
time by waiting to that point to give up. If we generalize
that function to deal with more than i8 types, we should
not have to deal with the degenerate case.
This patch imports the instruction-referencing implementation of
LiveDebugValues proposed here:
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142368.html
The new implementation is unreachable in this patch, it's the next patch
that enables it behind a command line switch. Briefly, rather than
tracking variable locations by just their location as the 'VarLoc'
implementation does, this implementation does it by value:
* Each value defined in a function is numbered, and propagated through
dataflow,
* Each DBG_VALUE reads a machine value number from a machine location,
* Variable _values_ are propagated through dataflow,
* Variable values are translated back into locations, DBG_VALUEs
inserted to specify where those locations are.
The ultimate aim of this is to enable referring to variable values
throughout post-isel code, rather than locations. Those patches will
build on top of this new LiveDebugValues implementation in later patches
-- it can't be done with the VarLoc implementation as we don't have
value information, only locations.
Differential Revision: https://reviews.llvm.org/D83047
This patch renames the current LiveDebugValues class to "VarLocBasedLDV"
and removes the pass-registration code from it. It creates a separate
LiveDebugValues class that deals with pass registration and management,
that calls through to VarLocBasedLDV::ExtendRanges when
runOnMachineFunction is called. This is done through the "LDVImpl"
abstract class, so that a future patch can install the new
instruction-referencing LiveDebugValues implementation and have it
picked at runtime.
No functional change is intended, just shuffling responsibilities.
Differential Revision: https://reviews.llvm.org/D83046
This reverses the existing transform that would uniformly canonicalize any 'xor' after any shift. In the case of logical shifts, that turns a 'not' into an arbitrary 'xor' with constant, and that's probably not as good for analysis, SCEV, or codegen.
The SCEV motivating case is discussed in:
http://bugs.llvm.org/PR47136
There's an analysis motivating case at:
http://bugs.llvm.org/PR38781
I did draft a patch that would do the same for 'ashr' but that's questionable because it's just swapping the position of a 'not' and uncovers at least 2 missing folds that we would probably need to deal with as preliminary steps.
Alive proofs:
https://rise4fun.com/Alive/BBV
Name: shift right of 'not'
Pre: C2 == (-1 u>> C1)
%a = lshr i8 %x, C1
%r = xor i8 %a, C2
=>
%n = xor i8 %x, -1
%r = lshr i8 %n, C1
Name: shift left of 'not'
Pre: C2 == (-1 << C1)
%a = shl i8 %x, C1
%r = xor i8 %a, C2
=>
%n = xor i8 %x, -1
%r = shl i8 %n, C1
Name: ashr of 'not'
%a = ashr i8 %x, C1
%r = xor i8 %a, -1
=>
%n = xor i8 %x, -1
%r = ashr i8 %n, C1
Differential Revision: https://reviews.llvm.org/D86243
This is a pure file move of LiveDebugValues.cpp ahead of the pass being
refactored, with an experimental new implementation to follow.
The motivation for these changes can be found here:
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142368.html
And the other related changes can be found in the phabricator stack for
this revision:
Differential Revision: https://reviews.llvm.org/D83304
This patch adds support for representing Fortran `character(n)`.
Primarily patch is based out of D54114 with appropriate modifications.
Test case IR is generated using our downstream classic-flang. We're in process
of upstreaming flang PR's but classic-flang has dependencies on llvm, so
this has to get in first.
Patch includes functional test case for both IR and corresponding
dwarf, furthermore it has been manually tested as well using GDB.
Source snippet:
```
program assumedLength
call sub('Hello')
call sub('Goodbye')
contains
subroutine sub(string)
implicit none
character(len=*), intent(in) :: string
print *, string
end subroutine sub
end program assumedLength
```
GDB:
```
(gdb) ptype string
type = character (5)
(gdb) p string
$1 = 'Hello'
```
Reviewed By: aprantl, schweitz
Differential Revision: https://reviews.llvm.org/D86305
Extend the `applyUpdates` in DominatorTree to allow a post CFG view,
different from the current CFG.
This patch implements the functionality of updating an already up to
date DT, to the desired PostCFGView.
Combining a set of updates towards an up to date DT and a PostCFGView is
not yet supported.
Differential Revision: https://reviews.llvm.org/D85472
The TableGen range piece punctuator is currently '-' (e.g., {0-9}),
which interacts oddly with the fact that an integer literal's sign
is part of the literal. This patch replaces the '-' with the new
punctuator '...'. The '-' punctuator is deprecated.
Differential Revision: https://reviews.llvm.org/D85585
Change-Id: I3d53d14e23f878b142d8f84590dd465a0fb6c09c
As disscussed in post-commit review starting with
https://reviews.llvm.org/D84108#2227365
while this appears to be mostly a win overall, especially code-size-wise,
this appears to shake //certain// code pattens in a way that is extremely
unfavorable for performance (+30% runtime regression)
on certain CPU's (i personally can't reproduce).
So until the behaviour is better understood, and a path forward is mapped,
let's back this out for now.
This reverts commit 1d51dc38d8.
TGParser::ParseValue contains two recursive calls, one to parse the RHS of a list paste operator and one to parse the RHS of a paste operator in a class/def name. Both of these calls neglect to check the return value to see if it is null (because of some error). This causes a crash in the next line of code, which uses the return value. The code now checks for null returns.
Differential Revision: https://reviews.llvm.org/D85852
The legacy PM alias analysis pipeline by default includes basic-aa.
When running `opt -foo-pass` under the NPM and -disable-basic-aa is not
specified, use basic-aa.
This decreases the number of check-llvm failures under NPM from 913 to 752.
Reviewed By: ychen, asbirlea
Differential Revision: https://reviews.llvm.org/D86167
The register class is required for inserting PHIs, but the "current
virtual register" isn't actually used for anything, so let's remove it
while we're at it.
Differential Revision: https://reviews.llvm.org/D85602
Change-Id: I1e647f31570ef21a7ea8e20db3454178e98a6a8b
- Adds a command line option to seed only selected functions.
- Makes seed allow listing exclusive to assertions enabled builds.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D86129
Currently, `AANoUndefImpl::initialize` mistakenly always indicates optimistic fixpoint for function returned position.
This is because an associated value is `Function` in the case, and `isGuaranteedNotToBeUndefOrPoison` returns true for Function.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86361
When trying to enable -debug-info-kind=constructor there was an assert
that occurs during debug info cloning ("mismatched subprogram between
llvm.dbg.value variable and !dbg attachment").
It appears that during llvm::CloneFunctionInto, a DISubprogram could be
duplicated when MapMetadata is called, and then added to the MD map again
when DIFinder gets a list of subprograms. This results in two different
versions of the DISubprogram.
This patch switches the order so that the DIFinder subprograms are
added before MapMetadata is called.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46784
Differential Revision: https://reviews.llvm.org/D86185
This is the slowest operation in the already slow pass.
Instead of sorting just put a stall list into an ordered
map.
Differential Revision: https://reviews.llvm.org/D86253
If some of gc live value are not used in gc.relocate we can remove them
from gc-live bundle of statepoint instruction.
Also the CL removes duplicated Values in gc-live bundle.
Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D85959
See http://lists.llvm.org/pipermail/llvm-dev/2017-June/113975.html for a related previous discussion.
Many tools install signal handlers to print stack traces and optionally
symbolize the addresses with an external program 'llvm-symbolizer' (when
searching for 'llvm-symbolizer', the directory containg the executable
is preferred over PATH).
'llvm-symbolizer' can be slow if the executable is large and/or if
llvm-symbolizer' itself is under-optimized. For example, my 'llvm-lto2' from a
-DCMAKE_BUILD_TYPE=Debug build is 443MiB. The 'llvm-symbolizer' from the same
build takes ~2s to symbolize it. (An optimized 'llvm-symbolizer' takes 0.34s).
A crashed clang may take more than 5s to symbolize a stack trace.
If a test file has several `not --crash` RUN lines. It can be very slow in a Debug build.
This patch makes `not --crash` set an environment variable to suppress symbolization.
This is similar to D33804 which uses a command line option.
I pick 'symbolization' instead of 'symbolication' because the former is
used much more commonly and its stem matches 'llvm-symbolizer'.
Also set LLVM_DISABLE_CRASH_REPORT=1, which is currently only applicable on
`__APPLE__`.
Reviewed By: dblaikie, aganea
Differential Revision: https://reviews.llvm.org/D86170
This patch adds support for constrained scalar int to fp operations on
PowerPC. Besides, this also fixes the FP exception bit of FCFID*
instructions.
Reviewed By: steven.zhang, uweigand
Differential Revision: https://reviews.llvm.org/D81669
The only def for gc.relocate is a gc.statepoint. But real dependency is not
described by def-use chain. Instead this dependency is encoded by indecies
of operands in gc-live bundle of statepoint as integer constants in gc.relocate.
InstCombine operates by def-use chain. As a result when value in gc-live bundle
is simplified the gc.statepoint itself is not simplified but it might simplify dependent
gc.relocates. To trigger the optimization of gc.relocate we now unconditionally trigger
check of all dependent gc.relocates by adding them to worklist.
This CL handles of gc.relocates as process of gc.statepoint optimization considering
gc.statepoint and related gc.relocate as whole entity.
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D85954
Currently ConstantExpr::getWithOperands does not handle FNeg and
subsequently treats FNeg as binary operator, leading to an assertion
failure or segmentation fault if built without assertions.
Originally I reproduced this with llvm-dis on a bitcode file, which I
unfortunately cannot share and also cannot really reduce.
But PR45426 describes the same issue and has a reproducer with Clang, so
I'll go with that.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D86274
This patch is the initial support for the Intial Exec Thread Local
Local Storage model to produce code sequence and relocations correct
to the ABI for the model when using PC relative memory operations.
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D81947
Recommit the patch after fixing an issue reported caused by the fact
that re-used values are also added to InsertedValues.
Additional tests have been added in 88818491b9
This reverts the revert commit 38884641f2.
The original commit (7ff0ace96db9164dcde232c36cab6519ea4fce8) was causing
build failure and was reverted in 6d242a7326
==================== Original Commit Message ====================
This patch adds support for referencing different abbrev tables. We use
'ID' to distinguish abbrev tables and use 'AbbrevTableID' to explicitly
assign an abbrev table to compilation units.
The syntax is:
```
debug_abbrev:
- ID: 0
Table:
...
- ID: 1
Table:
...
debug_info:
- ...
AbbrevTableID: 1 ## Reference the second abbrev table.
- ...
AbbrevTableID: 0 ## Reference the first abbrev table.
```
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D83116
PseudoBRIND had seemingly inherited incorrect annotations denoting it as
a call instruction and that it defines X1/ra. This caused excess
save/restore code to be emitted for ra.
Differential Revision: https://reviews.llvm.org/D86286
Do not break down local loads and stores so ds_read/write_b96/b128 in
ISelLowering can be selected on subtargets that support them and if align
requirements allow them.
Differential Revision: https://reviews.llvm.org/D84403
Fix local ds_read/write_b96/b128 so they can be selected if the alignment
allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break
them down.
Differential Revision: https://reviews.llvm.org/D81638
Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine
whether hardware supports such access.
UnalignedAccessMode should be used to enable them.
hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be
now used to quickly check both.
Differential Revision: https://reviews.llvm.org/D84522
Adjust alignment requirements for ds_read/write_b96/b128.
GFX9 and onwards allow misaligned access for reads and writes but only if
SH_MEM_CONFIG.alignment_mode allows it.
UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we
can relax alignment requirements.
UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions
but only from GFX9 onward and is supposed to match alignment_mode. By default
alignment of 4 is required.
Differential Revision: https://reviews.llvm.org/D82788
In SelectionDAGBuilder always translate the fshl and fshr intrinsics to
FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and
ORs. Improve the legalization of FSHL and FSHR to avoid code quality
regressions.
Differential Revision: https://reviews.llvm.org/D77152
Both AfterPass and AfterPassInvalidated pass instrumentation
callbacks get additional parameter of type PreservedAnalyses.
This patch was created by @fedor.sergeev. I have just slightly
changed it.
Reviewers: fedor.sergeev
Differential Revision: https://reviews.llvm.org/D81555
Before we speculatively execute a basic block, query the cost of
inserting the necessary select instructions against the phi folding
threshold. For non-trivial insertions, a more accurate decision can
probably be made during machine if-conversion. With minsize we query
the CodeSize cost, otherwise we use SizeAndLatency.
Differential Revision: https://reviews.llvm.org/D82438
Modify the ARM getCmpSelInstrCost implementation for the code size
costs of selects. Now consider the legalization cost and increase
the cost of i1 because those values wouldn't live in a general purpose
register. We also make selects +1 more expensive to account for the IT
instruction.
Differential Revision: https://reviews.llvm.org/D82091
As part of D84741, this adds a target hook for the
preferPredicatedReductionSelect option and makes use
of it under MVE, allowing us to tail predicate most
reduction loops.
Differential Revision: https://reviews.llvm.org/D85980
In e99dee82b0, the "out_of_memory_new_handler" was changed to be
explicitly initialized instead of relying on a global static
constructor.
However before this change, install_out_of_memory_new_handler could be
called multiple times while it asserts right now.
We can be more tolerant to calling multiple time InitLLVM without
reintroducing a global constructor for this handler.
Differential Revision: https://reviews.llvm.org/D86330
This patch adds support for referencing different abbrev tables. We use
'ID' to distinguish abbrev tables and use 'AbbrevTableID' to explicitly
assign an abbrev table to compilation units.
The syntax is:
```
debug_abbrev:
- ID: 0
Table:
...
- ID: 1
Table:
...
debug_info:
- ...
AbbrevTableID: 1 ## Reference the second abbrev table.
- ...
AbbrevTableID: 0 ## Reference the first abbrev table.
```
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D83116
This patch adds support for emitting multiple abbrev tables. Currently,
compilation units will always reference the first abbrev table.
Reviewed By: jhenderson, labath
Differential Revision: https://reviews.llvm.org/D86194
Summary:
- HIP uses an unsized extern array `extern __shared__ T s[]` to declare
the dynamic shared memory, which size is not known at the
compile time.
Reviewers: arsenm, yaxunl, kpyzhov, b-sumner
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82496
Known bits for G_ANYEXT was incorrectly using KnownBits::zext, causing
us to treat the high bits as zero even though they're (by definition)
unknown.
Differential Revision: https://reviews.llvm.org/D86323
Assuming this is used to split a memory access into smaller pieces,
the new access should still have the same aliasing properties as the
original memory access. As far as I can tell, this wasn't
intentionally dropped. It may be necessary to drop this if you are
moving the operand outside of the bounds of the original object in
such a way that it may alias another IR object, but I don't think any
of the existing users are doing this. Some of the uses widen into
unused alignment padding, which I think is OK.
Custom lower and widen odd sized loads up to the alignment. The
default set of legalization actions doesn't have a way to represent
this. This fixes naturally aligned <3 x s8> and <3 x s16> loads.
This also starts moving towards eliminating the buggy and
overcomplicated legalization rules for narrowing. All the memory size
changes should be done in the lower or custom action, not NarrowScalar
/ FewerElements. These currently have redundant and ambiguous code
with the lower action.
The SGPR spills happen in SILowerSGPRSpills() and allSGPRSpillsAreDead()
make sure there are no SGPR spills pending during PEI. But the FP/BP
spills happen during PEI and are exceptions.
Use actual frame indices of FP/BP in allSGPRSpillsAreDead() to
accommodate the exceptions.
Differential Revision: https://reviews.llvm.org/D86291
This patch is the initial support for the General Dynamic Thread Local
Local Storage model to produce code sequence and relocations correct
to the ABI for the model when using PC relative memory operations.
Patch by: NeHuang
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D82315
There are no nxv16i8/nxv8i16 SDIV instructions, so these fixed width operations must be promoted to nxv4i32.
Differential Revision: https://reviews.llvm.org/D86114
This ensures that we never encode an instruction which is unavailable,
such as if we explicitly insert a forbidden instruction when lowering.
This is particularly important on RISC-V given its high degree of
modularity, and will become increasingly important as new standard
extensions appear.
Reviewed By: asb, lenary
Differential Revision: https://reviews.llvm.org/D85015
This exposes the module optimization pipeline as a pass that can be
applied stand-alone when using 'opt'. This helps ml inliner training
scenarios, where we start with IR captured right before inlining,
perform the inlining (-scc-oz-module-inliner) and then want to continue
and observe the final IR (where this patch comes into play). We can then
apply llc on the resulting IR to continue compilation down to native.
Differential Revision: https://reviews.llvm.org/D86224
The normal scheme for tail folding reductions is to use:
loop:
p = phi(0, a)
mask = ...
x = masked_load(..., mask)
a = add(x, p)
s = select(mask, a, p)
This means we need to keep the register p and a alive out of the loop, plus
the mask. On a target with predicated operations we can instead generate
the phi as p = phi(0, s). This ensures the select in the loop and we can
fold select(m, add(a, b), c) to something like a vaddt c, a, b using the
m predicate. This in turn allows us to tail predicate the entire loop.
Differential Revision: https://reviews.llvm.org/D84741
The getSrcFromCopy helper nowadays return a MachineOperand pointer,
so talking about zero_reg was incorrect as it nowadays return
a nullptr when not finding a copy like instruction.
Currently, although we handle `CallBase` case in updateImpl, we give up in initialize in the case.
That is problematic when we propagate a range from call site returned position to floating position.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86196
For scalable vector shifts the prediacte is typically all active,
which gets selected to an unpredicated shift by immediate. When
code generating for fixed length vectors the predicate is based
on the vector length and so additional patterns are required to
make use of SVE's predicated shift by immediate instructions.
Differential Revision: https://reviews.llvm.org/D86204
When removing a non-constant store to a global in
CleanupPointerRootUsers(), the GlobalOpt pass could incorrectly return
false.
This was caught using the check introduced by D80916.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86149
Relanded since the buildbot issue was unrelated to this commit.
When hoisting simple values out from a loop, and an optsize attribute, a
convergent call, or an invoke instruction hindered the pass from
unswitching the loop, the pass would return an incorrect Modified
status.
This was caught using the check introduced by D80916.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D86085
The byte swapping, when dealing with 4 byte (float) FP constants
in DwarfExpression::addConstantFP, added in commit ef8992b9f0
was not correct. It always performed byte swapping using an
uint64_t value. When dealing with 4 byte values the 4 interesting
bytes ended up in the big end of the uint64_t, but later we emitted
the 4 bytes at the little end. So we ended up with zeroes being
emitted and faulty debug information.
This patch simplifies things a bit, IMHO. Using the APInt
representation throughout the function, instead of looking at
the internal representation using getRawBytes and without using
reinterpret_cast etc. And using API.byteSwap() should result in
correct byte swapping independent of APInt being 4 or 8 bytes.
Differential Revision: https://reviews.llvm.org/D86272
This reverts commit dfd447c220.
After I pushed this commit, llvm-sphinx-docs started failing, due to:
Warning, treated as error:
extension 'recommonmark' has no setup() function;
is it really a Sphinx extension module?
I don't see how this commit may have caused that, but I'm still
reverting it since I don't know how to proceed with that
troubleshooting.
When sampling from images with coordinates that only have 16 bit
accuracy, convert the image intrinsic call to use a16 or g16.
This does only happen if the target hardware supports it.
An alternative would be to always apply this combination, independent of
the target hardware and extend 16 bit arguments to 32 bit arguments
during legalization. To me, this sounds like an unnecessary roundtrip
that could prevent some further InstCombine optimizations.
Differential Revision: https://reviews.llvm.org/D85887
The check for the landingpad instructions was overly restrictive. In optimimized builds PHI nodes can appear
before the landingpad instructions, resulting in a fallback to SelectionDAG.
This change relaxes the check to allow PHI nodes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86141
Currently we have to set 'Machine' to something in our
YAML descriptions. Usually we use 'EM_X86_64' for 64-bit targets
and 'EM_386' for 32-bit targets. At the same time, in fact, in most
cases our tests do not need a machine type and we can use
'EM_NONE'.
This is cleaner, because avoids the need of using a particular machine.
In this patch I've made the 'Machine' key optional (the default value,
when it is not specified is `EM_NONE`) and removed it (where possible)
from yaml2obj, obj2yaml and llvm-readobj tests.
There are few tests left where I decided not to remove it, because
I didn't want to touch CHECK lines or doing anything more complex
than a removing a "Machine: *" line and formatting lines around.
Differential revision: https://reviews.llvm.org/D86202
This patch moves FixedPointSemantics and APFixedPoint
from Clang to LLVM ADT.
This will make it easier to use the fixed-point
classes in LLVM for constructing an IR builder for
fixed-point and for reusing the APFixedPoint class
for constant evaluation purposes.
RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-August/144025.html
Reviewed By: leonardchan, rjmccall
Differential Revision: https://reviews.llvm.org/D85312
The `UnrollMaxBlockToAnalyze` parameter is used at the stage when we have no
information about a loop body BB cost. In some cases, e.g. for simple loop
```
for(int i=0; i<32; ++i){
D = Arr2[i*8 + C1];
Arr1[i*64 + C2] += C3 * D;
Arr1[i*64 + C2 + 2048] += C4 * D;
}
```
current default parameter value is not enough to run deeper cost analyze so the
loop is not completely unrolled.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D86248
Use the stack to save and restore the link register when there is no
available register to do it.
Differential Revision: https://reviews.llvm.org/D76069
When hoisting simple values out from a loop, and an optsize attribute, a
convergent call, or an invoke instruction hindered the pass from
unswitching the loop, the pass would return an incorrect Modified
status.
This was caught using the check introduced by D80916.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D86085
Comparison against null is a common pattern that usually is followed by
error handling code and the likes. We now use AANonNull to simplify
these comparisons optimistically in order to make more code dead early
on.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D86145
`AADereferenceable::getAssumedDereferenceableBytes()` is actually
deducing `dereferenceable_or_null`. We should not use that information
to deduce `nonnull`, since it doesn't imply `nonnull`.
This patch adds support for constrained scalar fp to int operations on
PowerPC. Besides, this fixes the FP exception bit of quad-precision
convert & truncate instructions.
Reviewed By: steven.zhang, uweigand
Differential Revision: https://reviews.llvm.org/D81537
This commits breaks certain OpenMP codes (on power) because it expanded
the Attributor scope without telling the Attributor about the SCC
extend. See: https://reviews.llvm.org/D85544#2227611
This reverts commit b0b32e6490.
We don't need a std::string for a literal string, we can use a
StringRef.
The addition of StringRefs produces a Twine that we can just call
str() without converting to a SmallString ourselves. Twine will
do that internally.
-force-attribute adds an attribute to function via command-line.
However, there was no counter-part to remove an attribute. This patch
adds -force-remove-attribute that removes an attribute from function.
Differential Revision: https://reviews.llvm.org/D85586
There's a potential motivating case to increase this limit in PR47191:
http://bugs.llvm.org/PR47191
But first we should make it less hacky. The limit in InstCombine is directly tied
to this value because an increase there can cause asserts in the underlying value
tracking calls if not changed together. The usage in VectorUtils is independent,
but the comment suggests that we should use the same value unless there's a known
reason to diverge. There are similar limits in codegen analysis, but I think we
should leave those independent in case we intentionally want the optimization
power/cost to be different there.
Differential Revision: https://reviews.llvm.org/D86113
This patch was reverted in 7c182663a8 due to some failures
observed on PCC based machines. Failures were due to Endianness issue and
long double representation issues.
Patch is revised to address Endianness issue. Furthermore, support
for emission of `DW_OP_implicit_value` for `long double` has been removed
(since it was unclean at the moment). Planning to handle this in
a clean way soon!
For more context, please refer to following review link.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83560
TargetRegisterInfo::getMinimalPhysRegClass() returns rtcGPR64RegClassID for X16
and X17, as it's the last matching class. This in turn gets passed to
AArch64RegisterBankInfo::getRegBankFromRegClass(), which hits an unreachable.
It seems sensible to handle this case, so copies from X16 and X17 work.
Copying from X17 is used in inline assembly in libunwind for pointer
authentication.
Differential Revision: https://reviews.llvm.org/D85720
llvm is missing support for DW_OP_implicit_value operation.
DW_OP_implicit_value op is indispensable for cases such as
optimized out long double variables.
For intro refer: DWARFv5 Spec Pg: 40 2.6.1.1.4 Implicit Location Descriptions
Consider the following example:
```
int main() {
long double ld = 3.14;
printf("dummy\n");
ld *= ld;
return 0;
}
```
when compiled with tunk `clang` as
`clang test.c -g -O1` produces following location description
of variable `ld`:
```
DW_AT_location (0x00000000:
[0x0000000000201691, 0x000000000020169b): DW_OP_constu 0xc8f5c28f5c28f800, DW_OP_stack_value, DW_OP_piece 0x8, DW_OP_constu 0x4000, DW_OP_stack_value, DW_OP_bit_piece 0x10 0x40, DW_OP_stack_value)
DW_AT_name ("ld")
```
Here one may notice that this representation is incorrect(DWARF4
stack could only hold integers(and only up to the size of address)).
Here the variable size itself is `128` bit.
GDB and LLDB confirms this:
```
(gdb) p ld
$1 = <invalid float value>
(lldb) frame variable ld
(long double) ld = <extracting data from value failed>
```
GCC represents/uses DW_OP_implicit_value in these sort of situations.
Based on the discussion with Jakub Jelinek regarding GCC's motivation
for using this, I concluded that DW_OP_implicit_value is most appropriate
in this case.
Link: https://gcc.gnu.org/pipermail/gcc/2020-July/233057.html
GDB seems happy after this patch:(LLDB doesn't have support
for DW_OP_implicit_value)
```
(gdb) p ld
p ld
$1 = 3.14000000000000012434
```
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83560
D81345 appears to accidentally disables vectorization when explicitly
enabled. As PGSO isn't currently accessible from LoopAccessInfo, revert back to
the vectorization with versioning-for-unit-stride for PGSO.
Differential Revision: https://reviews.llvm.org/D85784
Previously we weren't adding the LegalizerInfo to the post-legalizer
combiner. Since that's fixed, we don't need to try to filter out the
one case that was breaking.
D85820 introduced a full path in the LLVM_SYSTEM_LIBS property of the
LLVMSupport target, which made the OCaml bindings fail to build, since
they use -l [system_lib] flags for every lib in LLVM_SYSTEM_LIBS, which
cannot work with absolute paths.
This patch solves the issue in a similar vain as ZLIB does it: it adds
the full library path to imported_libs, and adds a stripped down version
without directories, lib prefix and lib suffix to system_libs
In the future we should probably make some changes to LLVM_SYSTEM_LIBS,
since both zlib and ncurses do not necessarily have to be system libs
anymore due to the find_package / find_library bits introduced in
D85820 and D79219.
Patch By: haampie
Differential Revision: https://reviews.llvm.org/D86134
If we have a mask, and a value x, where (x & mask) == x, we can drop the AND
and just use x.
This is about a 0.4% geomean code size improvement on CTMark at -O3 for AArch64.
In AArch64, this is most useful post-legalization. Patterns like this often
show up when legalizing s1s, which must be extended to larger types.
e.g.
```
%cmp:_(s32) = G_ICMP ...
%and:_(s32) = G_AND %cmp, 1
```
Since G_ICMP only produces a single bit, there's no reason to mask it with the
G_AND.
Differential Revision: https://reviews.llvm.org/D85463
canBeMovedDownwards checks if the "wait" counterpart of __tgt_target_data_begin_mapper can be moved downwards, returning a pointer to the instruction that might require/modify the data transferred, and returning null it the movement is not possible or not worth it. The function splitTargetDataBeginRTC receives that returned instruction and instead of moving the "wait" it creates it at that point.
Differential Revision: https://reviews.llvm.org/D86155
And as being reported by Florian Hahn, there's a hit
in MultiSource/Benchmarks/mafft from the test-suite on X86 with -O3 -flto,
so reverting until addressed.
This reverts commit 71e0b82c9f.
Add handling for storing the extracted lower (truncated bits) element from a X86ISD::VTRUNC node - this can be lowered to a generic truncated store directly.
Differential Revision: https://reviews.llvm.org/D86158
Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a
function can have a unique ICV value after it is finished, and
AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a
call site. This enables us to check the value of a call and if it
changes the ICV. This also changes the approach in
`getReplacementValues()` to a worklist-based approach so we can explore
all relevant BBs.
Differential Revision: https://reviews.llvm.org/D85544
This implements the assemble and disassemble support of RISCV Vector
extension Zvlsseg instructions, base on the 0.9 spec version.
Reviewed by HsiangKai
Differential Revision: https://reviews.llvm.org/D84416
In DAGTypeLegalizer::GenWidenVectorLoads the algorithm assumes it only
ever deals with fixed width types, hence the offsets for each individual
store never take 'vscale' into account. I've changed the code in that
function to use TypeSize instead of unsigned for tracking the remaining
load amount. In addition, I've changed the load loop to use the new
IncrementPointer helper function for updating the addresses in each
iteration, since this handles scalable vector types.
Also, I've added report_fatal_errors in GenWidenVectorExtLoads,
TargetLowering::scalarizeVectorLoad and TargetLowering::scalarizeVectorStores,
since these functions currently use a sequence of element-by-element
scalar loads/stores. In a similar vein, I've also added a fatal error
report in FindMemType for the case when we decide to return the element
type for a scalable vector type.
I've added new tests in
CodeGen/AArch64/sve-split-load.ll
CodeGen/AArch64/sve-ld-addressing-mode-reg-imm.ll
for the changes in GenWidenVectorLoads.
Differential Revision: https://reviews.llvm.org/D85909
Summary:
When the resource descriptor is of vgpr, we need a waterfall loop
to read into a sgpr. In this patchm we generalized the implementation
to work for any regster class sizes, and extend the work to MIMG
instructions.
Fixes: SWDEV-223405
Reviewers:
arsenm, nhaehnle
Differential Revision:
https://reviews.llvm.org/D82603
Now that we no longer require for this map to have stable iteration order,
we no longer need to pay for keeping the iteration order stable,
so switch from `SmallMapVector` to `SmallDenseMap`.
While it may seem like we can just "deduplicate" the case where
some basic block happens to be a predecessor more than once,
which happens for e.g. switches, that is not correct thing to do.
We must actually add a PHI operand for each predecessor.
This was initially reported to me by David Major
as a clang crash during gecko build for android.
These instructions weren't in the initial version of MMX, but
were added when SSE1 was introduced. We already have the intrinsic
named correctly to include sse and the frontened header enforces
sse. We have one place in the backend where we DAG combine to
this intrinsic, but that's also qualified. So don't know of anything
currently broken unless someone writes their own IR and doesn't
set the sse feature.
We probably want to introduce pseudo-instructions at some point, like
we have for binary operations, but this seems okay for now.
One thing I'm not sure about is whether we should be doing this as a
DAGCombine instead of directly pattern-matching it. I don't see any big
downside to doing it this way, though.
Differential Revision: https://reviews.llvm.org/D85681
This isn't necessaary for ACLE, but could be useful in other situations.
And the change is simple.
Differential Revision: https://reviews.llvm.org/D85251
It's annoying to have to maintain multiple, nearly identical chains of if
statements which all set the same attributes.
Add a helper function, `addFlagsUsingAttrFn` which performs the attribute
setting.
Then, use wrappers for that function in `lowerCall` and `setArgFlags`.
(Note that the flag-setting code in `setArgFlags` was missing the returned
attribute. There's no selection for this yet, so no test. It's an example of
the kind of thing this lets us avoid, though.)
Differential Revision: https://reviews.llvm.org/D86159
Similar to this commit:
faf8065a99
Testcase is pretty much the same as
test/CodeGen/AArch64/tailcall-explicit-sret.ll
Except it uses i64 (since we don't handle the i1024 return values yet), and
doesn't have indirect tail call testcases (because we can't translate those
yet).
Differential Revision: https://reviews.llvm.org/D86148
Parsing DWARFv5 debug_loclist offsets when a CU is parsed is weighing
down memory usage of symbolizers that don't need to parse this data at
all. There's not much benefit to caching these anyway - since they are
O(1) lookup and reading once you know where the offset list starts (and
can do bounds checking with the offset list size too).
In general, I think it might be time to start paying down some of the
technical debt of loc/loclist/range/rnglist parsing to try to unify it a
bit more.
eg:
* Currently DWARFUnit has: RangeSection, RangeSectionBase, LocSection,
LocSectionBase, LocTable, RngListTable, LoclistTableHeader (be nice if
these were all wrapped up in two variables - one for loclists, one for
rnglists)
* rnglists and loclists are handled differently (see:
LoclistTableHeader, but no RnglistTableHeader)
* maybe all these types could be less stateful - lazily parse what they
need to, even reparsing rather than caching because it doesn't seem
too expensive, for instance. (though admittedly so long as it's
constantcost/overead per compilatiton that's probably adequate)
* Maybe implementing and using a DWARFDataExtractor that can be
sub-ranged (so we could slice it up to just the single contribution) -
though maybe that's not so useful because loc/ranges need to refer to
it by absolute, not contribution-relative mechanisms
Differential Revision: https://reviews.llvm.org/D86110
This is restricted to single use loads, which if we fold to sextloads we can
find more optimal addressing modes on AArch64.
This also fixes an overload the MachineFunction::getMachineMemOperand() method
which was incorrectly using the MF alignment instead of the MMO alignment.
Differential Revision: https://reviews.llvm.org/D85966
By detecting this sign extend pattern early, we can uncover opportunities for
more optimizations.
Differential Revision: https://reviews.llvm.org/D85965
VLD2/4 instructions cannot be predicated, so we cannot tail predicate
them from autovec. From intrinsics though, they should be valid as they
will just end up loading extra values into off vector lanes, not
effecting the on lanes. The same is true for loads in general where so
long as we are not using the other vector lanes, an unpredicated load
can be converted to a predicated one.
This marks VLD2 and VLD4 instructions as validForTailPredication and
allows any unpredicated load in tail predication loop, which seems to be
valid given the other checks we have.
Differential Revision: https://reviews.llvm.org/D86022
There are some cases where the instruction that sets up the iteration
count for a tail predicated loop cannot be moved before the dlstp,
stopping tail predication entirely. This patch checks if the mov operand
can be used and if so, uses that instead.
Differential Revision: https://reviews.llvm.org/D86087
This is a non-functional-change to generalize the printIR routines so that
the output can be saved and manipulated rather than being directly output
to dbgs(). This is a prerequisite change for many upcoming changes that
allow new ways of examining changes made to the IR in the new pass manager.
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D85999
We weren't looking through the parameters on calls at all.
E.g., say you had
```
declare i32 @zext(i32 zeroext %x)
...
%y = call i32 @zext(i32 %something)
...
```
At the point of the call, we wouldn't know that the %something should have the
zeroext attribute.
This sets flags in about the same way as
TargetLoweringBase::ArgListEntry::setAttributes.
Differential Revision: https://reviews.llvm.org/D86125
Summary:
This is a follow up for D82481. For .lcomm directive, although it's
not necessary to have .rename emitted, it's still desirable to do
it so that we do not see internal 'Rename..' gets print out in
symbol table. And we could have consistent naming between TC entry
and .lcomm. And also have consistent naming between IR and final
object file.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D86075
Allow non-VLX targets to use 512-bits VPERMV/VPERMV3 for 128/256-bit shuffles.
TBH I'm not sure these targets actually exist in the wild, but we're testing for them and its good test coverage for shuffle lowering/combines across different subvector widths.
Previously, it would successfully select and assert if not HSA or PAL
when expanding the pseudoinstruction. We don't need the
pseudoinstruction anymore since we know the total size after
legalization.
The code to determine the value size was overcomplicated and only
correct in the case where the result register already had a register
class assigned. We can always take the size directly from the
register's type.
We currently call the `llvm_unreachable` for the following YAML:
```
--- !ELF
FileHeader:
Class: ELFCLASS32
Data: ELFDATA2LSB
Type: ET_REL
Machine: EM_NONE
Flags: [ ]
```
it happens because the `Flags` key is present, though `EM_NONE` is a
machine type that has no known `EF_*` values and we call `llvm_unreachable` by mistake.
Differential revision: https://reviews.llvm.org/D86138
AMDGPU ISA isn't backwards compatible and hence -mcpu must always be specified during disassembly.
However, the AMDGPU target CPU is stored in e_flags in the ELF object.
This patch allows targets to implement CPU string detection, and also implements it for AMDGPU by looking at e_flags.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D84519
Right shift patterns will no longer incorrectly accept a shift
amount of zero. At the same time they will allow larger shift
amounts that are now saturated to their upper bound.
Patterns have been extended to enable immediate forms for shifts
taking an arbitrary predicate.
This patch also unifies the code path for immediate parsing so the
i64 based shifts are no longer treated specially.
Differential Revision: https://reviews.llvm.org/D86084
This patch adds lowerShuffleWithVTRUNC to handle basic binary shuffles that can be lowered either as a pure ISD::TRUNCATE or a X86ISD::VTRUNC (with undef/zero values in the remaining upper elements).
We concat the binary sources together into a single 256-bit source vector. To avoid regressions we perform this after we've tried to lower with PACKS/PACKUS which typically does a cleaner job than a concat.
For non-AVX512VL cases we have to canonicalize VTRUNC cases to use a 512-bit source vectors (inserting undefs/zeros in the upper elements as necessary), truncate and then (possibly) extract the 128-bit result.
This should address the last regressions in D66004
Differential Revision: https://reviews.llvm.org/D86093
This patch introduces a new abstract attribute `AANoUndef` which corresponds to `noundef` IR attribute and deduce them.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85184
Theory was that we should never reach a non-type unit (eg: type in an
anonymous namespace) when we're already in the invalid "encountered an
address-use, so stop emitting types for now, until we throw out the
whole type tree to restart emitting in non-type unit" state. But that's
not the case (prior commit cleaned up one reason this wasn't exposed
sooner - but also makes it easier to test/demonstrate this issue)
This reads more like what you'd expect the DWARF to look like (from the
lexical order of C++ - template parameters come before members, etc),
and also happens to make it easier to tickle (& thus test) a bug related
to type units and Split DWARF I'm about to fix.
Before this change we looked through all memory operations in a function
even if the first was an unknown call that could do anything. This did
cost a lot of time but there is little use to do so. We also avoid
creating AAs for things that we would have looked at in case no other AA
will; that is the reason for the test changes.
Running only the attributor-cgscc pass on a IR version of
`llvm-test-suite/MultiSource/Applications/SPASS/clause.c` reduced the
time we spend in `AAMemoryLocation::update` from 4% total to
0.9% (disclaimer: no accurate measurements).
Before we tired to create a dominator tree for a declaration when we
wanted to determine if the function pointer is `nonnull`. We now avoid
looking at global values if `Value::getPointerDereferenceableBytes` not
already determined `nonnull`.
Currently it is hard to avoid having LLVM link to the system install of
ncurses, since it uses check_library_exists to find e.g. libtinfo and
not find_library or find_package.
With this change the ncurses lib is found with find_library, which also
considers CMAKE_PREFIX_PATH. This solves an issue for the spack package
manager, where we want to use the zlib installed by spack, and spack
provides the CMAKE_PREFIX_PATH for it.
This is a similar change as https://reviews.llvm.org/D79219, which just
landed in master.
Differential revision: https://reviews.llvm.org/D85820
This patch implements the vec_extractm function prototypes in altivec.h in
order to utilize the vector extract with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82675
WIP that tries to hide the latency of runtime calls that involve host to
device memory transfers by splitting them into their "issue" and "wait"
versions. The "issue" is moved upwards as much as possible. The "wait" is
moved downards as much as possible. The "issue" issues the memory transfer
asynchronously, returning a handle. The "wait" waits in the returned
handle for the memory transfer to finish. We still lack of the movement.
Doesn't really matter in practice but that's how the nodes are
normally created by SelectionDAGBuilder. So we should match.
Found by temporarily hacking type checks into isel table.
This is the type declared in X86InstrFragmentsSIMD.td. ISel pattern
matching doesn't check so it doesn't matter in practice. Maybe for
SelectionDAG CSE it would matter.
Different training algorithms may produce models that, besides the main
policy output (i.e. inline/don't inline), produce additional outputs
that are necessary for the next training stage. To facilitate this, in
development mode, we require the training policy infrastructure produce
a description of the outputs that are interesting to it, in the form of
a JSON file. We special-case the first entry in the JSON file as the
inlining decision - we care about its value, so we can guide inlining
during training - but treat the rest as opaque data that we just copy
over to the training log.
Differential Revision: https://reviews.llvm.org/D85674
When diffing disassembly dump of two binaries, I see lots of noises from mismatched jump target addresses and global data references, which unnecessarily causes diffs on every function, making it impractical. I'm trying to symbolize the raw binary addresses to minimize the diff noise.
In this change, a local branch target is modeled as a label and the branch target operand will simply be printed as a label. Local labels are collected by a separate pre-decoding pass beforehand. A global data memory operand will be printed as a global symbol instead of the raw data address. Unfortunately, due to the way the disassembler is set up and to be less intrusive, a global symbol is always printed as the last operand of a memory access instruction. This is less than ideal but is probably acceptable from checking code quality point of view since on most targets an instruction can have at most one memory operand.
So far only the X86 disassemblers are supported.
Test Plan:
llvm-objdump -d --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr :
```
Disassembly of section .text:
<_start>:
push rax
mov dword ptr [rsp + 4], 0
mov dword ptr [rsp], 0
mov eax, dword ptr [rsp]
cmp eax, dword ptr [rip + 4112] # 202182 <g>
jge 0x20117e <_start+0x25>
call 0x201158 <foo>
inc dword ptr [rsp]
jmp 0x201169 <_start+0x10>
xor eax, eax
pop rcx
ret
```
llvm-objdump -d **--symbolize-operands** --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr :
```
Disassembly of section .text:
<_start>:
push rax
mov dword ptr [rsp + 4], 0
mov dword ptr [rsp], 0
<L1>:
mov eax, dword ptr [rsp]
cmp eax, dword ptr <g>
jge <L0>
call <foo>
inc dword ptr [rsp]
jmp <L1>
<L0>:
xor eax, eax
pop rcx
ret
```
Note that the jump instructions like `jge 0x20117e <_start+0x25>` without this work is printed as a real target address and an offset from the leading symbol. With a change in the optimizer that adds/deletes an instruction, the address and offset may shift for targets placed after the instruction. This will be a problem when diffing the disassembly from two optimizers where there are unnecessary false positives due to such branch target address changes. With `--symbolize-operand`, a label is printed for a branch target instead to reduce the false positives. Similarly, the disassemble of PC-relative global variable references is also prone to instruction insertion/deletion.
Reviewed By: jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D84191
While the original implementation added in D85787 / ae7f08812e
is not incorrect, it is known to be suboptimal.
In particular, it is not incorrect to use the basic block
in which the original `insertvalue` instruction is located
as the merge point, that is not necessarily optimal,
as `@test6` shows.
We should look at all the AggElts, and, if they are all defined
in the same basic block, then that is the basic block we should use.
On RawSpeed library, this catches +4% (+50) more cases.
On vanilla LLVM test-suits, this catches +12% (+92) more cases.
In a following patch, UseBB will be detected later,
so capturing it is potentially error-prone (capture by ref vs by val).
Also, parametrized UseBB will likely be needed
for multiple levels of PHI indirections later on anyways.
This is NFC at the moment, because right now we always insert the PHI
into the same basic block in which the original `insertvalue` instruction
is, but that will change.
Also, fixes addition of the suffix to the value names.
Some of the lower implementations were relying on this, however the
type was not set depending on which form .lower* helper form you were
using. For instance, if you used an unconditonal lower(), the type was
never set. Most of the lower actions do not benefit from a type
parameter, and just expand in terms of the original operation's types.
However, some lowerings could benefit from an additional type hint to
combine a promotion and an expansion. An example of this is for
add/sub sat. The DAG integer legalization tries to use smarter
expansions directly when promoting the integer type, and doesn't
always produce the same instruction with a wider type.
Treat this as an optional hint argument, that only means something for
specific lower actions. It may be useful to generalize this mechanism
to pass a full list of type indexes and desired types, but I haven't
run into a case like that yet.
SUMMARY:
1. This patch provided API for decoding the traceback table info and unit test for the these API.
2. Another patchs will do the following things:
2.1 added a new option --traceback-table to decode the trace back table information for xcoff object file when
using llvm-objdump to disassemble the xcoff objfile.
2.2 print out the traceback table information for llvm-objdump.
Reviewers: Jason liu, Hubert Tong, James Henderson
Differential Revision: https://reviews.llvm.org/D81585
The "isa" checks were less constrained because they allow
target constants, but the later matching code would bail
out on those anyway, so this should be slightly more
efficient.
this bug was causing miscompile.
now clang cant properly selfhost with -mllvm --enable-knowledge-retention
Reviewed By: jdoerfert, lebedev.ri
Differential Revision: https://reviews.llvm.org/D83507
The previous implementation was incorrect, and based off incorrect
instruction definitions. Unfortunately we can't match natural
addressing in a lot of cases due to the shift/scale applied in
getelementptrs. This relies on reducing the 64-bit shift to 32-bits.
... if the collected file doesn't exists.
This fixes the situation where LLDB can't create a file when capturing a
reproducer because the parent path doesn't exist, but can during replay
because the file collector created the directory hierarchy even though
the file doesn't exist.
This is covered by the lldb reproducer test suite.
isWriteAtEndOfFunction needs to check all memory uses of Def, which is
much more expensive than getting the underlying objects in practice.
Switch the call order, as recommended by the TODO, which was added as
per an earlier review.
This shaves off a bit of compile-time.
Currently the code does not account for the fact that getDomMemoryDef
can be called with ScanLimit == 0, if we reached the limit while
processing an earlier access. Also tighten the check a bit more and bump
the scan limit now that it is handled properly.
In some cases, this brings a 2x speedup in terms of compile-time.
We may have an SGPR->VGPR copy if a totally uniform pointer
calculation is used for a VGPR pointer operand.
Also hack around a bug in MUBUF matching which would incorrectly use
MUBUF for global when flat was requested. This should really be a
predicate on the parent pattern, but the DAG always checked this
manually inside the complex pattern.
If the same stream object is used for multiple compiles, the PAL metadata from eariler compilations will leak into later one. See https://github.com/GPUOpen-Drivers/llpc/issues/882 for how this is happening in LLPC.
No tests were added because multiple compiles will have to happen using the same pass manager, and I do not see a setup for that on the LLVM side. Let me know if there is a good way to test this.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D85667
Existing implementation always aborts on syntax errors in a DataLayout
description. While this is meaningful for consuming textual IR modules, it is
inconvenient for users that may need fine-grained control over the layout from,
e.g., command-line options. Propagate errors through the parsing functions and
only abort in the top-level parsing function instead.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D85650
The functions `__register_frame`/`__deregister_frame` are not
available on z/OS, so add a guard to not use them.
Reviewed By: lhames, abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D84787
The RISC-V Privileged Specification 1.11 defines `mcountinhibit`, which
has the same numeric CSR value as `mucounteren` from 1.09.1. This patch
enables the use of the old `mucounteren` name.
Patch by Yuichi Sugiyama.
Reviewed By: lenary, jrtc27, pzheng
Differential Revision: https://reviews.llvm.org/D85067
This fixes the "Unable to insert indirect branch" fatal error sometimes
seen when generating position-independent code.
Patch by msizanoen1
Reviewed By: jrtc27
Differential Revision: https://reviews.llvm.org/D84833
The current demand propagator for addition will mark all input bits at and right of the alive output bit as alive. But carry won't propagate beyond a bit for which both operands are zero (or one/zero in the case of subtraction) so a more accurate answer is possible given known bits.
I derived a propagator by working through truth tables and using a bit-reversed addition to make demand ripple to the right, but I'm not sure how to make a convincing argument for its correctness in the comments yet. Nevertheless, here's a minimal implementation and test to get feedback.
This would help in a situation where, for example, four bytes (<128) packed into an int are added with four others SIMD-style but only one of the four results is actually read.
Known A: 0_______0_______0_______0_______
Known B: 0_______0_______0_______0_______
AOut: 00000000001000000000000000000000
AB, current: 00000000001111111111111111111111
AB, patch: 00000000001111111000000000000000
Committed on behalf of: @rrika (Erika)
Differential Revision: https://reviews.llvm.org/D72423
Perform lowerShuffleWithVPMOV as part of the v16i8/v8i16 shuffle lowering stages, which are the only types that are currently supported.
We need to expand support for lowering shuffles as truncations to fix the remaining regressions in D66004
Support f128 using VE instructions. Update regression tests.
I've noticed there is no load or store i128 test, so I add them too.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D86035
With gcc 6.3.0, I hit the following compilation bug.
../lib/Transforms/InstCombine/InstCombineVectorOps.cpp:937:2: error: extra ‘;’ [-Werror=pedantic]
};
^
cc1plus: all warnings being treated as errors
The error is introduced by Commit ae7f08812e ("[InstCombine]
Aggregate reconstruction simplification (PR47060)")
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D84630
This pattern happens in clang C++ exception lowering code, on unwind branch.
We end up having a `landingpad` block after each `invoke`, where RAII
cleanup is performed, and the elements of an aggregate `{i8*, i32}`
holding exception info are `extractvalue`'d, and we then branch to common block
that takes extracted `i8*` and `i32` elements (via `phi` nodes),
form a new aggregate, and finally `resume`'s the exception.
The problem is that, if the cleanup block is effectively empty,
it shouldn't be there, there shouldn't be that `landingpad` and `resume`,
said `invoke` should be a `call`.
Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`.
But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft,
while it is pointless, does not look like "empty cleanup block".
So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has
higher cost than it could have on unwind branch :S
This doesn't happen *that* often, but it will basically happen once per C++
function with complex CFG that called more than one other function
that isn't known to be `nounwind`.
I think, this is a missing fold in InstCombine, so i've implemented it.
I think, the algorithm/implementation is rather self-explanatory:
1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate.
2. For each element, try to find from which aggregate it was extracted.
If it was extracted from the aggregate with identical type,
from identical element index, great.
3. If all elements were found to have been extracted from the same aggregate,
then we can just use said original source aggregate directly,
instead of re-creating it.
4. If we fail to find said aggregate when looking only in the current block,
we need be PHI-aware - we might have different source aggregate when coming
from each predecessor.
I'm not sure if this already handles everything, and there are some FIXME's,
i'll deal with all that later in followups.
I'd be fine with going with post-commit review here code-wise,
but just in case there are thoughts, i'm posting this.
On RawSpeed, for example, this has the following effect:
```
| statistic name | baseline | proposed | Δ | % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified | 0 | 1253 | 1253 | 0.00% | 0.00% |
| simplifycfg.NumInvokes | 948 | 1355 | 407 | 42.93% | 42.93% |
| instcount.NumInsertValueInst | 4382 | 3210 | -1172 | -26.75% | 26.75% |
| simplifycfg.NumSinkCommonCode | 574 | 458 | -116 | -20.21% | 20.21% |
| simplifycfg.NumSinkCommonInstrs | 1154 | 921 | -233 | -20.19% | 20.19% |
| instcount.NumExtractValueInst | 29017 | 26397 | -2620 | -9.03% | 9.03% |
| instcombine.NumDeadInst | 166618 | 174705 | 8087 | 4.85% | 4.85% |
| instcount.NumPHIInst | 51526 | 50678 | -848 | -1.65% | 1.65% |
| instcount.NumLandingPadInst | 20865 | 20609 | -256 | -1.23% | 1.23% |
| instcount.NumInvokeInst | 34023 | 33675 | -348 | -1.02% | 1.02% |
| simplifycfg.NumSimpl | 113634 | 114708 | 1074 | 0.95% | 0.95% |
| instcombine.NumSunkInst | 15030 | 14930 | -100 | -0.67% | 0.67% |
| instcount.TotalBlocks | 219544 | 219024 | -520 | -0.24% | 0.24% |
| instcombine.NumCombined | 644562 | 645805 | 1243 | 0.19% | 0.19% |
| instcount.TotalInsts | 2139506 | 2135377 | -4129 | -0.19% | 0.19% |
| instcount.NumBrInst | 156988 | 156821 | -167 | -0.11% | 0.11% |
| instcount.NumCallInst | 1206144 | 1207076 | 932 | 0.08% | 0.08% |
| instcount.NumResumeInst | 5193 | 5190 | -3 | -0.06% | 0.06% |
| asm-printer.EmittedInsts | 948580 | 948299 | -281 | -0.03% | 0.03% |
| instcount.TotalFuncs | 11509 | 11507 | -2 | -0.02% | 0.02% |
| inline.NumDeleted | 97595 | 97597 | 2 | 0.00% | 0.00% |
| inline.NumInlined | 210514 | 210522 | 8 | 0.00% | 0.00% |
```
So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half,
and there is a very apparent decrease in instruction and basic block count.
On vanilla llvm-test-suite:
```
| statistic name | baseline | proposed | Δ | % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified | 0 | 744 | 744 | 0.00% | 0.00% |
| instcount.NumInsertValueInst | 2705 | 2053 | -652 | -24.10% | 24.10% |
| simplifycfg.NumInvokes | 1212 | 1424 | 212 | 17.49% | 17.49% |
| instcount.NumExtractValueInst | 21681 | 20139 | -1542 | -7.11% | 7.11% |
| simplifycfg.NumSinkCommonInstrs | 14575 | 14361 | -214 | -1.47% | 1.47% |
| simplifycfg.NumSinkCommonCode | 6815 | 6743 | -72 | -1.06% | 1.06% |
| instcount.NumLandingPadInst | 14851 | 14712 | -139 | -0.94% | 0.94% |
| instcount.NumInvokeInst | 27510 | 27332 | -178 | -0.65% | 0.65% |
| instcombine.NumDeadInst | 1438173 | 1443371 | 5198 | 0.36% | 0.36% |
| instcount.NumResumeInst | 2880 | 2872 | -8 | -0.28% | 0.28% |
| instcombine.NumSunkInst | 55187 | 55076 | -111 | -0.20% | 0.20% |
| instcount.NumPHIInst | 321366 | 320916 | -450 | -0.14% | 0.14% |
| instcount.TotalBlocks | 886816 | 886493 | -323 | -0.04% | 0.04% |
| instcount.TotalInsts | 7663845 | 7661108 | -2737 | -0.04% | 0.04% |
| simplifycfg.NumSimpl | 886791 | 887171 | 380 | 0.04% | 0.04% |
| instcount.NumCallInst | 553552 | 553733 | 181 | 0.03% | 0.03% |
| instcombine.NumCombined | 3200512 | 3201202 | 690 | 0.02% | 0.02% |
| instcount.NumBrInst | 741794 | 741656 | -138 | -0.02% | 0.02% |
| simplifycfg.NumHoistCommonInstrs | 14443 | 14445 | 2 | 0.01% | 0.01% |
| asm-printer.EmittedInsts | 7978085 | 7977916 | -169 | 0.00% | 0.00% |
| inline.NumDeleted | 73188 | 73189 | 1 | 0.00% | 0.00% |
| inline.NumInlined | 291959 | 291968 | 9 | 0.00% | 0.00% |
```
Roughly similar effect, less instructions and blocks total.
See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e.
Compile-time wise, this appears to be roughly geomean-neutral:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions
And this is a win size-wize in general:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text
See https://bugs.llvm.org/show_bug.cgi?id=47060
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D85787
We can now enable this for AVX1 targets can now assist with canonicalizeShuffleMaskWithHorizOp cleanup.
There's still a few missed opportunities for merging subvector insert/extracts into shuffles, but they shouldn't cause any regressions now.
Instead of just attempting to fold shuffle(HOP,HOP) for a specific target shuffle, make this part of combineX86ShufflesRecursively so we can perform this on the combined shuffle chain, which is particularly useful for recognising more cases of where we're performing multiple HOPs that can be merged and pre-AVX where we don't have good blend/unary target shuffle support.
Split the isRepeatedTargetShuffleMask into a wrapper variant that takes a MVT describing the mask width, and an internal version that just needs the raw mask element bit size.
This will be necessary for an upcoming change where the horizontal ops element width might not match the shuffle mask element width.
This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.
A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.
This is a resubmit of https://reviews.llvm.org/D83743
MachOLinkGraphBuilder has been treating these as hidden, but they should be
treated as local.
Symbols with N_PEXT set and N_EXT unset are produced when hidden symbols are
run through 'ld -r' without passing -keep_private_externs. They will show up
under 'nm -m' as "was private extern", hence the name of the test cases.
Testcase commited as relocatable object to ensure that the test suite doesn't
depend on having 'ld -r' available.
This cleans up copies that the legalizer or other combines leave around. They
can occasionally end up escaping as moves.
Differential Revision: https://reviews.llvm.org/D85964
This was always set to 0. Use a default value of 0 in this context to
satisfy the instruction definition patterns. We can't unconditionally
use SLC with a default value of 0 due to limitations in TableGen's
handling of defaulted operands when followed by non-default operands.
The VGPR component is a 32-bit offset, not 64-bits.
I'm not sure what the correct syntax is for this. This maintains the
vaddr position and leaves saddr in the end "off" position. This is
particularly terrible for stores, since the operand order is now <vgpr
offset>, <data>, <sgpr base>, splitting the pointer operands. I
suppose this is a logical consequence from the mistake of not putting
the data operand first. I'm not sure what sp3 does.
This was only used for matching the saddr addressing mode of global
instructions, but this was not implemented correctly. The instruction
definitions aren't even correct, and are defined as using a 64-bit
VGPR component. Eliminate this pass to enable correcting the
instruction definitions. A new matching implementation can work in
GlobalISel or relying on DAG divergence information for the base
address.
When adding elements when iterating, the iterator will become
valid, which could cause errors. This fixes the issue by using
indexes instead of iterator.
This patch internalize non-exact functions and replaces of their uses
with the internalized version. Doing this enables the analysis of
non-exact functions.
We can do this because some non-exact functions with the same name
whose linkage is `linkonce_odr` or `weak_odr` should have the same
semantics, so we can safely internalize and replace use of them (the
result of the other version of this function should be the same.).
Note that not all functions can be internalized, e.g., function with
`linkonce` or `weak` linkage.
For now when specified in commandline, we internalize all functions
that meet the requirements without calculating the cost of such
internalzation.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D84167
This reverts commit 6dbf0cfcf7.
That commit caused failed assertions, e.g. like this:
$ cat sprintf-strcpy.c
char *ptr; void func(void) { ptr += sprintf(ptr, "%s", ""); }
$ clang -c sprintf-strcpy.c -O2 -target x86_64-linux-gnu
clang: ../lib/IR/Value.cpp:473: void llvm::Value::doRAUW(llvm::Value*,
llvm::Value::ReplaceMetadataUses): Assertion `New->getType() ==
getType() && "replaceAllUses of value with new value of different
type!"' failed.
This code becomes dead for valid IR after 48f4312 and a96fc46. The reason for the test change is that the verifier reports the first verification error encountered, in some non-specified visit order. By removing the verification code in gc.relocates for a statepoint with inline gc operands, I change the error the verifier reports. And in one case, the checked for error is no longer possible with the bundle representation, so I simply delete the file.
The "gc-live" operand bundles were recently added, and all tests have been updated to use that format. A migration period was provided, though it's worth noting these intrinsics are experimental, so formally there is no compatibile requirement.
This is an extension to a96fc46. "gc-live" hadn't been implemented at the point that patch was initially posted.
It did not process hazard for ds_permute because it does not
load or store even though it is DS.
Differential Revision: https://reviews.llvm.org/D86003
Transformation creates big strings for big C values, so bail out for C > 128.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86004
This would be a problem if the entire instrumented function was a call
to
e.g. memcpy
Use FnPrologueEnd Instruction* instead of ActualFnStart BB*
Differential Revision: https://reviews.llvm.org/D86001
(Forgot to land this a couple of weeks back.)
In a recent series of changes, I've introduced support for using the respective operand bundle kinds on the statepoint. At the moment, code supports either/or, but there's no need to keep the old support around. For the moment, I am simply changing the specification and verifier to require zero length argument sets in the intrinsic.
The intrinsic itself is experimental. Given that, there's no forward serialization needed. The in tree uses and generation have already been updated to use the new operand bundle based forms, the only folks broken by the change will be those with frontends generating statepoints directly and the updates should be easy.
Why not go ahead and just remove the arguments entirely? Well, I plan to. But while working on this I've found that almost all of the arguments to the statepoint can be expressed via operand bundles or attributes. Given that, I'm planning a radical simplification of the arguments and figured I'd do one update not several small ones.
Differential Revision: https://reviews.llvm.org/D80892
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.
This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.
One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.
I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.
Differential Revision: https://reviews.llvm.org/D85165
This allows us to add addtional instrumentation before the function start,
without splitting the first BB.
Differential Revision: https://reviews.llvm.org/D85985
Have the front-end use the `nounwind` attribute on atomic libcalls.
This prevents us from seeing `invoke __atomic_load` in MSAN, which
is problematic as it has no successor for instrumentation to be added.
A unique module id, which is a part of sinit and sterm function names, is
necessary to be unique. However, `getUniqueModuleId` will fail if there is
no strong external symbol within a module. We turn to use Pid and timestamp
when this happens.
Differential Revision: https://reviews.llvm.org/D85527
This avoid GUID lookup in Index.findSummaryInModule.
Follow up for D81242.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D85269
Define the platform ID = 10, and simple mappings between platform ID & name.
Reviewed By: MaskRay, cishida
Differential Revision: https://reviews.llvm.org/D85594
These should really match either G_BUILD_VECTOR or
G_BUILD_VECTOR_TRUNC, but there doesn't seem to be an existing
mechanism for matching alternative opcodes. There is GIM_SwitchOpcode,
but it seems to assume it's oly only used for matcher optimization.
I could also omit any opcode check and rely on the matcher directly
checking the opcode, but the table optimizer currently assumes there
has to be an opcode check.
Also doesn't try to handle undef elements like the DAG version.
This is beginning to look like a canonicalization stage that could be performed as part of shuffle combining
Another step towards PR41813
Recommit of rG9bd97d036398 with fixed offset adjustments
Unfortunately this ends up not working as expected on targets with
16-bit operations due to AMDGPUCodeGenPrepare's promotion of uniform
16-bit ops to i32.
The vector case annoyingly requires switching the checked opcode,
since constants for vectors aren't directly handled.
I also need to think more carefully about whether this is valid for i1.
Extend FixupStatepointCallerSaved pass with ability to spill
statepoint GC pointer arguments (optionally allowing them on CSRs).
Special handling is required for invoke statepoints, because at MI
level single landing pad may be shared by multiple statepoints, so
we must ensure we spill landing pad's live-ins into the same stack
slots.
Full statepoint refactoring change set is available at D81603.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D81647
Remove I8/I16 register classes which are prepared to implement previously
to implement VE ABI. However, it is possible to implement VE ABI correctly
without them. Therefore, removing them now.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D85905
This loop caused me a little headache once, because I didn't see the assigned variable is a member. The refactored version appears more readable to me.
Differential Revision: https://reviews.llvm.org/D85922
PAL recently got support for multiple ELF sections and relocations,
therefore we can now use .rodata sections instead of forcing constants
into .text.
Differential Revision: https://reviews.llvm.org/D85895
The code wasn't taking into account that the two operands
passed to ptest could be identical and was trying to erase
them twice.
Differential Revision: https://reviews.llvm.org/D85892
When getUserCost was transitioned to use an explicit CostKind,
TCK_CodeSize was used even though the original kind was implicitly
SizeAndLatency so restore this behaviour. We now only query for
CodeSize when optimising for minsize.
I expect this to not change anything as, I think all, targets will
currently return the same value for CodeSize and SizeLatency. Indeed
I see no changes in the test suite for Arm, AArch64 and X86.
Differential Revision: https://reviews.llvm.org/D85829
dumpStringOffsetsSection() expects the size of a contribution to be
correctly aligned. The patch adds the corresponding verifications for
pre-v5 cases.
Differential Revision: https://reviews.llvm.org/D85739
This lets us support the scenario where a binary is linked from a mix
of object files with both instrumented and non-instrumented globals.
This is likely to occur on Android where the decision of whether to use
instrumented globals is based on the API level, which is user-facing.
Previously, in this scenario, it was possible for the comdat from
one of the object files with non-instrumented globals to be selected,
and since this comdat did not contain the note it would mean that the
note would be missing in the linked binary and the globals' shadow
memory would be left uninitialized, leading to a tag mismatch failure
at runtime when accessing one of the instrumented globals.
It is harmless to include the note when targeting a runtime that does
not support instrumenting globals because it will just be ignored.
Differential Revision: https://reviews.llvm.org/D85871
This patch restricts the behaviour of referencing via .Lfoo$local
local aliases, introduced in https://reviews.llvm.org/D73230, to
STV_DEFAULT globals only.
Hidden symbols via --fvisiblity=hidden (https://gcc.gnu.org/wiki/Visibility)
is an important scenario.
Benefits:
- Improves the size of object files by using fewer STT_SECTION symbols.
- The code reads a bit better (it was not obvious to me without going
back to the code reviews why the canBenefitFromLocalAlias function
currently doesn't consider visibility).
- There is also a side benefit in restoring the effectiveness of the
--wrap linker option and making the behavior of --wrap consistent
between LTO and normal builds for references within a translation-unit.
Note: this --wrap behavior (which is specific to LLD) should not be
considered reliable. See comments on https://reviews.llvm.org/D73230
for more.
Differential Revision: https://reviews.llvm.org/D85782
Previously ConstantFoldExtractElementInstruction() would only work with
insertelement instructions, not contants. This properly handles
insertelement constants as well.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85865
If we need a scratch register for the spill don't use the same scratch
register that is being used for the MBUF offset.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D85772
Allow inlining only when the Callee has a subset of the Caller's
features. In principle, we should be able to inline regardless of any
features because WebAssembly supports features at module granularity,
not function granularity, but without this restriction it would be
possible for a module to "forget" about features if all the functions
that used them were inlined.
Requested in PR46812.
Differential Revision: https://reviews.llvm.org/D85494
This change moves elfabi related code to llvm/InterfaceStub library
so it can be shared by multiple llvm tools without causing cyclic
dependencies.
Differential Revision: https://reviews.llvm.org/D85678
Refactoring function `writeArchive` in ArchiveWriter. Added a new
function `writeArchiveBuffer` that returns the archive in a memory
buffer instead of writing it out to the disk. This refactor is necessary
so as to allow `llvm-libtool-darwin` to write universal files containing
archives.
Reviewed by jhenderson, MaskRay, smeenai
Differential Revision: https://reviews.llvm.org/D84858
Similarly as for pointers, even for integers a == b is usually false.
GCC also uses this heuristic.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D85781
These operations take Qda and Rn register operands, which are
commutative so long as the instruction is not predicated.
Differential Revision: https://reviews.llvm.org/D85813
This is a fixup to commit 43bdac2906, to make sure the
address space from the original load pointer is retained in the
vector pointer.
Resolves problem with
Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
due to address space mismatch.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D85912
InstCombine adds users of transformed instruction to working list to
process on the same iteration. However gc.relocate may have a hidden
user (next gc.relocate) which is connected through gc.statepoint intrinsic and
there is no direct def-use chain between them.
In this case if the next gc.relocation is already processed it will not be added
to worklist and will not be able to be processed on the same iteration.
Let's we have the following case:
A = gc.relocate(null)
B = statepoint(A)
C = gc.relocate(B, hidden(A))
If C is already considered then after replacement of A with null, statepoint B
instruction will be added to the queue but not C.
C can be processed only on the next iteration.
If the chain of relocation is pretty long the many iteration may be required.
This change is to reduce the number of iteration to meet the latest changes
related to reducing infinite loop threshold.
This is a quick (not best) fix. In the follow up patches I plan to move gc relocation
handling into statepoint handler. This should also help to remove unused gc live
entries in statepoint bundle.
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D75598
If the referenced symbol of a J[U]MP_SLOT is invalid (e.g. symbol index 0), llvm-objdump -d will bail out:
```
error: 'a': st_name (0x326600) is past the end of the string table of size 0x7
```
where 0x326600 is the st_name field of the first entry past the end of .symtab
Change it to a warning to continue dumping.
`X86/plt.test` uses a prebuilt executable, so I pick `ELF/AArch64/plt.test`
which has a YAML input and can be easily modified.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D85623