This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
The crash comes from mismatch between load count in epilogue and seh instruction count.
Still because of the pass AArch64LoadStoreOpt. It remove some load in the epilogue but haven't remove the corresponding seh instruction.
This patch don't optimize the load in the epilogue to fix the issue.
Fix: #58516
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D136430
AArch64LoadStoreOptimizer has a bunch of different guards to avoid
corrupting Windows SEH prologues/epilogues, but apparently we missed the
case of merging two instructions where the first instruction isn't part
of the epilogue, but the second instruction is.
Fixes issue discovered at https://reviews.llvm.org/D130049#3704064
Differential Revision: https://reviews.llvm.org/D134992
According D125377, we order STP Q's by ascending address. While on some
targets, paired 128 bit loads and stores are slow, so the STP will split
into STRQ and STUR, so I hope these stores will also be ordered.
Also add subtarget feature ascend-store-address to control the aggressive order.
Reviewed By: dmgreen, fhahn
Differential Revision: https://reviews.llvm.org/D126700
[Re-commit after fixing a dereference of "end" iterator]
The AArch64LoadStoreOptimnizer pass may merge a register
increment/decrement with a following memory operation. In doing so, it
may break CFI by moving a stack pointer adjustment past the CFI
instruction that described *that* adjustment.
This patch fixes this issue by moving said CFI instruction after the
merged instruction, where the SP increment/decrement actually takes
place.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D114547
The AArch64LoadStoreOptimnizer pass may merge a register
increment/decrement with a following memory operation. In doing so, it
may break CFI by moving a stack pointer adjustment past the CFI
instruction that described *that* adjustment.
This patch fixes this issue by moving said CFI instruction after the
merged instruction, where the SP increment/decrement actually takes
place.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D114547
FirstMI is only used to get the load/store operand and the machine
function. Pass the MF and register explicitly, so the helper can be used
to find rename registers for other instructions in the future.
The content of undef registers are not used in meaningful ways, when checking
if a rename register is used between paired instructions we should ignore
undef registers.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D119305
This reverts commit ea011ec5ed.
This still causes some miscompiles, I'll follow up in the phabricator
review with a sample of that issue (which is part of the sample of
the previous issue).
This is a recommit that fixes unwanted STP generation by checking that
the base register has not been modified or used elsewhere.
Our initial motivating case was memcpy's with alignments > 16. The
loads/stores, to which small memcpy's expand, are kept together in
several places so that we get a sequence like this for a 64 bit copy:
LD w0
LD w1
ST w0
ST w1
The load/store optimiser can generate a LDP/STP w0, w1 from this because
the registers read/written are consecutive. In our case however, the
sequence is optimised during ISel, resulting in:
LD w0
ST w0
LD w0
ST w0
This instruction reordering allows reuse of registers. Since the registers
are no longer consecutive (i.e. they are the same), it inhibits LDP/STP
creation. The approach here is to perform renaming:
LD w0
ST w0
LD w1
ST w1
to enable the folding of the stores into a STP. We do not yet generate
the LDP due to a limitation in the renaming implementation, but plan to
look at that in a follow-up so that we fully support this case. While
this was initially motivated by certain memcpy's, this is a general
approach and thus is beneficial for other cases too, as can be seen
in some test changes.
Differential Revision: https://reviews.llvm.org/D103597
Our initial motivating case was memcpy's with alignments > 16. The
loads/stores, to which small memcpy's expand, are kept together in
several places so that we get a sequence like this for a 64 bit copy:
LD w0
LD w1
ST w0
ST w1
The load/store optimiser can generate a LDP/STP w0, w1 from this because
the registers read/written are consecutive. In our case however, the
sequence is optimised during ISel, resulting in:
LD w0
ST w0
LD w0
ST w0
This instruction reordering allows reuse of registers. Since the registers
are no longer consecutive (i.e. they are the same), it inhibits LDP/STP
creation. The approach here is to perform renaming:
LD w0
ST w0
LD w1
ST w1
to enable the folding of the stores into a STP. We do not yet generate
the LDP due to a limitation in the renaming implementation, but plan to
look at that in a follow-up so that we fully support this case. While
this was initially motivated by certain memcpy's, this is a general
approach and thus is beneficial for other cases too, as can be seen
in some test changes.
Differential Revision: https://reviews.llvm.org/D103597
This patch fixes an issue where a pre-indexed store e.g.,
STR x1, [x0, #24]! with a store like STR x0, [x0, #8] are
merged into a single store: STP x1, x0, [x0, #24]!
. They shouldn’t be merged because the second store uses
x0 as both the stored value and the address and so it needs to be using the updated x0.
Therefore, it should not be folded into a STP <>pre.
Additionally a new test case is added to verify this fix.
Differential Revision: https://reviews.llvm.org/D101888
Change-Id: I26f1985ac84e970961e2cdca23c590fa6773851a
This patch merges STR<S,D,Q,W,X>pre-STR<S,D,Q,W,X>ui and
LDR<S,D,Q,W,X>pre-LDR<S,D,Q,W,X>ui instruction pairs into a single
STP<S,D,Q,W,X>pre and LDP<S,D,Q,W,X>pre instruction, respectively.
For each pair, there is a MIR test that verifies this optimization.
Differential Revision: https://reviews.llvm.org/D99272
Change-Id: Ie97a20c8c716c08492fe229c22e14e3c98ef08b7
Instead of outright disabling this completely with the noredzone attribute,
we only avoid doing the optimization if there are memory operations between
the adjustment and the load/store that the adjustment would be folded into.
This avoids the case of something like a stack cookie being corrupted if an
exception happens before the pre-increment to the SP occurs.
This also prevents the folding happening if we have a redzone, but the offset
being folded is above the redzone amount (128 bytes in this case).
rdar://73269336
Differential Revision: https://reviews.llvm.org/D95179
This matches the corresponding existing case in
AArch64LoadStoreOpt::findMatchingUpdateInsnForward.
Both cases could also be modified to check
MBBI->getFlag(FrameSetup/FrameDestroy) instead of forbidding any
optimization involving SP, but the effect is probably pretty much
the same.
Differential Revision: https://reviews.llvm.org/D88541
When pairing loads, we should check if in between the two loads the
base register has been modified. If that is the case then avoid pairing
them because the second load actually loads from a different address.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D86956
When matching store instruction for ldst opt, we should make sure store instr is in 'reg+imm' form as load instr,
otherwise, it will have assertion in isLdOffsetInRangeOfSt since it will use getImm() directly.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87905
When pairing ldr instructions to an ldp instruction, we cannot pair two ldr
destination registers where one is a sub or super register of the other.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D86906
Currently aarch64-ldst-opt will incorrectly rename registers with
multiple disjunct subregisters (e.g. result of LD3). This patch updates
the canRenameUpToDef to bail out if it encounters such a register class
that contains the register to rename.
Fixes PR46105.
Reviewers: efriedma, dmgreen, paquette, t.p.northover
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D81108
Summary:
Add a check to make sure that MachineInstr::mayAlias returns prematurely if at least one of its instruction parameters does not access memory. This prevents calls to TargetInstrInfo::areMemAccessesTriviallyDisjoint with incompatible instructions.
A side effect of this change is to render the mayAlias helper in the AArch64 load/store optimizer obsolete. We can now directly call the MachineInstr::mayAlias member function.
Reviewers: hfinkel, t.p.northover, mcrosier, eli.friedman, efriedma
Reviewed By: efriedma
Subscribers: efriedma, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78823
Do not count the presence of debug insts against the limit set by
LdStLimit, and allow the optimizer to find matching insts by skipping
over debug insts.
Differential Revision: https://reviews.llvm.org/D78411
Summary:
When moving add and sub to memory operand instructions,
aarch64-ldst-opt would prematurally pop the stack pointer,
before memory instructions that do access the stack using
indirect loads.
e.g.
```
int foo(int offset){
int local[4] = {0};
return local[offset];
}
```
would generate:
```
sub sp, sp, #16 ; Push the stack
mov x8, sp ; Save stack in register
stp xzr, xzr, [sp], #16 ; Zero initialize stack, and post-increment, making it invalid
------ If an exception goes here, the stack value might be corrupted
ldr w0, [x8, w0, sxtw #2] ; Access correct position, but it is not guarded by SP
```
Reviewers: fhahn, foad, thegameg, eli.friedman, efriedma
Reviewed By: efriedma
Subscribers: efriedma, kristof.beyls, hiraditya, danielkiss, llvm-commits, simon_tatham
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75755
This patch adds a new option to enable/disable register renaming in the
load-store optimizer. Defaults to disabled, as there is a potential
mis-compile caused by this.
If the root def of for renaming is a noop-pseudo instruction like kill,
we would end up without a correct def for the renamed register, causing
miscompiles.
This patch conservatively bails out on any pseudo instruction.
This fixes https://bugs.chromium.org/p/chromium/issues/detail?id=1037912#c70
Summary:
r347747 added support for clustering mem ops with FI base operands
including support for fixed stack objects in shouldClusterFI, but
apparently this was never tested.
This patch fixes shouldClusterFI to work with scaled as well as
unscaled load/store instructions, and fixes the ordering of memory ops
in MemOpInfo::operator< to ensure that memory addresses always
increase, regardless of which direction the stack grows.
Subscribers: MatzeB, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71334
In some cases, we can rename a store operand, in order to enable pairing
of stores. For store pairs, that cannot be merged because the first
tored register is defined in between the second store, we try to find
suitable rename register.
First, we check if we can rename the given register:
1. The first store register must be killed at the store, which means we
do not have to rename instructions after the first store.
2. We scan backwards from the first store, to find the definition of the
stored register and check all uses in between are renamable. Along
they way, we collect the minimal register classes of the uses for
overlapping (sub/super)registers.
Second, we try to find an available register from the minimal physical
register class of the original register. A suitable register must not be
1. defined before FirstMI
2. between the previous definition of the register to rename
3. a callee saved register.
We use KILL flags to clear defined registers while scanning from the
beginning to the end of the block.
This triggers quite often, here are the top changes for MultiSource,
SPEC2000, SPEC2006 compiled with -O3 for iOS:
Metric: aarch64-ldst-opt.NumPairCreated
Program base patch diff
test-suite...nch/fourinarow/fourinarow.test 2.00 39.00 1850.0%
test-suite...s/ASC_Sequoia/IRSmk/IRSmk.test 46.00 80.00 73.9%
test-suite...chmarks/Olden/power/power.test 70.00 96.00 37.1%
test-suite...cations/hexxagon/hexxagon.test 29.00 39.00 34.5%
test-suite...nchmarks/McCat/05-eks/eks.test 100.00 132.00 32.0%
test-suite.../Trimaran/enc-rc4/enc-rc4.test 46.00 59.00 28.3%
test-suite...T2006/473.astar/473.astar.test 160.00 200.00 25.0%
test-suite.../Trimaran/enc-md5/enc-md5.test 8.00 10.00 25.0%
test-suite...telecomm-gsm/telecomm-gsm.test 113.00 139.00 23.0%
test-suite...ediabench/gsm/toast/toast.test 113.00 139.00 23.0%
test-suite...Source/Benchmarks/sim/sim.test 91.00 111.00 22.0%
test-suite...C/CFP2000/179.art/179.art.test 41.00 49.00 19.5%
test-suite...peg2/mpeg2dec/mpeg2decode.test 245.00 279.00 13.9%
test-suite...marks/Olden/health/health.test 16.00 18.00 12.5%
test-suite...ks/Prolangs-C/cdecl/cdecl.test 90.00 101.00 12.2%
test-suite...fice-ispell/office-ispell.test 91.00 100.00 9.9%
test-suite...oxyApps-C/miniGMG/miniGMG.test 430.00 465.00 8.1%
test-suite...lowfish/security-blowfish.test 39.00 42.00 7.7%
test-suite.../Applications/spiff/spiff.test 42.00 45.00 7.1%
test-suite...arks/mafft/pairlocalalign.test 2473.00 2646.00 7.0%
test-suite.../VersaBench/ecbdes/ecbdes.test 29.00 31.00 6.9%
test-suite...nch/beamformer/beamformer.test 220.00 235.00 6.8%
test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2252.00 6.7%
test-suite...ve-susan/automotive-susan.test 109.00 116.00 6.4%
test-suite...s-C/unix-smail/unix-smail.test 65.00 69.00 6.2%
test-suite...CI_Purple/SMG2000/smg2000.test 1194.00 1265.00 5.9%
test-suite.../Benchmarks/nbench/nbench.test 472.00 500.00 5.9%
test-suite...oxyApps-C/miniAMR/miniAMR.test 248.00 262.00 5.6%
test-suite...quoia/CrystalMk/CrystalMk.test 18.00 19.00 5.6%
test-suite...rks/tramp3d-v4/tramp3d-v4.test 7331.00 7710.00 5.2%
test-suite.../Benchmarks/Bullet/bullet.test 5651.00 5938.00 5.1%
test-suite...ternal/HMMER/hmmcalibrate.test 750.00 788.00 5.1%
test-suite...T2006/456.hmmer/456.hmmer.test 764.00 802.00 5.0%
test-suite...ications/JM/ldecod/ldecod.test 1028.00 1079.00 5.0%
test-suite...CFP2006/444.namd/444.namd.test 1368.00 1434.00 4.8%
test-suite...marks/7zip/7zip-benchmark.test 4471.00 4685.00 4.8%
test-suite...6/464.h264ref/464.h264ref.test 3122.00 3271.00 4.8%
test-suite...pplications/oggenc/oggenc.test 1497.00 1565.00 4.5%
test-suite...T2000/300.twolf/300.twolf.test 742.00 774.00 4.3%
test-suite.../Prolangs-C/loader/loader.test 24.00 25.00 4.2%
test-suite...0.perlbench/400.perlbench.test 1983.00 2058.00 3.8%
test-suite...ications/JM/lencod/lencod.test 4612.00 4785.00 3.8%
test-suite...yApps-C++/PENNANT/PENNANT.test 995.00 1032.00 3.7%
test-suite...arks/VersaBench/dbms/dbms.test 54.00 56.00 3.7%
Reviewers: efriedma, thegameg, samparker, dmgreen, paquette, evandro
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D70450
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
a generically extensible collection of extra info attached to
a `MachineInstr`.
The primary change here is cleaning up the APIs used for setting and
manipulating the `MachineMemOperand` pointer arrays so chat we can
change how they are allocated.
Then we introduce an extra info object that using the trailing object
pattern to attach some number of MMOs but also other extra info. The
design of this is specifically so that this extra info has a fixed
necessary cost (the header tracking what extra info is included) and
everything else can be tail allocated. This pattern works especially
well with a `BumpPtrAllocator` which we use here.
I've also added the basic scaffolding for putting interesting pointers
into this, namely pre- and post-instruction symbols. These aren't used
anywhere yet, they're just there to ensure I've actually gotten the data
structure types correct. I'll flesh out support for these in
a subsequent patch (MIR dumping, parsing, the works).
Finally, I've included an optimization where we store any single pointer
inline in the `MachineInstr` to avoid the allocation overhead. This is
expected to be the overwhelmingly most common case and so should avoid
any memory usage growth due to slightly less clever / dense allocation
when dealing with >1 MMO. This did require several ergonomic
improvements to the `PointerSumType` to reasonably support the various
usage models.
This also has a side effect of freeing up 8 bits within the
`MachineInstr` which could be repurposed for something else.
The suggested direction here came largely from Hal Finkel. I hope it was
worth it. ;] It does hopefully clear a path for subsequent extensions
w/o nearly as much leg work. Lots of thanks to Reid and Justin for
careful reviews and ideas about how to do all of this.
Differential Revision: https://reviews.llvm.org/D50701
llvm-svn: 339940
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
Summary:
This pass sinks COPY instructions into a successor block, if the COPY is not
used in the current block and the COPY is live-in to a single successor
(i.e., doesn't require the COPY to be duplicated). This avoids executing the
the copy on paths where their results aren't needed. This also exposes
additional opportunites for dead copy elimination and shrink wrapping.
These copies were either not handled by or are inserted after the MachineSink
pass. As an example of the former case, the MachineSink pass cannot sink
COPY instructions with allocatable source registers; for AArch64 these type
of copy instructions are frequently used to move function parameters (PhyReg)
into virtual registers in the entry block..
For the machine IR below, this pass will sink %w19 in the entry into its
successor (%bb.1) because %w19 is only live-in in %bb.1.
```
%bb.0:
%wzr = SUBSWri %w1, 1
%w19 = COPY %w0
Bcc 11, %bb.2
%bb.1:
Live Ins: %w19
BL @fun
%w0 = ADDWrr %w0, %w19
RET %w0
%bb.2:
%w0 = COPY %wzr
RET %w0
```
As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be
able to see %bb.0 as a candidate.
With this change I observed 12% more shrink-wrapping candidate and 13% more dead copies deleted in spec2000/2006/2017 on AArch64.
Reviewers: qcolombet, MatzeB, thegameg, mcrosier, gberry, hfinkel, john.brawn, twoh, RKSimon, sebpop, kparzysz
Reviewed By: sebpop
Subscribers: evandro, sebpop, sfertile, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D41463
llvm-svn: 328237