llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	0529e2e018	[InstrInfo] Use 64-bit immediates for analyzeCompare() (NFCI) The backend generally uses 64-bit immediates (e.g. what MachineOperand::getImm() returns), so use that for analyzeCompare() and optimizeCompareInst() as well. This avoids truncation for targets that support immediates larger 32-bit. In particular, we can avoid the bugprone value normalization hack in the AArch64 target. This is a followup to D108076. Differential Revision: https://reviews.llvm.org/D108875	2021-08-30 19:46:04 +02:00
Nikita Popov	81b106584f	[AArch64] Fix comparison peephole opt with non-0/1 immediate (PR51476) This is a non-intrusive fix for https://bugs.llvm.org/show_bug.cgi?id=51476 intended for backport to the 13.x release branch. It expands on the current hack by distinguishing between CmpValue of 0, 1 and 2, where 0 and 1 have the obvious meaning and 2 means "anything else". The new optimization from D98564 should only be performed for CmpValue of 0 or 1. For main, I think we should switch the analyzeCompare() and optimizeCompare() APIs to use int64_t instead of int, which is in line with MachineOperand's notion of an immediate, and avoids this problem altogether. Differential Revision: https://reviews.llvm.org/D108076	2021-08-15 12:35:52 +02:00
David Green	bd07c2e266	[AArch64] Prefer fmov over orr v.16b when copying f32/f64 This changes the lowering of f32 and f64 COPY from a 128bit vector ORR to a fmov of the appropriate type. At least on some CPU's with 64bit NEON data paths this is expected to be faster, and shouldn't be slower on any CPU that treats fmov as a register rename. Differential Revision: https://reviews.llvm.org/D106365	2021-08-03 17:25:40 +01:00
Jon Roelofs	6611fbc62a	[AArch64] Dump a little more info about unimplemented reg-to-reg copies. NFC	2021-07-12 15:37:11 -07:00
Peter Waller	c5dfee44b9	[CodeGen][AArch64][SVE] Use ld1r[bhsd] for vector splat from memory This avoids the use of the vector unit for copying from scalar to vector. There is an extra ptrue instruction, but a predicate register with the ptrue pattern populated is likely to be free in the context of real code. Tests were generated from a template to cover the axes mentioned at the top of the test file. Co-authored-by: Francesco Petrogalli <francesco.petrogalli@arm.com> Differential Revision: https://reviews.llvm.org/D103170	2021-07-06 12:03:54 +00:00
Pablo Barrio	571c8c5263	[AArch64][v8.3A] Avoid inserting implicit landing pads (PACISP) PACISP have the advantage that they are in HINT space, meaning they can be run successfully in hardware without PAuth support - they will just behave as a NOP. However, PACISP are also implicit landing pads (think of an extra BTI jc). Therefore, they allow indirect jumps of all kinds into them, potentially inserting new gadgets. This patch replaces PACISP by PACI* LR, SP when compiling explicitly for hardware with full PAuth support. PACI* is not in the HINT space, therefore it will fault when run in hardware without PAuth support, but it is also not a landing pad, making programs safer in newer HW. Differential Revision: https://reviews.llvm.org/D101920	2021-06-24 18:24:32 +01:00
Nick Desaulniers	033138ea45	[IR] make stack-protector-guard-* flags into module attrs D88631 added initial support for: - -mstack-protector-guard= - -mstack-protector-guard-reg= - -mstack-protector-guard-offset= flags, and D100919 extended these to AArch64. Unfortunately, these flags aren't retained for LTO. Make them module attributes rather than TargetOptions. Link: https://github.com/ClangBuiltLinux/linux/issues/1378 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D102742	2021-05-21 15:53:30 -07:00
Nick Desaulniers	0f41778919	[AArch64] Support customizing stack protector guard Follow up to D88631 but for aarch64; the Linux kernel uses the command line flags: 1. -mstack-protector-guard=sysreg 2. -mstack-protector-guard-reg=sp_el0 3. -mstack-protector-guard-offset=0 to use the system register sp_el0 for the stack canary, enabling the kernel to have a unique stack canary per task (like a thread, but not limited to userspace as the kernel can preempt itself). Address pr/47341 for aarch64. Fixes: https://github.com/ClangBuiltLinux/linux/issues/289 Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen Differential Revision: https://reviews.llvm.org/D100919	2021-05-17 11:49:22 -07:00
Tim Northover	82a0e808bb	IR/AArch64/X86: add "swifttailcc" calling convention. Swift's new concurrency features are going to require guaranteed tail calls so that they don't consume excessive amounts of stack space. This would normally mean "tailcc", but there are also Swift-specific ABI desires that don't naturally go along with "tailcc" so this adds another calling convention that's the combination of "swiftcc" and "tailcc". Support is added for AArch64 and X86 for now.	2021-05-17 10:48:34 +01:00
Tim Northover	ea0eec69f1	IR+AArch64: add a "swiftasync" argument attribute. This extends any frame record created in the function to include that parameter, passed in X22. The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001 in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect of this is that tools walking the stack should expect to see one of three values there: * 0b0000 => a normal, non-extended record with just [FP, LR] * 0b0001 => the extended record [X22, FP, LR] * 0b1111 => kernel space, and a non-extended record. All other values are currently reserved. If compiling for arm64e this context pointer is address-discriminated with the discriminator 0xc31a and the DB (process-specific) key. There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing front-ends access to this slot (and forcing its creation initialized to nullptr if necessary).	2021-05-14 11:43:58 +01:00
Peter Waller	3fa6510f6e	[CodeGen][AArch64][SVE] Fold [rdffr, ptest] => rdffrs; bugfix for optimizePTestInstr When a ptest is used to set flags from the output of rdffr, the ptest can be eliminated, using a flags-setting rdffrs instead. Additionally, check that nothing consumes flags between rdffr and ptest; this case appears to have been missed previously. * There is no unpredicated RDFFRS instruction. * If substituting RDFFR_PP, require that the mask argument of the PTEST matches that of the RDFFR_PP. * Move some precondition code up inside optimizePTestInstr, so that it covers the new code paths for RDFFR which return earlier. * Only consider RDFFR, PTEST in same basic block. * Check for other flag setting instructions between the two, abort if found. * Drop an old TODO comment about removing dead PTEST instructions. RDFFR_P to follow in later patch. Differential Revision: https://reviews.llvm.org/D101357	2021-05-12 15:06:22 +01:00
Stelios Ioannou	936c777e2b	[AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR. This patch merges STR<S,D,Q,W,X>pre-STR<S,D,Q,W,X>ui and LDR<S,D,Q,W,X>pre-LDR<S,D,Q,W,X>ui instruction pairs into a single STP<S,D,Q,W,X>pre and LDP<S,D,Q,W,X>pre instruction, respectively. For each pair, there is a MIR test that verifies this optimization. Differential Revision: https://reviews.llvm.org/D99272 Change-Id: Ie97a20c8c716c08492fe229c22e14e3c98ef08b7	2021-04-30 17:29:58 +01:00
Pavel Iliin	2ec16103c6	[AArch64] Peephole rule to remove redundant cmp after cset. Comparisons to zero or one after cset instructions can be safely removed in examples like: cset w9, eq cset w9, eq cmp w9, #1 ---> <removed> b.ne .L1 b.ne .L1 cset w9, eq cset w9, eq cmp w9, #0 ---> <removed> b.ne .L1 b.eq .L1 Peephole optimization to detect suitable cases and get rid of that comparisons added. Differential Revision: https://reviews.llvm.org/D98564	2021-04-19 19:58:38 +01:00
Andrew Savonichev	f037b07b5c	Revert "[AArch64] Add Machine InstCombiner patterns for FMUL indexed variant" This reverts commit `cca9b5985c`. Buildbot reported an error for CodeGen/AArch64/machine-combiner-fmul-dup.mir: * Bad machine code: Virtual register killed in block, but needed live out. * - function: indexed_2s - basic block: %bb.0 entry (0x640fee8) Virtual register %7 is used after the block. * Bad machine code: Virtual register defs don't dominate all uses. * - function: indexed_2s - v. register: %7 LLVM ERROR: Found 2 machine code errors.	2021-04-12 16:28:49 +03:00
Andrew Savonichev	cca9b5985c	[AArch64] Add Machine InstCombiner patterns for FMUL indexed variant This patch adds DUP+FMUL => FMUL_indexed pattern to InstCombiner. FMUL_indexed is normally selected during instruction selection, but it does not work in cases when VDUP and VMUL are in different basic blocks. Differential Revision: https://reviews.llvm.org/D99662	2021-04-12 16:08:39 +03:00
Fangrui Song	5d44c92bf8	Change void getNoop(MCInst &NopInst) to MCInst getNop() Prefer (self-documenting) return values to output parameters (which are liable to be used). While here, rename Noop to Nop which is more widely used and improves consistency with hasEmitNops/setEmitNops/emitNop/etc.	2021-03-15 12:05:34 -07:00
Amara Emerson	5d6d9b63a3	[GlobalISel] Propagate extends through G_PHIs into the incoming value blocks. This combine tries to do inter-block hoisting of extends of G_PHIs, into the originating blocks of the phi's incoming value. The idea is to expose further optimization opportunities that are normally obscured by the PHI. Some basic heuristics, and a target hook for AArch64 is added, to allow tuning. E.g. if the extend is used by a G_PTR_ADD, it doesn't perform this combine since it may be folded into the addressing mode during selection. There are very minor code size improvements on AArch64 -Os, but the real benefit is that it unlocks optimizations like AArch64 conditional compares on some benchmarks. Differential Revision: https://reviews.llvm.org/D95703	2021-02-12 11:52:52 -08:00
Nicholas Guy	cd880442ae	[CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold Different targets might handle branch performance differently, so this patch allows for targets to specify the TailDuplicateSize threshold. Said threshold defines how small a branch can be and still be duplicated to generate straight-line code instead. This patch also specifies said override values for the AArch64 subtarget. Differential Revision: https://reviews.llvm.org/D95631	2021-02-08 13:28:00 +00:00
Jordan Rupprecht	752fafda3d	[NFC] Fix -Wsometimes-uninitialized After `49142991a6`, clang detects that MUL may be uninitialized. Set it to nullptr to suppress this check. Adding an assert to check that it is ultimately set fails two test cases. Since this is not a new issue, leave the assertion commented out until a code owner can fix the bug. The two failing test cases are noted in the assertion comment.	2021-01-13 20:32:38 -08:00
Kazu Hirata	4c1617dac8	[llvm] Use std::any_of (NFC)	2021-01-13 19:14:44 -08:00
Hsiangkai Wang	914e2f5a02	[NFC] Use generic name for scalable vector stack ID. Differential Revision: https://reviews.llvm.org/D94471	2021-01-13 10:57:43 +08:00
Mark Murray	af7cce2fa4	[AArch64] Add +pauth archictecture option, allowing the v8.3a pointer authentication extension. Differential Revision: https://reviews.llvm.org/D94083	2021-01-08 13:21:11 +00:00
Bradley Smith	c73ae747cb	[AArch64][SVE] Add optimization to remove redundant ptest instructions Co-Authored-by: Graham Hunter <graham.hunter@arm.com> Co-Authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D93292	2021-01-05 15:28:36 +00:00
Kazu Hirata	eb198f4c3c	[llvm] Use llvm::any_of (NFC)	2021-01-04 11:42:47 -08:00
Kazu Hirata	966f1431de	[Target] Use llvm::erase_if (NFC)	2020-12-20 17:43:22 -08:00
Chen Zheng	4830d458dd	[MachineCombiner][NFC] Add MustReduceRegisterPressure goal add a new goal MustReduceRegisterPressure for machine combiner pass. PowerPC will use this new goal to do some register pressure related optimization. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D92068	2020-12-14 00:02:42 -05:00
Sander de Smalen	73b6cb67dc	[NFCI] Replace AArch64StackOffset by StackOffset. This patch replaces the AArch64StackOffset class by the generic one defined in TypeSize.h. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D88983	2020-11-04 08:49:00 +00:00
Anna Thomas	35cb45c533	[ImplicitNullChecks] Support complex addressing mode The pass is updated to handle loads through complex addressing mode, specifically, when we have a scaled register and a scale. It requires two API updates in TII which have been implemented for X86. See added IR and MIR testcases. Tests-Run: make check Reviewed-By: reames, danstrushin Differential Revision: https://reviews.llvm.org/D87148	2020-10-07 20:55:38 -04:00
Momchil Velikov	a88c722e68	[AArch64] PAC/BTI code generation for LLVM generated functions PAC/BTI-related codegen in the AArch64 backend is controlled by a set of LLVM IR function attributes, added to the function by Clang, based on command-line options and GCC-style function attributes. However, functions, generated in the LLVM middle end (for example, asan.module.ctor or __llvm_gcov_write_out) do not get any attributes and the backend incorrectly does not do any PAC/BTI code generation. This patch record the default state of PAC/BTI codegen in a set of LLVM IR module-level attributes, based on command-line options: * "sign-return-address", with non-zero value means generate code to sign return addresses (PAC-RET), zero value means disable PAC-RET. * "sign-return-address-all", with non-zero value means enable PAC-RET for all functions, zero value means enable PAC-RET only for functions, which spill LR. * "sign-return-address-with-bkey", with non-zero value means use B-key for signing, zero value mean use A-key. This set of attributes are always added for AArch64 targets (as opposed, for example, to interpreting a missing attribute as having a value 0) in order to be able to check for conflicts when combining module attributed during LTO. Module-level attributes are overridden by function level attributes. All the decision making about whether to not to generate PAC and/or BTI code is factored out into AArch64FunctionInfo, there shouldn't be any places left, other than AArch64FunctionInfo, which directly examine PAC/BTI attributes, except AArch64AsmPrinter.cpp, which is/will-be handled by a separate patch. Differential Revision: https://reviews.llvm.org/D85649	2020-09-25 11:47:14 +01:00
Philip Reames	e1a3271ebb	[AArch64] Teach analyzeBranch to remove branch equivelent to fallthrough The motivation here is that MachineBlockPlacement relies on analyzeBranch to remove branches to fallthrough blocks when the branch is not fully analyzeable. With the introduction of the FAULTING_OP psuedo for implicit null checking (see D87861), this case becomes important. Note that it's hard to otherwise exercise this path as BranchFolding handle's any fully analyzeable branch sequence without using this interface. p.s. For anyone who saw my comment in the original review, what I thought was an issue in BranchFolding originally turned out to simply be a bug in my patch. (Now fixed.) Differential Revision: https://reviews.llvm.org/D88035	2020-09-22 14:38:27 -07:00
Philip Reames	b04c181ed7	[AArch64] Enable implicit null check transformation This change enables the generic implicit null transformation for the AArch64 target. As background for those unfamiliar with our implicit null check support: An implicit null check is the use of a signal handler to catch and redirect to a handler a null pointer. Specifically, it's replacing an explicit conditional branch with such a redirect. This is only done for very cold branches under frontend control w/appropriate metadata. FAULTING_OP is used to wrap the faulting instruction. It is modelled as being a conditional branch to reflect the fact it can transfer control in the CFG. FAULTING_OP does not need to be an analyzable branch to achieve it's purpose. (Or at least, that's the x86 model. I find this slightly questionable.) When lowering to MC, we convert the FAULTING_OP back into the actual instruction, record the labels, and lower the original instruction. As can be seen in the test changes, currently the AArch64 backend does not eliminate the unconditional branch to the fallthrough block. I've tried two approaches, neither of which worked. I plan to return to this in a separate change set once I've wrapped my head around the interactions a bit better. (X86 handles this via AllowModify on analyzeBranch, but adding the obvious code causing BranchFolding to crash. I haven't yet figured out if it's a latent bug in BranchFolding, or something I'm doing wrong.) Differential Revision: https://reviews.llvm.org/D87851	2020-09-17 16:00:19 -07:00
Philip Reames	e6bc7037d3	[AArch64] Statepoint support for AArch64. Differential Revision: https://reviews.llvm.org/D66012 Patch By: loicottet (with major rebase by me)	2020-09-14 16:43:08 -07:00
Owen Anderson	5987da8764	Revert "Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain"" This reverts commit `bc9a29b9ee`. The reasoning that this patch was wrong was itself incorrect (see discussion on llvm-commits). This patch does seem to be exposing a latent SVE code generation bug on non-public tests, which should not block a correctness fix for public, non-SVE use cases.	2020-09-01 19:29:03 +00:00
Paul Walker	bc9a29b9ee	Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain" This reverts commit `e9d9a61208`. This patch was previously revert by `04879086b4` with the reapplication being done after breaking the assert used to ensure SP is always 16-byte aligned, which is a requirement of the AAPCS. For extra context the latest patch caused runtime failures when building with "-march=armv8-a+sve -mllvm -aarch64-sve-vector-bits-min=256".	2020-09-01 16:09:37 +01:00
Owen Anderson	e9d9a61208	Reapply D70800: Fix AArch64 AAPCS frame record chain Original Commit Message: After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register) may be placed in the middle of a stack frame if a function has both callee-saved general-purpose registers and floating point registers. This will break the stack unwinders that simply walk through the frame records (based on the guarantee from AAPCS64 "The Frame Pointer" section). This commit fixes the problem by adding the frame record offset. Patch By: logan Differential Revision: D70800	2020-08-27 17:29:41 +00:00
Puyan Lotfi	7bc03f5553	[MachineOutliner][AArch64] WA for multiple stack fixup cases in MachineOutliner. In cases where MachineOutliner candidates either are: * noreturn * have calls with no available LR or free regs * Don't use SP we can end up hitting stack fixup code for the caller and the callee for a FrameID of MachineOutlinerDefault. This triggers the assert: `assert(OF.FrameConstructionID != MachineOutlinerDefault && "Can only fix up stack references once");` in AArch64InstrInfo.cpp. This assert exists for now because a lot of the fixup code is not tested to handle fixing up more than once and needs some better checks and enhancements to avoid potentially generating illegal code. I've filed a Bugzilla report to track this until these cases are handled by the AArch64 MachineOutliner: https://bugs.llvm.org/show_bug.cgi?id=46767 This diff detects cases that will cause these multiple stack fixups and prune the Candidates from `RepeatedSequenceLocs`. Differential Revision: https://reviews.llvm.org/D83923	2020-08-10 15:43:30 -04:00
Florian Hahn	f7658241cb	[AArch64] Consider instruction-level contract FMFs in combiner patterns. Currently, instruction level fast math flags are not considered when generating patterns for the machine combiner. This currently leads to some missed opportunities to generate FMAs in combination with `#pragma clang fp contract (fast)`. For example, when building the example below with -O3 for AArch64, no FMADD is generated. If built with -O2 and the DAGCombiner is used instead of the MachineCombiner for FMAs, an FMADD is generated. With this patch, the same code is generated in both cases. float madd_contract(float a, float b, float c) { #pragma clang fp contract (fast) return (a * b) + c; } Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84930	2020-08-04 10:25:16 +01:00
Tim Northover	0f1494be43	AArch64: avoid UB shift of negative value Left shifting a negative value is undefined behaviour, so this just moves the negation afterwards to avoid it.	2020-07-27 13:49:50 +01:00
Eli Friedman	993c1a3219	[AArch64][SVE] Teach copyPhysReg to copy ZPR2/3/4. It's sort of tricky to hit this in practice, but not impossible. I have a synthetic C testcase if anyone is interested. The implementation is identical to the equivalent NEON register copies. Differential Revision: https://reviews.llvm.org/D84373	2020-07-23 16:41:37 -07:00
Eric Christopher	cf23852587	[Target] As part of using inclusive language within the llvm project, migrate away from the use of blacklist and whitelist. This change affects an internal llvm command line option.	2020-06-20 00:06:39 -07:00
Kristof Beyls	c35ed40f4f	[AArch64] Extend AArch64SLSHardeningPass to harden BLR instructions. To make sure that no barrier gets placed on the architectural execution path, each BLR x<N> instruction gets transformed to a BL __llvm_slsblr_thunk_x<N> instruction, with __llvm_slsblr_thunk_x<N> a thunk that contains __llvm_slsblr_thunk_x<N>: BR x<N> <speculation barrier> Therefore, the BLR instruction gets split into 2; one BL and one BR. This transformation results in not inserting a speculation barrier on the architectural execution path. The mitigation is off by default and can be enabled by the harden-sls-blr subtarget feature. As a linker is allowed to clobber X16 and X17 on function calls, the above code transformation would not be correct in case a linker does so when N=16 or N=17. Therefore, when the mitigation is enabled, generation of BLR x16 or BLR x17 is avoided. As BLRA* indirect calls are not produced by LLVM currently, this does not aim to implement support for those. Differential Revision: https://reviews.llvm.org/D81402	2020-06-12 07:34:33 +01:00
Kristof Beyls	0ee176edc8	[AArch64] Introduce AArch64SLSHardeningPass, implementing hardening of RET and BR instructions. Some processors may speculatively execute the instructions immediately following RET (returns) and BR (indirect jumps), even though control flow should change unconditionally at these instructions. To avoid a potential miss-speculatively executed gadget after these instructions leaking secrets through side channels, this pass places a speculation barrier immediately after every RET and BR instruction. Since these barriers are never on the correct, architectural execution path, performance overhead of this is expected to be low. On targets that implement that Armv8.0-SB Speculation Barrier extension, a single SB instruction is emitted that acts as a speculation barrier. On other targets, a DSB SYS followed by a ISB is emitted to act as a speculation barrier. These speculation barriers are implemented as pseudo instructions to avoid later passes to analyze them and potentially remove them. Even though currently LLVM does not produce BRAA/BRAB/BRAAZ/BRABZ instructions, these are also mitigated by the pass and tested through a MIR test. The mitigation is off by default and can be enabled by the harden-sls-retbr subtarget feature. Differential Revision: https://reviews.llvm.org/D81400	2020-06-11 07:51:17 +01:00
hsmahesha	0ed2c04636	[AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes Summary: While clustering mem ops, AMDGPU target needs to consider number of clustered bytes to decide on max number of mem ops that can be clustered. This patch adds support to pass number of clustered bytes to target mem ops clustering logic. Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar Reviewed By: foad Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80545	2020-06-01 22:52:34 +05:30
Tim Northover	dace8224f3	AArch64: materialize large stack offset into xzr correctly. When a stack offset was too big to materialize in a single instruction, we were trying to do it in stages: adds xD, sp, #imm adds xD, xD, #imm Unfortunately, if xD is xzr then the second instruction doesn't exist and wouldn't do what was needed if it did. Instead we can use a temporary register for all but the last addition.	2020-06-01 09:30:05 +01:00
Cullen Rhodes	8a397b66b2	[AArch64][SVE] Add support for spilling/filling ZPR2/3/4 Summary: This patch enables the register allocator to spill/fill lists of 2, 3 and 4 SVE vectors registers to/from the stack. This is implemented with pseudo instructions that get expanded to individual LDR_ZXI/STR_ZXI instructions in AArch64ExpandPseudoInsts. Patch by Sander de Smalen. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D75988	2020-05-28 10:02:57 +00:00
Fangrui Song	0840d725c4	[MC] Change MCCFIInstruction::createDefCfaOffset to cfiDefCfaOffset which does not negate Offset The negative Offset has caused a bunch of problems and confused quite a few call sites. Delete the unneeded negation and fix all call sites.	2020-05-22 17:07:11 -07:00
Eli Friedman	f09d220c71	[AArch64][SVE] Fill out missing unpredicated load/store patterns. The set of patterns for unpredicated load/store was incomplete: it only included non-extending stores. Fill out the remaining patterns for extending stores, and add the corresponding support to frame offset lowering. Differential Revision: https://reviews.llvm.org/D80349	2020-05-21 13:29:30 -07:00
Eli Friedman	b4f9b34701	[AArch64] Fix unwind info generated by outliner. The offsets were wrong. The result is now the same as what the compiler would generate for a function that spills lr normally. Differential Revision: https://reviews.llvm.org/D80238	2020-05-20 16:39:00 -07:00
Eli Friedman	5d2c3a0b8c	[AArch64] Disable MachineOutliner on Windows. The handling of unwind info is broken, so disable it for now.	2020-05-19 13:49:03 -07:00
Vedant Kumar	c0fa447e02	AArch64: Remove reversedInstructionsWithoutDebug helper When using reversedInstructionsWithoutDebug to construct a range from a pair of MachineInstrBundleIterators, the range unexpectedly leaves out an element. This results in mis-optimization as @mstorsjo points out in https://reviews.llvm.org/D78157. The problem is that when we convert a MachineInstrBundleIterator to a reverse iterator, the result gets incremented: MachineInstrBundleIterator(++I.getReverse()) The comment there explains that the "resulting iterator will dereference ... to the previous node, which is somewhat unexpected; but converting the two endpoints in a range will give the same range in reverse". This makes it hard to understand what reversedInstructionsWithoutDebug will do: I've removed the helper to prevent similar mistakes in the future.	2020-04-24 11:28:17 -07:00

1 2 3 4 5 ...

426 Commits