llvm-project

Commit Graph

Author	SHA1	Message	Date
WANG Xuerui	ed078c48f0	[LoongArch] Add insn aliases `jr` and `ret` Differential Revision: https://reviews.llvm.org/D131512	2022-08-11 10:02:45 +08:00
wanglei	8716513e65	[LoongArch] Implement branch analysis This allows a number of optimisation passes to work. E.g. BranchFolding and MachineBlockPlacement. Differential Revision: https://reviews.llvm.org/D131316	2022-08-09 14:03:09 +08:00
Nicolai Hähnle	5a4033c367	update-test-checks: safely handle tests with #if's There is at least one Clang test (clang/test/CodeGen/arm_acle.c) which has functions guarded by #if's that cause those functions to be compiled only for a subset of RUN lines. This results in a case where one RUN line has a body for the function and another doesn't. Treat this case as a conflict for any prefixes that the two RUN lines have in common. This change exposed a bug where functions with '$' in the name weren't properly recognized in ARM assembly (despite there being a test case that was supposed to catch the problem!). This bug is fixed as well. Differential Revision: https://reviews.llvm.org/D130089	2022-07-20 11:23:49 +02:00
Sanjay Patel	f0dd12ec5c	[x86] use zero-extending load of a byte outside of loops too (2nd try) The first attempt missed changing test files for tools (update_llc_test_checks.py). Original commit message: This implements the main suggested change from issue #56498. Using the shorter (non-extending) instruction with only -Oz ("minsize") rather than -Os ("optsize") is left as a possible follow-up. As noted in the bug report, the zero-extending load may have shorter latency/better throughput across a wide range of x86 micro-arches, and it avoids a potential false dependency. The cost is an extra instruction byte. This could cause perf ups and downs from secondary effects, but I don't think it is possible to account for those in advance, and that will likely also depend on exact micro-arch. This does bring LLVM x86 codegen more in line with existing gcc codegen, so if problems are exposed they are more likely to occur for both compilers. Differential Revision: https://reviews.llvm.org/D129775	2022-07-19 21:27:08 -04:00
wanglei	53d5aceb70	[LoongArch] Add LoongArch support to update_llc_test_checks Add LoongArch assembly scrubbing and triple support to update_llc_test_checks. Depends on D128432 Reviewed By: MaskRay, xen0n Differential Revision: https://reviews.llvm.org/D128433	2022-07-06 17:19:33 +08:00
Fangrui Song	ab3630dd41	[UpdateTestChecks][test] Remove stray ; before/after non-RUN-non-CHECK comments	2022-07-01 11:42:47 -07:00
Johannes Doerfert	b0ce93226c	[UpdateTestChecks] Handle prefix reuse for appended check lines When we appended check lines at the end we could not share prefixes before. This patch should make it possible and allow us to reduce some check line counts (especially for Clang/OpenMP tests). See also: https://reviews.llvm.org/D128686 Differential Revision: https://reviews.llvm.org/D128684	2022-06-28 17:18:12 -05:00
Matt Arsenault	56303223ac	llvm-reduce: Don't assert on functions which don't track liveness Use the query that doesn't assert if TracksLiveness isn't set, which needs to always be available. We also need to start printing liveins regardless of TracksLiveness.	2022-06-07 10:00:25 -04:00
Nicolai Hähnle	d53de9b7ac	update_mir_test_checks: Better handling of common prefixes Support the pattern where a test file uses multiple prefixes per run line: one prefix that is unique to the run line, and additional prefixes that are common with other run lines. Decide on a per-function basis which prefix(es) to emit, based on which run lines have the same output. Move the renaming of vregs earlier, so that we can compare the output as it would actually be printed in check lines. Differential Revision: https://reviews.llvm.org/D126411	2022-06-01 15:53:32 -05:00
Mircea Trofin	f15c60218d	[UpdateTestChecks] Auto-generate stub bodies for unused prefixes This is scoped to autogenerated tests. The goal is to support having each RUN line specify a list of check-prefixes where one can specify potentially redundant prefixes. For example, for X86, if one specified prefixes for both AVX1 and AVX2, and the codegen happened to match today, one of the prefixes would be used and the onther one not. If the unused prefix were dropped, and later, codegen differences were introduced, one would have to go figure out where to add what prefix (paraphrasing https://lists.llvm.org/pipermail/llvm-dev/2021-February/148326.html) To avoid getting errors due to unused prefixes, whole directories can be opted out (as discussed on that thread), but that means that tests that aren't autogenerated in such directories could have undetected unused prefix bugs. This patch proposes an alternative that both avoids the above, dir-level optout, and supports the main autogen scenario discussed first. The autogen tool appends at the end of the test file the list of unused prefixes, together with a note explaining that is the case. Each prefix is set up to always pass. This way, unexpected unused prefixes are easily discoverable, and expected cases "just work". Differential Revision: https://reviews.llvm.org/D124306	2022-05-26 10:23:10 -07:00
Craig Topper	c11051a400	[SelectionDAG] Add a freeze to ISD::ABS expansion. I had initially assumed this was the problem with https://github.com/llvm/llvm-project/issues/55271#issuecomment-1133426243 But it turns out that was a simpler issue. This patch is still more correct than what we were doing before so figured I'd submit it anyway. No test case because I'm not sure how to get an undef around until expansion. Looking at the test deltas I wonder if it be valid to combine (sext_inreg (freeze (aextload X))) -> (freeze (sextload X)). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D126175	2022-05-22 14:29:58 -07:00
Alex Richardson	996873cdcb	[UpdateTestChecks] Use a counter for unpredictable FileCheck variables Variable captures such as `<MCInst #` can change based on unrelated changes to the LLVM backends, to avoid the generated test cases being different use an incrementing counter for variable names instead of using the actual value from the output file. This change may also be beneficial for some nameless IR variables (especially when combined with filtering of output), but for now I've restricted this change to the obvious candidates (--asm-show-inst output). Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D125405	2022-05-14 14:40:50 +00:00
Alex Richardson	37a68497f1	[update_llc_test_checks] Use FileCheck captures for MCInst/MCReg output To avoid test churn when backends add/rename new instructions/registers, it makes sense to use FileCheck captures for the exact MCInst/Reg number. This is motivated by D125307, where I use --asm-show-inst to differentiate the output for multiple instructions with the same mnemonic. This does not quite fix the churn issue yet: While files with the generated checks will be immune to the numbers changing, the update script test still suffers from this problem since the number is encoded in the FileCheck variable name. I plan to address this in a follow-up patch. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D125307	2022-05-14 14:15:37 +00:00
Alex Richardson	f421659286	[update_llc_test_checks] Baseline test for --asm-show-inst To avoid test churn when backends add/rename new instructions/registers, it makes sense to scrub the exact MCInst/Reg number. Differential Revision: https://reviews.llvm.org/D125305	2022-05-14 14:15:37 +00:00
Roman Lebedev	5865a74755	Require asserts in newly added test	2022-04-15 15:56:37 +03:00
Roman Lebedev	8fbed6870b	[UpdateTestChecks] Prevent rapid onset insanity when forced to write LoopVectorize-driven costmodel tests Subj, or on other words, we have a lot of tests that are driven by the LoopVectorizer's debug output, but we don't have any meaningful way to autogenerate checklines in them, which means that an insurmountable amount of manual work is required when modifying the appropriate cost models. That is not sustainable, so this presents a solution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D121133	2022-04-15 15:37:29 +03:00
Daniil Kovalev	8e43cbab33	[UpdateTestChecks] Add NVPTX support in update_llc_test_checks.py This patch makes possible generating NVPTX assembly check lines with update_llc_test_checks.py utility. Differential Revision: https://reviews.llvm.org/D122986	2022-04-15 11:01:53 +03:00
Momchil Velikov	d0ea42a7c1	[AArch64] Async unwind - function epilogues Reviewed By: MaskRay, chill Differential Revision: https://reviews.llvm.org/D112330	2022-04-12 16:50:50 +01:00
Daniil Kovalev	62a983ebc5	Revert "[CodeGen] Place SDNode debug ID declaration under appropriate #if" This reverts commit `83a798d4b0`. As discussed in D120714 with @thakis, the patch added unneeded complexity without noticeable benefits.	2022-04-06 20:32:53 +03:00
Daniil Kovalev	83a798d4b0	[CodeGen] Place SDNode debug ID declaration under appropriate #if Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to reduce memory usage when it is not needed. Differential Revision: https://reviews.llvm.org/D120714	2022-04-06 14:09:32 +03:00
Johannes Doerfert	a81fff8afd	Reapply "[Intrinsics] Add `nocallback` to the default intrinsic attributes" This reverts commit `c5f789050d` and reapplies `7aea3ea8c3` with additional test changes.	2022-03-25 09:36:50 -05:00
Momchil Velikov	50a97aacac	[AArch64] Async unwind - function prologues Re-commit of `32e8b550e5` This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411	2022-03-24 16:16:44 +00:00
Venkata Ramanaiah Nalamothu	04fff547e2	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm, ronlieb Differential Revision: https://reviews.llvm.org/D114652	2022-03-09 12:18:02 +05:30
Craig Topper	8e132c5c1d	[LegalizeTypes][ARM][X86] Change ExpandIntRes_ABS to use sra+xor+sub. Previously we used sra+add+xor if ADDCARRY is supported. This changes to sra+xor+sub is SUBCARRY is available. This is consistent with the recent change to the default expansion in LegalizeDAG. Differential Revision: https://reviews.llvm.org/D121039	2022-03-07 11:28:32 -08:00
Roman Lebedev	2b5a16420f	UpdateTestChecks: fix handling of UTC with spaces We can't just split by space, that's not going to give us the same argv we'd have gotten from the shell, it could be in a string, we must actually parse that as argv.	2022-03-07 20:25:23 +03:00
Roman Lebedev	eadd1668d0	update_analyze_test_checks.py: fix UTC_ARGS handling They should be both used if provided in the input test and manifested in the updated test.	2022-03-07 18:22:19 +03:00
Roman Lebedev	df6c26fd34	update_analyze_test_checks.py: fix --filter handling In particular, after filtering the check lines can't necessarily use `-NEXT`, they may not be one directly after another.	2022-03-07 15:41:13 +03:00
Hans Wennborg	85c53c7092	Revert "[AArch64] Async unwind - function prologues" It caused builds to assert with: (StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624. when targeting iOS. See comment on the code review for reproducer. > This patch rearranges emission of CFI instructions, so the resulting > DWARF and `.eh_frame` information is precise at every instruction. > > The current state is that the unwind info is emitted only after the > function prologue. This is fine for synchronous (e.g. C++) exceptions, > but the information is generally incorrect when the program counter is > at an instruction in the prologue or the epilogue, for example: > > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > after the `stp` is executed the (initial) rule for the CFA still says > the CFA is in the `sp`, even though it's already offset by 16 bytes > > A correct unwind info could look like: > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > .cfi_def_cfa_offset 16 > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > Having this information precise up to an instruction is useful for > sampling profilers that would like to get a stack backtrace. The end > goal (towards this patch is just a step) is to have fully working > `-fasynchronous-unwind-tables`. > > Reviewed By: danielkiss, MaskRay > > Differential Revision: https://reviews.llvm.org/D111411 This reverts commit `32e8b550e5`.	2022-03-04 17:36:26 +01:00
Sebastian Neubauer	473efae3a1	[UpdateTestChecks] Don't skip first line with --filter body_start was never used, resulting in the first filtered line to be skipped. Fixes the --filter option introduced in D117694. Differential Revision: https://reviews.llvm.org/D119704	2022-03-03 17:27:14 +01:00
Momchil Velikov	63c9aca12a	Revert "[AArch64] Async unwind - function epilogues" This reverts commit `74319d6794`. It causes test failures that look like infinite loop in asan/hwasan unwinding.	2022-03-02 15:01:57 +00:00
Momchil Velikov	74319d6794	[AArch64] Async unwind - function epilogues Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112330	2022-03-02 13:15:11 +00:00
Sebastian Neubauer	c74f54f2f4	[UpdateTestChecks] Add requires asserts to tests The tests use debug-only, which is only available in debug mode. This should fix the failing tests.	2022-03-01 15:28:42 +01:00
Yatao Wang	8565b6f9f2	[UpdateLLCTestChecks] Add support for isel debug output in update_llc_test_checks.py Add a check on run lines to pick up isel options in llc commands and allow generating check lines of isel final output other than assembly. If llc command line contains -debug-only=isel, update_llc_test_checks.py will try to scrub isel output, otherwise, the script will fall back on default behaviour, which is try to scrub assembly output instead. The motivation of this change is to allow usage of update_llc_test_checks.py to autogenerate checks of instruction selection results. In this way, we can detect errors at an earlier stage before the compilation goes all the way to assembly. It is an example of having some transparency for the stages between IR and assembly. These generated tests are almost like "unit tests" of isel stage. This patch only implements the initial change to differentiate isel output from assembly output for Lanai. Other targets will not be supported for isel check generation at the moment. Although adding support for it will only require implementing the function regex and scrubber for corresponding targets. The Lanai implementation was chosen mainly for the simplicity of demonstrating the difference between isel checks and asm checks. This patch also do not include the implementation of function prefix, which is required for the generated isel checks to pass. I will put up a follow up revision for the function prefix change to complete isel support. Reviewed By: Flakebi Differential Revision: https://reviews.llvm.org/D119368	2022-03-01 10:55:53 +01:00
Momchil Velikov	32e8b550e5	[AArch64] Async unwind - function prologues This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411	2022-02-28 13:37:57 +00:00
Craig Topper	440c4b705a	[SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y). Previous we used sra (X, size(X)-1); xor (add (X, Y), Y). By placing sub at the end, we allow RISCV to combine sign_extend_inreg with it to form subw. Some X86 tests for Z - abs(X) seem to have improved as well. Other targets look to be a wash. I had to modify ARM's abs matching code to match from sub instead of xor. Maybe instead ISD::ABS should be made legal. I'll try that in parallel to this patch. This is an alternative to D119099 which was focused on RISCV only. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D119171	2022-02-20 21:11:23 -08:00
Momchil Velikov	6398903ac8	Extend the `uwtable` attribute with unwind table kind We have the `clang -cc1` command-line option `-funwind-tables=1\|2` and the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tables (1) or asynchronous unwind tables (2)`. However, this is encoded in LLVM IR by the presence or the absence of the `uwtable` attribute, i.e. we lose the information whether to generate want just some unwind tables or asynchronous unwind tables. Asynchronous unwind tables take more space in the runtime image, I'd estimate something like 80-90% more, as the difference is adding roughly the same number of CFI directives as for prologues, only a bit simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even more, if you consider tail duplication of epilogue blocks. Asynchronous unwind tables could also restrict code generation to having only a finite number of frame pointer adjustments (an example of not having a finite number of `SP` adjustments is on AArch64 when untagging the stack (MTE) in some cases the compiler can modify `SP` in a loop). Having the CFI precise up to an instruction generally also means one cannot bundle together CFI instructions once the prologue is done, they need to be interspersed with ordinary instructions, which means extra `DW_CFA_advance_loc` commands, further increasing the unwind tables size. That is to say, async unwind tables impose a non-negligible overhead, yet for the most common use cases (like C++ exceptions), they are not even needed. This patch extends the `uwtable` attribute with an optional value: - `uwtable` (default to `async`) - `uwtable(sync)`, synchronous unwind tables - `uwtable(async)`, asynchronous (instruction precise) unwind tables Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D114543	2022-02-14 14:35:02 +00:00
Roman Lebedev	09d20761eb	[llvm] Fix update_analyze_test_checks and add a test to prevent further breakage	2022-02-10 12:33:33 +03:00
Johannes Doerfert	d19e530240	[UpdateTestChecks][FIX] Expected output changed with Attributor	2022-02-01 02:18:57 -06:00
David Greene	ecd46edd61	[UpdateTestChecks] Re-add --filter and --filter-out options Re-add filtering options with fixes for failed tests. We were not passing the is_filtered argument in all check generator calls in update_cc_test_checks.py Enhance the various update_*_test_checks.py tools to allow filtering the tool output with regular expressions. The --filter option will emit only tool output lines matching the given regular expression while the --filter-out option will emit only tools output lines not matching the given regular expression. Filters are applied in order of appearance on the command line (or in UTC_ARGS) and the first matching filter terminates the search. This allows test authors to create more focused tests by removing irrelevant tool output and checking only the pieces of output necessary to test the desired functionality. Differential Revision: https://reviews.llvm.org/D117694	2022-01-31 13:11:40 -08:00
David Greene	7e32d2b21a	Revert "[UpdateTestChecks] Add --filter and --filter-out options" Broke some update-test-checks tests. Reverting while developing a fix. This reverts commit `030f71698d`.	2022-01-28 17:06:51 -08:00
David Greene	030f71698d	[UpdateTestChecks] Add --filter and --filter-out options Enhance the various update_*_test_checks.py tools to allow filtering the tool output with regular expressions. The --filter option will emit only tool output lines matching the given regular expression while the --filter-out option will emit only tools output lines not matching the given regular expression. Filters are applied in order of appearance on the command line (or in UTC_ARGS) and the first matching filter terminates the search. This allows test authors to create more focused tests by removing irrelevant tool output and checking only the pieces of output necessary to test the desired functionality. Differential Revision: https://reviews.llvm.org/D117694	2022-01-28 14:08:07 -08:00
Jonas Paulsson	f541a5048a	[SystemZ] Implement orderFrameObjects(). By reordering the objects on the stack frame after looking at the users, a better utilization of displacement operands will result. This means less needed Load Address instructions for the accessing of these objects. This is important for very large functions where otherwise small changes could cause a lot more/less accesses go out of range. Note: this is not yet enabled for SystemZXPLINKFrameLowering, but should be. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D115690	2022-01-27 16:09:19 -06:00
Bjorn Pettersson	407600604b	[test] Use -passes in lit tests for the UpdateTestChecks tool The UpdateTestChecks tool itself does not care about which pass manager that is used in the opt invocation. So the lit tests that are verifying the behavior of the UpdateTestChecks tool is updated to use the new-PM syntax (-passes=) when specifying the pass pipeline in the test cases that are used for verifying the UpdateTestChecks tool. Differential Revision: https://reviews.llvm.org/D114517	2021-11-27 09:52:55 +01:00
wangpc	af0ecfccae	[RISCV] Generate pseudo instruction li Add an alias of `addi [x], zero, imm` to generate pseudo instruction li, which makes assembly mush more readable. For existed tests, users can update them by running script `llvm/utils/update_llc_test_checks.py`. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D112692	2021-11-22 14:01:37 +08:00
Craig Topper	3ff9cc01f2	[X86] Use CMOVNS for abs instead of CMOVGE. CMOVGE reads SF and OF. CMOVNS only reads SF. This matches with other recent changes to use a single flag where possible. It also matches gcc codegen. I believe this technically changes whether the conditioanl move happens on INT_MIN, but for INT_MIN both registers are the same so it doesn't matter. Differential Revision: https://reviews.llvm.org/D111826	2021-10-14 12:28:28 -07:00
David Green	adec922361	[AArch64] Make -mcpu=generic schedule for an in-order core We would like to start pushing -mcpu=generic towards enabling the set of features that improves performance for some CPUs, without hurting any others. A blend of the performance options hopefully beneficial to all CPUs. The largest part of that is enabling in-order scheduling using the Cortex-A55 schedule model. This is similar to the Arm backend change from `eecb353d0e` which made -mcpu=generic perform in-order scheduling using the cortex-a8 schedule model. The idea is that in-order cpu's require the most help in instruction scheduling, whereas out-of-order cpus can for the most part out-of-order schedule around different codegen. Our benchmarking suggests that hypothesis holds. When running on an in-order core this improved performance by 3.8% geomean on a set of DSP workloads, 2% geomean on some other embedded benchmark and between 1% and 1.8% on a set of singlecore and multicore workloads, all running on a Cortex-A55 cluster. On an out-of-order cpu the results are a lot more noisy but show flat performance or an improvement. On the set of DSP and embedded benchmarks, run on a Cortex-A78 there was a very noisy 1% speed improvement. Using the most detailed results I could find, SPEC2006 runs on a Neoverse N1 show a small increase in instruction count (+0.127%), but a decrease in cycle counts (-0.155%, on average). The instruction count is very low noise, the cycle count is more noisy with a 0.15% decrease not being significant. SPEC2k17 shows a small decrease (-0.2%) in instruction count leading to a -0.296% decrease in cycle count. These results are within noise margins but tend to show a small improvement in general. When specifying an Apple target, clang will set "-target-cpu apple-a7" on the command line, so should not be affected by this change when running from clang. This also doesn't enable more runtime unrolling like -mcpu=cortex-a55 does, only changing the schedule used. A lot of existing tests have updated. This is a summary of the important differences: - Most changes are the same instructions in a different order. - Sometimes this leads to very minor inefficiencies, such as requiring an extra mov to move variables into r0/v0 for the return value of a test function. - misched-fusion.ll was no longer fusing the pairs of instructions it should, as per D110561. I've changed the schedule used in the test for now. - neon-mla-mls.ll now uses "mul; sub" as opposed to "neg; mla" due to the different latencies. This seems fine to me. - Some SVE tests do not always remove movprfx where they did before due to different register allocation giving different destructive forms. - The tests argument-blocks-array-of-struct.ll and arm64-windows-calls.ll produce two LDR where they previously produced an LDP due to store-pair-suppress kicking in. - arm64-ldp.ll and arm64-neon-copy.ll are missing pre/postinc on LPD. - Some tests such as arm64-neon-mul-div.ll and ragreedy-local-interval-cost.ll have more, less or just different spilling. - In aarch64_generated_funcs.ll.generated.expected one part of the function is no longer outlined. Interestingly if I switch this to use any other scheduled even less is outlined. Some of these are expected to happen, such as differences in outlining or register spilling. There will be places where these result in worse codegen, places where they are better, with the SPEC instruction counts suggesting it is not a decrease overall, on average. Differential Revision: https://reviews.llvm.org/D110830	2021-10-09 15:58:31 +01:00
Alex Richardson	547e5e4ae6	[update_llc_test_checks.py] Fix MIPS ASM regex for functions with EH On MIPS, functions with exception handling code emits an additional temporary label at the start of the function (due to UseAssignmentForEHBegin): _Z8do_catchv: # @_Z8do_catchv .Ltmp3: .set .Lfunc_begin0, .Ltmp3 .cfi_startproc .cfi_personality 128, DW.ref.__gxx_personality_v0 .cfi_lsda 0, .Lexception0 .frame $c11,48,$c17 .mask 0x00000000,0 .fmask 0x00000000,0 .set noreorder .set nomacro .set noat # %bb.0: # %entry The `[^:]*` regex was terminating the search after .Ltmp<N>: and therefore not detecting functions with exception handling. Reviewed By: atanasyan, MaskRay Differential Revision: https://reviews.llvm.org/D100027	2021-09-28 17:57:36 +01:00
Alex Richardson	ee3109b044	[update_llc_test_checks] Baseline test for D100027 Show that we fail to generate CHECK lines for MIPS64 functions with EH. Differential Revision: https://reviews.llvm.org/D110408	2021-09-28 17:57:36 +01:00
Sebastian Neubauer	ecd5145c27	[Utils] Replace llc with cat for tests Make the update_llc_test_checks script test independant of llc behavior by using cat with static files to simulate llc output. This allows changing llc without breaking the script test case. The update script is executed in a temporary directory, so the llc-generated assembly files are copied there. %T is deprecated, but it allows copying a file with a predictable filename. Differential Revision: https://reviews.llvm.org/D110143	2021-09-22 10:10:35 +02:00
Alex Richardson	817e23d481	[update_mir_test_checks.py] Use -NEXT FileCheck directories Previously the script emitted output using plain CHECK directives. This can result in a test passing even if there are some instructions between CHECK directives that should have been removed. It also makes debugging tests that have the output in a different order more difficult since FileCheck can match with a later line and then complain about the "wrong" directive not being found. This will cause quite large diffs when updating existing tests, but I'm not sure we need an opt-in flag here. Depends on D109765 (pre-commit tests) Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D109767	2021-09-20 12:55:56 +01:00

1 2 3

103 Commits