llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	e1e4bf174b	[DAGCombine] Prevent the transform of combine for multi-use operand The test is based on a miscompile example in: https://llvm.org/PR51321 Differential Revision: https://reviews.llvm.org/D107692	2021-09-06 15:30:32 -04:00
Jonas Paulsson	118997d8e9	[SelectionDAGBuilder] Bugfix in visitInlineAsm() In case of a virtual register tied to a phys-def, the register class needs to be computed. Make sure that this works generally also with fast regalloc by using TLI.getRegClassFor() whenever possible, and make only the case of 'Untyped' use getMinimalPhysRegClass(). Fixes https://bugs.llvm.org/show_bug.cgi?id=51699. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D109291	2021-09-06 17:46:31 +02:00
David Green	1b83aaaefa	[DAG] Remove oneuse check in select_cc setgt X, -1, C, ~C fold This appears to produce better code, even if the condition may need to be replicated.	2021-09-05 16:18:31 +01:00
David Green	8523fb96a6	[DAG] Fold select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C Given a select_cc producing a constant and a invertion of the constant for a comparison more than zero, we can produce an xor with ashr instead, which produces smaller code. The ashr either sets all bits or clear all bits depending on if the value is negative. This is then xor'd with the constant to optionally negate the value. https://alive2.llvm.org/ce/z/DTFaBZ This includes a OneUseCheck on the Cmp, which seems to make thinks a little worse and will be removed in a followup. Differential Revision: https://reviews.llvm.org/D109149	2021-09-05 16:04:01 +01:00
David Green	79845ed6df	[DAG] Fold setcc eq with ashr to compare to zero. Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 -> set_cc setlt X, 0 to prevent some regressions later on when folding select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C Differential Revision: https://reviews.llvm.org/D109214	2021-09-05 14:06:47 +01:00
Fangrui Song	e03c8d309a	[AsmPrinter] Remove unneeded MCSubtargetInfo temporary after D14346. NFC The temporary object was used as a workaround when the target parser may change STI. D14346 made the MCSubtargetInfo argument to createMCAsmParser const, so we no longer need the temporary object.	2021-09-04 10:50:10 -07:00
Konstantin Schwarz	90d5298759	[GlobalISel] Add convenience constructors to MemDesc This allows constructing a MemDesc from a MachineMemoryOperand, a pattern that starts to show up more frequently. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D109161	2021-09-03 12:52:18 +02:00
Chen Zheng	34badc409c	Revert "[HardwareLoops] Change order of SCEV expression construction for InitLoopCount." This causes https://bugs.llvm.org/show_bug.cgi?id=51714 and is not a right patch according to comments in D91724 This reverts commit `42eaf4fe0a`.	2021-09-03 02:55:43 +00:00
Jessica Paquette	844d8e0337	[GlobalISel] Combine icmp eq/ne x, 0/1 -> x when x == 0 or 1 This adds the following combines: ``` x = ... 0 or 1 c = icmp eq x, 1 -> c = x ``` and ``` x = ... 0 or 1 c = icmp ne x, 0 -> c = x ``` When the target's true value for the relevant types is 1. This showed up in the following situation: https://godbolt.org/z/M5jKexWTW SDAG currently supports the `ne` case, but not the `eq` case. This can probably be further generalized, but I don't feel like thinking that hard right now. This gives some minor code size improvements across the board on CTMark at -Os for AArch64. (0.1% for 7zip and pairlocalalign in particular.) Differential Revision: https://reviews.llvm.org/D109130	2021-09-02 15:05:31 -07:00
Heejin Ahn	28780e59f6	[WebAssembly] Add Wasm SjLj support This add support for SjLj using Wasm exception handling instructions: https://github.com/WebAssembly/exception-handling/blob/master/proposals/exception-handling/Exceptions.md This does not yet support the mixed use of EH and SjLj within a function. It will be added in a follow-up CL. This currently passes all SjLj Emscripten tests for wasm0/1/2/3/s, except for the below: - `test_longjmp_standalone`: Uses Node - `test_dlfcn_longjmp`: Uses NodeRAWFS - `test_longjmp_throw`: Mixes EH and SjLj - `test_exceptions_longjmp1`: Mixes EH and SjLj - `test_exceptions_longjmp2`: Mixes EH and SjLj - `test_exceptions_longjmp3`: Mixes EH and SjLj Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D108960	2021-09-02 10:51:02 -07:00
David Green	9cb8f4d1ad	[ARM] Add a tail-predication loop predicate register The semantics of tail predication loops means that the value of LR as an instruction is executed determines the predicate. In other words: mov r3, #3 DLSTP lr, r3 // Start tail predication, lr==3 VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0. mov lr, #1 VADD.s32 q0, q1, q2 // Only first lane is updated. This means that the value of lr cannot be spilled and re-used in tail predication regions without potentially altering the behaviour of the program. More lanes than required could be stored, for example, and in the case of a gather those lanes might not have been setup, leading to alignment exceptions. This patch adds a new lr predicate operand to MVE instructions in order to keep a reference to the lr that they use as a tail predicate. It will usually hold the zeroreg meaning not predicated, being set to the LR phi value in the MVETPAndVPTOptimisationsPass. This will prevent it from being spilled anywhere that it needs to be used. A lot of tests needed updating. Differential Revision: https://reviews.llvm.org/D107638	2021-09-02 13:42:58 +01:00
Roman Lebedev	3f1f08f0ed	Revert @llvm.isnan intrinsic patchset. Please refer to https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html (and that whole thread.) TLDR: the original patch had no prior RFC, yet it had some changes that really need a proper RFC discussion. It won't be productive to discuss such an RFC, once it's actually posted, while said patch is already committed, because that introduces bias towards already-committed stuff, and the tree is potentially in broken state meanwhile. While the end result of discussion may lead back to the current design, it may also not lead to the current design. Therefore i take it upon myself to revert the tree back to last known good state. This reverts commit `4c4093e6e3`. This reverts commit `0a2b1ba33a`. This reverts commit `d9873711cb`. This reverts commit `791006fb8c`. This reverts commit `c22b64ef66`. This reverts commit `72ebcd3198`. This reverts commit `5fa6039a5f`. This reverts commit `9efda541bf`. This reverts commit `94d3ff09cf`.	2021-09-02 13:53:56 +03:00
Fraser Cormack	ef78f2106c	[LegalizeTypes][VP] Add splitting support for binary VP ops This patch extends D107904's introduction of vector-predicated (VP) operation legalization to include vector splitting. When the result of a binary VP operation needs splitting, all of its operands are split in kind. The two operands and the mask are split as usual, and the vector-length parameter EVL is "split" such that the low and high halves each execute the correct number of elements. Tests have been added to the RISC-V target to show splitting several scenarios for fixed- and scalable-vector types. Without support for `umax` (e.g. in the `B` extension) the generated code starts to branch. Ideally a cost model would prevent their insertion in the first place. Through these tests many opportunities for better codegen can be seen: combining known-undef VP operations and for constant-folding operations on `ISD::VSCALE`, to name but a few. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D107957	2021-09-02 10:15:53 +01:00
Abinav Puthan Purayil	0baace5379	[DAGCombine] Add node level checks for fp-contract and fp-ninf in visitFMULForFMADistributiveCombine(). Differential Revision: https://reviews.llvm.org/D107551	2021-09-02 11:33:14 +05:30
Roman Lebedev	f5753125f0	[Codegen][TLI][X86] SimplifyMultipleUseDemandedBits(): 0'th vec subreg widening is free, try to perform it earlier I believe, the profitability reasoning here is correct "sub"reg is already located within the 0'th subreg of wider reg, so if we have suvector insertion at index 0 into undef, then it's always free do to. After this, D109065 finally avoids the regression in D108382. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D109074	2021-09-02 00:54:05 +03:00
Arthur Eubanks	52e6d70c40	[NFC] Use newly introduced *AtIndex methods Introduced in D108788. These are clearer.	2021-09-01 11:18:41 -07:00
Fraser Cormack	85fd44d7fe	[SelectionDAG][NFC] Fix typo in assertion message s/Uexpected/Unexpected.	2021-09-01 08:55:06 +01:00
Yonghong Song	89424a829f	[DWARF] Support new TAG DW_TAG_LLVM_annotation A new LLVM specific TAG DW_TAG_LLVM_annotation is added. The name is suggested by Paul Robinson ([1]). Currently, this tag is used to output __attribute__((btf_tag("string"))) annotations in dwarf. The following is an example for a global variable with two btf_tag attributes: 0x0000002a: DW_TAG_variable DW_AT_name ("g1") DW_AT_type (0x00000052 "int") DW_AT_external (true) DW_AT_decl_file ("/tmp/home/yhs/work/tests/llvm/btf_tag/t.c") DW_AT_decl_line (8) DW_AT_location (DW_OP_addr 0x0) 0x0000003f: DW_TAG_LLVM_annotation DW_AT_name ("btf_tag") DW_AT_const_value ("tag1") 0x00000048: DW_TAG_LLVM_annotation DW_AT_name ("btf_tag") DW_AT_const_value ("tag2") 0x00000051: NULL In the future, DW_TAG_LLVM_annotation may encode other type of non-string const value. [1] https://lists.llvm.org/pipermail/llvm-dev/2021-June/151250.html Differential Revision: https://reviews.llvm.org/D106621	2021-08-31 19:22:17 -07:00
Stanislav Mekhanoshin	d170945bb2	[RegAlloc] Immediately delete dead instructions with live uses When RA eliminated a dead def it can either immediately delete the instruction itself or replace it with KILL to defer the actual removal. If this instruction has a virtual register use killing the register it will shrink the LI of the use. However, if the LI covers the instruction and extends beyond it the shrink will not happen. In fact that is impossible to shrink such use because of the KILL still using it. If later the LI of the use will be split at the KILL and the KILL itself is eliminated after that point the new live segment ends up at an invalid slot index. This extremely rare condition was hit after D106408 which has enabled rematerialization of such instructions. The replacement with KILL is only done for rematerialized defs which became dead and such rematerialization did not generally happen before. The patch deletes an instruction immediately if it is a result of rematerialization and has such use. An alternative would be to prohibit a split at a KILL instruction, but it looks like it is better to split a live range rather then keeping a killed instruction just in case it can be rematerialized further. Fixes PR51655. Differential Revision: https://reviews.llvm.org/D108951	2021-08-31 13:46:00 -07:00
Jessica Paquette	94d3ff09cf	[GlobalISel] Don't use G_FPTOSI in G_ISNAN legalization As noted in the comments in D108227, using G_FPTOSI produces wrong results for G_ISNAN. Drop the G_FPTOSI and perform the operation on integer types. Elsewhere in LLVM, a bitcast would be the appropriate choice (as it is in SDAG). GlobalISel does not distinguish between integer and FP types, so a bitcast would be meaningless here.	2021-08-31 10:26:42 -07:00
Hussain Kadhem	524ded7d01	[VP] implementation of sdag support for VP memory intrinsics Followup to D99355: SDAG support for vector-predicated load/store/gather/scatter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D105871	2021-08-31 17:01:50 +02:00
Nemanja Ivanovic	84d4ed1761	Revert "[DebugInfo] Emit DW_TAG_namelist and DW_TAG_namelist_item" This reverts commit `0a6fad754e`. It caused failures on a number of PowerPC bots.	2021-08-31 09:24:50 -05:00
Craig Topper	201f6446da	[LegalizeTypes][X86] Improve ExpandIntRes_FP_TO_SINT/ExpandIntRes_FP_TO_UINT when input is SoftPromoteHalf. Instead of splitting off the fp16 to float conversion and generating a libcall, we should split the operation into fp16 to float and float to integer operations. This will allow the float to integer conversion to go through any custom handling the target has. If the target doesn't have custom handling then we should come back to ExpandIntRes_FP_TO_SINT/ ExpandIntRes_FP_TO_UINT automatically to create the libcall. This avoids generating libcalls on 32-bit X86. These library functions may not exist in 32-bit libgcc. At least for LLVM, we never generate them when hardware floating point instructions are available. Differential Revision: https://reviews.llvm.org/D108933	2021-08-30 13:12:59 -07:00
Bjorn Pettersson	789f01283d	[SelectionDAG] Fix miscompile bugs related to smul.fix.sat with scale zero When expanding a SMULFIXSAT ISD node (usually originating from a smul.fix.sat intrinsic) we've applied some optimizations for the special case when the scale is zero. The idea has been that it would be cheaper to use an SMULO instruction (if legal) to perform the multiplication and at the same time detect any overflow. And in case of overflow we could use some SELECT:s to replace the result with the saturated min/max value. The only tricky part is to know if we overflowed on the min or max value, i.e. if the product is positive or negative. Unfortunately the implementation has been incorrect as it has looked at the product returned by the SMULO to determine the sign of the product. In case of overflow that product is truncated and won't give us the correct sign bit. This patch is adding an extra XOR of the multiplication operands, which is used to determine the sign of the non truncated product. This patch fixes PR51677. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D108938	2021-08-30 22:08:26 +02:00
Chih-Ping Chen	070090cfa5	[DebugInfo] Remove the restriction on the size of DIStringType in DebugHandlerBase::isUnsignedDIType. Differential Revision: https://reviews.llvm.org/D108559	2021-08-30 15:36:54 -04:00
Nikita Popov	0529e2e018	[InstrInfo] Use 64-bit immediates for analyzeCompare() (NFCI) The backend generally uses 64-bit immediates (e.g. what MachineOperand::getImm() returns), so use that for analyzeCompare() and optimizeCompareInst() as well. This avoids truncation for targets that support immediates larger 32-bit. In particular, we can avoid the bugprone value normalization hack in the AArch64 target. This is a followup to D108076. Differential Revision: https://reviews.llvm.org/D108875	2021-08-30 19:46:04 +02:00
Hongtao Yu	f39256e3a5	[CSSPGO] Avoid repeatedly computing md5 hash code for pseudo probe inline contexts. Md5 hashing is expansive. Using a hash map to look up already computed GUID for dwarf names. Saw a 2% build time improvement on an internal large application. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D108722	2021-08-30 10:11:47 -07:00
Kazu Hirata	c50faffb4e	[llvm] Remove redundant calls to str() and c_str() (NFC) Identified with readability-redundant-string-cstr.	2021-08-30 09:05:05 -07:00
Craig Topper	705d005781	[DAGCombiner][RISCV] Don't use vector types in DAGCombiner::tryStoreMergeOfLoads if we need a rotate. The check for whether a rotate is possible occurs before the memory legality checks for the integer type. So it's possible we decide we can use a rotate, but then fail the legality checks. If that happens we should not fall back to a vector type. This triggers an assertion in the rotate handling when it finds a vector type instead of an integer type. In theory we could use a shufflevector in place of the rotate, but right now I'd just like to fix the crash. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D108839	2021-08-30 08:47:15 -07:00
Djordje Todorovic	86f5288eae	[LiveDebugValues] Cleanup Transfers when removing Entry Value If we encounter a new debug value, describing the same parameter, we should stop tracking the parameter's Entry Value. At that point, in some cases, the Transfer which uses the parameter's Entry Value, is already emitted. Thanks to the RemoveRedundantDebugValues pass, many problems with incorrect instruction order and number of DBG_VALUEs are fixed. However, we still cannot rely on the rule that each new debug value is set by the previous non-debug instruction in Machine Basic Block. When new parameter debug value triggers removal of Backup Entry Value for the same parameter, do the cleanup of Transfers emitted from Backup Entry Values. Get the Transfer Instruction which created the new debug value and search for debug values already emitted from the to-be-deleted Backup Entry Value and attached to the Transfer Instruction. If found, delete the Transfer and remove "primary" Entry Value Var Loc from OpenRanges. This patch fixes PR47628. Patch by Nikola Tesic. Differential revision: https://reviews.llvm.org/D106856	2021-08-30 14:00:41 +02:00
Simon Pilgrim	7c25a32840	Fix MSVC "signed/unsigned mismatch" comparison warning. NFCI.	2021-08-30 12:11:09 +01:00
“bhkumarn”	0a6fad754e	[DebugInfo] Emit DW_TAG_namelist and DW_TAG_namelist_item This patch emits DW_TAG_namelist and DW_TAG_namelist_item for fortran namelist variables. DICompositeType is extended to support this fortran feature. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D108553	2021-08-30 13:40:39 +05:30
Matt Arsenault	1494298b51	GlobalISel: Remove check for empty functions as these are invalid IR	2021-08-27 09:27:06 -04:00
Carl Ritson	5d9de3ea18	[DAGCombine] Allow FMA combine with both FMA and FMAD Without this change only the preferred fusion opcode is tested when attempting to combine FMA operations. If both FMA and FMAD are available then FMA ops formed prior to legalization will not be merged post legalization as FMAD becomes the preferred fusion opcode. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D108619	2021-08-27 19:49:35 +09:00
Matt Arsenault	3fdcd9bb13	GlobalISel: Add CallBase to CallLoweringInfo The DAG version has this, and is necessary for call lowering to take advantage of any attributes at the call site.	2021-08-26 21:09:11 -04:00
Craig Topper	8bb24289f3	[SelectionDAG] Optimize bitreverse expansion to minimize the number of mask constants. We can halve the number of mask constants by masking before shl and after srl. This can reduce the number of mov immediate or constant materializations. Or reduce the number of constant pool loads for X86 vectors. I think we might be able to do something similar for bswap. I'll look at it next. Differential Revision: https://reviews.llvm.org/D108738	2021-08-26 09:33:24 -07:00
Andrew Wei	c9066c5d37	[CGP] Fix the crash for combining address mode when having cyclic dependency In the combination of addressing modes, when replacing the matched phi nodes, sometimes the phi node to be replaced has been modified. For example, there’s matcher set [A, B] and [C, A], which will have cyclic dependency: A is replaced by B and C will be replaced by A. Because we tried to match new phi node to another new phi node, we should ignore new phi nodes when mapping new phi node to old one. Reviewed By: skatkov Differential Revision: https://reviews.llvm.org/D108635	2021-08-26 22:52:42 +08:00
Jay Foad	985eb25546	[MachineScheduler] Fix tracing Consistently print a newline before "RegionInstrs:".	2021-08-26 09:27:01 +01:00
Heejin Ahn	2f88a30ca6	[WebAssembly] Extract longjmp handling in EmSjLj to a function (NFC) Emscripten SjLj and (soon-to-be-added) Wasm SjLj transformation share many steps: 1. Initialize `setjmpTable` and `setjmpTableSize` in the entry BB 2. Handle `setjmp` callsites 3. Handle `longjmp` callsites 4. Cleanup and update SSA 1, 3, and 4 are identical for Emscripten SjLj and Wasm SjLj. Only the step 2 is different. This CL extracts the current Emscripten SjLj's longjmp callsites handling into a function. The reason to make this a separate CL is, without this, the diff tool cannot compare things well in the presence of moved code and added code in the followup Wasm SjLj CL, and it ends up mixing them together, making the diff unreadable. Also fixes some typos and variable names. So far we've been calling the buffer argument to `setjmp` and `longjmp` `jmpbuf`, but the name used in the man page for those functions is `env`, so updated them to be consistent. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D108728	2021-08-25 15:45:38 -07:00
Heejin Ahn	c2c9a3fd9c	[WebAssembly] Rename wasm.catch.exn intrinsic back to wasm.catch The plan was to use `wasm.catch.exn` intrinsic to catch exceptions and add `wasm.catch.longjmp` intrinsic, that returns two values (setjmp buffer and return value), later to catch longjmps. But because we decided not to use multivalue support at the moment, we are going to use one intrinsic that returns a single value for both exceptions and longjmps. And even if it's not for that, I now think the naming of `wasm.catch.exn` is a little weird, because the intrinsic can still take a tag immediate, which means it can be used for anything, not only exceptions, as long as that returns a single value. This partially reverts D107405. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D108683	2021-08-25 14:19:22 -07:00
Sanjay Patel	e728d1a3e8	[DAGCombiner] create binop nodes with all of expected values This is another bug exposed by https://llvm.org/PR51612 (and the one that triggered the initial assertion) in the report. That example was suppressed with: `985b48f183` ...but these would still crash because we created nodes like UADDO without the expected 2 output values.	2021-08-25 16:14:22 -04:00
Sanjay Patel	985b48f183	[DAGCombiner] check uses more strictly on select-of-binop fold There are 2 bugs here: 1. We were not checking uses of operand 2 (the false value of the select). 2. We were not checking for multiple uses of nodes that produce >1 result. Correcting those is enough to avoid the crash in the reduced test based on: https://llvm.org/PR51612 The additional use check on operand 0 (the condition value of the select) should not strictly be necessary because we are only replacing one use with another (whether it makes performance sense to do the transform with that pattern is not clear). But as noted in the TODO, changing that uncovers another bug. Note: there's at least one more bug here - we aren't propagating EVTs correctly, but I plan to fix that in another patch.	2021-08-25 14:14:41 -04:00
Nick Desaulniers	846e562dcc	[Clang] add support for error+warning fn attrs Add support for the GNU C style __attribute__((error(""))) and __attribute__((warning(""))). These attributes are meant to be put on declarations of functions whom should not be called. They are frequently used to provide compile time diagnostics similar to _Static_assert, but which may rely on non-ICE conditions (ie. relying on compiler optimizations). This is also similar to diagnose_if function attribute, but can diagnose after optimizations have been run. While users may instead simply call undefined functions in such cases to get a linkage failure from the linker, these provide a much more ergonomic and actionable diagnostic to users and do so at compile time rather than at link time. Users instead may be able use inline asm .err directives. These are used throughout the Linux kernel in its implementation of BUILD_BUG and BUILD_BUG_ON macros. These macros generally cannot be converted to use _Static_assert because many of the parameters are not ICEs. The Linux kernel still needs to be modified to make use of these when building with Clang; I have a patch that does so I will send once this feature is landed. To do so, we create a new IR level Function attribute, "dontcall" (both error and warning boil down to one IR Fn Attr). Then, similar to calls to inline asm, we attach a !srcloc Metadata node to call sites of such attributed callees. The backend diagnoses these during instruction selection, while we still know that a call is a call (vs say a JMP that's a tail call) in an arch agnostic manner. The frontend then reconstructs the SourceLocation from that Metadata, and determines whether to emit an error or warning based on the callee's attribute. Link: https://bugs.llvm.org/show_bug.cgi?id=16428 Link: https://github.com/ClangBuiltLinux/linux/issues/1173 Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D106030	2021-08-25 10:34:18 -07:00
Jeremy Morse	0116ed0069	[DebugInfo][InstrRef] Don't use instr-ref for unoptimised functions InstrRefBasedLDV is marginally slower than VarlocBasedLDV when analysing optimised code -- however, it's much slower when analysing code compiled -O0. To avoid this: don't use instruction referencing for -O0 functions. In the "pure" case of unoptimised code, this won't really harm the debugging experience because most variables won't have been promoted off the stack, so can't go missing. It becomes more complicated when optimised code is inlined into functions marked optnone; however these are rare, and as -O0 doesn't run many optimisations there should be little damage to the debug experience as a result. I've taken the opportunity to refactor testing for instruction-referencing into a MachineFunction method, which seems the most appropriate place to put it. Differential Revision: https://reviews.llvm.org/D108585	2021-08-25 15:10:36 +01:00
Peilin Guo	4c4dbeeeea	[DAGCombine] Check the legality of the index of EXTRACT_SUBVECTOR For ISD::EXTRACT_SUBVECTOR, its second operand must be a constant multiple of the known-minimum vector length of the result type. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107795	2021-08-25 19:33:39 +08:00
Jeremy Morse	cc1e87bf55	[DebugInfo][InstrRef] Avoid stack-slot-coloring changing codegen due to DI Stack slot colouring adds "weight" to slots if a non-dbg-value instruction refers to it. This, unfortunately, means that DBG_PHI instructions can have an effect on codegen. The fix is very simple, replace isDebugValue with isDebugInstr. The regression test contains a scenario that reproduces this problem; I've represented both normal-debug mode and instr-ref debug mode instructions in comment lines prefixed with AAAAAA and BBBBBB, and un-comment them with sed to test that the two different modes produce the same behaviour. Differential Revision: https://reviews.llvm.org/D108627	2021-08-25 12:04:59 +01:00
Konstantin Schwarz	4b4bc1ea16	[GlobalISel] Do not generate illegal G_SEXTLOADs after legalization The sext_inreg_of_load combine did not have the isLegalOrBeforeLegalizer check, leading to the generation of potentially illegal G_SEXTLOADs when run after legalization. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D108626	2021-08-25 10:13:39 +02:00
Vang Thao	549f6a819a	[MachineCopyPropagation] Check CrossCopyRegClass for cross-class copys On some AMDGPU subtargets, copying to and from AGPR registers using another AGPR register is not possible. A intermediate VGPR register is needed for AGPR to AGPR copy. This is an issue when machine copy propagation forwards a COPY $agpr, replacing a COPY $vgpr which results in $agpr = COPY $agpr. It is removing a cross class copy that may have been optimized by previous passes and potentially creating an unoptimized cross class copy later on. To avoid this issue, check CrossCopyRegClass if a different register class will be needed for the copy. If so then avoid forwarding the copy when the destination does not match the desired register class and if the original copy already matches the desired register class. Issue seen while attempting to optimize another AGPR to AGPR issue: Live-ins: $agpr0 $vgpr0 = COPY $agpr0 $agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0 $agpr2 = COPY $vgpr0 $agpr3 = COPY $vgpr0 $agpr4 = COPY $vgpr0 After machine-cp: $vgpr0 = COPY $agpr0 $agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0 $agpr2 = COPY $agpr0 $agpr3 = COPY $agpr0 $agpr4 = COPY $agpr0 Machine-cp propagated COPY $agpr0 to replace $vgpr0 creating 3 AGPR to AGPR copys. Later this creates a cross-register copy from AGPR->VGPR->AGPR for each copy when the prior VGPR->AGPR copy was already optimal. Reviewed By: lkail, rampitec Differential Revision: https://reviews.llvm.org/D108011	2021-08-24 21:22:36 -07:00
Stanislav Mekhanoshin	92c1fd19ab	Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408	2021-08-24 11:09:02 -07:00
Simon Pilgrim	194b08000c	[DAG] LoadedSlice::canMergeExpensiveCrossRegisterBankCopy - replace getABITypeAlign with allowsMemoryAccess (PR45116) One of the cases identified in PR45116 - we don't need to limit load combines to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.	2021-08-24 15:28:30 +01:00

1 2 3 4 5 ...

31133 Commits