llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	0131498402	GlobalISel: Remove dead code Generic code should probably not introduce G_INSERT/G_EXTRACT. The mirror unpackRegs should also be removed, but AMDGPU still has a use remaining which needs to be fixed.	2021-03-01 17:06:43 -05:00
Sanjay Patel	154c47dc06	[SDAG] add helper for select->logic folds; NFC This set of transforms should be extended to handle vector types.	2021-03-01 16:24:15 -05:00
Wouter van Oortmerssen	a0f4526836	[WebAssembly] Fix split-dwarf not emitting DW_OP_WASM_location correctly It was using the regular path for target indices that uses uleb, but TI_GLOBAL_RELOC needs to be uint32_t. Introduced here: https://reviews.llvm.org/D85685 Fixes: https://github.com/emscripten-core/emscripten/issues/13240 Differential Revision: https://reviews.llvm.org/D97564	2021-03-01 11:53:30 -08:00
Arthur Eubanks	040c1b49d7	Move EntryExitInstrumentation pass location This seems to be more of a Clang thing rather than a generic LLVM thing, so this moves it out of LLVM pipelines and as Clang extension hooks into LLVM pipelines. Move the post-inline EEInstrumentation out of the backend pipeline and into a late pass, similar to other sanitizer passes. It doesn't fit into the codegen pipeline. Also fix up EntryExitInstrumentation not running at -O0 under the new PM. PR49143 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D97608	2021-03-01 10:08:10 -08:00
Craig Topper	e745f7c563	[LegalizeTypes] Improve ExpandIntRes_XMULO codegen. The code previously used two BUILD_PAIRs to concatenate the two UMULO results with 0s in the lower bits to match original VT. Then it created an ADD and a UADDO with the original bit width. Each of those operations need to be expanded since they have illegal types. Since we put 0s in the lower bits before the ADD, the lower half of the ADD result will be 0. So the lower half of the UADDO result is solely determined by the other operand. Since the UADDO need to be split in half, we don't really needd an operation for the lower bits. Unfortunately, we don't see that in type legalization and end up creating something more complicated and DAG combine or lowering aren't always able to recover it. This patch directly generates the narrower ADD and UADDO to avoid needing to legalize them. Now only the MUL is done on the original type. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97440	2021-03-01 09:54:32 -08:00
Matt Arsenault	361cfdf228	GlobalISel: Verify G_CONCAT_VECTORS has at least 2 sources	2021-03-01 09:10:36 -05:00
Matt Arsenault	6c260d3bc0	GlobalISel: Move splitToValueTypes to generic code I copied the nearly identical function from AArch64 into AMDGPU, so fix this duplication. Mips and X86 have their own more exotic versions which should be removed. However replacing those is better left for a separate patch since it requires other changes to avoid regressions.	2021-03-01 08:58:18 -05:00
Simon Pilgrim	9dd83f5ee8	[DAG] visitVECTOR_SHUFFLE - attempt to match commuted shuffles with MergeInnerShuffle. Try to match "shuffle(C, shuffle(A, B, M0), M1) -> shuffle(A, B, M2)" etc. by using MergeInnerShuffle's commuted inner shuffle mode.	2021-03-01 10:42:11 +00:00
Fraser Cormack	6718fda6ad	[CodeGen] Fix issues with subvector intrinsic index types This patch addresses issues arising from the fact that the index type used for subvector insertion/extraction is inconsistent between the intrinsics and SDNodes. The intrinsic forms require i64 whereas the SDNodes use the type returned by SelectionDAG::getVectorIdxTy. Rather than update the intrinsic definitions to use an overloaded index type, this patch fixes the issue by transforming the index to the correct type as required. Any loss of index bits going from i64 to a smaller type is unexpected, and will be caught by an assertion in SelectionDAG::getVectorIdxConstant. The patch also updates the documentation for INSERT_SUBVECTOR and adds an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR. This necessitated changes to AArch64 which was using i64 for EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed its codegen after updating the backend accordingly. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D97459	2021-03-01 10:28:21 +00:00
Serguei Katkov	65fb706231	[Statepoint Lowering] Consider dead deopt gc values together with other gc values Currently dead gc value mentioned in the deopt section are not listed in gc section and so are processed separately. With this CL all deopt gc values are considered as base pointers and processed in the same way as other gc values. The fact that deopt gc pointer is a base pointer was used all the time but it is explicitly documented here by putting the value in SI.Base. The idea of the patch comes from Philip Reames. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D97554	2021-03-01 17:23:02 +07:00
Simon Pilgrim	64c41301ce	[DAG] visitVECTOR_SHUFFLE - move shuffle canonicalization/merges all under the same legality test. NFCI. Minor cleanup to move related combines closer together to make it more coherent, without changing the ordering.	2021-03-01 09:42:00 +00:00
Max Kazantsev	9fac8496ea	[NFC] Detect IV increment expressed as uadd_with_overflow and usub_with_overflow Current callers do not call it with such argument, so this is NFC. But for further changes, it can be very useful to detect such cases.	2021-03-01 13:24:01 +07:00
Max Kazantsev	8d835f42a5	[NFC] Introduce function getIVStep for further reuse	2021-03-01 13:04:56 +07:00
Max Kazantsev	fdbad5e5ac	[NFC] Whitespace fix	2021-03-01 12:14:03 +07:00
Max Kazantsev	2892fcc204	[NFC] Factor out IV detector function for further reuse	2021-03-01 12:11:54 +07:00
Serguei Katkov	06c5119c76	[Statepoint lowering] Require spill of deopt value in case its type is not legal If the type of the deopt operand has an illegal type and we want to use register for it then it needs to be legalized. This is not supported currently by legalizer and it is not actually clear how to legalize this type of values. Instead we just spill such values and use spill slot location in statepoint. Originally tests were created by Philip Reames. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D97541	2021-03-01 10:23:53 +07:00
Craig Topper	5de09ef02e	[DAGCombiner][X86] Don't peek through ANDs on the shift amount in matchRotateSub when called from MatchFunnelPosNeg. Peeking through AND is only valid if the input to both shifts is the same. If the inputs are different, then the original pattern ORs the two values when the masked shift amount is 0. This is ok if the values are the same since the OR would be a NOP which is why its ok for rotate. Fixes PR49365 and reverts PR34641 Differential Revision: https://reviews.llvm.org/D97637	2021-02-28 12:58:00 -08:00
Kazu Hirata	d639120983	[llvm] Use set_is_subset (NFC)	2021-02-28 10:59:20 -08:00
Craig Topper	ca5247bb17	[DAGCombiner] Don't skip no overflow check on UMULO if the first computeKnownBits call doesn't return any 0 bits. Even if the first computeKnownBits call doesn't have any zero bits it is possible the other operand has bitwidth-1 leading zero. In that case overflow is still impossible. So always call computeKnownBits for both operands.	2021-02-28 08:26:22 -08:00
Heejin Ahn	aa097ef8d4	[WebAssembly] Fix reverse mapping in WasmEHFuncInfo D97247 added the reverse mapping from unwind destination to their source, but it had a critical bug; sources can be multiple, because multiple BBs can have a single BB as their unwind destination. This changes `WasmEHFuncInfo::getUnwindSrc` to `getUnwindSrcs` and makes it return a vector rather than a single BB. It does not return the const reference to the existing vector but creates a new vector because `WasmEHFuncInfo` stores not `BasicBlock` or `MachineBasicBlock` but `PointerUnion` of them. Also I hoped to unify those methods for `BasicBlock` and `MachineBasicBlock` into one using templates to reduce duplication, but failed because various usages require `BasicBlock*` to be `const` but it's hard to make it `const` for `MachineBasicBlock` usages. Fixes https://github.com/emscripten-core/emscripten/issues/13514. (More precisely, fixes https://github.com/emscripten-core/emscripten/issues/13514#issuecomment-784708744) Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97583	2021-02-26 17:12:10 -08:00
Fangrui Song	47c5576d7d	ELF: Create unique SHF_GNU_RETAIN sections for llvm.used global objects If a global object is listed in `@llvm.used`, place it in a unique section with the `SHF_GNU_RETAIN` flag. The section is a GC root under `ld --gc-sections` with LLD>=13 or GNU ld>=2.36. For front ends which do not expect to see multiple sections of the same name, consider emitting `@llvm.compiler.used` instead of `@llvm.used`. SHF_GNU_RETAIN is restricted to ELFOSABI_GNU and ELFOSABI_FREEBSD in binutils. We don't do the restriction - see the rationale in D95749. The integrated assembler has supported SHF_GNU_RETAIN since D95730. GNU as>=2.36 supports section flag 'R'. We don't need to worry about GNU ld support because older GNU ld just ignores the unknown SHF_GNU_RETAIN. With this change, `__attribute__((retain))` functions/variables emitted by clang will get the SHF_GNU_RETAIN flag. Differential Revision: https://reviews.llvm.org/D97448	2021-02-26 16:38:44 -08:00
Craig Topper	eea53b142d	[DAGCombiner] Optimize SMULO/UMULO if we can prove that overflow is impossible. Using ComputeNumSignBits or computeKnownBits we might be able to determine that overflow is impossible. This especially helps after type legalization if the type was promoted from a type with half the bits or more. Type legalization conservatively creates a promoted smulo/umulo and an overflow check for the promoted bits. The overflow from the promoted smulo/umulo is ORed with the result of the promoted bits overflow check. Proving that the promoted smulo/umulo can never overflow will leave us with just the promoted bits overflow check. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97160	2021-02-26 14:50:03 -08:00
James Y Knight	6de6455752	Use getAlign() on atomicrmw/cmpxchg instructions, now that it's available. These locations were missed as part of adding alignment to the instructions, and were still making their own alignment assumptions.	2021-02-26 15:06:15 -05:00
Philip Reames	0832a58e22	[cgp] Minor code improvement - reuse an existing named helper [NFC]	2021-02-26 11:51:32 -08:00
Mircea Trofin	3e992326a5	[NFC][regalloc] const-ed APIs, using MCRegister instead of unsigned	2021-02-26 09:54:20 -08:00
Mircea Trofin	a2bfc43ae1	[NFC] Const-ed 2 APIs in VirtRegMap	2021-02-26 09:32:42 -08:00
James Y Knight	740e69b6fd	Fix assert to use getTypeStoreSize instead of getPrimitiveSizeInBits, per comment on D97223.	2021-02-26 11:08:00 -05:00
Simon Pilgrim	aefe8f2f6c	[DAG] Fold vXi1 multiplies -> and This allows us to remove X86 custom lowering of vXi1 MUL, which helps simplify a load of mask math. Mentioned in D97478 post review.	2021-02-26 11:46:12 +00:00
Simon Pilgrim	73adc26ac0	[DAG] expandAddSubSat - break if-else chain. NFCI. Fix styleguide issue - each if() block always returns so we don't need to make them a if-else chain.	2021-02-26 11:02:08 +00:00
Chen Zheng	d39bc36b1b	[debug-info] refactor emitDwarfUnitLength remove `Hi` `Lo` argument from `emitDwarfUnitLength`, so we can make caller of emitDwarfUnitLength easier. Reviewed By: MaskRay, dblaikie, ikudrin Differential Revision: https://reviews.llvm.org/D96409	2021-02-25 21:00:25 -05:00
James Y Knight	24539f1ef2	Add Alignment argument to IRBuilder CreateAtomicRMW and CreateAtomicCmpXchg. And then push those change throughout LLVM. Keep the old signature in Clang's CGBuilder for now -- that will be updated in a follow-on patch (D97224). The MLIR LLVM-IR dialect is not updated to support the new alignment attribute, but preserves its existing behavior. Differential Revision: https://reviews.llvm.org/D97223	2021-02-25 18:29:42 -05:00
Simon Pilgrim	9490b9f14b	[DAG] Move simplification of SADDSAT/SSUBSAT/UADDSAT/USUBSAT of vXi1 to getNode() As discussed on D97276 we should be able to always do this in node creation, we don't need a combine.	2021-02-25 17:49:26 +00:00
Jon Roelofs	7f6e331645	Support `#pragma clang section` directives on MachO targets rdar://59560986 Differential Revision: https://reviews.llvm.org/D97233	2021-02-25 09:30:10 -08:00
David Sherwood	87dbcd8865	[CodeGen] Canonicalise adds/subs of i1 vectors using XOR When calling SelectionDAG::getNode() to create an ADD or SUB of two vectors with i1 element types we can canonicalise this to use XOR instead, where 1+1 is treated as wrapping around to 0 and 0-1 wraps to 1. I've added the following tests for SVE targets: CodeGen/AArch64/sve-pred-arith.ll and modified some X86 tests to reflect the much simpler codegen required. Differential Revision: https://reviews.llvm.org/D97276	2021-02-25 10:31:26 +00:00
Craig Topper	fe50be12c8	[LegalizeIntegerTypes] Further improve ExpandIntRes_SADDSUBO for targets where SADDO/SSUBO aren't supported. Rather than converting 3 signbits to bools and comparing them, we can do bitwise logic on the whole vector and convert the resulting sign bit to a bool at the end. This is still a different algorithm than what we do in LegalizeDAG through expandSADDOSSUBO. That algorithm needs to know that the RHS of SSUBO is > 0, but that's costly when the type is split. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97325	2021-02-24 10:05:38 -08:00
Sander de Smalen	5e19208d96	[InstructionCost] NFC: Fix up missing cases in LoopVectorize and CodeGenPrep. This fixes the types of a few more cost variables to be of type InstructionCost.	2021-02-24 14:30:03 +00:00
Simon Pilgrim	8082bfe7e5	[DAG] Add basic mul-with-overflow constant folding support As noticed on D97160	2021-02-24 11:09:02 +00:00
Petr Hosek	11a53f47fb	Revert "[InstrProfiling] Use nobits as __llvm_prf_cnts section type in ELF" This reverts commit `6b286d93f7` because in some cases when the optimizer evaluates the global initializer, __llvm_prf_cnts may not be entirely zero initialized.	2021-02-24 00:41:43 -08:00
Craig Topper	cb6fc4b0a3	[LegalizeIntegerTypes] Use GetExpandedInteger instead of SplitInteger in ExpandIntRes_XMULO. We know the input is going to be expanded as well, so we should just ask for the already expanded operands. Otherwise we create nodes that are just going to need to be legalized.	2021-02-23 23:53:45 -08:00
Chen Zheng	be5d92e37e	[Debug-Info][NFC] move emitDwarfUnitLength to MCStreamer class We may need to do some customization for DWARF unit length in DWARF section headers for some targets for some code generation path. For example, for XCOFF in assembly path, AIX assembler does not require the debug section containing its debug unit length in the header. Move emitDwarfUnitLength to MCStreamer class so that we can do customization in different Streamers Reviewed By: ikudrin Differential Revision: https://reviews.llvm.org/D95932	2021-02-23 21:29:05 -05:00
Heejin Ahn	ea8c6375e3	[WebAssembly] Fix incorrect grouping and sorting of exceptions This CL is not big but contains changes that span multiple analyses and passes. This description is very long because it tries to explain basics on what each pass/analysis does and why we need this change on top of that. Please feel free to skip parts that are not necessary for your understanding. --- `WasmEHFuncInfo` contains the mapping of <EH pad, the EH pad's next unwind destination>. The value (unwind dest) here is where an exception should end up when it is not caught by the key (EH pad). We record this info in WasmEHPrepare to fix catch mismatches, because the CFG itself does not have this info. A CFG only contains BBs and predecessor-successor relationship between them, but in `WasmEHFuncInfo` the unwind destination BB is not necessarily a successor or the key EH pad BB. Their relationship can be intuitively explained by this C++ code snippet: ``` try { try { foo(); } catch (int) { // EH pad ... } } catch (...) { // unwind destination } ``` So when `foo()` throws, it goes to `catch (int)` first. But if it is not caught by it, it ends up in the next unwind destination `catch (...)`. This unwind destination is what you see in `catchswitch`'s `unwind label %bb` part. --- `WebAssemblyExceptionInfo` groups exceptions so that they can be sorted continuously together in CFGSort, as we do for loops. What this analysis does is very simple: it creates a single `WebAssemblyException` per EH pad, and all BBs that are dominated by that EH pad are included in this exception. We also identify subexception relationship in this way: if EHPad A domiantes EHPad B, EHPad B's exception is a subexception of EHPad A's exception. This simple rule turns out to be incorrect in some cases. In `WasmEHFuncInfo`, if EHPad A's unwind destination is EHPad B, it means semantically EHPad B should not be included in EHPad A's exception, because it does not make sense to rethrow/delegate to an inner scope. This is what happened in CFGStackify as a result of this: ``` try try catch ... <- %dest_bb is among here! end delegate %dest_bb ``` So this patch adds a phase in `WebAssemblyExceptionInfo::recalculate` to make sure excptions' unwind destinations are not subexceptions of their unwind sources in `WasmEHFuncInfo`. But this alone does not prevent `dest_bb` in the example above from being sorted within the inner `catch`'s exception, even if its exception is not a subexception of that `catch`'s exception anymore, because of how CFGSort works, which will be explained below. --- CFGSort places BBs within the same `SortRegion` (loop or exception) continuously together so they can be demarcated with `loop`-`end_loop` or `catch`-`end_try` in CFGStackify. `SortRegion` is a wrapper for one of `MachineLoop` or `WebAssemblyException`. `SortRegionInfo` already does some complicated things because there discrepancies between those two data structures. `WebAssemblyException` is what we control, and it is defined as an EH pad as its header and BBs dominated by the header as its BBs (with a newly added exception of unwind destinations explained in the previous paragraph). But `MachineLoop` is an LLVM data structure and uses the standard loop detection algorithm. So by the algorithm, BBs that are 1. dominated by the loop header and 2. have a path back to its header. Because of the second condition, many BBs that are dominated by the loop header are not included in the loop. So BBs that contain `return` or branches to outside of the loop are not technically included in `MachineLoop`, but they can be sorted together with the loop with no problem. Maybe to relax the condition, in CFGSort, when we are in a `SortRegion` we allow sorting of not only BBs that belong to the current innermost region but also BBs that are by the current region header. (This was written this way from the first version written by Dan, when only loops existed.) But now, we have cases in exceptions when EHPad B is the unwind destination for EHPad A, even if EHPad B is dominated by EHPad A it should not be included in EHPad A's exception, and should not be sorted within EHPad A. One way to make things work, at least correctly, is change `dominates` condition to `contains` condition for `SortRegion` when sorting BBs, but this will change compilation results for existing non-EH code and I can't be sure it will not degrade performance or code size. I think it will degrade performance because it will force many BBs dominated by a loop, which don't have the path back to the header, to be placed after the loop and it will likely to create more branches and blocks. So this does a little hacky check when adding BBs to `Preferred` list: (`Preferred` list is a ready list. CFGSort maintains ready list in two priority queues: `Preferred` and `Ready`. I'm not very sure why, but it was written that way from the beginning. BBs are first added to `Preferred` list and then some of them are pushed to `Ready` list, so here we only need to guard condition for `Preferred` list.) When adding a BB to `Preferred` list, we check if that BB is an unwind destination of another BB. To do this, this adds the reverse mapping, `UnwindDestToSrc`, and getter methods to `WasmEHFuncInfo`. And if the BB is an unwind destination, it checks if the current stack of regions (`Entries`) contains its source BB by traversing the stack backwards. If we find its unwind source in there, we add the BB to its `Deferred` list, to make sure that unwind destination BB is added to `Preferred` list only after that region with the unwind source BB is sorted and popped from the stack. --- This does not contain a new test that crashes because of this bug, but this fix changes the result for one of existing test case. This test case didn't crash because it fortunately didn't contain `delegate` to the incorrectly placed unwind destination BB. Fixes https://github.com/emscripten-core/emscripten/issues/13514. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97247	2021-02-23 14:54:55 -08:00
Amara Emerson	4691405ba9	Fix a range-loop-analysis warning.	2021-02-23 14:41:08 -08:00
Heejin Ahn	445f4e7484	[WebAssembly] Disable wasm.lsda() optimization in WasmEHPrepare In every catchpad except `catch (...)`, we add a call to `_Unwind_CallPersonality`, which is a wapper to call the personality function. (In most of other Itanium-based architectures the call is done from libunwind, but in wasm we don't have the control over the VM.) Because the personatlity function is called to figure out whether the current exception is a type we should catch, such as `int` or `SomeClass&`, `catch (...)` does not need the personality function call. For the same reason, all cleanuppads don't need it. When we call `_Unwind_CallPersonality`, we store some necessary info in a data structure called `__wasm_lpad_context` of type `_Unwind_LandingPadContext`, which is defined in the wasm's port of libunwind in Emscripten. Also the personality wrapper function returns some info (selector and the caught pointer) in that data structure, so it is used as a medium for communication. One of the info we need to store is the address for LSDA info for the current function. `wasm.lsda()` intrinsic returns that address. (This intrinsic will be lowered to a symbol that points to the LSDA address.) The simpliest thing is call `wasm.lsda()` every time we need to call `_Unwind_CallPersonality` and store that info in `__wasm_lpad_context` data structure. But we tried to be better than that (D77423 and some more previous CLs), so if catchpad A dominates catchpad B and catchpad A is not `catch (...)`, we didn't insert `wasm.lsda()` call in catchpad B, thinking that the LSDA address is the same for a single function and we already visited catchpad A and `__wasm_lpad_context.lsda` field would already have that value. But this can be incorrect if there is a call to another function, which also can have the personality function and LSDA, between catchpad A and catchpad B, because `__wasm_lpad_context` is a globally defined structure and the callee function will overwrite its `lsda` field. So in this CL we don't try to do any optimizaions on adding `wasm.lsda()` call; we store the result of `wasm.lsda()` every time we call `_Unwind_CallPersonality`. We can do some complicated analysis, like checking if there is a function call between the dominating catchpad and the current catchpad, but at this time it seems overkill. This deletes three tests because they all tested `wasm.ldsa()` call optimization. Fixes https://github.com/emscripten-core/emscripten/issues/13548. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D97309	2021-02-23 14:38:59 -08:00
Craig Topper	eb165090bb	[LegalizeIntegerTypes] Improve ExpandIntRes_SADDSUBO codegen on targets without SADDO/SSUBO. This code creates 3 setccs that need to be expanded. It was creating a sign bit test as setge X, 0 which is non-canonical. Canonical would be setgt X, -1. This misses the special case in IntegerExpandSetCCOperands for sign bit tests that assumes canonical form. If we don't hit this special case we end up with a multipart setcc instead of just checking the sign of the high part. To fix this I've reversed the polarity of all of the setccs to setlt X, 0 which is canonical. The rest of the logic should still work. This seems to produce better code on RISCV which lacks a setgt instruction. This probably still isn't the best code sequence we could use here. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97181	2021-02-23 09:40:32 -08:00
Jay Foad	a6be26710b	[GlobalISel] Make more use of replaceSingleDefInstWithReg. NFC.	2021-02-23 17:08:34 +00:00
Cassie Jones	8f956a5e8f	[GlobalISel] Implement narrowScalar for SADDE/SSUBE/UADDE/USUBE Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96673	2021-02-22 19:59:36 -05:00
Cassie Jones	e1532649cb	[GlobalISel] Implement narrowScalar for SADDO/SSUBO Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96672	2021-02-22 19:59:36 -05:00
Cassie Jones	c63b33b792	[GlobalISel] Implement narrowScalar for UADDO/USUBO Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96671	2021-02-22 19:59:35 -05:00
Amara Emerson	212d6a95ab	[GloblalISel] Support lowering <3 x i8> arguments in multiple parts. Differential Revision: https://reviews.llvm.org/D97086	2021-02-22 13:58:44 -08:00
Amara Emerson	69ce291bcc	[AArch64][GlobalISel] Support lowering <1 x i8> arguments. We don't yet have working codegen for the resulting unmerges, and if we did it would probably be horrible. Differential Revision: https://reviews.llvm.org/D97035	2021-02-22 13:58:44 -08:00
Heejin Ahn	a08e609d2e	[WebAssembly] Rename methods in WasmEHFuncInfo (NFC) This renames variable and method names in `WasmEHFuncInfo` class to be simpler and clearer. For example, unwind destinations are EH pads by definition so it doesn't necessarily need to be included in every method name. Also I am planning to add the reverse mapping in a later CL, something like `UnwindDestToSrc`, so this renaming will make meanings clearer. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D97173	2021-02-22 12:16:11 -08:00
Kazu Hirata	ffba9e596d	[CodeGen] Use range-based for loops (NFC)	2021-02-21 19:58:07 -08:00
Craig Topper	1a6c1ac686	[SelectionDAG][RISCV] Teach ComputeNumSignBits to handle SREM. This also removes a pattern from RISCV that is no longer needed since the sexti32 on the LHS of the srem in the pattern implies the result is sign extended so the sign_extend_inreg should be removed in DAG combine now. Reviewed By: luismarques, RKSimon Differential Revision: https://reviews.llvm.org/D97133	2021-02-21 11:13:36 -08:00
Simon Pilgrim	38ab47c813	[DAG] Match USUBSAT patterns through zext/trunc This patch handles usubsat patterns hidden through zext/trunc and uses the getTruncatedUSUBSAT helper to determine if the USUBSAT can be correctly performed in the truncated form: zext(x) >= y ? x - trunc(y) : 0 --> usubsat(x,trunc(umin(y,SatLimit))) zext(x) > y ? x - trunc(y) : 0 --> usubsat(x,trunc(umin(y,SatLimit))) Based on original examples: void foo(unsigned short p, int max, int n) { int i; unsigned m; for (i = 0; i < n; i++) { m = --p; *p = (unsigned short)(m >= max ? m-max : 0); } } Differential Revision: https://reviews.llvm.org/D25987	2021-02-21 15:26:54 +00:00
Kazu Hirata	0b417ba20f	[CodeGen] Use range-based for loops (NFC)	2021-02-20 21:46:02 -08:00
Petr Hosek	6b286d93f7	[InstrProfiling] Use nobits as __llvm_prf_cnts section type in ELF This can reduce the binary size because counters will no longer occupy space in the binary, instead they will be allocated by dynamic linker. Differential Revision: https://reviews.llvm.org/D97110	2021-02-20 14:20:33 -08:00
Simon Pilgrim	761bbed264	[DAG] foldSubToUSubSat - fold sub(a,trunc(umin(zext(a),b))) -> usubsat(a,trunc(umin(b,SatLimit))) This moves the last custom x86 USUBSAT fold to generic DAGCombine. Completes PR40111 Differential Revision: https://reviews.llvm.org/D96703	2021-02-20 12:02:07 +00:00
Kazu Hirata	a205fa5cd9	[CodeGen] Use range-based for loops (NFC)	2021-02-19 22:44:14 -08:00
Pan, Tao	12edddafac	[CodeGen] Fix two dots between text section name and symbol name There is a trailing dot in text section name if it has prefix, don't add repeated dot when connect text section name and symbol name. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D96327	2021-02-20 10:15:48 +08:00
Craig Topper	baab797878	[ValueTypes] Assert if changeVectorElementType is called on a simple type with an extended element type. Previously we would use the extended implementation, but the extended implementation requires the vector type to be extended so that we can access the LLVMContext. In theory we could detect this case and use the context from the element type instead, but since I know of no cases hitting this in practice today I've done the simplest thing. Also add asserts to several extended EVT functions that assume LLVMTy is non-null. Follow from discussion in D97036 Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D97070	2021-02-19 17:30:46 -08:00
Mircea Trofin	82492f24ff	[NFC][Regalloc] Share the VirtRegAuxInfo object with LiveRangeEdit VirtRegAuxInfo is an extensibility point, so the register allocator's decision on which implementation to use should be communicated to the other users - namely, LiveRangeEdit. Differential Revision: https://reviews.llvm.org/D96898	2021-02-19 07:44:28 -08:00
Simon Pilgrim	5d3930bb8f	[DAG] visitTRUNCATE - attempt to truncate USUBSAT Fold trunc(usubsat(zext(x),y)) -> usubsat(x,trunc(umin(y,satlimit)))	2021-02-19 14:26:05 +00:00
Kazu Hirata	fd04f3a30c	[CodeGen] Use range-based for loops (NFC)	2021-02-18 22:46:43 -08:00
Adrian Prantl	c4ad878acb	Reset the EntryValue location flag in finalizeEntryValue. This fixes an assertion error when entry values are combined with DW_OP_LLVM_fragment.	2021-02-18 18:36:36 -08:00
Matt Arsenault	2d3d2e78d0	MIR: Fix parser crash on syntax error on first character This was calling the diagnostic printer before the context member was initialized.	2021-02-18 18:59:08 -05:00
Leonard Chan	c77659e549	[llvm][IR] Do not place constants with static relocations in a mergeable section This patch provides two major changes: 1. Add getRelocationInfo to check if a constant will have static, dynamic, or no relocations. (Also rename the original needsRelocation to needsDynamicRelocation.) 2. Only allow a constant with no relocations (static or dynamic) to be placed in a mergeable section. This will allow unused symbols that contain static relocations and happen to fit in mergeable constant sections (.rodata.cstN) to instead be placed in unique-named sections if -fdata-sections is used and subsequently garbage collected by --gc-sections. See https://lists.llvm.org/pipermail/llvm-dev/2021-February/148281.html. Differential Revision: https://reviews.llvm.org/D95960	2021-02-18 15:39:00 -08:00
Matt Arsenault	62d946e133	GlobalISel: Merge some AMDGPU ABI lowering code to generic code AMDGPU currently has a lot of pre-processing code to pre-split argument types into 32-bit pieces before passing it to the generic code in handleAssignments. This is a bit sloppy and also requires some overly fancy iterator work when building the calls. It's better if all argument marshalling code is handled directly in handleAssignments. This handles more situations like decomposing large element vectors into sub-element sized pieces. This should mostly be NFC, but does change the generated code by shifting where the initial argument packing instructions are placed. I think this is nicer looking, since it now emits the packing code directly after the relevant copies, rather than after the copies for the remaining arguments. This doubles down on gfx6/gfx7 using the gfx8+ ABI for 16-bit types. This is ultimately the better option, but incompatible with the DAG. Fixing this requires more work, especially for f16.	2021-02-18 17:26:55 -05:00
Simon Pilgrim	53e83afcaf	[DAG] getTruncatedUSUBSAT - always truncate operands. NFCI. As noticed on D96703, we're always truncating the operands so should use getNode(ISD::TRUNCATE) instead of getZExtOrTrunc.	2021-02-18 21:28:55 +00:00
Guozhi Wei	66f2d09ebf	[DAGCombiner] Transform (zext (select c, load1, load2)) -> (select c, zextload1, zextload2) If extload is legal, following transform (zext (select c, load1, load2)) -> (select c, zextload1, zextload2) can save one ext instruction. Differential Revision: https://reviews.llvm.org/D95086	2021-02-18 13:15:20 -08:00
Philip Reames	dcebe8ab1e	Fix a buildbot warning triggered by `1dfb06d`	2021-02-18 09:37:49 -08:00
Philip Reames	13753808f4	[verify-regalloc] Verify after allocation and before postOptimization I've now hit several cases where a mistake in the regalloc main loop caused corrupt live intervals that didn't get caught until either the next verify or during post-optimization. The later case is rather confusing and tends to lead one down false trails, so let's catch corruption before that.	2021-02-18 09:10:50 -08:00
Philip Reames	5318d9e516	[splitkit] Add a minor wrapper function for readability [NFC]	2021-02-18 09:00:22 -08:00
Bradley Smith	8bad8a43c3	[AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD Adjust generateFMAsInMachineCombiner to return false if SVE is present in order to combine fmul+fadd into fma. Also add new pseudo instructions so as to select the most appropriate of FMLA/FMAD depending on register allocation. Depends on D96599 Differential Revision: https://reviews.llvm.org/D96424	2021-02-18 16:55:16 +00:00
Craig Topper	61d4d9a5d3	[TableGen][SelectionDAG] Improve efficiency of encoding negative immediates for isel's CheckInteger opcode. CheckInteger uses an int64_t encoded using a variable width encoding that is optimized for encoding a number with a lot of leading zeros. Negative numbers have no leading zeros so use the largest encoding requiring 9 bytes. I believe its most like we want to check for positive and negative numbers near 0. -1 is quite common due to its use in the 'not' idiom. To optimize for this, we can borrow an idea from the bitcode format and move the sign bit to bit 0 with the magnitude stored in the upper bits. This will drastically increase the number of leading zeros for small magnitudes. Then we can run this value through VBR encoding. This gives a small reduction in the table size on all in tree targets except VE where size increased by about 300 bytes due to intrinsic ids now requiring 3 bytes instead of 2. Since the intrinsic enum space is shared by all targets this an unfortunate consquence of where VE is currently located in the range. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96317	2021-02-18 08:53:17 -08:00
Philip Reames	1dfb06d0b4	[regalloc] Add a couple of dump routines for ease of debugging [NFC]	2021-02-18 08:50:00 -08:00
Kazu Hirata	61efa3d93f	[CodeGen] Use range-based for loops (NFC)	2021-02-17 23:58:46 -08:00
Kazu Hirata	8e13bbca08	[CodeGen] Use ListSeparator (NFC)	2021-02-17 23:58:43 -08:00
Yang Fan	796feb6163	[MC][ELF] Fix unused variable warning (NFC) GCC warning: ``` /llvm-project/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp: In member function ‘virtual llvm::MCSection* llvm::TargetLoweringObjectFileELF::getSectionForLSDA(const llvm::Function&, const llvm::MCSymbol&, const llvm::TargetMachine&) const’: /llvm-project/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp:871:8: warning: variable ‘IsComdat’ set but not used [-Wunused-but-set-variable] 871 \| bool IsComdat = false; \| ^~~~~~~~ ```	2021-02-18 14:23:18 +08:00
Chen Zheng	5517923b1c	[XCOFF][NFC] make csect properties optional for getXCOFFSection We are going to support debug sections for XCOFF. So the csect properties are not necessary. This patch makes these properties optional. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D95931	2021-02-17 20:51:42 -05:00
Jessica Paquette	e6064a6418	[GlobalISel] Implement computeKnownBits for G_ASSERT_SEXT Implementation is the same as G_SEXT_INREG. Differential Revision: https://reviews.llvm.org/D96899	2021-02-17 14:00:36 -08:00
Jessica Paquette	26fb036559	[GlobalISel] Implement computeNumSignBits for G_ASSERT_SEXT Same implementation as G_SEXT_INREG. Add a testcase to combine-sext-inreg for a concrete example, and a testcase to KnownBitsTest. Differential Revision: https://reviews.llvm.org/D96897	2021-02-17 13:53:17 -08:00
Jessica Paquette	60aa646441	[GlobalISel] Add G_ASSERT_SEXT This adds a G_ASSERT_SEXT opcode, similar to G_ASSERT_ZEXT. This instruction signifies that an operation was already sign extended from a smaller type. This is useful for functions with sign-extended parameters. E.g. ``` define void @foo(i16 signext %x) { ... } ``` This adds verifier, regbankselect, and instruction selection support for G_ASSERT_SEXT equivalent to G_ASSERT_ZEXT. Differential Revision: https://reviews.llvm.org/D96890	2021-02-17 13:10:34 -08:00
Mircea Trofin	3a030c2f2f	[NFC][RegAlloc] InlineSpiller::Original is a Register	2021-02-17 12:07:59 -08:00
Derek Schuff	1f9e551a81	[WebAssembly] Do not use EHCatchret symbols with wasm EH D94835 added support for WinEH to export public symbols pointing to basic blocks which are catchret targets for use with Windows CET. Wasm currently doesn't support public symbols to non-function code addresses (they get treated like new functions in asm but then don't lower to object files correctly). It created them unconditionally for all catchret targets. This change disables those symbols unless the exceptionHandlingType is WinEH (since they aren't used with ExceptionHandling::Wasm) Differential Revision: https://reviews.llvm.org/D96824	2021-02-17 11:22:48 -08:00
Marianne Mailhot-Sarrasin	f0ec9f1bb3	[Pipeliner] Fixed optimization remarks and debug dumps Initiation Interval value The II value was incremented before exiting the loop, and therefor when used in the optimization remarks and debug dumps it did not reflect the initiation interval actually used in Schedule. Differential Revision: https://reviews.llvm.org/D95692	2021-02-17 12:28:37 -05:00
Simon Pilgrim	87fbc06d06	[DAG] Pull out getTruncatedUSUBSAT helper from foldSubToUSubSat. NFCI. This will simplify an incoming generic implementation of D25987. I'll rebase D96703 shortly to support this.	2021-02-17 12:17:08 +00:00
Simon Pilgrim	05c64ea672	[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) (REAPPLIED) Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)) Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles. Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle. This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise. Fixes issue raised by @saugustine in rG5aa8f4c0843a where we were failing to replace null shuffle operands from MergeInnerShuffle to UNDEFs. Differential Revision: https://reviews.llvm.org/D96345	2021-02-17 11:42:43 +00:00
Igor Kudrin	aa84289629	[DebugInfo] Keep the DWARF64 flag in the module metadata This allows the option to affect the LTO output. Module::Max helps to generate debug info for all modules in the same format. Differential Revision: https://reviews.llvm.org/D96597	2021-02-17 17:03:34 +07:00
Sjoerd Meijer	7f3170ec19	[MachineSink] Add a loop sink limit To make sure compile-times don't regress, add an option to restrict the number of instructions considered for sinking as alias analysis can be expensive and for the same reason also skip large blocks. Differential Revision: https://reviews.llvm.org/D96485	2021-02-17 08:50:53 +00:00
Kazu Hirata	3279943adf	[CodeGen] Use range-based for loops (NFC)	2021-02-16 23:23:08 -08:00
Sriraman Tallam	d1a838babc	Basic block sections should enable function sections implicitly. Basic block sections enables function sections implicitly, this is not needed and is inefficient with "=list" option. We had basic block sections enable function sections implicitly in clang. This is particularly inefficient with "=list" option as it places functions that do not have any basic block sections in separate sections. This causes unnecessary object file overhead for large applications. This patch disables this implicit behavior. It only creates function sections for those functions that require basic block sections. Further, there was an inconistent behavior with llc as llc was not turning on function sections by default. This patch makes llc and clang consistent and tests are added to check the new behavior. This is the first of two patches and this adds functionality in LLVM to create a new section for the entry block if function sections is not enabled. Differential Revision: https://reviews.llvm.org/D93876	2021-02-16 16:27:16 -08:00
Petr Hosek	16af973933	[MC][ELF] Support for zero flag section groups This change introduces support for zero flag ELF section groups to LLVM. LLVM already supports COMDAT sections, which in ELF are a special type of ELF section groups. These are generally useful to enable linker GC where you want a group of sections to always travel together, that is to be either retained or discarded as a whole, but without the COMDAT semantics. Other ELF assemblers already support zero flag ELF section groups and this change helps us reach feature parity. Differential Revision: https://reviews.llvm.org/D95851	2021-02-16 14:23:40 -08:00
Sterling Augustine	5aa8f4c084	Revert "[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)))" This reverts commit `5dfba562dd`. That commit causes an assertion failure with the following repro: typedef long b __attribute__((__vector_size__(16))); b d; b e; b __attribute__((__always_inline__)) c(b h, b i) { return (__attribute__((__vector_size__(8 sizeof(short)))) short)h + i; } j() { b k, l, m, n, o[6], p, q; m = d[5]; b r = m; b s = f(r, 8); q = s; l = d[1]; p = l; t(q); n = c(m, l); o[1] = c(s, f(p, 8)); k = __builtin_shufflevector(n, o[1], 0, 2); e = __builtin_ia32_psrlwi128(k, j); } ./bin/clang -cc1 -triple x86_64-grtev4-linux-gnu -emit-obj -O1 -std=c99 test.c	2021-02-16 12:48:15 -08:00
Simon Pilgrim	df45c18135	[DAG] PromoteIntRes_ADDSUBSHLSAT - promote ISD::UADDSAT as clamped add Similar to D96622, we're better off just promoting uaddsat(x,y) -> umin(add(x,y),c) instead of trying to perform a shifted uaddsat. I initially tried to just use shifted promotion in cases where we didn't have a legal/custom umin - but we don't appear to have any targets that have uaddsat but not umin, so imo we're better off always using the umin and avoid an untested shifted uaddsat code path. Differential Revision: https://reviews.llvm.org/D96767	2021-02-16 17:37:44 +00:00
Craig Topper	064ada4ec6	[SelectionDAG][AArch64] Restrict matchUnaryPredicate to only handle SPLAT_VECTOR for scalable vectors. `fde2466171` added support for scalable vectors to matchUnaryPredicate by handling SPLAT_VECTOR in addition to BUILD_VECTOR. This was used to enabled UDIV/SDIV/UREM/SREM by constant expansion in BuildUDIV/BuildSDIV in TargetLowering.cpp The caller there expects to call getBuildVector from the match factors. This leads to a crash right now if there is a SPLAT_VECTOR of fixed vectors since the number of vectors won't match the number of elements. To fix this, this patch updates the callers to check the opcode instead of whether the type is fixed or scalable. This assumes that only 3 opcodes are handled by matchUnaryPredicate so I've added an assertion to the final else to check that opcode. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96174	2021-02-16 09:22:46 -08:00
Simon Pilgrim	5dfba562dd	[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)) Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles. Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle. This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise. Differential Revision: https://reviews.llvm.org/D96345	2021-02-16 15:46:34 +00:00
Simon Pilgrim	420420de57	[DAG] Avoid APInt copies by directly using the APInt reference from getAPIntValue. NFCI.	2021-02-16 13:50:34 +00:00
Simon Pilgrim	dd879f7dc9	[DAG] Use APInt::extractBits instead of lshr().trunc(). NFCI. Avoids so many APInt instances by directly using the APInt reference from getAPIntValue.	2021-02-16 13:50:33 +00:00
Kazu Hirata	22f00f61dd	[CodeGen] Use range-based for loops (NFC)	2021-02-15 14:46:11 -08:00
Matt Arsenault	392e0fcfd1	GlobalISel: Handle arguments partially passed on the stack The API is a bit awkward since you need to index into an array in the passed struct. I guess an alternative would be to pass all of the individual fields.	2021-02-15 17:06:14 -05:00
Matt Arsenault	1b3d8ddeb9	CodeGen: Move function to get subregister indexes to cover a LaneMask Return the best covering index, and additional needed to complete the mask. This logically belongs in TargetRegisterInfo, although I ended up not needing it for why I originally split this out.	2021-02-15 17:05:37 -05:00
Craig Topper	eb75f250fe	[RISCV][LegalizeTypes] Try to expand BITREVERSE before promoting if the promoted BITREVERSE would expand anyway. If we're going to end up expanding anyway, we should do it early so we don't create extra operations to handle the bytes added by promotion. Simlilar was done for BSWAP previously. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96681	2021-02-15 12:33:16 -08:00
Adrian Prantl	09b832e74f	Support emitting complex expressions that include entry values This patch enables AsmPrinter support for complex expression with entry values. It shouldn't AsmPrinter's call whether these are safe or not but the pass who introduces the DW_OP_LLVM_entry_value. This patch on its own has no effect on clang. Differential Revision: https://reviews.llvm.org/D96559	2021-02-15 11:09:09 -08:00
Simon Pilgrim	e47f21da61	[DAG] visitVSELECT - move OpLHS == LHS into inner if() in USUBSAT matching. NFCI. This will be necessary for the update of D25987 where we'll need to match OpLHS against other ops.	2021-02-15 18:27:00 +00:00
Caroline Concatto	2d728bbff5	[CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse This patch adds a new intrinsic experimental.vector.reduce that takes a single vector and returns a vector of matching type but with the original lane order reversed. For example: ``` vector.reverse(<A,B,C,D>) ==> <D,C,B,A> ``` The new intrinsic supports fixed and scalable vectors types. The fixed-width vector relies on shufflevector to maintain existing behaviour. Scalable vector uses the new ISD node - VECTOR_REVERSE. This new intrinsic is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker (@paulwalker-arm). [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Differential Revision: https://reviews.llvm.org/D94883	2021-02-15 13:39:43 +00:00
Arlo Siemsen	080866470d	Add ehcont section support In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling. This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker. This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag. The change includes a test that when the module flag is enabled the section is correctly generated. The set of exception continuation information includes returns from exceptional control flow (catchret in llvm). In order to collect catchret we: 1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation, 2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and 3) Combines these targets with the other EHCont targets that were already being collected. Change originally authored by Daniel Frampton <dframpto@microsoft.com> For more details, see MSVC documentation for `/guard:ehcont` https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D94835	2021-02-15 14:27:12 +08:00
Cassie Jones	97a1cdb156	[GlobalISel] Disable vector types in narrowScalarAddSub The implementation for vectors is broken and doesn't seem to be used by anything. Explicitly remove support for them, they can be added again later when they're properly implemented. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D95699	2021-02-14 18:06:32 -05:00
Cassie Jones	36246388ba	[GlobalISel] Extract a narrowScalarAddSub method. NFC Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D95426	2021-02-14 18:06:32 -05:00
Kazu Hirata	910e2d1e57	[llvm] Use llvm::is_contained (NFC)	2021-02-14 08:36:20 -08:00
Kazu Hirata	d5adba10f0	[CodeGen] Use range-based for loops (NFC)	2021-02-13 20:41:39 -08:00
Simon Pilgrim	6f5a805bbb	[DAG] Fold i1/vXi1 saddsat/uaddsat(x,y) -> or(x,y) Alive2: https://alive2.llvm.org/ce/z/FzcrpH	2021-02-13 15:02:01 +00:00
Simon Pilgrim	0df15e5eff	[DAG] Fold i1/vXi1 ssubsat/usubsat(x,y) -> and(x,~y) Alive2: https://alive2.llvm.org/ce/z/4nkNGh	2021-02-13 13:21:15 +00:00
Simon Pilgrim	60ba5397df	[DAG] PromoteIntRes_ADDSUBSHLSAT - use promoted ISD::USUBSAT directly As discussed on D96413, as long as the promoted bits of the args are zero we can use the basic ISD::USUBSAT pattern directly, without the shifting like we do for other ops. I think something similar should be possible for ISD::UADDSAT as well, which I'll look at later. Also, create a ISD::USUBSAT node directly - this will be expanded back by the legalizer later on if necessary. Differential Revision: https://reviews.llvm.org/D96622	2021-02-13 12:35:10 +00:00
Simon Pilgrim	7ad0c573bd	[DAG] Fix shift amount limit in SimplifyDemandedBits trunc(shift(x,c)) to truncated bitwidth We lost this in D56387/rG69bc0990a9181e6eb86228276d2f59435a7fae67 - where I got the src/dst bitwidths mixed up and assumed getValidShiftAmountConstant would catch it. Patch by @craig.topper - confirmed by @Carrot that it fixes PR49162	2021-02-13 12:00:08 +00:00
Kazu Hirata	905cf88d18	[CodeGen] Use range-based for loops (NFC)	2021-02-12 23:44:33 -08:00
Adrian Prantl	982b891905	Store the LocationKind of an entry value buffer independently from the main LocationKind (NFC) This patch hides the logic for setting the location kind of an entry value inside the begin/finalize/cancel functions. This way we get rid the strange workaround that is currently in setLocation(). In the future, this will allow us to set the location kind of the entry value independently from the location kind of the main expression. Differential Revision: https://reviews.llvm.org/D96554	2021-02-12 16:59:39 -08:00
Jay Foad	7c749baa3a	[GlobalISel] Simpler verification of G_SEXT_INREG and G_ASSERT_ZEXT There's no need to call verifyVectorElementMatch since we already know that the source and destination types are identical. Differential Revision: https://reviews.llvm.org/D96589	2021-02-12 21:33:27 +00:00
Amara Emerson	5d6d9b63a3	[GlobalISel] Propagate extends through G_PHIs into the incoming value blocks. This combine tries to do inter-block hoisting of extends of G_PHIs, into the originating blocks of the phi's incoming value. The idea is to expose further optimization opportunities that are normally obscured by the PHI. Some basic heuristics, and a target hook for AArch64 is added, to allow tuning. E.g. if the extend is used by a G_PTR_ADD, it doesn't perform this combine since it may be folded into the addressing mode during selection. There are very minor code size improvements on AArch64 -Os, but the real benefit is that it unlocks optimizations like AArch64 conditional compares on some benchmarks. Differential Revision: https://reviews.llvm.org/D95703	2021-02-12 11:52:52 -08:00
Simon Pilgrim	4841a225b7	[DAG] Move basic USUBSAT pattern matches from X86 to DAGCombine Begin transitioning the X86 vector code to recognise sub(umax(a,b) ,b) or sub(a,umin(a,b)) USUBSAT patterns to make it more generic and available to all targets. This initial patch just moves the basic umin/umax patterns to DAG, removing some vector-only checks on the way - these are some of the patterns that the legalizer will try to expand back to so we can be reasonably relaxed about matching these pre-legalization. We can handle the trunc(sub(..))) variants as well, which helps with patterns where we were promoting to a wider type to detect overflow/saturation. The remaining x86 code requires some cleanup first - some of it isn't actually tested etc. I also need to resurrect D25987. Differential Revision: https://reviews.llvm.org/D96413	2021-02-12 18:22:57 +00:00
Lukas Sommer	6577cef9b0	[CodeGen] New pass: Replace vector intrinsics with call to vector library This patch adds a pass to replace calls to vector intrinsics (i.e., LLVM intrinsics operating on vector operands) with calls to a vector library. Currently, calls to LLVM intrinsics are only replaced with calls to vector libraries when scalar calls to intrinsics are vectorized by the Loop- or SLP-Vectorizer. With this pass, it is now possible to replace calls to LLVM intrinsics already operating on vector operands, e.g., if such code was generated by MLIR. For the replacement, information from the TargetLibraryInfo, e.g., as specified via -vector-library is used. This is a re-try of the original commit `2303e93e66` that was reverted due to pass manager problems. Other minor changes have also been made. Differential Revision: https://reviews.llvm.org/D95373	2021-02-12 12:53:27 -05:00
Akira Hatanaka	ed4718eccb	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-12 09:51:57 -08:00
Petar Avramovic	f0d65f4096	AMDGPU/GlobalISel: Calculate isKnownNeverNaN for fminnum and fmaxnum Implements same logis as in SelectionDAG. G_FMINNUM_IEEE and G_FMAXNUM_IEEE are never SNaN by definition and never NaN when one operand is known non-NaN and other known non-SNaN. G_FMINNUM and G_FMAXNUM are never NaN/SNaN when one of the operands is known non-NaN/SNaN. Differential Revision: https://reviews.llvm.org/D91716	2021-02-12 17:14:34 +01:00
Petar Avramovic	122c649c98	AMDGPU/GlobalISel: Check values of constants in isKnownNeverNaN Differential Revision: https://reviews.llvm.org/D91714	2021-02-12 17:14:34 +01:00
Simon Pilgrim	2465541dc0	[DAG] DAGTypeLegalizer::PromoteIntRes_ADDSUBSHLSAT - break if-else chain. NFCI. Style fixup - the if() block always returns so we can pull out the contents of the else() block.	2021-02-12 10:33:12 +00:00
Kazu Hirata	d61b4cb9d8	[CodeGen] Use range-based for loops (NFC)	2021-02-11 23:31:31 -08:00
Amara Emerson	de035c18cf	[GlobalISel] Fix sext_inreg(load) combine to not move the originating load. The builder was using the extend user as the insertion point, which meant that we were incorrectly "moving" the load from its original position, and therefore could violate memory operation ordering.	2021-02-11 19:27:09 -08:00
Snehasish Kumar	2c7077e67d	[CodeGen] Split out cold exception handling pads. Support for splitting exception handling pads was added in D73739. This change updates the code to split out exception handling pads if profile information indicates that they are cold. For a given function with multiple landind pads, if one of them is hot they are all retained as part of the hot code section. Differential Revision: https://reviews.llvm.org/D96372	2021-02-11 11:23:43 -08:00
Snehasish Kumar	d079dbc591	[CodeGen] Basic block sections should take precendence over splitting. The use of basic block sections should take precedence over the machine function splitting pass. Since they use the same underlying mechanism they are kept exclusive. Updated the tests to check that split machine functions is overridden by all flavours of basic block sections. Differential Revision: https://reviews.llvm.org/D96392	2021-02-11 11:14:10 -08:00
Craig Topper	5744502a13	[TargetLowering][RISCV][AArch64][PowerPC] Enable BuildUDIV/BuildSDIV on illegal types before type legalization if we can find a larger legal type that supports MUL. If we wait until the type is legalized, we'll lose information about the orginal type and need to use larger magic constants. This gets especially bad on RISCV64 where i64 is the only legal type. I've limited this to simple scalar types so it only works for i8/i16/i32 which are most likely to occur. For more odd types we might want to do a small promotion to a type where MULH is legal instead. Unfortunately, this does prevent some urem/srem+seteq matching since that still require legal types. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96210	2021-02-11 09:43:13 -08:00
Simon Pilgrim	5beebf9c58	[DAG] foldLogicOfSetCCs - Generalize and/or (setcc X, CMax, ne), (setcc X, CMin, ne/eq) fold. NFCI. Prep work to add support for non-uniform vectors - replace APInt values with using the SDValue ops directly.	2021-02-11 17:09:01 +00:00
Thomas Preud'homme	bad0290ce3	Improve STRICT_FSETCC codegen in absence of no NaN As for SETCC, use a less expensive condition code when generating STRICT_FSETCC if the node is known not to have Nan. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D91972	2021-02-11 14:19:43 +00:00
Joe Ellis	67464dfe36	[DebugInfo] Only perform TypeSize -> unsigned cast when necessary This commit moves a line in SelectionDAGBuilder::handleDebugValue to avoid implicitly casting a TypeSize object to an unsigned earlier than necessary. It was possible that we bail out of the loop before the value is ever used, which means we could create a superfluous TypeSize warning. Reviewed By: DavidTruby Differential Revision: https://reviews.llvm.org/D96423	2021-02-11 13:54:09 +00:00
Max Kazantsev	418c218efa	Return "[Codegenprepare][X86] Use usub with overflow opt for IV increment" The patch did not account for one corner case where cmp does not dominate the loop latch. This patch adds this check, hopefully it's cheap because the CFG does not change during the transform, so DT queries should be executed quickly. If you see compile time slowness from this, please revert. Differential Revision: https://reviews.llvm.org/D96119	2021-02-11 19:49:23 +07:00
Max Kazantsev	90081f3020	Revert "[Codegenprepare][X86] Use usub with overflow opt for IV increment" This reverts commit `3d15b7e7df`. We've found an internal failure, need to analyze.	2021-02-11 17:52:11 +07:00
Max Kazantsev	3d15b7e7df	[Codegenprepare][X86] Use usub with overflow opt for IV increment Function `replaceMathCmpWithIntrinsic` artificially limits the scope of the optimization, setting a requirement of two instructions be in the same block, due to two reasons: - usage of DT for more general check is costly in terms of compile time; - risk of creating a new value that lives through multiple blocks. Because of this, two semantically equivalent tests may be or not be the subject of this opt depending on where the binary operation is located. See `test/CodeGen/X86/usub_inc_iv.ll` for motivation There is one important particular case where this limitation is too strict: it is when the binary operation is the increment of the induction variable. As result, the application of this opt becomes fragile and highly reliant on where other passes decide to place IV increment. In most cases, they place it in the end of the latch block, killing the opt opportunity (when in fact it does not matter where to insert the actual instruction). This patch handles this particular case separately. - The detector does not use dom tree and has constant cost; - The value of IV or IV.next lives through all loop in any case, so this should not create a new unexpected long-living value. As result, the transform becomes more robust. It also seems to lead to better code generation in some cases (see `test/CodeGen/X86/lsr-loop-exit-cond.ll`). Differential Revision: https://reviews.llvm.org/D96119 Reviewed By: spatel, reames	2021-02-11 11:59:45 +07:00
Kazu Hirata	c5e90a8857	[AsmPrinter] Use range-based for loops (NFC)	2021-02-10 20:01:22 -08:00
Hongtao Yu	1cb47a063e	[CSSPGO] Unblock optimizations with pseudo probe instrumentation. The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include: 1. IR InstCombine, sinking load operation to shorten lifetimes. 2. MIR LiveRangeShrink, similar to #1 3. MIR TwoAddressInstructionPass, i.e, opeq transform 4. MIR function argument copy elision 5. IR stack protection. (though not perf-critical but nice to have). Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95982	2021-02-10 12:43:17 -08:00
Jeremy Morse	1d68e0a075	Reland [DWARF] Location-less inlined variables should not have DW_TAG_variable Originally landed in `ddc2f1e3fb` and reverted in `d32deaab4d` because of a Generic test objecting. That was fixed up in `013613964f`. Original landing commit message follows: [DWARF] Location-less inlined variables should not have DW_TAG_variable Discussed in this thread: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148139.html DwarfDebug::collectEntityInfo accidentally distinguishes between variable locations that never have a location specified, and variable locations that have an empty location specified. The latter leads to the creation of an empty variable referring to the abstract origin. Fix this by seeking a non-empty location before producing a concrete entity, to guarantee a DW_AT_location will be produced. Other loops in collectEntityInfo and endFunctionImpl take care of examining the retainedNodes collection and ensuring optimised-out variables are created. Differential Revision: https://reviews.llvm.org/D95617	2021-02-10 15:40:47 +00:00
Luís Marques	acac29ca42	[DAGCombiner] Don't fold FCOPYSIGN vector sign operand casts Avoid doing the following combine for vector types: ``` copysign(x, fp_extend(y)) -> copysign(x, y) copysign(x, fp_round(y)) -> copysign(x, y) ``` That combine seemed to impede the selection of vector instruction and cause a mess in some circumstances. Differential Revision: https://reviews.llvm.org/D96037	2021-02-10 14:25:24 +00:00
Sander de Smalen	750a78cd5d	[ValueTypes] Add MVT for nxv1bf16. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96249	2021-02-10 08:50:41 +00:00
Kazu Hirata	7e75f6fc1d	[SelectionDAG] Use range-based for loops (NFC)	2021-02-09 22:14:30 -08:00
Matt Arsenault	b72a23650f	GlobalISel: Fix using wrong calling convention for callees This was taking the calling convention from the parent function, instead of the callee. Avoids regressions in a future patch when the caller and callee have different type breakdowns. For some reason AArch64's lowerFormalArguments seems to intentionally ignore the parent isVarArg.	2021-02-09 13:48:56 -05:00
Nico Weber	de1966e542	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `4a64d8fe39`. Makes clang crash when buildling trivial iOS programs, see comment after https://reviews.llvm.org/D92808#2551401	2021-02-09 11:06:32 -05:00
Nemanja Ivanovic	a5222aa085	[DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets As of commit `284f2bffc9`, the DAG Combiner gets rid of the masking of the input to this node if the mask only keeps the bottom 16 bits. This is because the underlying library function does not use the high order bits. However, on PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits from the register. Therefore, the library implementation of __gnu_h2f_ieee will return an incorrect result if the bits aren't cleared. This combine is desired for ARM (and possibly other targets) so this patch adds a query to Target Lowering to check if this zeroing needs to be kept. Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092 Differential revision: https://reviews.llvm.org/D96283	2021-02-09 06:33:48 -06:00
Thomas Preud'homme	a50ab8672d	Revert STRICT_FCMP nonan optimisation Summary: This reverts commit `b7b61a7b5b` which fails on some of the builders: http://lab.llvm.org:8011/#/builders/14/builds/5806 Reviewers: Subscribers:	2021-02-09 11:27:35 +00:00
Thomas Preud'homme	b7b61a7b5b	Improve STRICT_FSETCC codegen in absence of no NaN As for SETCC, use a less expensive condition code when generating STRICT_FSETCC if the node is known not to have Nan. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D91972	2021-02-09 11:18:16 +00:00
Matt Arsenault	87e280110d	GlobalISel: Use correct calling convention in handleAssignments This was using the calling convention of the calling function, not the callee. Avoids regressions in a future patch.	2021-02-08 17:09:28 -05:00
Amara Emerson	ec41ed5b1b	[AArch64][GlobalISel] Support the 'returned' parameter attribute. On AArch64 (which seems to be the only target that supports it), this attribute allows codegen to avoid saving/restoring the value in x0 across a call. Gives a 0.1% geomean -Os code size improvement on CTMark. Differential Revision: https://reviews.llvm.org/D96099	2021-02-08 12:47:39 -08:00
Simon Pilgrim	c5c690a835	[DAG] visitVECTOR_SHUFFLE - move shuffle legality check into MergeInnerShuffle lamda. NFCI. This is going to be necessary for a future reuse of MergeInnerShuffle	2021-02-08 14:25:16 +00:00
Nicholas Guy	cd880442ae	[CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold Different targets might handle branch performance differently, so this patch allows for targets to specify the TailDuplicateSize threshold. Said threshold defines how small a branch can be and still be duplicated to generate straight-line code instead. This patch also specifies said override values for the AArch64 subtarget. Differential Revision: https://reviews.llvm.org/D95631	2021-02-08 13:28:00 +00:00

1 2 3 4 5 ...

30405 Commits