llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	fa61e200e5	AMDGPU/GlobalISel: Widen non-power-of-2 load results Load extra bits if suitably aligned. This allows using widened 3-vector loads on SI, and fixes legalization for <9 x s32> (which LSV apparently forms frequently on lowered kernel argument lists). Fix incorrectly treating these as legal on SI. This should emit a 64-bit store and a 32-bit store. I think all of the load and store rules are just about complete, but due for a rewrite.	2020-02-12 09:35:10 -05:00
Hans Wennborg	a19de32095	Fix unused function warning (PR44808)	2020-02-12 15:12:48 +01:00
Simon Pilgrim	9eb426c88c	[TargetLowering] Add NegatibleCost enum for isNegatibleForFree return codes The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not. This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied. It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on. Differential Revision: https://reviews.llvm.org/D74221	2020-02-12 11:51:42 +00:00
Jay Foad	e9900b1fbf	[AMDGPU] Add one more pass to LLVMInitializeAMDGPUTarget	2020-02-12 11:19:14 +00:00
Djordje Todorovic	97ed706a96	Revert "[DebugInfo] Enable the debug entry values feature by default" This reverts commit rG9f6ff07f8a39. Found a test failure on clang-with-thin-lto-ubuntu buildbot.	2020-02-12 11:59:04 +01:00
Djordje Todorovic	9f6ff07f8a	[DebugInfo] Enable the debug entry values feature by default This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-12 10:25:14 +01:00
Nicolai Hähnle	ab2f610f38	AMDGPU: llvm.amdgcn.writelane is a source of divergence Summary: Consider: %r = call i32 @llvm.amdgcn.writelane(i32 0, i32 1, i32 2) This produces a value that is 0 on lane 1, and 2 everywhere else; i.e., it is divergent. Reported-by: Marek Olsak <Marek.Olsak@amd.com> Reviewers: arsenm, foad, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74400	2020-02-12 09:12:56 +01:00
Kazushi (Jam) Marukawa	42a16dacda	[VE] Bit operator isel Summary: Isel and tests for bswap,brev,ctpop,ctlz,ctty,rotl,rotr Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D74304	2020-02-12 09:02:13 +01:00
Craig Topper	746395a446	[X86] Remove unnecessary hasSideEffects = 0, mayLoad = 1 from an instruction with a pattern. NFC	2020-02-11 23:26:29 -08:00
Craig Topper	3988b7046a	[X86] Correct the predicate on some patterns for 128 and 256 EVEX versions of VCVTPS2PH. These should require AVX512VL not AVX512F. The legacy VEX patterns will match first unless AVX512VL is enabled so this doesn't cause a functional issue.	2020-02-11 23:26:29 -08:00
Craig Topper	0daf9b8e41	[X86][LegalizeTypes] Add SoftPromoteHalf support STRICT_FP_EXTEND and STRICT_FP_ROUND This adds a strict version of FP16_TO_FP and FP_TO_FP16 and uses them to implement soft promotion for the half type. This is enough to provide basic support for __fp16 with strictfp. Add the necessary X86 support to use VCVTPS2PH/VCVTPH2PS when F16C is enabled.	2020-02-11 22:30:04 -08:00
Matt Arsenault	6d4ebada79	AMDGPU: Use conditions directly in division expansion This was creating a select on true/false values, and then comparing that later. This produced more work for later combines, which can be avoided by just using the boolean values. This was copied from the original DAG expansion, which also has the same problem. This doesn't have a observable change using SelectionDAG, but since GlobalISel is missing these optimizations, the final code was noticeably longer.	2020-02-11 23:11:30 -05:00
Austin Kerbow	3a312c3ee5	[AMDGPU][GlobalISel] Refactor selectDS1Addr1Offset/selectDS64Bit4ByteAligned Differential Revision: https://reviews.llvm.org/D74261	2020-02-11 16:57:13 -08:00
Matt Arsenault	b30e122333	AMDGPU: Don't expand more special div cases in IR These have nicer expansions implemented in the DAG. Ideally we would either directly implement all of these special expansions, or stop expanding division in the IR.	2020-02-11 19:01:06 -05:00
Matt Arsenault	86f9117d47	AMDGPU: Don't report 2-byte alignment as fast This is apparently worse than 1-byte alignment. This does not attempt to decompose 2-byte aligned wide stores, but will stop trying to produce them. Also fix bug in LoadStoreVectorizer which was decreasing the alignment and vectorizing stack accesses. It was assuming a stack object was an alloca that could have its base alignment changed, which is not true if the pointer is derived from a function argument.	2020-02-11 18:35:00 -05:00
Justin Lebar	1bd6123b78	Use std::foo_t rather than std::foo in LLVM. Summary: C++14 migration. No functional change. Reviewers: bkramer, JDevlieghere, lebedev.ri Subscribers: MatzeB, hiraditya, jkorous, dexonsmith, arphaman, kadircet, lebedev.ri, usaxena95, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74384	2020-02-11 15:12:51 -08:00
Matt Arsenault	f734ce0488	AMDGPU: Fix crash on v3i15 kernel arguments This was split into 3 i15 arguments. The i15 piece needs to be rounded to a simple MVT for the memory type.	2020-02-11 18:11:39 -05:00
Matt Arsenault	92c62582fc	AMDGPU: Directly use rcp intrinsic in idiv expansions Since natural fdiv lowering is now more conservative even with denormals disabled, we get a slower expansion from just a plain 1.0/fdiv. Directly emit the rcp intrinsic when using it to implement integer division to avoid a pointlessly complex sequence.	2020-02-11 18:11:39 -05:00
Matt Arsenault	b87e3e2d0d	AMDGPU: Don't create potentially dead rcp declarations This will introduce unused declarations if this doesn't reach any of the paths that will really use it.	2020-02-11 18:11:39 -05:00
Craig Topper	846d0ac43e	[X86] Don't disable code in combineHorizontalPredicateResult just because we have avx512 We aren't doing a good job of optimizing AVX512 outside of this code. So remove the bail out for AVX512 and replace with a FIXME. This at least gets us the AVX2 codegen. Differential Revision: https://reviews.llvm.org/D74431	2020-02-11 14:36:29 -08:00
Krzysztof Parzyszek	61ca996e79	[Hexagon] Don't generate short vectors in ISD::SELECT in preprocessing Selection DAG preprocessing runs long after legalization, so make sure that the types can be handled by the selection code.	2020-02-11 15:27:33 -06:00
lewis-revill	07f7c00208	[RISCV] Add support for save/restore of callee-saved registers via libcalls This patch adds the support required for using the __riscv_save and __riscv_restore libcalls to implement a size-optimization for prologue and epilogue code, whereby the spill and restore code of callee-saved registers is implemented by common functions to reduce code duplication. Logic is also included to ensure that if both this optimization and shrink wrapping are enabled then the prologue and epilogue code can be safely inserted into the basic blocks chosen by shrink wrapping. Differential Revision: https://reviews.llvm.org/D62686	2020-02-11 21:23:03 +00:00
Jay Foad	9df0c264d4	[AMDGPU] Fix implicit operands for ENTER_WWM pseudo Summary: SIInstrInfo::expandPostRAPseudo converts ENTER_WWM in-place into an S_OR_SAVEEXEC instruction that needs certain implicit operands. Without this patch I get errors like this that make it harder to use -stop-after to bisect the pass pipeline: $ llc -march=amdgcn test/CodeGen/AMDGPU/wqm.ll -stop-after=postrapseudos -o - \| sed -E 's/ (from\|into) custom "TargetCustom[0-9]+"//' \| llc -march=amdgcn -x=mir error: <stdin>:1295:70: missing implicit register operand 'implicit-def $scc' renamable $sgpr2_sgpr3 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec ^ Note that this error is currently only generated by MIParser but it comes with a FIXME comment: // FIXME: Move the implicit operand verification to the machine verifier. Reviewers: critson, arsenm, rampitec, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74428	2020-02-11 20:11:41 +00:00
Craig Topper	d7de7ac370	[X86] Raise the latency for VectorImul from 4 to 5 in Skylake scheduler models Based on uops.info these should have 5 cycle latency as they did on Haswell/Broadwell. I have no additional internal information from Intel. This was also shown as a discrepancy in the spreadsheet that was sent with an early llvm-dev post about llvm-exegesis. It also matches Agner Fog. Differential Revision: https://reviews.llvm.org/D74357	2020-02-11 11:24:25 -08:00
Stanislav Mekhanoshin	453a8f3af7	[AMDGPU] Remove AMDGPURegisterInfo R600 and GCN do not have anything in common in terms of register file organization anymore. Differential Revision: https://reviews.llvm.org/D74426	2020-02-11 11:13:38 -08:00
Jordan Rupprecht	734f086b42	[NFC] Fix unused var in release builds	2020-02-11 10:10:52 -08:00
Yonghong Song	29bc5dd194	[BPF] implement isTruncateFree and isZExtFree in BPFTargetLowering Currently, isTruncateFree() and isZExtFree() callbacks return false as they are not implemented in BPF backend. This may cause suboptimal code generation. For example, if the load in the context of zero extension has more than one use, the pattern zextload{i8,i16,i32} will not be generated. Rather, the load will be matched first and then the result is zero extended. For example, in the test together with this commit, we have I1: %0 = load i32, i32* %data_end1, align 4, !tbaa !2 I2: %conv = zext i32 %0 to i64 ... I3: %2 = load i32, i32* %data, align 4, !tbaa !7 I4: %conv2 = zext i32 %2 to i64 ... I5: %4 = trunc i64 %sub.ptr.lhs.cast to i32 I6: %conv13 = sub i32 %4, %2 ... The I1 and I2 will match to one zextloadi32 DAG node, where SUBREG_TO_REG is used to convert a 32bit register to 64bit one. During code generation, SUBREG_TO_REG is a noop. The %2 in I3 is used in both I4 and I6. If isTruncateFree() is false, the current implementation will generate a SLL_ri and SRL_ri for the zext part during lowering. This patch implement isTruncateFree() in the BPF backend, so for the above example, I3 and I4 will generate a zextloadi32 DAG node with SUBREG_TO_REG is generated during lowering to Machine IR. isZExtFree() is also implemented as it should help code gen as well. This patch also enables the change in https://reviews.llvm.org/D73985 since it won't kick in generates MOV_32_64 machine instruction. Differential Revision: https://reviews.llvm.org/D74101	2020-02-11 09:59:19 -08:00
Nikita Popov	5eb19bf4a2	[X86CmovConversion] Make heuristic for optimized cmov depth more conservative (PR44539) Fix/workaround for https://bugs.llvm.org/show_bug.cgi?id=44539. As discussed there, this pass makes some overly optimistic assumptions, as it does not have access to actual branch weights. This patch makes the computation of the depth of the optimized cmov more conservative, by assuming a distribution of 75/25 rather than 50/50 and placing the weights to get the more conservative result (larger depth). The fully conservative choice would be std::max(TrueOpDepth, FalseOpDepth), but that would break at least one existing test (which may or may not be an issue in practice). Differential Revision: https://reviews.llvm.org/D74155	2020-02-11 17:33:11 +01:00
Eric Astor	8d5bf0422b	[ms] [llvm-ml] Add support for attempted register parsing Summary: Add a new method (tryParseRegister) that attempts to parse a register specification. MASM allows the use of IFDEF <register>, as well as IFDEF <symbol>. To accommodate this, we make it possible to check whether a register specification can be parsed at the current location, without failing the entire parse if it can't. Reviewers: thakis Reviewed By: thakis Tags: #llvm Differential Revision: https://reviews.llvm.org/D73486	2020-02-11 10:45:33 -05:00
Jonas Paulsson	0311e28e9c	[SystemZ] Bugfix in emitSelect() When more than one SelectPseudo instruction is handled a new MBB is returned. This must not be done if that would result in leaving an undhandled isel pseudo behind in the original MBB. Fixes https://bugs.llvm.org/show_bug.cgi?id=44849. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D74352	2020-02-11 10:41:01 -05:00
Sjoerd Meijer	6b0ed508fa	[ARM][MVE] Tail-Predication: recognise (again) active lanes IR pattern A small IR change in calculating the active lanes resulted in no longer recognising tail-predication. Now recognise both an 'add' and 'or' in the expression that calculates the active lanes. Differential Revision: https://reviews.llvm.org/D74394	2020-02-11 15:18:18 +00:00
Andrew Wei	db875f6655	[RISCV] Optimize seteq/setne pattern expansions for better code size ADDI(C.ADDI) may achieve better code size than XORI, since XORI has no C extension. This patch transforms two patterns and gets almost equivalent results. Differential Revision: https://reviews.llvm.org/D71774	2020-02-11 22:45:15 +08:00
Simon Pilgrim	fa620fc8e2	[X86] combineConcatVectorOps - reuse IsSplat and remove duplicate code. NFC.	2020-02-11 13:37:57 +00:00
Simon Pilgrim	11c16e7159	[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option.	2020-02-11 12:21:03 +00:00
Mirko Brkusanin	5ba931a84a	[Mips] Add intrinsics for 4-byte and 8-byte MSA loads/stores. New intrinisics are implemented for when we need to port SIMD code from other arhitectures and only load or store portions of MSA registers. Following intriniscs are added which only load/store element 0 of a vector: v4i32 __builtin_msa_ldrq_w (const void , imm_n2048_2044); v2i64 __builtin_msa_ldr_d (const void , imm_n4096_4088); void __builtin_msa_strq_w (v4i32, void , imm_n2048_2044); void __builtin_msa_str_d (v2i64, void , imm_n4096_4088); Differential Revision: https://reviews.llvm.org/D73644	2020-02-11 11:47:30 +01:00
Kerry McLaughlin	e7755f9e4f	[AArch64][SVE] Add SVE2 intrinsics for complex integer dot product Summary: Implements the following intrinsics: - @llvm.aarch64.sve.cdot - @llvm.aarch64.sve.cdot.lane Reviewers: sdesmalen, efriedma, dancgr, c-rhodes, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73687	2020-02-11 10:28:31 +00:00
Jay Foad	b06a13f541	[AMDGPU] Fix non-deterministic iteration order Summary: As far as I know this did not affect code generation, but it did affect the order of -debug-only=si-wqm output and the naming of autonamed values in -print-after=si-wqm output. Reviewers: arsenm, rampitec, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, mgrang, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74317	2020-02-11 09:19:30 +00:00
Craig Topper	798305d29b	[X86] Custom lower ISD::FP16_TO_FP and ISD::FP_TO_FP16 on f16c targets instead of using isel patterns. We need to use vector instructions for these operations. Previously we handled this with isel patterns that used extra instructions and copies to handle the the conversions. Now we use custom lowering to emit the conversions. This allows them to be pattern matched and optimized on their own. For example we can now emit vpextrw to store the result if its going directly to memory. I've forced the upper elements to VCVTPHS2PS to zero to keep some code similar. Zeroes will be needed for strictfp. I've added a DAG combine for (fp16_to_fp (fp_to_fp16 X)) to avoid extra instructions in between to be closer to the previous codegen. This is a step towards strictfp support for f16 conversions.	2020-02-10 22:01:48 -08:00
diggerlin	09d26b79d2	[NFC] Refactor the tuple of symbol information with structure for llvm-objdump SUMMARY: refator the std::tuple<uint64_t, StringRef, uint8_t> to structor Reviewers: daltenty Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D74240	2020-02-10 19:23:01 -05:00
Xiangling Liao	660b0d7f7b	[AIX] Enable frame pointer for AIX and add related test suite This patch: - enable frame pointer for AIX; - update some of red zone comments; - add/update testcases; Differential Revision: https://reviews.llvm.org/D72454	2020-02-10 15:43:41 -05:00
diggerlin	aa86311e62	[AIX][XCOFF] Support Mergeable2ByteCString and Mergeable4ByteCString SUMMARY: The patch is enable to support Mergeable2ByteCString and Mergeable4ByteCString Reviewers: daltenty Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D74164	2020-02-10 14:45:54 -05:00
Jonas Paulsson	fcdb99e0b5	[SystemZ] Add a subtarget cache like some other targets already have. Each function is with this compiled with the SystemZSubtarget initialized from the functions attributes. Review: Ulrich Weigand. Differential Revision: https://reviews.llvm.org/D74086	2020-02-10 13:10:58 -05:00
Matt Arsenault	7af7b96a9b	AMDGPU: Move R600 test compatability hack Instead of handling the r600 intrinsics on amdgcn, handle the amdgcn intrinsics on r600.	2020-02-10 10:02:06 -08:00
Simon Pilgrim	f319074824	[X86] combineConcatVectorOps - combine X86ISD::PACKSS ops	2020-02-10 17:48:02 +00:00
Simon Pilgrim	74c0f98cf5	[X86] combineConcatVectorOps - combine X86ISD::VPERMI ops	2020-02-10 17:48:01 +00:00
Simon Pilgrim	2463b8c97d	[X86] combineConcatVectorOps - combine VSHLI/VSRAI/VSRLI ops Non-AVX512BW targets failed to concatenate 256-bit shifts back to 512-bits (split during 512-bit shuffle lowering as they don't have v32i16/v64i8 types).	2020-02-10 16:59:09 +00:00
Stanislav Mekhanoshin	ed3527c648	[AMDGPU] Split R600 and GCN subregs These are generated and do not need to have the same values. We are defining separate subregs for R600 and GCN but then using AMDGPU subregs on R600. Differential Revision: https://reviews.llvm.org/D74248	2020-02-10 08:29:56 -08:00
Simon Pilgrim	06617c4522	[X86] Add lowerShuffleAsBitRotate (PR44379) As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets. This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway. There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch. REAPPLIED rGe82e17d4d4ca after reversion at rG39eade73a567 - fixed offset matching in matchShuffleAsBitRotate.	2020-02-10 16:16:56 +00:00
Luke Geeson	a67db83681	[AArch64] Make Read Write System Registers Read Only This patch makes the following System Registers Read Only: - CurrentEL - ICH_MISR_EL2 - PMBIDR_EL1 - PMSIDR_EL1 as found in: https://developer.arm.com/docs/ddi0595/e/aarch64-system-registers Relative line numbers were also added to the tests so we get more informative error messages on failure. Change-Id: I963b4f01ca5737b58f9e8e7abe9ca1d99e328758	2020-02-10 14:34:24 +00:00
Kai Nacke	34946dfd79	[SystemZ] Add implementation for the intrinsic llvm.read_register This change implements the llvm intrinsic llvm.read_register for the SystemZ platform which returns the value of the specified register (http://llvm.org/docs/LangRef.html#llvm-read-register-and-llvm-write-register-intrinsics). This implementation returns the value of the stack register, and can be extended to return the value of other registers. The implementation for this intrinsic exists on various other platforms including Power, x86, ARM, etc. but missing on SystemZ. Reviewers: uweigand Differential Revision: https://reviews.llvm.org/D73378	2020-02-10 08:19:10 -05:00

1 2 3 4 5 ...

56022 Commits