llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	0ba40d4ccf	AMDGPU/GlobalISel: Combines for V_CVT_F32_UBYTE[0-3] Ports the existing DAG combines, minus the simplify demanded bits which seems to have no equivalent now. Without these, this isn't particularly helpful in most of the IR sample cases.	2020-04-13 19:18:19 -04:00
Austin Kerbow	a69b3e010c	[AMDGPU][GlobalISel] Fix div_scale in FDIV lowering Differential Revision: https://reviews.llvm.org/D78004	2020-04-13 15:54:49 -07:00
Heejin Ahn	ba40896f99	[WebAssembly] Fix try placement in fixing unwind mismatches Summary: In CFGStackify, `fixUnwindMismatches` function fixes unwind destination mismatches created by `try` marker placement. For example, ``` try ... call @qux ;; This should throw to the caller! catch ... end ``` When `call @qux` is supposed to throw to the caller, it is possible that it is wrapped inside a `catch` so in case it throws it ends up unwinding there incorrectly. (Also it is possible `call @qux` is supposed to unwind to another `catch` within the same function.) To fix this, we wrap this inner `call @qux` with a nested `try`-`catch`-`end` sequence, and within the nested `catch` body, branch to the right destination: ``` block $l0 try ... try ;; new nested try call @qux catch ;; new nested catch local.set n ;; store exnref to a local br $l0 end catch ... end end local.get n ;; retrieve exnref back rethrow ;; rethrow to the caller ``` The previous algorithm placed the nested `try` right before the `call`. But it is possible that there are stackified instructions before the call from which the call takes arguments. ``` try ... i32.const 5 call @qux ;; This should throw to the caller! catch ... end ``` In this case we have to place `try` before those stackified instructions. ``` block $l0 try ... try ;; this should go before 'i32.const 5' i32.const 5 call @qux catch local.set n br $l0 end catch ... end end local.get n rethrow ``` We correctly handle this in the first normal `try` placement phase (`placeTryMarker` function), but failed to handle this in this `fixUnwindMismatches`. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77950	2020-04-13 15:50:01 -07:00
Craig Topper	113f37a1f9	[CallSite removal][TargetLowering] Replace ImmutableCallSite with CallBase Differential Revision: https://reviews.llvm.org/D77995	2020-04-13 13:50:15 -07:00
Rahman Lavaee	05192e585c	Extend BasicBlock sections to allow specifying clusters of basic blocks in the same section. Differential Revision: https://reviews.llvm.org/D76954	2020-04-13 12:19:59 -07:00
Rahman Lavaee	4ddf7ab454	Revert "Extend BasicBlock sections to allow specifying clusters of basic blocks" This reverts commit `0d4ec16d3d` Because tests were not added to the commit.	2020-04-13 12:19:59 -07:00
Rahman Lavaee	0d4ec16d3d	Extend BasicBlock sections to allow specifying clusters of basic blocks in the same section. This allows specifying BasicBlock clusters like the following example: !foo !!0 1 2 !!4 This places basic blocks 0, 1, and 2 in one section in this order, and places basic block #4 in a single section of its own.	2020-04-13 11:46:11 -07:00
Matt Arsenault	e6605a209c	DAG: Fix wrong legality check for ISD::FMAD Since `1725f28841`, this should check isFMADLegalForFAddFSub rather than the the plain isOperationLegal. This would assert in a subset of cases due to an oddity in how FMAD is selected. We will allow FMA formation pre-legalize, but not FMAD even in cases where it would be valid. The current hook requires passing in the root fadd/fsub. However, in this distributed case, this would be far more complicated to pass in the relevant operand. AMDGPU doesn't get any value from the node, and only needs the type and is the only implementor, so I'm not sure why we have this complexity. Just rename and expand the assert to avoid the more complicated checks spread through the distribution logic.	2020-04-13 10:25:39 -07:00
Craig Topper	6dbf1a1229	[X86] Move X86ShuffleDecode.cpp/h into MCTargetDesc and remove X86Utils library. NFC The shuffle decoding is used by X86ISelLowering and MCTargetDesc/X86InstComments. The latter used to be in a separate InstPrinter library. The Utils library existed to allow InstPrinter and CodeGen to share the shuffle decoding. Since X86InstComments now lives in the MCTargetDesc, which CodeGen already depends on, we can sink the shuffle decoding there as well. Differential Revision: https://reviews.llvm.org/D77980	2020-04-13 10:14:08 -07:00
Jay Foad	bc78baec4c	[X86] Improve combineVectorShiftImm Summary: Fold (shift (shift X, C2), C1) -> (shift X, (C1 + C2)) for logical as well as arithmetic shifts. This is needed to prevent regressions from an upcoming funnel shift expansion change. While we're here, fold (VSRAI -1, C) -> -1 too. Reviewers: RKSimon, craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77300	2020-04-13 15:54:55 +01:00
Simon Pilgrim	401cbe373b	[X86][AVX] Attempt to scale masked shuffles to match the root type Improve the chances of folding the writemask into the combined shuffle by scaling a wider shuffle mask to match the root's original type. This creates a few minor issues with variable shuffles, preventing combines of shuffles because of the more limited support binary shuffle types. In most cases we're probably better off combining the shuffles and losing the writemask fold, but this isn't always going to be true.	2020-04-13 14:57:25 +01:00
Simon Pilgrim	fdd9ff9700	[X86][AVX] Create splitVectorIntBinary helper. Removes duplicate code from split256IntArith/split512IntArith.	2020-04-13 13:09:38 +01:00
Austin Kerbow	eab9a4f119	[AMDGPU] Don't assert on partial exec copy After Machine CSE and coalescing we can end up with copies of exec to subregister SGPRs. Differential Revision: https://reviews.llvm.org/D77992	2020-04-12 21:14:36 -07:00
Craig Topper	dbb272b0a3	[CallSite removal][FastISel] Use CallBase instead of CallSite in fastLowerCall.	2020-04-12 18:02:24 -07:00
Craig Topper	42fc7852f5	[X86] Print k-mask in FMA3 comments.	2020-04-12 13:16:53 -07:00
Craig Topper	95192f548d	[CallSite removal][TargetLowering] Use CallBase instead of CallSite in TargetLowering::ParseConstraints interface. Differential Revision: https://reviews.llvm.org/D77929	2020-04-12 11:26:25 -07:00
Mircea Trofin	d2f1cd5d97	[llvm][NFC] Refactor uses of CallSite to CallBase - call promotion Summary: Updated CallPromotionUtils and impacted sites. Parameters that are expected to be non-null, and return values that are guranteed non-null, were replaced with CallBase references rather than pointers. Left FIXME in places where more changes are facilitated by CallBase, but aren't CallSites: Instruction* parameters or return values, for example, where the contract that they are actually CallBase values. Reviewers: davidxl, dblaikie, wmi Reviewed By: dblaikie Subscribers: arsenm, jvesely, nhaehnle, eraman, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77930	2020-04-12 08:27:29 -07:00
Sanjay Patel	d04db4825a	[x86] use vector instructions to lower FP->int->FP casts As discussed in PR36617: https://bugs.llvm.org/show_bug.cgi?id=36617#c13 ...we can avoid the likely slow round-trip from XMM to GPR to XMM by using the vector versions of the convert instructions. Based on experimental results from recent Intel/AMD chips, we don't need to worry about triggering denorm stalls while operating on garbage data in the high lanes with convert instructions, so this is expected to always be as good or better perf than the scalar instruction equivalent. FP exceptions are also not a concern because strict code should not be using the regular SDAG opcodes. Differential Revision: https://reviews.llvm.org/D77895	2020-04-12 10:26:43 -04:00
Simon Pilgrim	40581a0a2b	[X86] Use isAnyZero shuffle mask helper where possible. NFC.	2020-04-12 10:57:29 +01:00
Craig Topper	d3465e0691	[X86] Enable shuffle combining for AVX512 unless the root is used by a vselect A lot of vectorized code doesn't use masks so we shouldn't penalize them by not doing shuffle combining on avx512 targets. I've added support for VALIGNQ/VALIGND and 512-bit SHUF128 to prevent some regressions. I also prevented recombining 256-bit SHUF128 to PERM2X128. We may not need to add 256-bit SHUF128 support, but I don't think I found any cases requiring that in my testing. Differential Revision: https://reviews.llvm.org/D77928	2020-04-11 20:05:10 -07:00
Matt Arsenault	96819011ca	AMDGPU/GlobalISel: Fix RegBankSelect for v2s16 shifts These need to be promoted and scalarized for the SALU.	2020-04-11 20:55:33 -04:00
Matt Arsenault	ac8d51a3c6	AMDGPU/GlobalISel: Legalize 16-bit shift amounts to s16 The current selector depends on 16-bit shifts using 16-bit shift amount types, but really it should accept either for all types.	2020-04-11 18:12:26 -04:00
Craig Topper	d1da1b53ff	[X86] Cleanup ISD::BRIND handling code in X86DAGToDAGISel::Select. NFC -Drop llvm:: on MVT::i32 -Use getValueType instead of getSimpleValueType for an equality check just cause its shorter and doesn't matter. -Don't create a const SDValue & since its cheap to copy. -Remove explicit case from MVT enum to EVT. -Add message to assert.	2020-04-11 15:01:05 -07:00
Craig Topper	21a7d08e72	[X86] Move code that replaces ISD::VSELECT with X86ISD::BLENDV from X86DAGToDAGISel::Select to PreprocessISelDAG	2020-04-11 15:01:05 -07:00
Matt Arsenault	c5497e5399	AMDGPU/GlobalISel: Fix legalizing <3 x s16> vselects	2020-04-11 15:59:51 -04:00
Fangrui Song	0a55d3f557	[MC] Default MCAsmInfo::UseIntegratedAssembler to true	2020-04-11 10:13:52 -07:00
Fangrui Song	d2e5157c1f	[MC] Add UseIntegratedAssembler = false. NFC	2020-04-11 10:13:49 -07:00
Matt Arsenault	cf29333f40	AMDGPU/GlobalISel: Work around forming illegal zextload after legalize Selection would fail after the post legalize combiner put an illegal zextload back together. The base combiner has parameter to only allow legal operations, but they appear to not be used. I also don't see a nice way to remove a single entry from all_combines, so just hack around this.	2020-04-11 10:52:58 -04:00
Sanjay Patel	1318ddbc14	[VectorUtils] rename scaleShuffleMask to narrowShuffleMaskElts; NFC As proposed in D77881, we'll have the related widening operation, so this name becomes too vague. While here, change the function signature to take an 'int' rather than 'size_t' for the scaling factor, add an assert for overflow of 32-bits, and improve the documentation comments.	2020-04-11 10:05:49 -04:00
Nemanja Ivanovic	512600e3c0	[PowerPC] Handle f16 as a storage type only The PPC back end currently crashes (fails to select) with f16 input. This patch expands it on subtargets prior to ISA 3.0 (Power9) and uses the HW conversions on Power9. Fixes https://bugs.llvm.org/show_bug.cgi?id=39865 Differential revision: https://reviews.llvm.org/D68237	2020-04-11 07:34:47 -05:00
Craig Topper	9c1842d8af	Change FastISel::CallLoweringInfo::CS to be an ImmutableCallSite instead of a pointer. NFCI. This is the same as what was done to the CallLoweringInfo in TargetLowering.h in r309159. This is just a step on the way to replacing this with CallBase.	2020-04-10 23:45:36 -07:00
Nemanja Ivanovic	04eae39617	[PowerPC] Another folow-up fix for `6c4b40def7` There was another issue introduced by this commit that the OP initially missed. Namely, for functions that are free to use R2 as a callee-saved register, we emit a TOC expression based on the address of the GEP label without emitting the GEP label. Since we only emit such expressions for the large code model, this issue only surfaced there. I have confirmed that with this fix, the kernel build is successful with target "all".	2020-04-10 21:09:59 -05:00
Scott Constable	0505181006	[X86] Fix to X86LoadValueInjectionRetHardeningPass for possible segfault `MBB.back()` could segfault if `MBB.empty()`. Fixed by checking for `MBB.empty()` in the loop. Differential Revision: https://reviews.llvm.org/D77584	2020-04-10 18:28:08 -07:00
Sam Clegg	16206ee07d	[WebAssembly] Minor cleanup to WebAssemblySubtarget. NFC. Pretty much all other platforms pass CPU string as arg0 of initializeSubtargetDependencies. Differential Revision: https://reviews.llvm.org/D77894	2020-04-10 16:47:39 -07:00
Craig Topper	b8a108140d	[CallSite removal][X86] Remove uses of CallSite from X86WinEHState.cpp Differential Revision: https://reviews.llvm.org/D77862	2020-04-10 11:34:06 -07:00
Fangrui Song	a7aaaf7016	[MC][RISCV] Make .reloc support arbitrary relocation types Similar to D76746 (ARM), D76754 (AArch64) and llvmorg-11-init-6967-g152d14da64c (x86) Differential Revision: https://reviews.llvm.org/D77018	2020-04-10 10:43:53 -07:00
Craig Topper	a6732069ee	[CallSite removal][X86] Remove unneeded use of CallSite. NFC We already have a CallInst, we can just get the calling convention from it.	2020-04-10 10:27:21 -07:00
Fangrui Song	7f36cb1f1a	[AArch64InstPrinter] Change printAlignedLabel to print the target address in hexadecimal form Similar to D76580 (x86) and D76591 (PPC). ``` // llvm-objdump -d output (before) 10000: 08 00 00 94 bl #32 10004: 08 00 00 94 bl #32 // llvm-objdump -d output (after) 10000: 08 00 00 94 bl 0x10020 10004: 08 00 00 94 bl 0x10024 // GNU objdump -d. The lack of 0x is not ideal due to ambiguity. 10000: 94000008 bl 10020 <bar+0x18> 10004: 94000008 bl 10024 <bar+0x1c> ``` The new output makes it easier to find the jump target. Differential Revision: https://reviews.llvm.org/D77853	2020-04-10 09:21:09 -07:00
Simon Pilgrim	1824ae0f42	[X86] Remove defunct EmitLoweredAtomicFP declaration. NFC.	2020-04-10 17:05:07 +01:00
Simon Pilgrim	dd84a2f77a	[X86] Remove defunct emitFMA3Instr declaration. NFC.	2020-04-10 17:05:06 +01:00
Christopher Tetreault	65b8b643b4	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: sdesmalen, efriedma, jonpa Reviewed By: sdesmalen Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77265	2020-04-10 08:43:32 -07:00
Simon Pilgrim	a88cc20456	ProfileSummaryInfo.h - remove unnecessary includes. NFC Remove a number of includes that aren't necessary (nor are we relying on the remaining includes to provide the declarations), we just needed a llvm::Instruction forward declaration. This exposed a couple of source files that were implicitly replying on the includes for their use of llvm::SmallSet or std::set, requiring local includes to be added there instead.	2020-04-10 16:25:48 +01:00
Stanislav Mekhanoshin	44920e8566	[AMDGPU] Disable sub-dword scralar loads IR widening These will be widened in the DAG. In the meanwhile early widening prevents otherwise possible vectorization of such loads. Differential Revision: https://reviews.llvm.org/D77835	2020-04-10 08:20:49 -07:00
Simon Pilgrim	91bc50c0d7	[CostModel][X86] Improve InsertElement costs for sub-128bit vectors If we're inserting into v2i8/v4i8/v8i8/v2i16/v4i16 style sub-128bit vectors ensure we don't use the SK_PermuteTwoSrc cost of the legalized value type - this is a followup to rG12c629ec6c59 which added equivalent sub-128bit shuffle costs	2020-04-10 14:55:46 +01:00
Michael Liao	b54b4ecac3	Fix `-Wextra` warning. NFC.	2020-04-10 03:22:02 -04:00
Kai Luo	b7d5229d78	[PowerPC] Update alignment for ReuseLoadInfo in LowerFP_TO_INTForReuse In LowerFP_TO_INTForReuse, when emitting `stfiwx`, alignment of 4 is set for the `MachineMemOperand`, but RLI(ReuseLoadInfo)'s alignment is not updated for following loads. It's related to failed alignment check reported in https://bugs.llvm.org/show_bug.cgi?id=45297 Differential Revision: https://reviews.llvm.org/D77624	2020-04-10 05:49:19 +00:00
Nemanja Ivanovic	7f3787c0f2	[PowerPC] Bail out of redundant LI elimination on an implicit kill The transformation currently does not differentiate between explicit and implicit kills. However, it is not valid to later simply clear an implicit kill flag since the kill could be due to a call or return. Fixes: https://bugs.llvm.org/show_bug.cgi?id=45374	2020-04-09 22:17:29 -05:00
Heejin Ahn	b647de9925	[WebAssembly] Use dummy debug info in Emscripten SjLj Summary: D74269 added debug info to newly created instructions, including calls to `malloc` and `free`, by taking debug info from existing instructions around, whose debug info may or may not be empty. But there are cases debug info is required by the IR verifier: when both the caller and the callee functions have DISubprograms, meaning we already have declarations to `malloc` or `free` with a DISubprogram attached, newly created calls to `malloc` and `free` should have non-empty debug info. This patch creates a non-empty dummy debug info in this case to those calls to make the IR verifier pass. Fixes https://bugs.llvm.org/show_bug.cgi?id=45461. Reviewers: dschuff Subscribers: aprantl, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77784	2020-04-09 18:44:50 -07:00
Stefan Pintilie	5b18b6e9a8	[PowerPC][Future] Fix for `6c4b40def7` This is a fix for the previous patch `6c4b40def7`. In some cases it may be possible to have the compiler produce st_other=1 without the compiler using mcpu=future which should not be the case. This patch adds a guard to make sure that if we are using st_other=1 then we are also compiling for future CPU.	2020-04-10 01:12:11 +00:00
Craig Topper	5625e6ab37	[X86] Improve min/max reduction costs. This is similar to what I recently did for getArithmeticReductionCost. I'm trying to account for the narrowing from 512->256->128 as we go. I've also added a new helper method getMinMaxCost that tries to handle the cases where we have native min/max instructions and fall back to cmp+select when we don't. Differential Revision: https://reviews.llvm.org/D76634	2020-04-09 17:28:50 -07:00
Nemanja Ivanovic	5fe2809447	[PowerPC] Don't assert on SELECT_CC with i1 type When we try to select a SELECT_CC on Power9, we check if it can be matched to a SETB instruction. In that function, we assert that the output type is i32/i64. This is unnecessary as it is perfectly reasonable to have an i1 SELECT_CC. Change that from an assert to an early exit condition. Fixes: https://bugs.llvm.org/show_bug.cgi?id=45448	2020-04-09 19:27:32 -05:00
Amara Emerson	e99169f1c2	[AArch64][GlobalISel] CallLowering: Don't generate new copies each time we need to store to a stack location for outgoing args. During call arg lowering we shouldn't be modifying SP so cache the SP copy vreg for subsequent uses. Gives a 0.2% geomean code size improvement on CTMark. Differential Revision: https://reviews.llvm.org/D77838	2020-04-09 17:08:56 -07:00
Christopher Tetreault	9f87d951fc	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: mcrosier, efriedma, sdesmalen Reviewed By: efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77269	2020-04-09 16:43:29 -07:00
James Y Knight	5e7b98fe75	Fix an unused-variable warning in Release mode.	2020-04-09 16:34:55 -04:00
Christopher Tetreault	e634f482ea	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: arsenm, efriedma, sdesmalen Reviewed By: arsenm Subscribers: wdng, arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77268	2020-04-09 13:11:37 -07:00
Christopher Tetreault	e1e131ea5e	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: grosbach, efriedma, sdesmalen Reviewed By: efriedma Subscribers: hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77271	2020-04-09 12:52:44 -07:00
Stefan Pintilie	64868cbfcf	[PowerPC][Future] Fix for `75828ef615` Used unsigned long where uint64_t should have been used by mistake. Fixed in this patch.	2020-04-09 19:33:12 +00:00
Simon Pilgrim	12c629ec6c	[CostModel][X86] Add shuffle costs for some common sub-128bit vectors v2i8/v4i8/v8i8 + v2i16/v4i16 all show up in vectorizer code and by just using the legalized types (v16i8/v8i16) we're highly exaggerating the actual cost of the shuffle.	2020-04-09 19:57:06 +01:00
Paolo Savini	fae40bd5a1	[RISCV] Add MC layer support for proposed Bit Manipulation extension (version 0.92) This adds the instruction encoding and mnenomics for the proposed RISC-V Bit Manipulation extension (version 0.92). It is implemented with each category of instruction as its own target feature, with the 'b' extension feature enabling all options. Since this extension is not yet ratified, all target features are prefixed with 'experimental-' to note their status. Differential Revision: https://reviews.llvm.org/D65649	2020-04-09 18:04:22 +01:00
jasonliu	085689d44c	[PPC][AIX] Implement variadic function handling in LowerFormalArguments_AIX Summary: This patch adds support for handling of variadic functions for AIX. This includes ensuring that use and consume correct type of va_list (char *va_list) for AIX. Authored by: ZarkoCA Reviewers: cebowleratibm, sfertile, jasonliu Reviewed by: jasonliu Differential Revision: https://reviews.llvm.org/D76130	2020-04-09 16:49:44 +00:00
Stefan Pintilie	75828ef615	[PowerPC][Future] Initial support for PCRel addressing for constant pool loads Add initial support for PC Relative addressing for constant pool loads. This includes adding a new relocation for @pcrel and adding a new PowerPC flag to identify PC relative addressing. Differential Revision: https://reviews.llvm.org/D74486	2020-04-09 11:17:23 -05:00
Kazushi (Jam) Marukawa	015dee1ac8	[VE] Support (m)0 and (m)1 operands Summary: VE has special operands to represent 0b000...000111...111 (`(m)0`) and 0b111...111000...000 (`(m)1`) bit sequences. This patch supports those operands not only in machine instructions but also in DAG lowering. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D77769	2020-04-09 18:09:00 +02:00
Craig Topper	5a55363dc4	[X86] Remove redundant VMOVDDUPZ128rmk/VMOVDDUPZ128rmkz isel patterns. These patterns are identical to the pattern for the instruction.	2020-04-09 09:06:58 -07:00
Simon Cook	2df6a02fd7	[RISCV] Implement evaluateBranch This implements the instruction analysis required to print branch targets as part of llvm-objdump's disassembly. Note, this only handles those branches which can be analyzed in a single instruction, a future patch will handle multiple-instruction patterns, such as AUIPC/LUI+JALR instruction pairs. Differential Revision: https://reviews.llvm.org/D77567	2020-04-09 15:11:55 +01:00
Shengchen Kan	2477cec2ac	[NFC][X86] Refine code in X86AsmBackend Summary: Move code to a better place, rename function, etc Tags: #llvm Differential Revision: https://reviews.llvm.org/D77778	2020-04-09 21:31:52 +08:00
Jay Foad	9c7bd94ce8	Fix typo in comment	2020-04-09 10:36:00 +01:00
Jay Foad	4970a1deca	[AMDGPU] Remove outdated comment	2020-04-09 10:36:00 +01:00
Jay Foad	c63aed890e	[KnownBits] Move AND, OR and XOR logic into KnownBits Summary: There are at least three clients for KnownBits calculations: ValueTracking, SelectionDAG and GlobalISel. To reduce duplication the common logic should be moved out of these clients and into KnownBits itself. This patch does this for AND, OR and XOR calculations by implementing and using appropriate operator overloads KnownBits::operator& etc. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74060	2020-04-09 10:10:37 +01:00
WangTianQing	a3dc949000	[X86] Add TSXLDTRK instructions. Summary: For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference Reviewers: craig.topper, RKSimon, LuoYuanke Reviewed By: craig.topper Subscribers: mgorny, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D77205	2020-04-09 13:17:29 +08:00
Matt Arsenault	0aa0d70067	MIR: Use Register	2020-04-08 22:07:26 -04:00
Sam Clegg	7baad0c53c	[WebAssembly][MC] Use StringRef over std::string pointer This is followup based on feedback on `5be42f36f5`. See: https://reviews.llvm.org/D77627. Differential Revision: https://reviews.llvm.org/D77674	2020-04-08 18:28:08 -07:00
Christopher Tetreault	49fd24fe9e	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: hfinkel, efriedma, sdesmalen Reviewed By: efriedma Subscribers: wuzish, nemanjai, hiraditya, kbarton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77266	2020-04-08 16:10:55 -07:00
Artem Belevich	a9627b7ea7	[CUDA] Add partial support for recent CUDA versions. Generate PTX using newer versions of PTX and allow using sm_80 with CUDA-11. None of the new features of CUDA-10.2+ have been implemented yet, so using these versions will still produce a warning. Differential Revision: https://reviews.llvm.org/D77670	2020-04-08 11:19:44 -07:00
Sean Fertile	d0b57b41f4	[PowerPC][AIX][NFC] Replace deprecated getByValAlign call. Replace call to deprecated 'getByValAlign()' with 'getNonZeroByValAlign()'.	2020-04-08 13:27:39 -04:00
Matt Arsenault	dcce3ef1d2	FastISel: Partially use Register Doesn't try to convert the cases that depend on generated code.	2020-04-08 12:10:58 -04:00
Matt Arsenault	ca0ace7298	CodeGen: Use Register in MachineBasicBlock	2020-04-08 12:10:58 -04:00
Matt Arsenault	84aa58cbe2	CodeGen: Use Register in TargetLowering	2020-04-08 12:10:58 -04:00
Sean Fertile	8abfd2c3bb	[PowerPC][AIX] Enable passing byval formal arguments in multiple registers. Any or all the argument registers can be used to pass a byval formal argument, with the limitation that the argument must fit in the available registers (ie: is not split between registers and stack). Differential Revision: https://reviews.llvm.org/D76902	2020-04-08 11:16:33 -04:00
Stefan Pintilie	6c4b40def7	[PowerPC][Future] Add Support For Functions That Do Not Use A TOC. On PowerPC most functions require a valid TOC pointer. This is the case because either the function itself needs to use this pointer to access the TOC or because other functions that are called from that function expect a valid TOC pointer in the register R2. The main exception to this is leaf functions that do not access the TOC since they are guaranteed not to need a valid TOC pointer. This patch introduces a feature that will allow more functions to not require a valid TOC pointer in R2. Differential Revision: https://reviews.llvm.org/D73664	2020-04-08 08:07:35 -05:00
Simon Pilgrim	66c18c729d	[X86][SSE] Combine PTEST(AND(X,Y),AND(X,Y)) -> PTEST(X,Y) and ANDN equivalents Tests derived from PR42035 examples	2020-04-08 12:42:22 +01:00
Shengchen Kan	916044d819	[X86][MC] Support enhanced relaxation for branch align Summary: Since D75300 has been landed, I want to support enhanced relaxation when we need to align branches and allow prefix padding. "Enhanced Relaxtion" means we allow an instruction that could not be traditionally relaxed to be emitted into RelaxableFragment so that we increase its length by adding prefixes for optimization. The motivation is straightforward, RelaxFragment is mostly for relative jumps and we can not increase the length of jumps when we need to align them, so if we need to achieve D75300's purpose (reducing the bytes of nops) when need to align jumps, we have to make more instructions "relaxable". Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight Reviewed By: reames Subscribers: hiraditya, llvm-commits, annita.zhang Tags: #llvm Differential Revision: https://reviews.llvm.org/D76286	2020-04-08 19:08:19 +08:00
Anna Welker	89e1248d7b	[ARM][MVE] Optimise offset addresses of gathers/scatters This patch adds an analysis of the offset addresses used by gathers and scatters to the MVEGatherScatterLowering pass to find multiplications and additions that are loop invariant and thus can be moved into the loop preheader, avoiding to execute them each time. Differential Revision: https://reviews.llvm.org/D76681	2020-04-08 11:46:57 +01:00
Kazushi (Jam) Marukawa	aa034867f1	[VE] Simplify definitions of uimm6 and simm7 Summary: To prepare continuous changes, simplify uimm6 and simm7 operands. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D77700	2020-04-08 09:53:42 +02:00
Stanislav Mekhanoshin	f96810ff34	[AMDGPU] Expand vector trunc stores from i16 to i8 Differential Revision: https://reviews.llvm.org/D77693	2020-04-07 21:47:45 -07:00
Eli Friedman	565b56a72c	[NFC] Clean up uses of LoadInst constructor.	2020-04-07 16:28:53 -07:00
Fangrui Song	624654fd64	[VE] Migrate to the getMachineMemOperand overload using llvm::Align Just delete the deprecated overload because nothing uses it.	2020-04-07 16:04:54 -07:00
Matt Arsenault	6011627f51	CodeGen: More conversions to use Register	2020-04-07 18:54:36 -04:00
Fangrui Song	2f8fb4d1cd	[VE] Adapt `aa26dd9858` and `2481f26ac3`	2020-04-07 15:45:19 -07:00
Stanislav Mekhanoshin	96e51ed005	[AMDGPU] Implement copyPhysReg for 16 bit subregs Differential Revision: https://reviews.llvm.org/D74937	2020-04-07 14:22:46 -07:00
Matt Arsenault	2481f26ac3	CodeGen: Use Register in TargetFrameLowering	2020-04-07 17:07:44 -04:00
Matt Arsenault	aa26dd9858	CodeGen: Use Register in more places	2020-04-07 15:59:40 -04:00
Nemanja Ivanovic	ecd8435483	[NFC][PowerPC] Fix register class for patterns using XXPERMDIs There are a few patterns where we use a superclass for inputs to this instruction rather than the correct class. This can sometimes lead to unncessary copies.	2020-04-07 14:06:08 -05:00
Graham Sellers	a19a56f6a1	[AMDGPU] Extend constant folding for logical operations This patch extends existing constant folding in logical operations to handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and V_AND_OR_B32. Also added a couple of tests for existing folds.	2020-04-07 14:37:16 -04:00
Craig Topper	c41685b16f	[SelectionDAG] Make getZeroExtendInReg take a vector VT if the operand VT is a vector. This removes a call to getScalarType from a bunch of call sites. It also makes the behavior consistent with SIGN_EXTEND_INREG. Differential Revision: https://reviews.llvm.org/D77631	2020-04-07 11:34:08 -07:00
Eli Friedman	e9ac757f79	[AArch64] Don't expand memcmp in strict align mode. `7aecf232` fixed the bug where we would miscompile, but we still generate a crazy amount of code. Turn off the expansion until someone implements an appropriate heuristic. Differential Revision: https://reviews.llvm.org/D77599	2020-04-07 10:53:36 -07:00
Matt Arsenault	f596ab4066	AMDGPU: Use early return	2020-04-07 13:48:00 -04:00
Sam Clegg	5be42f36f5	[WebAssembly][MC] Fix leak of std::string members in MCSymbolWasm Summary: Fixes: https://bugs.llvm.org/show_bug.cgi?id=45452 Subscribers: dschuff, jgravelle-google, hiraditya, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77627	2020-04-07 10:38:43 -07:00
Stanislav Mekhanoshin	12a324393d	[AMDGPU] Limit endcf-collapase to simple if We can only collapse adjacent SI_END_CF if outer statement belongs to a simple SI_IF, otherwise correct mask is not in the register we expect, but is an argument of an S_XOR instruction. Even if SI_IF is simple it might be lowered using S_XOR because lowering is dependent on a basic block layout. It is not considered simple if instruction consuming its output is not an SI_END_CF. Since that SI_END_CF might have already been lowered to an S_OR isSimpleIf() check may return false. This situation is an opportunity for a further optimization of SI_IF lowering, but that is a separate optimization. In the meanwhile move SI_END_CF post the lowering when we already know how the rest of the CFG was lowered since a non-simple SI_IF case still needs to be handled. Differential Revision: https://reviews.llvm.org/D77610	2020-04-07 10:27:23 -07:00
David Tenty	b9245f14b7	[NFC][PowerPC] Cleanup 64-bit and Darwin CalleeSavedRegs Summary: - Remove the no longer used Darwin CalleeSavedRegs - Combine the SVR464 callee saved regs and AIX64 since the two are (and should be) identical into PPC64 - Update tests for 64-bit CSR change Reviewers: sfertile, ZarkoCA, cebowleratibm, jasonliu, #powerpc Reviewed By: sfertile Subscribers: wuzish, nemanjai, hiraditya, kbarton, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77235	2020-04-07 11:49:10 -04:00
Simon Pilgrim	e3b6059776	[X86][SSE] combineX86ShufflesConstants - early out for zeroable vectors (PR45443) Shuffle combining can insert zero byte sized elements into the shuffle mask, which combineX86ShufflesConstants will attempt to fold without taking into account whether the byte-sized type is legal (e.g. AVX512F only targets). If we have a full-zeroable vector then we should just return a zero version of the root type, otherwise if the type isn't valid we should bail. Fixes PR45443	2020-04-07 14:45:29 +01:00
Keith Walker	01dc10774e	[ARM] unwinding .pad instructions missing in execute-only prologue If the stack pointer is altered for local variables and we are generating Thumb2 execute-only code the .pad directive is missing. Usually the size of the adjustment is stored in a PC-relative location and loaded into a register which is then added to the stack pointer. However when we are generating execute-only code code the size of the adjustment is instead generated using the MOVW/MOVT instruction pair. As a by product of handling the execute-only case this also fixes an existing issue that in the none execute-only case the .pad directive was generated against the load of the constant to a register instruction, instead of the instruction which adds the register to the stack pointer. Differential Revision: https://reviews.llvm.org/D76849	2020-04-07 11:51:59 +01:00
Peter Smith	14c1e98754	[ARM] Remove condition that could never be true From Arm v8 Architecture Reference Manual F5.1.84 LDREXD The ldrexd instruction in Arm state has the following conditions: t = UInt(Rt); t2 = t + 1; n = UInt(Rn); if Rt<0> == '1' \|\| t2 == 15 \|\| n == 15 then UNPREDICTABLE; In when Rt is odd or if Rt is 14 (making t2 15). In the implementation when the pair is the UNPREDICTABLE R14_R15 we would ideally return SOFT_FAIL. We can't because there is no R14_R15 value for us to return so we fail early returning FAIL. The early return for registers outside the bounds of the table means the check for Rt == 14 (0xE) redundant which causes a static analyzer to flag the condition as never being true. To fix the warning I've removed the check and replaced with a comment explaining the difference with the specification. Fixes pr41660 Differential Revision: https://reviews.llvm.org/D77463	2020-04-07 09:50:56 +01:00
Sam Clegg	f0bbf3d086	[WebAssembly] EmscriptenEHSjLj: Mark more functions as imported These should have been part of https://reviews.llvm.org/D77192 Differential Revision: https://reviews.llvm.org/D77358	2020-04-06 21:27:31 -07:00
Xiang1 Zhang	01a32f2bd3	Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow Enforcement Technology) Do not commit the llvm/test/ExecutionEngine/MCJIT/cet-code-model-lager.ll because it will cause build bot fail(not suitable for window 32 target). Summary: This patch comes from H.J.'s `2bd54ce7fa` This patch fix the failed llvm unit tests which running on CET machine. (e.g. ExecutionEngine/MCJIT/MCJITTests) The reason we enable IBT at "JIT compiled with CET" is mainly that: the JIT don't know the its caller program is CET enable or not. If JIT's caller program is non-CET, it is no problem JIT generate CET code or not. But if JIT's caller program is CET enabled, JIT must generate CET code or it will cause Control protection exceptions. I have test the patch at llvm-unit-test and llvm-test-suite at CET machine. It passed. and H.J. also test it at building and running VNCserver(Virtual Network Console), it works too. (if not apply this patch, VNCserver will crash at CET machine.) Reviewers: hjl.tools, craig.topper, LuoYuanke, annita.zhang, pengfei Reviewed By: LuoYuanke Subscribers: tstellar, efriedma, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76900	2020-04-07 09:48:47 +08:00
Eli Friedman	68b03aee1a	Remove SequentialType from the type heirarchy. Now that we have scalable vectors, there's a distinction that isn't getting captured in the original SequentialType: some vectors don't have a known element count, so counting the number of elements doesn't make sense. In some cases, there's a better way to express the commonality using other methods. If we're dealing with GEPs, there's GEP methods; if we're dealing with a ConstantDataSequential, we can query its element type directly. In the relatively few remaining cases, I just decided to write out the type checks. We're talking about relatively few places, and I think the abstraction doesn't really carry its weight. (See thread "[RFC] Refactor class hierarchy of VectorType in the IR" on llvmdev.) Differential Revision: https://reviews.llvm.org/D75661	2020-04-06 17:03:49 -07:00
David Blaikie	5aead592f0	X86ISelLowering: Minor refactor to avoid redundant initialization while ensuring compiler warnings can hopefully still prove initialization Based on post-commit review/discussion in fabe52a7412b	2020-04-06 14:25:52 -07:00
Konstantin Pyzhov	72e8754916	[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU. Reviewers: sameerds, dstuttard Differential Revision: https://reviews.llvm.org/D77228	2020-04-06 09:05:58 -04:00
Matt Arsenault	869f05c834	AMDGPU: Remove dead paths for requiresUniformRegister The extracts from control flow intrinsics are already properly handled by divergence analysis. The inline asm case isn't dead, but has also never really worked correctly so leave it as-is for now.	2020-04-06 16:15:10 -04:00
Konstantin Pyzhov	51dc028314	Revert `e1730cfeb3`	2020-04-06 05:56:11 -04:00
Konstantin Pyzhov	e1730cfeb3	[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU. Reviewers: sameerds, dstuttard Differential Revision: https://reviews.llvm.org/D77228	2020-04-06 05:10:37 -04:00
Fangrui Song	a5d375e0cb	[AArch64] Allow logical immediates to have all-1 in top bits So that constant expressions like the following are permitted: and w0, w0, #~(0xfe<<24) and w1, w1, #~(0xff<<24) The behavior matches GNU as (opcodes/aarch64-opc.c:aarch64_logical_immediate_p). Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D75885	2020-04-06 09:56:04 -07:00
Matt Arsenault	8a5f0dafd4	AMDGPU/GlobalISel: Select llvm.amdgcn.div.scale	2020-04-06 11:50:19 -04:00
Matt Arsenault	e87ec66762	AMDGPU/GlobalISel: Fix llvm.amdgcn.div.fmas.ll	2020-04-06 11:50:16 -04:00
Jay Foad	ddd2f4b96f	[AMDGPU] Fix inaccurate comments	2020-04-06 16:44:08 +01:00
Chris Bowler	d6ea82d11c	[AIX][PPC] Implement by-val caller arguments in multiple registers Differential Revision: https://reviews.llvm.org/D76380	2020-04-06 11:06:51 -04:00
Matt Arsenault	cbf719b568	AMDGPU: Use DAG patterns for div_fmas	2020-04-06 09:28:30 -04:00
Matt Arsenault	79b29d6df7	AMDGPU: Remove DisableInst feature I'm not sure why these were bothering to check the instruction profile, since those profiles should only be used with these instruction classes.	2020-04-06 09:27:44 -04:00
Hans Wennborg	64c2312750	Revert `43f031d312` "Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow Enforcement Technology)" ExecutionEngine/MCJIT/cet-code-model-lager.ll is failing on 32-bit windows, see llvm-commits thread for `fef2dab`. This reverts commit `43f031d312` and the follow-ups `fef2dab100` and `6a800f6f62`.	2020-04-06 15:05:25 +02:00
Guillaume Chatelet	ff858d7781	[Alignment][NFC] Add DebugStr and operator* Summary: This is a roll forward of D77394 minus AlignmentFromAssumptions (which needs to be addressed separately) Differences from D77394: - DebugStr() now prints the alignment value or `None` and no more `Align(x)` or `MaybeAlign(x)` - This is to keep Warning message consistent (CodeGen/SystemZ/alloca-04.ll) - Removed a few unneeded headers from Alignment (since it's included everywhere it's better to keep the dependencies to a minimum) Reviewers: courbet Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77537	2020-04-06 12:09:45 +00:00
Simon Pilgrim	9bc5b1a489	[X86][SSE] combineVectorSignBitsTruncation - remove minimum vector length limitations truncateVectorWithPACK has its own vector length controls, so we can rely on those directly. This helps some existing truncation to subvector tests, which were being combined later during shuffle lowering at which point the sign/zero bit detection had become obscured preventing lowerShuffleWithPACK working as well as it could.	2020-04-06 12:45:23 +01:00
Kazushi (Jam) Marukawa	e981a46a77	[VE] Update lea/load/store instructions Summary: Modify lea/load/store instructions to accept `disp(index, base)` style addressing mode (called ASX format). Also, uniform the number of DAG nodes to have 3 operands for this ASX format instructions, and update selectADDR functions to lower appropriate MI. Reviewers: arsenm, simoll, k-ishizaka Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D76822	2020-04-06 11:49:46 +02:00
Oliver Stannard	a294d9eb21	Revert "[IPRA][ARM] Spill extra registers at -Oz" Reverting because this is causing failures on bots with expensive checks enabled. This reverts commit `73cea83a6f`.	2020-04-06 10:34:59 +01:00
Kerry McLaughlin	944e322f88	[AArch64][SVE] Add SVE intrinsics for saturating add & subtract Summary: Adds the following intrinsics: - @llvm.aarch64.sve.[s\|u]qadd.x - @llvm.aarch64.sve.[s\|u]qsub.x Reviewers: sdesmalen, c-rhodes, dancgr, efriedma, cameron.mcinally, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77054	2020-04-06 10:07:08 +01:00
Guillaume Chatelet	6000478f39	Revert "[Alignment][NFC] Add DebugStr and operator*" This reverts commit `1e34ab98fc`.	2020-04-06 07:55:25 +00:00
Guillaume Chatelet	1e34ab98fc	[Alignment][NFC] Add DebugStr and operator* Summary: Also updates files to use them. This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77394	2020-04-06 07:12:46 +00:00
Simon Pilgrim	a43e233606	Remove unused function 'isInRange'. NFCI.	2020-04-05 23:11:24 +01:00
Simon Pilgrim	4431a29c60	[X86][SSE] Combine unary shuffle(HORIZOP,HORIZOP) -> HORIZOP We had previously limited the shuffle(HORIZOP,HORIZOP) combine to binary shuffles, but we can often merge unary shuffles just as well, folding in UNDEF/ZERO values into the 64-bit half lanes. For the (P)HADD/HSUB cases this is limited to fast-horizontal cases but PACKSS/PACKUS combines under all cases.	2020-04-05 22:49:46 +01:00
Apelete Seketeli	8aadb442d1	[scan-build] fix dead store warnings emitted on LLVM AMDGPU code base This fixes dead store warnings of the type "dead assignment" reported by Clang Static Analyzer.	2020-04-05 11:19:03 -04:00
Oliver Stannard	cb6aeb2239	[ARM] Add data gathering hint instruction Summary: This patch upstreams support the optional ARMv8.0 Data Gathering Hint (DGH) extension, which adds the Data Gathering Hint instruction to the hint space. See ARMv8.0-DGH in the Arm Architecture Reference Manual Armv8 for more information. Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, danielkiss, samparker Reviewed By: SjoerdMeijer Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77097	2020-04-05 15:21:00 +01:00
Oliver Stannard	6f60eb4a3c	[ARM] Add enhanced counter virtualization system registers Summary: This patch upstreams support for the ARMv8.6A Enhanced Counter Virtualization (ECV) extension, which adds 6 new system registers. See ARMv8.6-ECV in the Arm Architecture Reference Manual Armv8 for more information. Reviewers: t.p.northover, rengolin, SjoerdMeijer, pcc, ab, chill Reviewed By: SjoerdMeijer Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77094	2020-04-05 15:18:35 +01:00
Oliver Stannard	9e1455dc23	[ARM] Add ARMv8.6 Fine Grain Traps system registers Summary: This patch upstreams support for the ARMv8.6A Fine Grain Traps (FGT) extension, which adds 5 new system registers. See ARMv8.6-FGT in the Arm Architecture Reference Manual Armv8 for more information. Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, momchil.velikov Reviewed By: SjoerdMeijer Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76991	2020-04-05 14:28:18 +01:00
Diogo Sampaio	59d10dc703	[ARM] add ARMv8.6-A Activity monitors virtualization extension Summary: This patch upstreams v8.6A activity monitors virtualization assembler support, which consists of 32 new system registers (two groups, each with 16 numbered registers). See ARMv8.6-AMU in the Arm Architecture Reference Manual Armv8 for more information. Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, john.brawn, ostannard Reviewed By: ostannard Subscribers: LukeGeeson, dnsampaio, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76998	2020-04-05 13:31:06 +01:00
Benjamin Kramer	ff889df356	[X86] Roll some loops. NFCI.	2020-04-05 13:59:50 +02:00
Simon Pilgrim	3079e51858	[X86][SSE] Generalize shuffle(HORIZOP,HORIZOP) -> HORIZOP combine Our existing combine allows to merge the shuffle of 2 similar 64-bit wide 'horizontal ops' (HADD/PACK/etc.) if the shuffle was a UNPCK/MOVSD. This patch generalizes this to decode any target shuffle mask that can be widened to a 128-bit repeating v2*64 mask, which helps us catch PBLENDW/PBLENDD cases.	2020-04-05 12:09:19 +01:00
Simon Pilgrim	a17de6b91c	[X86][SSE] truncateVectorWithPACK - upper undef for 128->64 packing If we're packing from 128-bits to 64-bits then we don't need the RHS argument. This helps with register allocation, especially as we avoid repeating a use of the input value.	2020-04-05 11:47:36 +01:00
Matt Arsenault	6bfe28e92f	AMDGPU: Fix annotate kernel features through casted calls I thought I was testing this before, but the workitem id x case isn't great since it's mandatory in the parent kernel.	2020-04-04 20:44:44 -04:00
Matt Arsenault	221890d709	AMDGPU: Add feature for fast f32 denormals	2020-04-04 20:01:24 -04:00
Simon Pilgrim	e5e719d885	[X86][SSE] lowerV8I16Shuffle - lower compaction shuffles using PACKUSDW(PBLENDW,PBLENDW) on SSE41+ Similar to the lowerV16I8Shuffle implementation, for binary compaction v8i16 shuffles we can avoid the PUNPCKLDQ(PSHUFB,PSHUFB) pattern on SSE41+ targets by using PACKUSDW and PBLENDW. Before SSE41 we would need to use PACKSSDW but that requires sign extension that seems to destroy any gains, even on targets without PSHUFB. This is a bigger gain on AMD than Intel targets but should never be a regression, and avoiding the shuffle mask load(s) is always useful. Noticed in codegen while dealing with PR31443.	2020-04-04 13:08:25 +01:00
Craig Topper	1d42c0db9a	Revert "[X86] Add a Pass that builds a Condensed CFG for Load Value Injection (LVI) Gadgets" This reverts commit `c74dd640fd`. Reverting to address coding standard issues raised in post-commit review.	2020-04-03 16:56:08 -07:00
Craig Topper	a505ad58cf	Revert "[X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)" This reverts commit `62c42e29ba` Reverting to address coding standard issues raised in post-commit review.	2020-04-03 16:55:53 -07:00
Scott Constable	62c42e29ba	[X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI) After finding all such gadgets in a given function, the pass minimally inserts LFENCE instructions in such a manner that the following property is satisfied: for all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at least one LFENCE instruction. The algorithm that implements this minimal insertion is influenced by an academic paper that minimally inserts memory fences for high-performance concurrent programs: http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf The algorithm implemented in this pass is as follows: 1. Build a condensed CFG (i.e., a GadgetGraph) consisting only of the following components: -SOURCE instructions (also includes function arguments) -SINK instructions -Basic block entry points -Basic block terminators -LFENCE instructions 2. Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e., gadgets) are already mitigated by existing LFENCEs. If all gadgets have been mitigated, go to step 6. 3. Use a heuristic or plugin to approximate minimal LFENCE insertion. 4. Insert one LFENCE along each CFG edge that was cut in step 3. 5. Go to step 2. 6. If any LFENCEs were inserted, return true from runOnFunction() to tell LLVM that the function was modified. By default, the heuristic used in Step 3 is a greedy heuristic that avoids inserting LFENCEs into loops unless absolutely necessary. There is also a CLI option to load a plugin that can provide even better optimization, inserting fewer fences, while still mitigating all of the LVI gadgets. The plugin can be found here: https://github.com/intel/lvi-llvm-optimization-plugin, and a description of the pass's behavior with the plugin can be found here: https://software.intel.com/security-software-guidance/insights/optimized-mitigation-approach-load-value-injection. Differential Revision: https://reviews.llvm.org/D75937	2020-04-03 13:45:50 -07:00
Scott Constable	c74dd640fd	[X86] Add a Pass that builds a Condensed CFG for Load Value Injection (LVI) Gadgets Adds a new data structure, ImmutableGraph, and uses RDF to find LVI gadgets and add them to a MachineGadgetGraph. More specifically, a new X86 machine pass finds Load Value Injection (LVI) gadgets consisting of a load from memory (i.e., SOURCE), and any operation that may transmit the value loaded from memory over a covert channel, or use the value loaded from memory to determine a branch/call target (i.e., SINK). Also adds a new target feature to X86: +lvi-load-hardening The feature can be added via the clang CLI using -mlvi-hardening. Differential Revision: https://reviews.llvm.org/D75936	2020-04-03 13:02:04 -07:00
Scott Constable	f95a67d8b8	[X86] Add RET-hardening Support to mitigate Load Value Injection (LVI) Adding a pass that replaces every ret instruction with the sequence: pop <scratch-reg> lfence jmp *<scratch-reg> where <scratch-reg> is some available scratch register, according to the calling convention of the function being mitigated. Differential Revision: https://reviews.llvm.org/D75935	2020-04-03 12:08:34 -07:00
Matt Arsenault	30ebafaa56	CodeGen: Convert some TII hooks to use Register	2020-04-03 14:52:54 -04:00
Matt Arsenault	178050c3ba	AMDGPU: Use Register in more places	2020-04-03 14:52:54 -04:00
Matt Arsenault	e8dcb6d05e	AMDGPU: Remove redundant virtual	2020-04-03 14:52:53 -04:00
Christopher Tetreault	b600809688	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: kparzysz, sdesmalen, efriedma Reviewed By: kparzysz Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77267	2020-04-03 11:26:51 -07:00
Stanislav Mekhanoshin	0462795095	[AMDGPU] Propagate AGPR RC from PHI to its PHI operands We can fix register class of PHI based on its all AGPR uses. That leaves behind all PHIs which were already processed earlier. Propagate RC back to PHI operands of a PHI. Differential Revision: https://reviews.llvm.org/D77344	2020-04-03 11:23:02 -07:00
Simon Pilgrim	34a497b765	[X86][SSE] lowerShuffleWithPACK - extend to use chained PACKs for larger truncations Extend lowerShuffleWithPACK/matchShuffleWithPACK/createPackShuffleMask to handle compaction style shuffle masks that can be lowered to chains of PACKSS/PACKUS if their inputs are suitably sign/zero extended. This helps avoid PSHUFB (and its mask load) for short shuffle chains, shuffle combining will still replace with a PSHUFB if we have enough shuffles as getFauxShuffleMask should recognise the PACKSS/PACKUS chains.	2020-04-03 18:26:10 +01:00
John Brawn	4ad9ca0f9e	[ARM] Fix incorrect handling of big-endian vmov.i64 Currently when the target is big-endian vmov.i64 reverses the order of the two words of the vector. This is correct only when the underlying element type is 32-bit, as actually what it should be doing is considering it a vector of the underlying type and reversing the elements of that. Differential Revision: https://reviews.llvm.org/D76515	2020-04-03 17:36:50 +01:00

1 2 3 4 5 ...

57083 Commits