llvm-project

Commit Graph

Author	SHA1	Message	Date
Scott Linder	823549a6ec	[AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructions Emit a waterfall loop in the general case for a potentially-divergent Rsrc operand. When practical, avoid this by using Addr64 instructions. Recommits r341413 with changes to update the MachineDominatorTree when present. Differential Revision: https://reviews.llvm.org/D51742 llvm-svn: 343992	2018-10-08 18:47:01 +00:00
Simon Pilgrim	6fc8d05565	[X86][AVX2] Enable ZERO_EXTEND_VECTOR_INREG lowering of 256-bit vectors Some necessary yak shaving before lowering *_EXTEND_VECTOR_INREG 256-bit vectors on AVX1 targets as suggested by D52964. Differential Revision: https://reviews.llvm.org/D52970 llvm-svn: 343991	2018-10-08 18:40:50 +00:00
Tom Stellard	14d8807d9a	AMDGPU/GlobalISel: Select amdgcn.cvt.pkrtz to 64-bit instructions Summary: The 32-bit variants do not exist on VI+. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52958 llvm-svn: 343985	2018-10-08 17:49:29 +00:00
Nicolai Haehnle	ea36cd595c	AMDGPU: Future-proof {raw,struct}.buffer.atomic intrinsics Summary: The ISA is really supposed to support 64-bit atomics as well, so the data type should be an overload. Mesa doesn't use these atomics yet, in fact I noticed this issue while trying to use the atomics from Mesa. Change-Id: I77f58317a085a0d3eb933cc7e99308c48a19f83e Reviewers: tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52291 llvm-svn: 343978	2018-10-08 16:53:48 +00:00
Sanjay Patel	8459465a44	[x86] add hadd test with no undefs, remove duplicate tests; NFC llvm-svn: 343975	2018-10-08 16:24:43 +00:00
Sanjay Patel	d48789c0c4	[x86] simplify hadd tests; NFC The tests from PR39195 don't use 2 parameters. That's the root problem for the pattern matching in isHorizontalBinOp(). llvm-svn: 343974	2018-10-08 15:56:28 +00:00
Neil Henning	6641657453	[AMDGPU] Add an AMDGPU specific atomic optimizer. This commit adds a new IR level pass to the AMDGPU backend to perform atomic optimizations. It works by: - Running through a function and finding atomicrmw add/sub or uses of the atomic buffer intrinsics for add/sub. - If all arguments except the value to be added/subtracted are uniform, record the value to be optimized. - Run through the atomic operations we can optimize and, depending on whether the value is uniform/divergent use wavefront wide operations (DPP in the divergent case) to calculate the total amount to be atomically added/subtracted. - Then let only a single lane of each wavefront perform the atomic operation, reducing the total number of atomic operations in flight. - Lastly we recombine the result from the single lane to each lane of the wavefront, and calculate our individual lanes offset into the final result. Differential Revision: https://reviews.llvm.org/D51969 llvm-svn: 343973	2018-10-08 15:49:19 +00:00
Oliver Stannard	367b4741f4	[AArch64][v8.5A] Don't create BR instructions in outliner when BTI enabled When branch target identification is enabled, we can only do indirect tail-calls through x16 or x17. This means that the outliner can't transform a BLR instruction at the end of an outlined region into a BR. Differential revision: https://reviews.llvm.org/D52869 llvm-svn: 343969	2018-10-08 14:12:08 +00:00
Oliver Stannard	c922116a51	[AArch64][v8.5A] Restrict indirect tail calls to use x16/17 only when using BTI When branch target identification is enabled, all indirectly-callable functions start with a BTI C instruction. this instruction can only be the target of certain indirect branches (direct branches and fall-through are not affected): - A BLR instruction, in either a protected or unprotected page. - A BR instruction in a protected page, using x16 or x17. - A BR instruction in an unprotected page, using any register. Without BTI, we can use any non call-preserved register to hold the address for an indirect tail call. However, when BTI is enabled, then the code being compiled might be loaded into a BTI-protected page, where only x16 and x17 can be used for indirect tail calls. Legacy code withiout this restriction can still indirectly tail-call BTI-protected functions, because they will be loaded into an unprotected page, so any register is allowed. Differential revision: https://reviews.llvm.org/D52868 llvm-svn: 343968	2018-10-08 14:09:15 +00:00
Oliver Stannard	250e5a5b65	[AArch64][v8.5A] Branch Target Identification code-generation pass The Branch Target Identification extension, introduced to AArch64 in Armv8.5-A, adds the BTI instruction, which is used to mark valid targets of indirect branches. When enabled, the processor will trap if an instruction in a protected page tries to perform an indirect branch to any instruction other than a BTI. The BTI instruction uses encodings which were NOPs in earlier versions of the architecture, so BTI-enabled code will still run on earlier hardware, just without the extra protection. There are 3 variants of the BTI instruction, which are valid targets for different kinds or branches: - BTI C can be targeted by call instructions, and is inteneded to be used at function entry points. These are the BLR instruction, as well as BR with x16 or x17. These BR instructions are allowed for use in PLT entries, and we can also use them to allow indirect tail-calls. - BTI J can be targeted by BR only, and is intended to be used by jump tables. - BTI JC acts ab both a BTI C and a BTI J instruction, and can be targeted by any BLR or BR instruction. Note that RET instructions are not restricted by branch target identification, the reason for this is that return addresses can be protected more effectively using return address signing. Direct branches and calls are also unaffected, as it is assumed that an attacker cannot modify executable pages (if they could, they wouldn't need to do a ROP/JOP attack). This patch adds a MachineFunctionPass which: - Adds a BTI C at the start of every function which could be indirectly called (either because it is address-taken, or externally visible so could be address-taken in another translation unit). - Adds a BTI J at the start of every basic block which could be indirectly branched to. This could be either done by a jump table, or by taking the address of the block (e.g. the using GCC label values extension). We only need to use BTI JC when a function is indirectly-callable, and takes the address of the entry block. I've not been able to trigger this from C or IR, but I've included a MIR test just in case. Using BTI C at function entries relies on the fact that no other code in BTI-protected pages uses indirect tail-calls, unless they use x16 or x17 to hold the address. I'll add that code-generation restriction as a separate patch. Differential revision: https://reviews.llvm.org/D52867 llvm-svn: 343967	2018-10-08 14:04:24 +00:00
Alexander Ivchenko	1aedf203dd	[GlobalIsel][X86] Support G_UDIV/G_UREM/G_SREM Support G_UDIV/G_UREM/G_SREM. The instruction selection code is taken from FastISel with only minor tweaks to adapt for GlobalISel. Differential Revision: https://reviews.llvm.org/D49781 llvm-svn: 343966	2018-10-08 13:40:34 +00:00
Sanjay Patel	60badd7584	[x86] add 16 missed hadd patterns (PR39195); NFC llvm-svn: 343965	2018-10-08 12:54:33 +00:00
Peter Smith	6f36cd4d76	[ARM] Account for implicit IT when calculating inline asm size When deciding if it is safe to optimize a conditional branch to a CBZ or CBNZ the offsets of the BasicBlocks from the start of the function are estimated. For inline assembly the generic getInlineAsmLength() function is used to get a worst case estimate of the inline assembly by multiplying the number of instructions by the max instruction size of 4 bytes. This unfortunately doesn't take into account the generation of Thumb implicit IT instructions. In edge cases such as when all the instructions in the block are 4-bytes in size and there is an implicit IT then the size is underestimated. This can cause an out of range CBZ or CBNZ to be generated. The patch takes a conservative approach and assumes that every instruction in the inline assembly block may have an implicit IT. Fixes pr31805 Differential Revision: https://reviews.llvm.org/D52834 llvm-svn: 343960	2018-10-08 09:38:28 +00:00
Oliver Stannard	9ecdac8ee0	[AArch64] Fix verifier error when outlining indirect calls The MachineOutliner for AArch64 transforms indirect calls into indirect tail calls, replacing the call with the TCRETURNri pseudo-instruction. This pseudo lowers to a BR, but has the isCall and isReturn flags set. The problem is that TCRETURNri takes a tcGPR64 as the register argument, to prevent indiret tail-calls from using caller-saved registers. The indirect calls transformed by the outliner could use caller-saved registers. This is fine, because the outliner ensures that the register is available at all call sites. However, this causes a verifier failure when the register is not in tcGPR64. The fix is to add a new pseudo-instruction like TCRETURNri, but which accepts any GPR. Differential revision: https://reviews.llvm.org/D52829 llvm-svn: 343959	2018-10-08 09:18:48 +00:00
Alex Bradbury	5af6c1496a	[RISCV] Update alu8.ll and alu16.ll test cases The srli test in alu8.ll was a no-op, as it shifted by 8 bits. Fix this, and also change the immediate in alu16.ll as shifted by something other than a poewr of 8 is more interesting. llvm-svn: 343958	2018-10-08 09:08:51 +00:00
Sanjay Patel	ecc8af61e7	[DAGCombiner] allow undef elts in vector fadd matching llvm-svn: 343945	2018-10-07 16:30:42 +00:00
Sanjay Patel	f956840dbe	[x86] add vector fadd with undef elts test; NFC llvm-svn: 343944	2018-10-07 16:27:50 +00:00
Sanjay Patel	6c02c6a3a6	[x86] remove redundant tests; NFC The equivalent tests were added to the file with related folds in rL343941. llvm-svn: 343943	2018-10-07 16:13:38 +00:00
Sanjay Patel	ef76e27985	[DAGCombiner] allow undefs when matching vector splats for fmul folds llvm-svn: 343942	2018-10-07 16:05:37 +00:00
Sanjay Patel	fcb1061c13	[x86] add vector fmul with undef elts tests; NFC llvm-svn: 343941	2018-10-07 16:00:55 +00:00
Sanjay Patel	0b74c840dd	[DAGCombiner] allow undef elts in vector fabs/fneg matching This change is proposed as a part of D44548, but we need this independently to avoid regressions from improved undef propagation in SimplifyDemandedVectorElts(). llvm-svn: 343940	2018-10-07 15:32:06 +00:00
Sanjay Patel	31a3f2aaba	[x86] add tests for FP logic folding for vectors with undefs; NFC llvm-svn: 343938	2018-10-07 15:05:39 +00:00
Simon Pilgrim	3b04a4e322	[SelectionDAG] Respect multiple uses in SimplifyDemandedBits to SimplifyDemandedVectorElts simplification rL343913 was using SimplifyDemandedBits's original demanded mask instead of the adjusted 'NewMask' that accounts for multiple uses of the op (those variable names really need improving....). Annoyingly many of the test changes (back to pre-rL343913 state) are actually safe - but only because their multiple uses are all by PMULDQ/PMULUDQ. Thanks to Jan Vesely (@jvesely) for bisecting the bug. llvm-svn: 343935	2018-10-07 11:45:46 +00:00
Simon Pilgrim	012fda59a5	[AARCH64][X86] Remove _nonsplat from test names As discussed on D50222 llvm-svn: 343934	2018-10-07 11:24:04 +00:00
Alex Bradbury	47afe5e7c0	[RISCV] Introduce alu8.ll and alu16.ll tests These track the quality of generated code for simple arithmetic operations that were legalised from non-native types. llvm-svn: 343930	2018-10-07 06:53:46 +00:00
Simon Pilgrim	9fa1c66421	[X86] getFauxShuffleMask - Handle undef + sentinel values in subvector insertion llvm-svn: 343926	2018-10-06 22:13:44 +00:00
Simon Pilgrim	0dcf1cea03	[X86][SSE] Add SSE41 vector int2fp tests llvm-svn: 343925	2018-10-06 20:24:27 +00:00
Simon Pilgrim	62d199f4e5	[X86] combinePMULDQ - add op back to worklist if SimplifyDemandedBits succeeds on either operand Prevents missing other simplifications that may occur deep in the operand chain where CommitTargetLoweringOpt won't add the PMULDQ back to the worklist itself llvm-svn: 343922	2018-10-06 14:51:14 +00:00
Simon Pilgrim	944c530563	[X86] Regenerate LSR loop iteration test llvm-svn: 343921	2018-10-06 14:26:38 +00:00
Sanjay Patel	891be5af90	[x86] add test for masked store with extra shift op; NFC llvm-svn: 343920	2018-10-06 14:11:05 +00:00
Simon Pilgrim	0cc0a24b55	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - simplify PSHUFB masks Attempt to simplify PSHUFB masks (even non-constant ones) - we should probably be able to simplify other variable shuffles as well as the need arises. llvm-svn: 343919	2018-10-06 13:49:31 +00:00
Simon Pilgrim	9c9c97bcf4	[SelectionDAG] Add SimplifyDemandedBits to SimplifyDemandedVectorElts simplification This patch enables SimplifyDemandedBits to call SimplifyDemandedVectorElts in cases where the demanded bits mask covers entire elements of a bitcasted source vector. There are a couple of cases here where simplification at a deeper level (such as through bitcasts) prevents further simplification - CommitTargetLoweringOpt only adds immediate uses/users back to the worklist when we might want to combine the original caller again to see what else it can simplify. As well as that I had to disable handling of bool vector until SimplifyDemandedVectorElts better supports some of their opcodes (SETCC, shifts etc.). Fixes PR39178 Differential Revision: https://reviews.llvm.org/D52935 llvm-svn: 343913	2018-10-06 10:20:04 +00:00
Jessica Paquette	b328d95333	[GlobalIsel] Add llvm.invariant.start and llvm.invariant.end Port over the implementation in SelectionDAGBuilder.cpp into the IRTranslator and update the arm64-irtranslator test. These were causing fallbacks in CTMark/Bullet (-Rpass-missed=gisel-select), and this patch fixes that. https://reviews.llvm.org/D52945 llvm-svn: 343885	2018-10-05 21:02:46 +00:00
Sanjay Patel	f84ece68ca	[x86] make blend tests resistant to demanded elements improvements; NFC Similar to rL343858 - we don't want these tests to lose value with D52912. llvm-svn: 343882	2018-10-05 20:26:54 +00:00
Alex Bradbury	90fc100742	[RISCV] Regenerate several tests now enableMultipleCopyHints is enabled by default r343851 caused codegen changes in several tests. This patch regenerates them. llvm-svn: 343873	2018-10-05 18:25:55 +00:00
Craig Topper	0ed892da70	[X86] Don't promote i16 compares to i32 if the immediate will fit in 8 bits. The comments in this code say we were trying to avoid 16-bit immediates, but if the immediate fits in 8-bits this isn't an issue. This avoids creating a zero extend that probably won't go away. The movmskb related changes are interesting. The movmskb instruction writes a 32-bit result, but fills the upper bits with 0. So the zero_extend we were previously emitting was free, but we turned a -1 immediate that would fit in 8-bits into a 32-bit immediate so it was still bad. llvm-svn: 343871	2018-10-05 18:13:36 +00:00
Sanjay Patel	f6a160a102	[SelectionDAG] allow undefs when matching splat constants And use that to transform fsub with zero constant operands. The integer part isn't used yet, but it is proposed for use in D44548, so adding both enhancements here makes that patch simpler. llvm-svn: 343865	2018-10-05 17:42:19 +00:00
Sanjay Patel	8858fa8552	[x86] add test for (X - 0.0) vector with undef elts; NFC llvm-svn: 343863	2018-10-05 17:36:51 +00:00
Simon Pilgrim	90947214f3	[X86][SSE] Try to make MOVLPS/MOVHPS(+PD) instructions SimplifyDemandedElts proof Fix for D52912 which was simplifying MOVLPS/MOVHPS(+PD) instructions as the tests were only touching one of the vector halfs llvm-svn: 343858	2018-10-05 15:50:18 +00:00
Sanjay Patel	00216bca66	[x86] regenerate full checks; NFC llvm-svn: 343855	2018-10-05 14:56:14 +00:00
Sanjay Patel	b7d85655f7	[x86] add test for fneg matching failure; NFC llvm-svn: 343854	2018-10-05 14:49:20 +00:00
Simon Pilgrim	6c5ab48fe7	[X86][AVX] getFauxShuffleMask - add support for INSERT_SUBVECTOR subvector shuffles Decode subvector shuffles from INSERT_SUBVECTOR(SRC0, SHUFFLE(EXTRACT_SUBVECTOR(SRC1)) This was found necessary while investigating PR39161 llvm-svn: 343853	2018-10-05 14:41:00 +00:00
Tom Stellard	7c65078f04	AMDGPU/GlobalISel: Add support for G_INTTOPTR Summary: This is a no-op. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52916 llvm-svn: 343839	2018-10-05 04:34:09 +00:00
Thomas Lively	4b47d08e52	[WebAssembly] Saturating arithmetic intrinsics Summary: Depends on D52805. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52813 llvm-svn: 343833	2018-10-05 00:45:20 +00:00
Daniel Sanders	a464ffd52c	[globalisel][combine] When placing truncates, handle the case when the BB is empty GlobalISel uses MIR with implicit fallthrough on each basic block. As a result, getFirstNonPhi() can return end(). llvm-svn: 343829	2018-10-04 23:47:37 +00:00
Yury Delendik	409b439152	[WebAssembly] Ignore DBG_VALUE in WebAssemblyCFGStackify pass when looking for block start Summary: Fixes https://bugs.llvm.org/show_bug.cgi?id=39158 and regression caused by D49034. Though it is possible the problem was existed before and was exposed by additional DBG_VALUEs. Reviewers: sunfish, dschuff, aheejin Reviewed By: aheejin Subscribers: sbc100, aheejin, llvm-commits, alexcrichton, jgravelle-google Differential Revision: https://reviews.llvm.org/D52837 llvm-svn: 343827	2018-10-04 23:31:00 +00:00
Daniel Sanders	ab358bfd09	[globalisel][combine] Fix a rare crash when encountering an instruction whose op0 isn't a reg The simplest instance of this is an intrinsic with no results which will have the intrinsic ID as operand 0. Also fix some benign incorrectness when op0 is a reg but isn't a def that was guarded against by checking for the extension opcodes. llvm-svn: 343821	2018-10-04 21:44:32 +00:00
Konstantin Zhuravlyov	aa067cb9fb	AMDGPU: Rename isAmdCodeObjectV2 -> isAmdHsaOrMesa The isAmdCodeObjectV2 is a misleading name which actually checks whether the os is amdhsa or mesa. Also add a test to make sure we do not generate old kernel header for code object v3. Differential Revision: https://reviews.llvm.org/D52897 llvm-svn: 343813	2018-10-04 21:02:16 +00:00
Daniel Sanders	a05c7583c9	[globalisel][combine] Improve the truncate placement for the extending-loads combine This brings the extending loads patch back to the original intent but minus the PHI bug and with another small improvement to de-dupe truncates that are inserted into the same block. The truncates are sunk to their uses unless this would require inserting before a phi in which case it sinks to the _beginning_ of the predecessor block for that path (but no earlier than the def). The reason for choosing the beginning of the predecessor is that it makes de-duping multiple truncates in the same block simple, and optimized code is going to run a scheduler at some point which will likely change the position anyway. llvm-svn: 343804	2018-10-04 18:44:58 +00:00
Sanjay Patel	2cf1561f1a	[x86] add test for SSE sqrtss register dep (PR22206) llvm-svn: 343803	2018-10-04 17:59:30 +00:00
Matthias Braun	0c67a4e958	AArch64: Fix XSeqPairs/WSeqPairs problems - Fix spill/reloads of XSeqPairs failing with vregs (only physregs worked correctly) - Add missing spill/reload code for WSeqPairs class Differential Revision: https://reviews.llvm.org/D52761 llvm-svn: 343799	2018-10-04 17:02:53 +00:00
Farhana Aleen	4bc597bff5	[AMDGPU] Match signed dot4/8 pattern. Summary: This patch matches signed dot4 and dot8 pattern. Author: FarhanaAleen Reviewed By: msearles Differential Revision: https://reviews.llvm.org/D52520 llvm-svn: 343798	2018-10-04 16:57:37 +00:00
Simon Pilgrim	8ba4061d39	[X86][AVX] Add PR39161 test case for v4f64 zzww shuffle llvm-svn: 343786	2018-10-04 15:06:09 +00:00
Alex Bradbury	a4b7b6dabc	[RISCV][NFC] Remove dead CHECK lines from vararg.ll test The RISCV32 check prefix is no longer used so these lines are dead. llvm-svn: 343757	2018-10-04 07:35:52 +00:00
Alex Bradbury	e96b7c88a3	[RISCV] Bugfix for floats passed on the stack with the ILP32 ABI on RV32F f32 values passed on the stack would previously cause an assertion in unpackFromMemLoc.. This would only trigger in the presence of the F extension making f32 a legal type. Otherwise the f32 would be legalized. This patch fixes that by keeping LocVT=f32 when a float is passed on the stack. It also adds test coverage for this case, and tests that also demonstrate lw/sw/flw/fsw will be selected when most profitable. i.e. there is no unnecessary i32<->f32 conversion in registers. llvm-svn: 343756	2018-10-04 07:28:49 +00:00
Thomas Lively	5d461c96bd	[WebAssembly] Bitselect intrinsic and instruction Summary: Depends on D52755. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52805 llvm-svn: 343739	2018-10-03 23:02:23 +00:00
Simon Atanasyan	757270435c	[mips] Remove -allow-deprecated-dag-overlap flag from tests. NFC Fix DAG check statements in MIPS codegen tests to remove -allow-deprecated-dag-overlap flag. llvm-svn: 343730	2018-10-03 22:02:23 +00:00
Simon Pilgrim	aabd99c27a	[X86] PUSH/POP 'mem-mem' instructions are not RMW - these are 2 different addresses This patch adds a 'WriteCopy' [WriteLoad, WriteStore] schedule sequence instead to better model the behaviour Found by @andreadb during llvm-mca testing on btver2 which was crashing on "zero uop" WriteRMW only instructions llvm-svn: 343708	2018-10-03 19:02:38 +00:00
Simon Pilgrim	0b451a2983	[X86][Btver2] Fix MMX PSHUFB schedule Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343701	2018-10-03 18:18:50 +00:00
Daniel Sanders	fb9b99b26e	[globalisel][combines] Don't sink G_TRUNC down to use if that use is a G_PHI This fixes a problem where the register allocator fails to eliminate a PHI because there's a non-PHI in the middle of the PHI instructions at the start of a BB. This G_TRUNC can be better placed but this at least fixes the correctness issue quickly. I'll follow up with a patch to the verifier to catch this kind of bug in future. llvm-svn: 343693	2018-10-03 15:43:39 +00:00
Nirav Dave	925b64be64	[X86] Correctly use SSE registers if no-x87 is selected. Fix use of SSE1 registers for f32 ops in no-x87 mode. Notably, allow use of SSE instructions for f32 operations in 64-bit mode (but not 32-bit which is disallowed by callign convention). Also avoid translating memset/memcopy/memmove into SSE registers without X87 for 32-bit mode. This fixes PR38738. Reviewers: nickdesaulniers, craig.topper Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D52555 llvm-svn: 343689	2018-10-03 14:13:30 +00:00
Alex Bradbury	efceb59801	[RISCV] Remove RV64 test lines from umulo-128-legalisation-lowering.ll The generated code is incorrect anyway, and this test adds noise to the upcoming set of patches that flesh out RV64 support. llvm-svn: 343675	2018-10-03 10:59:42 +00:00
Tim Renouf	a37679d67b	[AMDGPU] Fix for negative offsets in buffer/tbuffer intrinsics Summary: The new buffer/tbuffer intrinsics handle an out-of-range immediate offset by moving/adding offset&-4096 to a vgpr, leaving an in-range immediate offset, with a chance of the move/add being CSEd for similar loads/stores. However it turns out that a negative offset in a vgpr is illegal, even if adding the immediate offset makes it legal again. Therefore, this commit disables the offset&-4096 thing if the offset is negative. Differential Revision: https://reviews.llvm.org/D52683 Change-Id: Ie02f0a74f240a138dc2a29d17cfbd9e350e4ed13 llvm-svn: 343672	2018-10-03 10:29:43 +00:00
Fangrui Song	3d76d36059	[AMDGPU] Rename pass "isel" to "amdgpu-isel" Summary: The AMDGPU target specific pass "isel" is a misleading name. Reviewers: tstellar, echristo, javed.absar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D52759 llvm-svn: 343659	2018-10-03 03:38:22 +00:00
Daniel Sanders	bad3936109	[globalisel] Fix one more missing Verifier pass from gisel-commandline-option.ll llvm-svn: 343658	2018-10-03 02:52:54 +00:00
Matt Arsenault	635d479322	AMDGPU: Always run AMDGPUAlwaysInline Even if calls are enabled, it still needs to be run for forcing inline of functions that use LDS. llvm-svn: 343657	2018-10-03 02:47:25 +00:00
Daniel Sanders	34eac35a60	Add the missing new files from r343654 llvm-svn: 343655	2018-10-03 02:21:30 +00:00
Daniel Sanders	c973ad1878	Re-commit: [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64 Summary: Depends on D45541 Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45543 The previous commit failed portions of the test-suite on GreenDragon due to duplicate COPY instructions and iterator invalidation. Both issues have now been fixed. To assist with this, a helper (cloneVirtualRegister) has been added to MachineRegisterInfo that can be used to get another register that has the same type and class/bank as an existing one. llvm-svn: 343654	2018-10-03 02:12:17 +00:00
Thomas Lively	9075cd607d	[WebAssembly] any_true and all_true intrinsics and instructions Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52755 llvm-svn: 343649	2018-10-03 00:19:39 +00:00
Sam Clegg	b2486f118d	[WebAssembly] Stop generating helper functions in WebAssemblyLowerEmscriptenEHSjLj Previously we were creating weakly defined helper function in each translation unit: - setThrew - setTempRet0 Instead we now assume these will be provided at link time. In emscripten they are provided in compiler-rt: https://github.com/kripken/emscripten/pull/7203 Additionally we previously created three global variable which are also now required to exist at link time instead. - __THREW__ - _threwValue - __tempRet0 Differential Revision: https://reviews.llvm.org/D49208 llvm-svn: 343640	2018-10-02 22:12:15 +00:00
Daniel Sanders	f430d941e9	[globalisel] Attempt to fix llvm-clang-x86_64-expensive-checks-win The behaviour of this bot indicates that -verify-machineinstrs has been forced on and is therefore inserting the verifier on builds that don't expect it. Explicitly specify whether it's enabled or disabled for each test. llvm-svn: 343633	2018-10-02 20:51:27 +00:00
Matt Morehouse	4b1ec17fb0	Revert "X86, AArch64, ARM: Do not attach debug location to spill/reload instructions" This reverts r343520 due to breakage of HWASan tests on Android. llvm-svn: 343616	2018-10-02 18:35:44 +00:00
Krzysztof Parzyszek	528aff3372	[Hexagon] Fix extracting subvectors of non-HVX vNi1 Patch by Brendon Cahoon. llvm-svn: 343596	2018-10-02 15:05:43 +00:00
Roman Lebedev	ea2046bea9	[NFC][CodeGen][X86] fma.ll, lwp-intrinsics.ll: actually spell --check-prefixes correctly :/ llvm-svn: 343588	2018-10-02 13:34:50 +00:00
Roman Lebedev	5412be4b7a	[NFC][CodeGen][X86] lwp-intrinsics.ll: fix check prefixes llvm-svn: 343585	2018-10-02 13:11:08 +00:00
Roman Lebedev	8b253f0b54	[NFC][CodeGen][X86] fma.ll: fix check prefixes for -mcpu=bdver2 llvm-svn: 343584	2018-10-02 13:10:55 +00:00
Simon Pilgrim	ad23f270db	[X86] Standardize floating point assembly comments Consistently try to use APFloat::toString for floating point constant comments to get rid of differences between Constant / ConstantDataSequential values - it should help stop some of the linux-windows buildbot failures matching NaN/INF etc. as well. Differential Revision: https://reviews.llvm.org/D52702 llvm-svn: 343562	2018-10-02 09:08:51 +00:00
Matt Arsenault	ab41193312	AMDGPU: Expand atomicrmw nand in IR llvm-svn: 343559	2018-10-02 03:50:56 +00:00
Thomas Lively	6f77811a21	[WebAssembly] Restore slashes in SIMD conversion names Summary: Depends on D52372 and D52442. Reviewers: aheejin, dschuff, aardappel Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52512 llvm-svn: 343558	2018-10-02 01:52:21 +00:00
Fangrui Song	99d4f74d01	[AArch64][DAGCombiner]: change -stop-after=isel to instruction-select "isel" is registered by AMDGPU. The test will break if the AMDGPU target is not built. llvm-svn: 343553	2018-10-02 00:22:51 +00:00
Daniel Sanders	33f42f97af	Revert: r343521 and r343541: [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64 There's a strange assertion on two of the Green Dragon bots that goes away when this is reverted. The assertion is in RegBankAlloc and if it is this commit then -verify-machine-instrs should have caught it earlier in the pipeline. llvm-svn: 343546	2018-10-01 22:32:08 +00:00
Reid Kleckner	9ea2c01264	[codeview] Emit S_FRAMEPROC and use S_DEFRANGE_FRAMEPOINTER_REL Summary: Before this change, LLVM would always describe locals on the stack as being relative to some specific register, RSP, ESP, EBP, ESI, etc. Variables in stack memory are pretty common, so there is a special S_DEFRANGE_FRAMEPOINTER_REL symbol for them. This change uses it to reduce the size of our debug info. On top of the size savings, there are cases on 32-bit x86 where local variables are addressed from ESP, but ESP changes across the function. Unlike in DWARF, there is no FPO data to describe the stack adjustments made to push arguments onto the stack and pop them off after the call, which makes it hard for the debugger to find the local variables in frames further up the stack. To handle this, CodeView has a special VFRAME register, which corresponds to the $T0 variable set by our FPO data in 32-bit. Offsets to local variables are instead relative to this value. This is part of PR38857. Reviewers: hans, zturner, javed.absar Subscribers: aprantl, hiraditya, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D52217 llvm-svn: 343543	2018-10-01 21:59:45 +00:00
Craig Topper	42cd8cd862	Recommit r343499 "[X86] Enable load folding in the test shrinking code" Original message: This patch adds load folding support to the test shrinking code. This was noticed missing in the review for D52669 llvm-svn: 343540	2018-10-01 21:35:28 +00:00
Craig Topper	f06a57fc89	Recommit r343498 "[X86] Improve test instruction shrinking when the sign flag is used and the output of the and is truncated." This includes a fix to prevent i16 compares with i32/i64 ands from being shrunk if bit 15 of the and is set and the sign bit is used. Original commit message: Currently we skip looking through truncates if the sign flag is used. But that's overly restrictive. It's safe to look through the truncate as long as we ensure one of the 3 things when we shrink. Either the MSB of the mask at the shrunken size isn't set. If the mask bit is set then either the shrunk size needs to be equal to the compare size or the sign There are still missed opportunities to shrink a load and fold it in here. This will be fixed in a future patch. llvm-svn: 343539	2018-10-01 21:35:26 +00:00
Stefan Pintilie	5d32a86f44	[PowerPC] Folding XForm to DForm loads requires alignment for some DForm loads. Going from XForm Load to DSForm Load requires that the immediate be 4 byte aligned. If we are not aligned we must leave the load as LDX (XForm). This bug is causing a compile-time failure in the benchmark h264ref. Differential Revision: https://reviews.llvm.org/D51988 llvm-svn: 343525	2018-10-01 20:16:27 +00:00
Daniel Sanders	9659bfda5a	[globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64 Summary: Depends on D45541 Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45543 llvm-svn: 343521	2018-10-01 18:56:47 +00:00
Matthias Braun	3e081703c3	X86, AArch64, ARM: Do not attach debug location to spill/reload instructions Spill/reload instructions are artificially generated by the compiler and have no relation to the original source code. So the best thing to do is not attach any debug location to them (instead of just taking the next debug location we find on following instructions). Differential Revision: https://reviews.llvm.org/D52125 llvm-svn: 343520	2018-10-01 18:56:39 +00:00
Craig Topper	1346b5b7cf	[X86] Add more test shrinking with truncate and sign bit usage tests. NFC llvm-svn: 343519	2018-10-01 18:52:19 +00:00
Craig Topper	e072934d28	Revert r343499 and r343498. X86 test improvements There's a subtle bug in the handling of truncate from i32/i64 to i32 without minsize. I'll be adding more test cases and trying to find a fix. llvm-svn: 343516	2018-10-01 18:40:44 +00:00
Krzysztof Parzyszek	6d569a2cc4	[Hexagon] Remove incorrect pattern for swiz The pattern had a couple of problems: - It was checking for loads of bytes in the reverse order to what it should have been looking for. - It would replace loads of bytes with a load of a word without making sure that the alignment was correct. Thanks to Eli Friedman for pointing it out. llvm-svn: 343514	2018-10-01 18:24:40 +00:00
Matthias Braun	7159daa68e	MIRParser: Check that instructions only reference DILocation metadata llvm-svn: 343505	2018-10-01 17:50:52 +00:00
Craig Topper	aa84e1bba2	[X86] Enable load folding in the test shrinking code This patch adds load folding support to the test shrinking code. This was noticed missing in the review for D52669 Differential Revision: https://reviews.llvm.org/D52699 llvm-svn: 343499	2018-10-01 17:10:50 +00:00
Craig Topper	2b587ad071	[X86] Improve test instruction shrinking when the sign flag is used and the output of the and is truncated Currently we skip looking through truncates if the sign flag is used. But that's overly restrictive. It's safe to look through the truncate as long as we ensure one of the 3 things when we shrink. Either the MSB of the mask at the shrunken size isn't set. If the mask bit is set then either the shrunk size needs to be equal to the compare size or the sign flag needs to be unused. There are still missed opportunities to shrink a load and fold it in here. This will be fixed in a future patch. Differential Revision: https://reviews.llvm.org/D52669 llvm-svn: 343498	2018-10-01 17:10:45 +00:00
Simon Pilgrim	e0d2019052	[X86][Btver2] Fix BT(C\|R\|S)mr & BT(C\|R\|S)mi schedule latency + uop counts Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343494	2018-10-01 16:31:30 +00:00
Matthias Braun	004fe6bf83	DAGCombiner: StoreMerging: Fix bad index calculating when adjusting mismatching vector types This fixes a case of bad index calculation when merging mismatching vector types. This changes the existing code to just use the existing extract_{subvector\|element} and a bitcast (instead of bitcast first and then newly created extract_xxx) so we don't need to adjust any indices in the first place. rdar://44584718 Differential Revision: https://reviews.llvm.org/D52681 llvm-svn: 343493	2018-10-01 16:25:50 +00:00
Sanjay Patel	5187efcfab	[x86] add tests for 256- and 512-bit vector types for scalar-to-vector transform; NFC llvm-svn: 343491	2018-10-01 16:17:18 +00:00
Simon Atanasyan	1ea206be73	[mips] Generate tests expectations using update_llc_test_checks. NFC Generate tests expectations using update_llc_test_checks and reduce number of "check prefixes" used in the tests. llvm-svn: 343485	2018-10-01 14:43:07 +00:00
Clement Courbet	a933fb237e	[X86][Sched] Update scheduling information for VZEROALL on HWS, BDW, SKX, SNB. Summary: While looking at PR35606, I found out that the scheduling info is incorrect. One can check that it's really a P5+P6 and not a 2*P56 with: echo -e 'vzeroall\nvandps %xmm1, %xmm2, %xmm3' \| ./bin/llvm-exegesis -mode=uops -snippets-file=- (vandps executes on P5 only) Reviewers: craig.topper, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52541 llvm-svn: 343447	2018-10-01 08:37:48 +00:00
Carlos Alberto Enciso	81d8ef2196	[DebugInfo][Dexter] Incorrect DBG_VALUE after MCP dead copy instruction removal. When MachineCopyPropagation eliminates a dead 'copy', its associated debug information becomes invalid. as the recorded register has been removed. It causes the debugger to display wrong variable value. Differential Revision: https://reviews.llvm.org/D52614 llvm-svn: 343445	2018-10-01 08:14:44 +00:00
Clement Courbet	ce4caff0de	[CodeGen][NFC] Add tests for heterogeneous types in MergeConsecutiveStores Reviewers: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52643 llvm-svn: 343444	2018-10-01 07:16:22 +00:00

1 2 3 4 5 ...

26150 Commits